N. Nucleic Acids – Primary Through Secondary Structure

N. NUCLEIC ACIDS – PRIMARY THROUGH SECONDARY STRUCTURE

The Subunits

Like proteins, polynucleic acids (DNA and RNA) are linear polymers of monomers that share chemical functionality but possess some structural diversity. The monomers comprising polynucleic acids are referred to as nucleotides. The structure of a nucleotide is composed of three simpler chemical groups that are combined by condensation reactions (the formal loss of water upon bond formation). The nucleotide is composed of a nitrogenous base, a sugar and phosphate. Nitrogenous Bases The chemical diversity of nucleic acids is principally derived from the several nitrogenous bases are found among nucleic acids. These bases are classified as purines and pyrimidines, based on the basic heterocyclic compound upon which each is based (Figure N.1). There are two purines that are typically incorporated in growing strands of DNA and RNA: adenine and guanine (Table N.1). Each can be distinguished by the substitution of the purine ring with “exocyclic” substituents at C6 and C2 (for guanine). In addition, a third purine, hypoxanthine, is often used in structural studies and is found in modified RNA molecules in the cell (Table N.1). The common pyrimidines are cytidine, thymine and uracil. The latter two are chiefly incorporated into different polymers. Thymine is commonly found in DNA, while uracil is commonly found in RNA. They differ in the presence (thymine) or absence (uracil) of an exocyclic methyl group at C5 of the pyrimidine ring (Table N.1). Note that modified forms of these bases may be found in both DNA and RNA. Common substitutions include methylation at C5 of cytosine and on N6 of adenine.

Figure N.1 The structure of (A) purine and (B) pyrimidine. The numbering scheme for each heterocycle is shown in Table N.1. Sugars Biology has selected two sugars to act as part of the unvarying “backbone” of RNA and DNA. They are ribose and 2-deoxyribose (Figure N.2). Ribonucleic acids (RNA) and deoxyribonucleic acids (DNA) differ in the presence or absence of a hydroxyl group at C2 of the sugar. Ribose is a pentose, and is not isolable in the “open” form shown in Figure N.2A. The aldehyde at C1 combines with hydroxyl groups at either C4 or C5 to form furanoses or pyranoses, respectively, named for heterocyclic compounds furan and pyran. Two anomers form for each ring conformation, in which the hydroxyl group at C1 is either directed above the ring (the β-anomer) or below the ring (the α-

N.1 anomer) as shown in Figure N.2C. Nucleic acids typically use the β-furanose form of both ribose and deoxyribose. The anomeric carbon (C1) is particularly reactive via SN1 and SN2 chemistry due to the ability of the ring oxygen to stabilize partial positive charge on C1.

Figure N.2. (A) Fischer projection of D-ribose, which forms cyclic hemiacetals in either the furanose or pyranose conformations. (B) The two anomers of ribose can interconvert via substitution reactions at C1. Phosphates

The phosphate component of nucleotides is simply derived from phosphoric acid (H3PO4). As with carboxylic acids, which form carboxylic esters, phosphoric acid may form mono-, di- and tri-esters (Figure N.3). These esters are key to the polymerization of nucleic acids, and are relatively inert to attack from external nucleophiles. The pKa of phosphodiesters at the phosphate oxygen is about 2, so these groups are typically deprotonated under physiological conditions.

N.2 Table N.1. Nomenclature associated with bases, nucleosides and nucleotides.

*TMP assumes presence of 2-deoxy-5-phosphoribose, or more properly dTMP.

N.3

Figure N.3. (A) Formation of a carboxylic ester. (B) Formation, from left, of a mono-, di- and tri-phosphoester. Nucleosides

Figure N.4. (A) Formation of adenosine, the nucleoside, from adenine and ribose (note updated numbering for ribose ring). (B) Formation of 2’-deoxycytidine, the nucleoside, from cytosine and 2-deoxyribose.

N.4 Nucleosides are formed by the condensation of a nitrogenous base with ribose or deoxyribose. Purines form “glycosidic” bonds to the anomeric carbon via the nitrogen atom at position 9, while pyrimidines condense to form linkages via the nitrogen at position 1 (Figure N.4). Nucleoside numbering retains the numbering scheme of the bases and sugars, but the sugar numbering is now identified as 1’ through 5’ (the “prime” indicating a sugar atom). The names of the nucleosides is similar to that of the bases (Table N.1) and is often easy to confuse with the base nomenclature, but usually the context of a discussion will make it clear which is being discussed. Nucleotides Nucleosides are composed of only a base and sugar, while nucleotides are the whole tamale. They are the combination of base, sugar and phosphate. Phosphate esters may be formed with any of the hydroxyl groups of the nucleoside sugar, leading to terms such as 5’- adenosine monophosphate and 3’-thymidine monophosphate. In addition, polyphosphate anhydrides may be linked to nucleosides, creating nucleoside diphosphates and triphosphates (NDPs and NTPs; Figure N.5).

Figure N.5 Some nucleotides. From left: adenosine monophosphate (or 5’-adenylic acid; AMP), 3’-thymidine monophosphate and adenosine triphosphate (ATP). Primary Structure of Nucleic Acids Biological Synthesis

Just as proteins are condensation polymers of amino acids, polynucleotides are condensation polymers of 5’-nucleoside phosphates. Sort of. The chemistry used to form polynucleotides relies on the use of activated phosphate groups – the phosphate anhydrides of NTPs and dNTPs. Phosphodiester linkages are formed between the 3’ and 5’ hydroxyl groups of joined nucleosides. Because of the direction of synthesis in the cell, in which each nucleotide is added to the 3’ end of a growing chain, the chain’s primary structure (its sequence) is read from the 5’ to the 3’ direction. The 5’ end of the chain has a 5’ hydroxyl group or 5’ phosphate monoester, while the 3’ end has a free 3’ hydroxyl group (Figure N.6). The structural complexity of polynucleotides is generally less than that of proteins since there are only four nucleotide building blocks typically used for DNA (A, C, G, T) or RNA (A, C, G, U). However, modifications to RNA, in particular, can substantially expand the structural repertoire. A typical RNA or DNA sequence will be read: 5’-AGCACGA….-3’, though if there is some ambiguity about whether RNA or DNA is being discussed, one may see abbreviations such as: 5’-dAdU-3’ (it is possible to have deoxyuridine) or 5’-rArTrCrG-3’ (and ribothymidine). Short polynucleotides may be referred to as dinucleotides, trinucleotides… and up to oligonucleotides.

N.5

N.6 Chemical Synthesis The chemical synthesis of DNA is quite straightforward and is both automated and is routinely performed commercially for surprisingly small amounts of money. The procedure involves solid phase supports, removing the need for purification of intermediates, and proceeds in the 3’ to 5’ direction. Polynucleotides of up to 100 nucleotides are routinely synthesized. RNA, with an additional hydroxyl group to protect, is more difficult to synthesis chemically, and is often produced by enzymatic processes. The details of this are similar to those of solid state peptide synthesis. I hope to add later. Chemical Stability of Polynucleotides As mentioned above, the phosphodiester linkage is thermodynamically unstable with respect to its hydrolysis products. However, the negative charge on the phosphate group generally repels nucleophiles, which by definition seek positively charged atoms. Thus, DNA is a very stable polymer and can be extracted from fossils many millions of years old. RNA on the other hand is not stable. Although external nucleophiles do not react rapidly with the phosphate groups, there is an internal nucleophile always present and waiting. The 2’-OH group is positioned relatively well to attack the phosphoryl phosphorus atom, leading to a transesterification reaction, in which a 3’, 5’- phosphodiester linkage is cleaved and replaced with a 2’, 3’-cyclic phosphodiester and a free 5’-OH on the leaving group (Figure N.7). The reaction is accelerated greatly by the presence of base, which may accept a proton from the 2’-OH group during the attack, so one typically handles RNA in slightly acidic solution, at about pH 6.

Figure N.7. The cleavage of a ribodinucleotide via internal attack of the phosphoryl phosphorus by the 2’-OH of the 5’ nucleotide. The products are a 2’,3’-cyclic monophosphate and a free nucleoside.

N.7 st The Double Helix – 1 Pass

A Brief History1

In one of the most often told stories in biology/chemistry, Watson and Crick were able to produce a model of the double helix of DNA based on no experimental work of their own, but rather the following points of information: • Erwin Chargaff of Columbia University had previously shown that DNA from a number of sources contains roughly equal proportions of adenine and thymine, and guanine and cytosine (that is the ratio of A/T and G/C is roughly one).2 • Rosalind Franklin working either alongside or under the supervision of Maurice Wilkins (the two of them disputed their working relationship) had developed diffraction data from DNA samples that indicated an antiparallel double helical structure for DNA. Watson & Crick infamously obtained the data through Wilkins, not Franklin. • The famous organic chemist, Alexander Todd (Nobel 1957), who had worked out the covalent structure of DNA and RNA provided Watson & Crick with the correct tautomeric structures of the bases, which were not generally used up until that point.

• Linus Pauling had succeeded on modeling the α-helix based on crystallographic and stereochemical data, so why couldn’t Watson and Crick do the same? They did – beating Pauling to the punch (Pauling famously published an incredibly implausible model for DNA that placed the negatively charged phosphates together at the center of the helix). Watson-Crick Base Pairs and the B Helix

The signal achievement of Watson & Crick was to identify a means by which purines and pyrimidines are able to structurally complement each other via hydrogen bonding. In this sense, their achievement closely mirrors that of Pauling in identifying the major forms of protein secondary structure. Any structure that would remove hydrogen bonding groups from solution (either on the peptide backbone or on the edges of nitrogenous bases) must provide a compensating opportunity for internal hydrogen bonding, else the structure will be an enthalpic disaster (as was Pauling’s failed model of DNA). Watson, playing around with some fabricated models of the bases recognized that one could form complementary hydrogen bonding schemes between the two Chargaff pairs, A:T and G:C. He proposed two hydrogen bonds between each pair; the third hydrogen bond between G

1 Every educated person should read the “Eighth Day of Creation” by Horace Freeland Judson, which provides a reasonably unbiased review of this story. It also traces the birth of molecular biology, with special attention to the first protein structure ever solved (a heroic tale, unlike that of DNA).

2 One of the most bitter men in science, Chargaff later quipped, vis-à-vis Watson and Crick, “It is indeed late in the day when two pygmies cast such long shadows.”

N.8 and C did not get added till later (Figure N.8). The notable success arising from this pairing scheme is that it provided a mechanism for the association of two strands of DNA in a regular structure. Since each purine pairs with a pyrimidine, and visa versa, the diameter of the double helix remains constant throughout the duplex. Their structure was derived from parameters that Franklin had obtained from a DNA form that she labeled “B”, hence the Watson-Crick double helix is the “B conformation” of DNA. (We’ll get to the A conformation later).

Figure N.8. (A) The Watson-Crick base pairing scheme for A:T (on left) and C:G (on right). R is the deoxyribose moiety. (B) Cartoon diagram of strand complementarity in duplex. Note that large purine and smaller pyrimidine match in each base pair to create constant helix diameter. DNA is virtually never found in nature in a single-stranded form, but rather as an antiparallel duplex that adopts a right-handed helical structure. One strand complements another in opposite directions (Figure N.8B), while turning about a central helical axis. The interior of the double helix is composed of stacked, hydrogen-bonded bases. The solvent accessible portions of the double helix is composed of the phosphodeoxyribose “backbone” and the base pair edges (Figure N.9).

N.9

Figure N.9. The B conformation of DNA. One strand is poly•dA (running 5’-3’, left to right), while the other is poly•dT (running 3’-5’, left to right). In this image the bases are shown as space-filling spheres reflecting vdW radii. Base carbon atoms are shown in yellow, except the exocyclic methyl group of thymine, which is purple. The phosphodeoxyribose backbone is shown with grey carbon atoms for the poly•dT strand and white carbon atoms for the poly•dA strand. Stabilization of the Double Helix Hydrogen Bonding or Not

It is relatively simple to monitor the stability of the DNA duplex using UV spectroscopy. The bases absorb UV light at 260 nm, but their molar absorptivity increases dramatically in the DNA duplex (this is the “hyperchromicity” of DNA). Thus, DNA denaturation to single strands is accompanied by a sharp decrease in absorbance at 260 nm. Thermal denaturation studies are routinely performed by raising the temperature of a DNA sample and observing the change in duplex content by following the gradual decrease in absorbance. From work with natural samples of duplex DNA, it has long been known that the melting temperature of DNA (the temperature at which it is 50% denatured, varies roughly linearly with the percent GC base pair content. The greater the GC content, the higher the melting temperature. Since DNA denaturation is clearly entropically favorable and is evidently enthalpically unfavorable (it is an endothermic process), the classic temptation has been to argue that “GC base pairs hold DNA together more tightly because they have three base H-bonds instead of two for AT base pairs.” This is easy to remember and is wrong. As with proteins, hydrogen bonding is not a particularly effective way of providing enthalpic stabilization of structure in water. H-bonds formed within the ordered structure are simply substituting for hydrogen bonds that could be made with water molecules in solution. Instead, it is better to understand hydrogen bonds in DNA as compensating for the loss of H-bonds to solvent and providing a mechanism for structural complementarity between strands. The evidence for this comes from nearest neighbor analyses of the stability of base pairs. It is hard to predict the stability of a single base pair without knowing its nearest neighbor (see Table N.2). In an analysis of the stability of duplexes formed by all 10 possible duplexes formed by dinucleotides, it has been shown that, in some circumstances, a duplex with an AT base pair is more enthalpically stable than a duplex with only GC base pairs. However, with entropic terms thrown in, the duplexes that contain only G and C are more stable than the others.

N.10 While this is seemingly consistent with the hydrogen bonding scheme, something more subtle is going on.

a Table N. 2. Table of thermodynamic data for contributions made by base pair doublets in duplex melting. Each sequence can be seen as two dinucleotides that complement each other. For example, AT/TA is 5’-AT-3’/3’-TA-5’.

Base Stacking The value of base pairing can be seen in solution studies of the bases. They do not hydrogen bond spontaneously with each other in solution, though an A:T base pair will form in deuterochloroform with an equilibrium constant of association of 100 M-1. However, simple dinucleotides, such as 5’- AA-3’ will adopt a stacked structure in aqueous solution without the complementary sequence (TT; Figure N.10), despite the entropic cost of fixing the molecule to a single conformation.

Enthalpically, this association into a stacked conformation is favored by a ∆Hstacking˚ of -8.5 kcal/mol. Against that, the ∆Sstacking˚ is -28.5 cal/mol•K. In methanol, the entropic term becomes even less favorable and the base stacking comes undone. The argument for this observation is that the hydrophobic effect helps stabilize the stacked base pair conformation in water, and that the unfavorable entropic term for stacking actually is using favorable solvent entropy to mask a much less favorable entropy for the dinucleotide itself. Thus the base stacking is favored by both an enthalpic contribution, presumably driven by vdW forces and a “not-so-bad” entropic term that is mitigated by the hydrophobic effect.

N.11

Figure N.11. Diagram of an AA base stacked conformation. At left is a top view showing overlap between the adenine bases, while at right the side view is shown, with the planes of the bases separated by 3.4 Å, the combined vdW radii of the atoms in each plane.

A study of stacking in the context of duplex DNA was conducted by Eric Kool’s lab.3 Using a core hexanucleotide, CGCGCG, stacking was studying by adding an additional base to the 5’ end. Thus, a duplex would form with an overhanging nucleotide, “X” (Scheme N.1): Scheme N.1 5’- XCGCGCG GCGCGCX – 5’

By comparing the thermal melting behavior of X-substituted duplexes to the unsubstituted oligonucleotide duplex, the contribution made by “X” to stacking can be evaluated (Table N.3). As can be seen, purines contribute more to the stability of the duplex than pyrimidines in this context. More interestingly, an “artificial base”, benzene was attached to deoxyribose and compared to the other bases. It too can stabilize the duplex enthalpically and overall in free energy (the entropic contribution vs. the unmodified nucleotide can’t be compared because the added size leads to an additional entropic benefit to melting.

Table N.3. Thermodynamics of melting for X-substituted oligonucleotide duplexes.

3 Guckian et al. (2000) Factors Contributing to Aromatic Stacking in Water. Evaluation in the Context of DNA. J. Am. Chem. Soc. 122, 2213-2222.

N.12 As a final testament to the power of things that aren’t hydrogen bonding to the stabilization of the DNA double helix, the Kool lab (yet again) created a wildly non-native “base pair” in which one member of the pair has no base at all (φ) while the second “base” is a polyaromatic hydrocarbon, pyrene (P). Obviously there are some issues with respect to the geometry of fit of this pairing in the context of purine-pyrimidine pairing, but in combination. Replacing thymine with φ renders the duplex less stable by 5.6 kcal/mol, not surprisingly. Adenine loses a hydrogen bonding partner and the bases flanking the lost thymine lose stacking interactions. However, when the adenine is replaced with P, then all but 0.5 kcal/mol of stabilization is replaced (Table N.3).

Table N.3. Data for an unusual base pair, P:φ (shown at lower left).

Conformational Flexibility in DNA A and B Conformations As noted above, Rosalind Franklin had succeeded in obtaining two different types of diffraction images from DNA fibers sealed in capillary tubes. One, the B form, was obtained from DNA fibers held under conditions of high relative humidity. This was the form modeled by Watson and Crick in 1953. The second, the A form, was obtained at lower humidity and with a high salt concentration. The conformation of A-DNA is strikingly different than that of B-DNA, yielding two (of many) secondary structures for DNA.4 Both A- and B-DNA duplexes exist in right-handed helices with complementary Watson-Crick base pairing. However, while B-DNA has a graceful, extended appearance, A-DNA appears more collapsed and somewhat deformed (Figure N.12).

4 The third, famous, conformation of DNA is Z-DNA which appears as a left-handed helix. I won’t cover it in these notes, but it’s a funny looking bit of double helix, that’s for sure.

N.13

Figure N.12. Idealized models of B-DNA (left) and A-DNA (right). Both are constructed from 20mers of polyA complexed with polyT. The major groove of each duplex can be identified by the magenta color of the thymine methyl groups.

N.14 To understand the conformational differences between A- and B-DNA we need to introduce a means of describing two different structural regions of duplex DNA – the major and minor grooves (Figure N.13). The phosphosugar backbones of the two strands in the duplex can be envisioned as two ridges that trace the helical path around a cylinder. But, because the placement of the backbone atoms with respect to the cylinder is not symmetric, the grooves that form between the ridges are unequal. The major groove is composed of the “tops” of the base pairs as drawn in Figure N.8A above. The N7, C5, and C6 atoms of purines (and substituents on those atoms) and the C4 and C5 atoms of pyrimidines (with their substituents) face the major groove, while N2 and N3 of purines and O2 of pyrimidines line the minor groove. In figure N.12, it can be seen that the major groove in B-DNA is quite accessible and open, while the minor groove is hidden from solution. The opposite is true in A-DNA, with its accessible minor groove substituents and inaccessible major groove.

Figure N.13. Diagram showing atoms facing the major and minor groove. For comparison, a line drawing of a T:A base pair in the same orientation is shown at right as well. The measured differences between the two conformations are shown in Figure N.12. Most noticeable are the differing rise/base pair figures. A-DNA is somewhat compressed and unwound in comparison to B DNA. Canonical B-DNA has 10 base pairs per turn, while A-DNA has 11. The unwinding allows the bases to shift over each other, allowing more extensive base stacking interactions, and that in turn tilts the base pairs with respect to the helical axis. Aside from the shifting of base pairs and the unwinding of the helix, the chief conformational difference between A- and B-DNA is in the conformation of the phosphodeoxyribose backbone. In five membered rings consisting of sp3 hybridized atoms, one atom of the ring must lie outside the plane of the four others. In DNA, it is usually the 2’ or 3’ carbon that is displaced from the ring. A displacement toward the base is termed an “endo- pucker” while displacement away from the base is an “exo- pucker”. Based on similar shapes, the four possible displacements are categorized as Northern puckers (2’-exo or 3’-endo) and Southern puckers (2’-endo or 3’-exo) because the ring seems to trace out the letter “N” or “S” accordingly (Figure N.14).

N.15

Figure N.14. Two puckers of deoxyribose. (A) A Northern pucker. This is either 3’- endo or 2’-exo (I can’t draw well enough to distinguish). (B) A Southern pucker. Either 3’-exo or 2’-endo. The Effect of Environment on Conformation As mentioned above, Franklin’s original observations on the A- and B-DNA conformations resulted from different environmental conditions associated with sample preparation. A-DNA dominated with relatively dry DNA fibers under high salt concentrations, while the B-form arose under conditions of relatively high humidity and low salt. It has been shown that a the relative likelihood of finding a sample in the A- or B-forms also depends on sequence, with high AT rich sequences preferring the B conformation and high GC content DNA more readily adopting the A- conformation. The combined effect of humidity and sequence is mapped out in Figure N.15. Note that all DNA sequences prefer the B-conformation at high humidity, but gradually as the sample is dried out, the A-conformation begins to appear, and at the lowest levels of humidity only GC-rich DNA is able to sustain non-disordered conformation. Two issues are at work: (1) low humidity favors the A-conformation, and (2) high GC content favors A-DNA. Why?

Figure N.15. Relative preference of DNA preparations of differing GC content for the A- and B- conformations at differing humidities.

N.16 Saenger & Kennard published a review of DNA crystal structures that investigated the “economics” of phosphate hydration.5 The hypothesis of this study is that the A-conformation is favored at low humidity because of the difficulty in hydrating phosphates fully under those circumstances. In B- DNA, it was observed that each phosphate group in the backbone is hydrated by, on average, seven water molecules. However, as the sample is dried out, the phosphates must learn to economize, and at some threshold, share the waters of hydration. This is achieved in A-DNA because the phosphate groups are closer to one another. In B-DNA, thanks to the 3’-exo pucker, phosphates are about 7 Å apart, but in A-DNA, the 3’-endo/2’-exo puckers draw the phosphates closer together, about 5.4 Å. At that distance, two phosphates may be bridged by a single water molecule that hydrogen bonds to each phosphate, while at a 7 Å separation, each phosphate must obtain H-bonds from waters strictly interacting with one phosphate at a time (Figure N.16).

Figure N.16. At right, the phosphate separation is 7 Å in the B-conformation. Note the “Southern” pucker of the sugars. At right, the A-conformation places the sugars 5.4 Å apart. Note the “Northern” pucker of the sugars. Note also that this is RNA, not DNA (there is a 2’ hydroxyl group). We’ll return to that point later.

As for the sequence dependence of the A- vs. B-conformations, it has been observed that in B- DNA, AT base pairs are particularly well hydrated. A so-called spine of hydration has been observed in the minor groove of AT-rich regions of B-DNA. Water molecules are fit snugly into the minor groove, occupying a region of space that would be occupied by the exocyclic C2- amino group of guanine, thus providing the C2 carbonyl oxygen of thymine and N3 of adenine with H-bonding partners (Figure N.17). The conformational change to A-DNA disrupts the geometry that supports the “spine of hydration” and the AT base pairs lose a significant source of enthalpic stabilization.

5 Saenger, W., Hunter, W. N., and Kennard, O. (1986) DNA Conformation is Determined by Economics in the Hydration of Phosphate Groups. Nature 324, 385-388.

N.17

Figure N.17. The spine of hydration in the minor groove of an AT-rich section of B-DNA. Note that the three water molecules (red spheres) act as H-bond donors to N3 atoms on adenine and O2 atoms from thymine. The N2 group of guanine would create steric repulsion and displace those waters. RNA and the A-Conformation Although RNA is not constrained functionally to the double helical conformation to the degree that DNA is, double stranded RNA is found in short stretches, and it uniformly adopts the A- conformation. The reasons for this are principally steric. The 3’-exo sugar conformation that is found in the B-conformation places the 2’ and 3’ hydroxyl groups of the ribose ring into an eclipsed conformation. Furthermore, the 2’ hydroxyl group makes significant steric clashes with adjacent atom groups in the B-conformation. Because of this, the major groove of RNA is considerably less accessible to other solutes than is the major groove of DNA, which has implications for how other molecules interact with each of these two species.

Figure N.18. B-RNA on left, and A-RNA on right. Note eclipsing interaction in B- RNA and clashes (with gray dashes) formed by 2’ hydroxyl group.

N.18