<<

1

Atomic St ructures of the M olecular Components in DNA and

RNA based on Bond L engths as Sums of Atomic Radii

Raji Heyrovská

(Institute of Biophysics, Academy of Sciences of the Czech Republic)

E-mail: [email protected]

Abstract

The author’s interpretation in recent years of bond lengths as sums of the relevant atomic/ionic radii has been extended here to the bonds in the skeletal structures of adenine, guanine, thymine, cytosine, uracil, ribose, deoxyribose and phosphoric acid. On examining the bond length data in the literature, it has been found that the averages of the bond lengths are close to the sums of the corresponding atomic covalent radii of (tetravalent), (trivalent), (divalent), (monovalent) and (pentavalent). Thus, the conventional molecular structures have been resolved here (for the first time) into probable atomic structures.

1. Introduction

The interpretation of the bond lengths in the constituting DNA and RNA have been of interest ever since the discovery [1] of the molecular structure of nucleic acids. This work was motivated by the earlier findings

[2a,b] that the inter-atomic distance between any two or ions is, in general, the sum of the radii of the atoms (and or ions) constituting the , whether the bond is partially or fully ionic or covalent. It 2 was shown in [3] that the lengths of the hydrogen bonds in the Watson-

Crick [1] base pairs (A, T and C, G) of nucleic acids as well as in many other inorganic and biochemical compounds are sums of the ionic or covalent radii of the hydrogen donor and acceptor atoms or ions and of the transient proton involved in the hydrogen bonding. Thus, it was shown [2, 3] that, in general, bond lengths between any two atoms (or ions) can be considered as sum of the radii of the atoms (or ions) constituting the bond.

In the case of covalent bonds in general, Pauling [4] considered the bond lengths as sums of the covalent radii of the constituent atoms. However,

Pauling [4] did not extend this to the purines (A and G) and pyrimidines (T,

C and U).

This article shows (for the first time) that the existing data on the inter- atomic distances in adenine(A), guanine (G), thymine (T) (in DNA), cytosine (C), uracil (U) (in RNA), ribose (in RNA), deoxyribose (in DNA) and phosphoric acid can be interpreted as sums of the appropriate covalent radii of carbon, nitrogen, hydrogen, oxygen and phosphorus.

2. Bond lengths as sums of the relevant atomic covalent r adii (this work)

Pauling [4] observed that in all the bases (A, T, C, G and U), the CC bond length is around 1.40 Å and that the CN bond length has values around 1.32

Å and 1.36 Å. On examining the existing bond length data [4 -7], it is found that in fact the CC, CN, CH, NH, CO, OH and PO bonds that constitute the structures of the above molecules have some essentially fixed bond lengths, and that the CC and CN bonds have more values than those suggested in [4]. 3

For the interpretation here of the bond lengths, the atomic covalent radii,

Rcov defined in [4] as half the inter-atomic distance between two atoms of the same kind, have been used. Note that C has two types of single bonds as pointed out in [4]: the aliphatic single bonds with a of 0.77

Å (as in diamond), and the aromatic single bonds with a covalent radius of

0.72 Å (as in graphite). The covalent radius of C, also defined

[4] as half the CC double bond length, is 0.67 Å. The above three radii of C are distinguished here using the denotations as shown in Fig. 1: C(I): 0.77

Å, C(II): 0.72 Å and C(III): 0.67 Å. Similarly, Rcov for N(I) is half the NN covalent length, for N(II) it is half the covalent NN double bond length, for H it is half the HH covalent single bond distance, for O(I) it is half the OO covalent single bond length and for O(II) it is half the OO covalent double bond length. Rcov for P was obtained by subtracting Rcov of

O(I) from the known [4, 7] PO(I) bond length.

An interesting observation from the data in [4 - 7] is e.g., that the CN bond length of 1.34 Å in the aromatic ring in adenine is the same irrespective of whether it pertains to a single or a double bond. This is similar to the finding in [4] that in the case of , all the six CC bonds are of length 1.39 (+/- ) 0.01 Å with equal of 1.5 (due to , [4]), although represented by three alternating single and double bonds. The CC bond length in benzene (with bond order 1.5) was interpreted in [2] as the sum (0.72 + 0.77 Å) of the radii for C(II) and C(III).

Similarly, in adenine, the CN bond distances [4 -7] for N1-C2 (single bond),

C2-N3 (double bond), N3-C4 (single bond), and C6-N1 (double bond) (see column 2, Table 1) in the aromatic ring all have the same value of 1.34 (+/-) 4

0.02 Å. This is interpreted here as the sum of the covalent radii (Rcov) of

C(II) and N(II)], which is equal to (0.72 + 0.62 Å).

3. Comparison of the bond length data with the sums of covalent radii

The existing bond length data in [4 - 7] and the sums R(sum) of the relevant atomic covalent radii (given in Fig. 1) are assembled in Table 1 for A, G, T and C and in Table 2 for U, ribose, deoxyribose and phosphoric acid.

In Table 3, the sums of the atomic radii, R(sum) (in column 2) are compared with the average values (in column 3) of the bond lengths in [4-

7] (given in Tables 1 and 2). Fig. 2 shows a plot of the bond lengths versus

R(sum). It can be seen that of the twelve bond lengths, R(sum) is close (+/-

0.03) to the average values [4 -7] for ten of them, and only two values deviate by 0.04 and 0.07 Å from R(sum), due to the higher values in [6, 7] than in [4, 5].

Therefore, it is concluded here that it is reasonable to interpret the bond lengths in the purines (A and G), pyrimidines (T, C and U), sugars (ribose and deoxyribose) and phosphoric acid, as sums of the appropriate covalent radii (Rcov) of the atoms involved.

Thus, the molecular structures conventionally written as in [1, 4 – 7] have now been resolved into atomic structures based on the individual atomic radii. See Fig. 3 for thymine (T) and adenine (A), Fig. 4 for cytosine (C) and guanine (G) and Fig. 5 for phosphoric acid, deoxyribose, uracil (U) and ribose. For these molecules, while confirming Pauling’s [4] two CN bonds of lengths 1.32 and 1.36 (+/- 0.02 Å) and one CC bond length of 1.40 Å, the sums of the atomic radii show here (see Table 3) that actually there are 1) 5 five CN bonds (1.29, 1.34, 1.37, 1.42 and 1.47 Å), 2) three CC bonds (1.39,

1.49 and 1.54 Å), 3) two CO bonds (1.27 and 1.44 Å) and 4) two PO bonds

(1.52 and 1.59 Å). The bond lengths of C, O and N with H are: O(I) - H

(1.04 Å), N(I) - H (1.07 Å), C(III) - H (1.04 Å), C(II) - H (1.09 Å) and C(I),

H (1.14 Å), (not given in Table 1 - 3). Note that the increase in the CH bond length in going from aromatic to aliphatic C is attributed by Pauling [4] to the change in the radius of H, whereas here it is attributed to those of C.

Thus, the sums of the covalent radii of the individual atoms (see Fig. 1) constituting the bonds are shown to be fair representations of the average bond lengths [4 -7] (see Table 3) of all the individual molecular components of DNA and RNA. This has enabled the establishing of the most probable atomic structures as shown in Figs. 2 – 5. During the formation of the

Watson-Crick base pairs [1, 3] by hydrogen bonding, the donor to acceptor bonds become partially ionic (electrostatic) by the sharing of the transient proton as described in [3].

Acknowledgments

The author is grateful for the grants, AVOZ50040507 of the Academy of

Sciences of the Czech Republic and LC06035 of the Ministry of Education,

Youth and Sports of the Czech Republic. 6

References

[1] J. D. Watson and F. H. C. Crick, Nature (London), 171 (1953) 737.

[2] R. Heyrovska, a) Mol. Phys., 103 (2005) 877 and b) International Joint meeting of ECS, USA and Japanese, Korean and Australian Societies,

Honolulu, Hawaii, October 2004, Vol. 2004 - 2, Extd. Abs. C2-0551. http://www.electrochem.org/dl/ma/206/pdfs/0551.pdf

[3] R. Heyrovska, Chem. Phys. Lett., 432 (2006) 348, and the literature cited therein.

[4] L. Pauling, The Nature of the Chemical Bond, Cornell Univ. Press,

Ithaca, New York (1960).

[5] A. L. Lehninger, Biochemistry, Worth Publishers, Inc., New York

(1975).

[6] M. Preuss, G. Schmidt, K. Seino, J. Furthmuller, F . Bechstedt,

J. Comput. Chem. 25 (2004) 112, see ref. 28 therein.

[7] a) http://ndbserver.rutgers.edu/archives/proj/valence/bases6.html; b) http://ndbserver.rutgers.edu/standards/ideal_geometries.html c) http://ndbserver.rutgers.edu/education/education_DNA.html 7

Table 1. Bond lengths (Å) (+/- 0.02Å) of adenin e, guanine, thymine and cytosin e from [4 – 7] comp ared with su ms of ato mic covalent radii, R(sum) (in bold). The numbering s of the atoms are as in [7b,c].

Adenin e Bond length s Guanin e Bond length s Thymin e Bond length s Cytosine Bond length s N1-C2 1.34 = 0.62 + 0.72 N1-C2 1 .3 7 = 0.67 + 0.70 N1-C2 1.37 = 0.67 + 0.70 N1-C2 1 . 3 7 = 0.67 + 0.70 * 1.34,1.34,1.32,1.34 1.32,1.37,1.34 1.38,1.38,1.36 1.41,1.40,1.36 C2-N3 1.34 = 0.62 + 0.72 C2-N3 1 .2 9 = 0.67 + 0.62 C2-N3 1 . 3 7 = 0.67 + 0.70 C2-N3 1 . 37 = 0.67 + 0.70 * 1.34,1.33,1.32,1.34 1.36,1.32,1.32 1.38,1.37,1.36 1.37,1.35,1.34 N3-C4 1 .34 = 0.62 + 0.72 N3-C4 1 .2 9 = 0 .67 + 0.62 N3-C4 1 . 3 7 = 0.67 + 0.70 N3-C4 1 . 2 9 = 0.67 + 0.62 * 1.34,1.34,1.36,1.34 1.42,1.35,1.32 1.40,1.38,1.36 1.31,1.34,1.32 C4-C5 1.39 = 0.67 + 0.72 C4-C5 1 .3 9 = 0 .67 + 0.72 C4-C5 1 . 3 9 = 0.67 + 0.72 C4-C5 1 . 3 9 = 0.67 + 0.72 * 1.40,1.38,1.40,1.44 1.44,1.38,1.38 1.46,1.45,1.40 1.44,1.43,1.40 C5-C6 1.39 = 0.67 + 0.72 C5-C6 1.39 = 0.67 + 0.72 C5-C6 1 . 3 9 = 0.67 + 0.72 C5-C6 1 . 3 9 = 0.67 + 0.72 * 1.39,1.41,1.40,1.38 1.39,1.42,1.40 1.35,1.34,1.40 1.36,1.34,1.40 C6-N1 1.34 = 0.62 + 0.72 C6-N1 1 .3 7 = 0.67 + 0.70 C6-N1 1 . 37 = 0.67 + 0.70 C6-N1 1 . 3 7 = 0.67 + 0.70 * 1.34,1.35,1.32,1.34 1.36,1.39,1.36 1.37,1.38,1.36 1.35,1.37,1.34 C5-N7 1.29 = 0.67 + 0.62 C5-N7 1 .3 4 = 0 .6 2 + 0.72 C2-O2 1 .27 = 0.67 + 0.60 C2-O2 1 . 2 7 = 0.67 + 0.60 * 1.37,1.39,1.32,1.32 1.37,1.39,1.32 1.22,1.22,1.23 1.22,1.24,1.23 N7-C8 1.29 = 0.67 + 0.62 N7-C8 1 .2 9 = 0 .67 + 0.62 C4-O4 1 . 2 7 = 0.67 + 0.60 C4-N4 1 . 3 7 = 0.67 + 0.70 * 1.32,1.31,1.32,1.28 1.32,1.31,1.32 1.22,1.23,1.23 1.36,1.34,1.35 C8-N9 1.37 = 0.67 + 0.70 C8-N9 1 .3 7 = 0.67 + 0.70 C5-M5 1 . 4 9 = 0.72+0.77 N1-C1' 1 . 4 7 = 0.70 + 0.77 * 1.37,1.37,1.33,1.37 1.37,1.37,1.32 1.49,1.50,1.53 -,1.47,1.53 N9-C4 1.42 = 0.70 + 0.72 N9-C4 1 .3 7 = 0.67 + 0.70 N1-C1' 1 . 4 7 = 0.70 + 0.77 * 1.38,1.37,1.36, - 1.37,1.38,1.34 -,1.47,1.53 C6-N6 1.42 = 0.70 + 0.72 C2-N2 1 .3 7 = 0 .67 + 0.70 * -,1.34,1.35,1.34 1.38,1.34,1.35 N9- C1' 1.47 = 0.70 + 0.77 C6-O6 1 .2 7 = 0.67 + 0.60 * -,1.46,1.53,- 1.22,1.24,1.23 N9-C1' 1 .4 7 = 0.70 + 0.77 * -,1.46,1.53 *Values: *Values: *Values: *Values: Refs. 6,7,4,5 Refs.6,7,4 Refs.6,7,4 Refs.6,7,4 8

Table 2. Bond lengths (+/- 0.02Å) of uracil, ribose, deoxyribose, phosphoric acid/pho sphate fro m [4 – 7] compared with su ms o f atomic covalent radii, R(sum) (in bold). The numberings o f the atoms are as in [7b,c].

Ribose, de- **Phosphoric Uracil Bond length Oxyribose Bond length acid, phosphate Bond length N1-C2 1 .3 7 = 0.67 + 0.70 C1'-C2' 1 . 54 = 0.77 + 0.77 **P=O 1.52 = 0.92 + 0.60 * 1.38,1.34 1.53,1.52 1.49,1.52 C2-N3 1 .3 7 = 0 .67 + 0.70 C2'-C3' 1 . 54 = 0.77 + 0.77 **P-O(H) 1.59 = 0.92 + 0.67 * 1.37,1.38 1.53,1.52 1.61,1.57 N3-C4 1 .3 7 = 0 .67 + 0.70 C3'-C4' 1 . 54 = 0.77 + 0.77 P-O3' 1.59 = 0.92 + 0.67 * 1.38,1.37 1.52,1.53 1.61,1.56 C4-C5 1 .3 9 = 0.67 + 0.72 C4'-O4' 1 . 44 = 0.77 + 0.67 P-O5' 1 . 5 9 = 0.92 + 0.67 * 1.43,1.41 1.45,1.45 1.59,1.56 C5-C6 1 .3 9 = 0 .67 + 0.72 O4'-C1' 1 . 44 = 0.77 + 0.67 (P)O5'-C5' 1 . 4 4 = 0.77 + 0.67 * 1.34,1.41 1.41,1.42 1.44,- C6-N1 1 .3 7 = 0.67 + 0.70 C3'-O3' 1 . 44 = 0.77 + 0.67 (P)O3'-C3' 1 . 4 4 = 0 .7 7 + 0.67 * 1.38,1.34 1.42,1.43 1.43,- C2-O2 1 .2 7 = 0.67 + 0.60 C5'-C4' 1 . 54 = 0.77 + 0.77 * 1.22,1.23 1.51,1.51 C4-O4 1 .2 7 = 0 .67 + 0.60 C2'-O2' 1 . 44 = 0.77 + 0.67 * 1.23,1.24 1.41,- N1-C1' 1 .4 7 = 0.70 + 0.77 C1'-N1/N9 1.47 = 0.70 + 0.77 * 1.47,- 1.47,1.47 O5'-C5' 1 .44 = 0.77 + 0.67 * 1.42,1.42

*Values: *Values: ribose,deoxyribose *Values:

Refs. 7,4 Ref. 7 Refs. 7,4

Table 3. Averag e bond lengths comp ared with radii sum, R(su m).

Average Bonds R (sum) [4 - 7] C(III) - O(II) 1.27 1.23 C(III) - N(II) 1.29 1.32 C(II) - N(II) 1.34 1.36 C(III) - N(I) 1.37 1.36 C(II) - C(III) 1.39 1.41 C(II) - N(I) 1.42 1.35 C(I) - O(I) 1.44 1.43 C(I) - N(I) 1.47 1.48 C(I) - C(II) 1.49 1.48 P - O(II) 1.52 1.51 C(I) - C(I) 1.54 1.52 P - O(I) 1.59 1.58 9

Legends for Figs. 1 – 5.

Fig. 1. The nine covalent radii (Rcov) (+/- 0.02Å) [4] of the five atoms constituting the molecular components of DNA and RNA: tetravalent C

(black), trivalent N (blue), monovalent H (green), divalent O (red) and pentavalent P (pink). Nature combines the above atoms (from mono to pentavalent) into unique atomic structures, see Figs. 3 - 5.

Fig. 2. Comparison of the average bond length from the data in [4 – 7] with the sum of atomic radii, R (sum). See Table 3 for the values.

Fig. 3. Structures of thymine (T) and adenine (A). Upper: Molecular structures with conventional bonds [7c] and lower: atomic structures with bond lengths as sums of the atomic covalent radii given in Fig. 1. (Base pairing [1] occurs when hydrogen bonds connect T with A along the broken lines; see [3] for the additivity of the radii in the hydrogen bond.)

Fig. 4. Structures of cytosine (C) and guanine (G). Upper: Molecular structures with conventional bonds [7c] and lower: atomic structures with bond lengths as sums of the atomic covalent radii given in Fig. 1. (Base pairing [1] occurs when hydrogen bonds connect C with G along the broken lines; see [3] for the additivity of the radii in the hydrogen bond.

Fig. 5. Atomic structures of phosphoric acid, deoxyribose, uracil (U) and ribose, with bond lengths as sums of atomic covalent radii given in Fig. 1.

Upper (for Uracil): Molecular structure with conventional bonds [7c]. By the elimination of water molecules as shown, the bases combine with sugars and the latter with phosphoric acid, to form nucleic acids [4 -7]. 10

Fig. 1.

C(I ) C(I I ) C(I I I ) N(I ) N(I I ) H O(I ) O(I I ) P

1

Rcov: 0.77 0.72 0.67 0.70 0.62 0.37 0.67 0.60 0.92 (Å)

Fig. 2 11

Fig. 3. Thymine and Adenine (upper: molecular structure [7c])

Thymine (Atomic structure)

Adenine(Atomic structure)

N(7) N(1) N(1) N(3)

N(3) N(9) 12

Fig. 4.

Cytosine and Guanine [upper: molecular structure [7c])

Cytosine (Atomic structure)

Guanine (Atomic structure)

N(3)

N(7)

N(1)

H + OH(1’) N(3) N(9) bonds = H2O with C(1’) 13

Fig. 5.

C(1’) O bonds with C(5’) OH(1’)

C(1’)

H + OH(5’) = H2O

Phosphoric acid (Atomic structure) Deoxyribose (Atomic structure)

Uracil (upper : molecular structure [7c])

Uracil (Atomic structure) Ribose(Atomic structure)

N(1) N(3)