<<

Proc. Natl. Acad. Sci. USA Vol. 85, pp. 2653-2657, April 1988 Directional pressure and neutral molecular evolution (guanine-plus-cytosine content/selective constraints/non-Darwinian evolution) NOBORU SUEOKA Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309 Communicated by Norman H. Horowitz, November 9, 1987 (received for review August 13, 1987)

ABSTRACT A quantitative theory of directional muta- mutational change in terms of DNA base composition (G + C tion pressure proposed in 1962 explained the wide variation of content) in bacteria or in different of individual verte- DNA base composition observed among different bacteria and brates. its small heterogeneity within individual bacterial species. The Existence of nonrandom overall mutation pressure- theory was based on the assumption that the effect of mutation namely, directional mutation pressure toward the a pair (A-T on a is not random but has a directionality toward or T-A) or toward the 'y pair (G-C or C-G), was first suspected higher or lower guanine-plus-cytosine content of DNA, and from the variation of DNA base composition [expressed as this pressure generates directional changes more in neutral G + C content, 'y/(a + y)] among of different species parts of the genome than in functionally significant parts. Now and heterogeneity within the genome of an individual species that DNA sequence data are available, the theory allows the of bacteria (9, 10). estimation of the extent of neutrality of directional mutation Variation of G + C content is also reflected in the total pressure against selection. Newly defined parameters were composition in bacteria and Tetrahymena (14, used in the analysis, and two apparently universal constants 15). As predicted, when the was deciphered, it were discovered. Analysis of DNA sequence has revealed that became evident that the correlation of each amino acid practically all organisms are subject to directional mutation content with DNA G+ C content depends on the G+ C pressure. The theory also offers plausible explanations for the content of the codon set for the particular amino acid. It is large heterogeneity in guanine-plus-cytosine content among therefore important to note that the directional mutation different parts of the vertebrate genome. pressure can exert some directional (nonrandom) changes on the amino acid composition of proteins as well. Results of Questions on relative roles of Darwinian selection and of some recent analyses of DNA sequence and codon usage non-Darwinian or neutral mutation on molecular evolution have been explained by directional mutation pressure have been the focus of discussion for the last two decades (16-23). between neutralists and selectionists. It is now widely ac- Theory of Directional Mutation Pressure. Based on the cepted that the neutral concept of evolution propounded by major features of variation and heterogeneity of DNA base Kimura since 1968 (1, 2) and by King and Jukes since 1969 composition, Sueoka (5) formulated a quantitative theory of (3) is as important in evolution as selection is. The neutral directional mutation pressure in 1962. In the same year, theory of evolution is based mainly on two premises: (i) Freese (6) also proposed a theory to explain some composi- functionally neutral changes of the genome by - tional features of DNA. The two theories are similar in substitution mutation escape the editing effects of Darwinian principle and conclusions but are different in treatment. The selection (1, 3) and (ii) these changes are fixed in the directional mutation theory and rationale for it in the form population by random (1, 2) as formulated by presented by Sueoka (5) are briefly stated below. Wright (4). The theory explains the much higher rates of The major cause for a change in DNA G + C content of an DNA base substitution compared to the mutation rates organism is the mutation (5) between an a pair and a y pair. previously estimated from phenotypically observable muta- When there tions (1). The two premises were reasonable; the first one are no selective constraints, had previously been pointed out (5, 6). These premises are u not sufficient, however, to explain a number of the major y a [1] features of the neutral evolution of the genome. For exam- (p) V (1 - p) ple, the neutral theory elaborated by Kimura (2) does not where p is the fractional G + C content of DNA and u and v explain the main features of DNA base composition: the are mutation rates per generation per base. At equilibrium (5), relatively small heterogeneity within a microbial species (7-10) in contrast to the extremely wide variation, ranging v v p P = , or - = [2] from approximately 25% to 75% guanine-plus-cytosine v u 1- (G + C) content, among different species of bacteria, algae, u + - and protozoa (11-13). By contrast, the quantitative theory of Here, p is the G+ C content at equilibrium. The large directional mutation pressure, proposed in 1962 (5, 6), can variation of G + C content among DNAs of different bacteria explain a number of evolutionary changes that have oc- was explained mainly by differences in v/u rather than by curred in DNA base composition and nucleotide sequence selection (5, 6). that are not possible to understand by the principles of the Directional Mutation Pressure. A definition of directional neutral theory alone. Moreover, as shown in this paper, use mutation pressure is necessary for the quantitative analysis of of parameters previously defined in the directional mutation the effect ofdirectional mutation on the G + C content ofDNA. theory allows us to estimate the degree of neutrality of The directional mutation pressure (AD) is now defined as (5) The publication costs of this article were defrayed in part by page charge This article must therefore be hereby marked "advertisement" v payment. AD = [3] in accordance with 18 U.S.C. §1734 solely to indicate this fact. u + V 2653 Downloaded by guest on October 8, 2021 2654 Evolution: Sueoka Proc. Natl. Acad. Sci. USA 85 (1988) Table 1. Major statistics for the organisms referred to in this article Avg. no. G + C content No. of of codons Organism genes per Total* P1 P2 P3 V/ut Bacteria Bacillus subtilis 10 308.5 0.42 0.516 ± 0.027 0.382 ± 0.065 0.427 ± 0.038 0.75 ± 0.12 Escherichia coli 110 438.2 0.50 0.607 ± 0.045 0.404 ± 0.032 0.564 ± 0.059 1.34 ± 0.31 Fungus Saccharomyces cerevisiae 49 333.2 0.35 0.465 ± 0.051 0.376 ± 0.048 0.392 ± 0.050 0.66 ± 0.15 Protozoan Hypotrichst 3 295.3 0.42 0.518 ± 0.031 0.401 ± 0.025 0.523 ± 0.063 1.10 ± 0.06 Invertebrates Caenorhabditis elegans§ 8 486.5 0.36 0.579 ± 0.096 0.398 ± 0.136 0.489 ± 0.087 1.01 ± 0.31 Drosophila melanogaster 11 272.3 0.40 0.548 ± 0.047 0.398 ± 0.061 0.788 ± 0.034 3.83 ± 0.73 Sea urchin1 8 148.3 0.42 0.581 ± 0.053 0.462 ± 0.054 0.621 ± 0.048 1.68 ± 0.34 Vertebrates Xenopus laevis 5 126.6 0.42 0.567 ± 0.018 0.370 ± 0.058 0.451 ± 0.144 1.01 ± 0.76 Chicken 28 257.6 0.44 0.588 ± 0.060 0.410 ± 0.059 0.692 ± 0.138 3.12 ± 2.49 Mouse 31 302.4 0.44 0.549 ± 0.049 0.431 ± 0.064 0.619 ± 0.105 1.84 ± 0.84 Rat 43 247.8 0.43 0.560 ± 0.065 0.408 ± 0.068 0.654 ± 0.102 2.15 ± 0.95 Human 90 311.4 0.43 0.552 ± 0.061 0.414 ± 0.052 0.631 ± 0.165 2.43 ± 2.12 Plant Zea mays 8 269.3 0.46 0.616 ± 0.039 0.377 ± 0.017 0.468 ± 0.055 0.90 ± 0.23 Unless otherwise stated, codon-frequency data compiled by Maruyama et al. (27) were used. Each value for the three codon positions represents the average G + C content of genes with standard deviation. *G + C content of the whole genome generally accepted (28). tRatio of mutation rates, calculated by Eq. 4b. tCiliates (data collection assisted by David Prescott). §Nematodes (data collection assisted by Thomas Blumenthal). IDifferent species are combined (27). Here, gD > 0.5 indicates that the mutation pressure favors y dual genes or for the sum of genes of an organism (Table 1). over at and AD < 0.5 indicates a preference for a over y. At Unless otherwise specified, we analyze the data of codon- equilibrium, the G + C content of a neutral nucleotide posi- usage frequency recently calculated by Maruyama et al. (27) tion equals mutation pressure D (5); p is therefore termed from the sequence data of GenBank* (29). the G + C content at "mutational equilibrium." Analysis of Equilibrium. The G + C content of the third Analysis of directional mutation pressure is based on the codon position (P3) was used for estimating the directional relative value of mutation rates, u and v, not on their mutation pressure (PD) in the present work. Because of the absolute values. Since the probability of fixation in the partially silent nature of the third codon position (30), P3 population is similar for individual neutral , the represents one of the most neutral within the values of v/u and g also apply to the evolution (change in genome, so far as the G + C content is concerned (8, 31). The population) of P3, as far as P3 is in equilibrium with direc- marked response of P3 to directional mutation pressure is tional mutation pressure. On the other hand, the fixation of also manifested in its extreme values in bacteria (9% and P,2 and P123 in a population is subject to selective con- 95%; Fig. la), in some genes of vertebrates (95%; refs. 32 straints, the extent of which will be examined in this paper, and 33), and in mitochondrial genes of Drosophila yakuba assuming that P12 and P123 are values in equilibrium with (6.2%; ref. 19). These results indicate close neutrality of the directional mutation pressure and selective constraints. In third codon positions in G + C content. The near neutrality of other words, the rate of directional change of DNA G + C P3 does not mean that any change of the third codon position content is not our concern in the present analysis. is neutral. Obviously, some are not neutral, In 1965, Cox and Yanofsky (24) reported experimental since they cause amino acid changes. However, transitions results indicating that the directional mutation pressure should be sufficient to bring P3 into equilibrium with AD. indeed changes the G + C content of Escherichia coli. They Nevertheless, changes in P3 may not be completely neutral. used a mutator strain, mutT (25); mutT enhances mutation If an independent method to assess the extent ofneutrality of rate 103-fold over the spontaneous level and all mutational P3 becomes available in the future, small corrections may changes due to mutT are A-T to C-G (26). The continuous become necessary. culture of the mutT strain for 1200-1600 generations showed Unlike P3, P1 and P2 are subject to functional constraints changes in G + C content by 0.2-0.5%. against change because a mutation at these positions usually G + C Content of the Three Codon Positions. In the present leads to an amino acid change, except between some codons analysis, observed G + C contents of the first, second, and of , , or . Other regions such as third codon positions (P1, P2, and P3, respectively) are and intergenic spacers are abundant only in eukaryotes and corrected average G+C contents of the three codon posi- their function is not well understood and may differ widely in tions that, are calculated from 56 triplets out of 64. Because their degree of functional neutrality. of the inequality of a and 'y at the third codon position, the Assuming that P3 is neutral with regard to directional three stop codons (TAA, TAG, and TGA) and the three mutation pressure and that a near equilibrium has been codons for (ATT, ATC, and ATA) were excluded in calculation of P3, and two single codons for *EMBL/GenBank Genetic Sequence Database (1985) GenBank (ATG) and (TGG) were excluded in all three (P1, (Bolt, Beranek, and Newman Laboratories, Cambridge, MA), P2, and P3). The G+ C contents are calculated for indivi- Tape Release 38.0. Downloaded by guest on October 8, 2021 Evolution: Sueoka Proc. Natl. Acad. Sci. USA 85 (1988) 2655 attained for P3, currently the best estimate of AD and v/u can the slope of the first codon position compared with the be obtained by approximating j with P3 in Eqs. 2 and 3: second come from the fact that more abundant amino acids have more y pairs at the first codon positions than at the D= P3 [4a] second codon positions (3, 14, 33). The situation is shown diagrammatically in Fig. 2. The V P3 extent of discrepancy in G+C content between the third codon position and the average of first and second codon U (1 - P3) [4b] positions (P12) (the average line in Figs. la and 2), is where P3 is the G + C content of the third codon position at proportionately larger on both sides of the intersect (Ep), as equilibrium with the directional mutation pressure. In real- indicated by the two thick open arrows, one toward higher ity, P3 can be approximated by P3. In this article, P values A+T content and the other toward higher G+C content. with circumflex indicate values at equilibrium either with This discrepancy is interpreted to mean that the directional mutation alone (P) or with both mutation pressure and mutation pressure is counteracted by selective constraints, selection (P1, P2, P12, and P123). thus forming a new type of equilibrium. As the parameter Extent of Directional Mutation Pressure and Equilibrium. that represents the overall situation of G + C equilibrium, the In order to analyze the relation between directional mutation mutation-selection equilibrium coefficient E is defined as the the data used Muto regression coefficient against P3. The equilibrium coefficient pressure and selective constraints, by E is 0 for no effect of directional mutation pressure (complete and Osawa (16) are presented in Fig. la with some modifi- selective constraints) and 1 for the complete equilibrium cation. The G + C content of the first and the second codon with .tD (complete neutrality). Thus, E12 represents the positions (ordinate) are plotted against the G + C content of extent of equilibrium of P12 by directional mutation pressure the third codon position (P3; abscissa), instead of against the and selective constraints. The values of E and 1 - E are total DNA G+ C content as originally reported (16). The convenient measures of relative effects of neutrality and total G+ C content is not an ideal variable for quantitative selective constraints, respectively. arguments because it includes all three letters of the codon Mutation-Selection Equilibrium. With the assumption of and also other noncoding areas. Their data on the coding neutrality of P3 and equilibrium between the directional areas and spacer regions between genes of the same set of mutation pressure and selective constraints, the relationship bacteria are also replotted against P3 (Fig. lb). between P3 and P12 presented in Fig. 2 can be expressed as As Muto and Osawa (16) pointed out, it is evident that bacteria with lower and higher values of G + C content have 12 = + r12(P3 [5a] more extreme values in the third codon position than the Ep -Ep), total G + C content or than those of the first and second or P12 = r12P3 + (1 -E2)Ep. [5b] codon positions (Fig. la). It is also noted that even among bacteria whose DNA G + C content is well within the limits At the equilibrium point Ep, the average (P12) of P1 and of 25% and 75% G + C, there is the tendency that species P2 equals P~; note that if E equals 0 (complete selective with a high G + C content have an even higher G + C content constraint), P12 equals Ep independent of P3 or AD (E%. 5b). in the third codon position (G + C pressure) and species with Note that exactly the same Ep value is applicable to P12 as a high A + T content have an even higher A + T content well as the whole protein-coding area, including all three (A + T pressure). The consistently higher G + C content and codon positions (P123; see Table 2 legend). Ep thus represents the proportion of G + C nucleotides in positions 1 and 2 or for 1.0 a b 1.0

°0X00

~00.5-. + 0.5 0 1.0 0.5 1.0 G+C content of third position FIG. 1. Regression of G + C content of the first and second positions with the third codon position among bacterial species. (a) 1. Regression lines were drawn by linear least-squares analysis. G + C Amm ------..0 contents of the first codon position (o), those of the second codon position (A), and those of the average of the first and second codon positions (W) are shown. Data compiled by Muto and Osawa (16) G+C content of third position were replotted against the G + C content of the third codon position (P3). Because of large differences in the number of genes for FIG. 2. Diagrammatic representation of the equilibrium coeffi- different bacteria, regression analysis was performed using the cient, E12, the equilibrium point (Ep), and the minimal (Ami.) and average value for each species without weighting by the number of maximal (Am.) values of the average G + C content of the first and genes. Regression coefficients (or E values) and their standard second codon positions. Solid line represents the regression line of deviations for the first codon position, the second codon position, the average G + C content ofthe first and the second codon positions and the average of the two are 0.374 ± 0.013, 0.163 ± 0.012, and of various bacteria (Fig. la). Broken line is drawn parallel to the 0.269 ± 0.011, respectively. (b) Data of Muto and Osawa (16) on the regression line through the origin to visualize the equilibrium G + C content of coding areas (o) and noncoding areas (spacers, x) coefficient, E12 (degree of neutrality), and the disequilibrium coeffi- were replotted against the G + C content of the third codon position cient, 1 - E12 (degree of selective constraints). Open arrows indicate of bacteria. Regression coefficients for the coding areas and the deviation of the regression line from complete equilibrium with the spacers are 0.53 ± 0.02 and 0.66 ± 0.01, respectively. directional mutation pressure, tD = v/(u + v) (dotted line). Downloaded by guest on October 8, 2021 2656 Evolution: Sueoka Proc. Natl. Acad. Sci. USA 85 (1988)

Table 2. Relative contributions of neutrality (e) for P12 and P123, Ep common for P12 and P123, and Amin and Ara for P123 for coding areas of various organisms Neutrality,* % for - AiforA~~~~~~~min~~~~~~foraxfo~ ~ A System 612 X 100 E123 X 100 Ept P123.% P123, % Bacterial speciest 26.3 ± 6.6 50.8 ± 4.4 0.464 ± 0.076 22.8 ± 4.6 73.6 ± 4.6 Vertebrates§ Xenopus 19.3 ± 12.9 46.2 ± 13.0 0.473 ± 0.114 25.4 ± 1.5 71.6 ± 1.5 Chicken 15.4 ± 5.0 44.6 ± 5.4 0.464 ± 0.044 26.2 ± 2.3 69.8 ± 2.3 Mouse 27.4 ± 5.4 51.6 ± 5.6 0.442 ± 0.051 21.4 ± 2.8 73.0 ± 2.8 Rats 22.6 ± 5.8 40.3 ± 5.7 0.428 ± 0.051 22.1 ± 2.6 70.5 ± 2.6 Human 19.4 ± 2.0 46.3 ± 2.2 0.447 ± 0.018 24.0 ± 2.3 70.3 ± 2.3 Coding areas represent the areas of DNA that encode amino acid sequences, including all three codon positions. For a visual presentation of these parameters, see Fig. 2. 6123 is the equilibrium coefficient for the coding area. Relative effects are expressed in percent neutrality (E123) and selective constraints (1 - 6123) on the effective directional mutation pressure (P3- Ep) (Fig. 2; Eq. 5a). tEp values were calculated from the regression of P12 or P123 [(P1 + P2 + P3)/3] against P3 without weighting by the number of genes for each species of bacteria and with weighting by the number of codons for each gene for vertebrates. The Ep value for the whole coding area (P123) is the same as the Ep value for P12, because at Ep for P12, P12 = P3, and consequently, at Ep, P123 = (2P12 + P3)/3 = P3, which is the same for P12. Standard deviations for Ep were calculated as [(1 - E)-2 Var(Amin) + (Amin)2(1 - E) Var(E)V1/2, where Var(Amin) and Var(E) are variances for Arnin and for E, respectively (see ref. 34). $Data compiled by Muto and Osawa (16) were used for calculation (Fig. lb). §Genes of each vertebrate (27) were used for calculation. $A parotid gland -rich protein gene was eliminated from the calculation.

all three codon positions; where P12 and P123 are at equilib- compositional heterogeneity of vertebrate DNA is known to rium with directional mutation pressure (AD) and are not be larger than that of bacterial DNA. In vertebrates, P3 of subject to selective constraints. The EP values calculated for individual genes of a species shows much wider distribution, different bacteria and the EP values for different genes of covering the P3 value between 0.3 and 0.9, than intraspecific individual vertebrates are similar (Table 2). EP may, there- P3 values of bacteria, yeast, nematode, Drosophila, and fore, be a universal constant for P12 and for P123 (the entire maize (Table 1). Thus, different genes of individual verte- coding area). brates have widely different base compositions, particularly Heterogeneity of P3 Within Vertebrate Species. As is evi- in the third codon position. Moreover, as reported by dent from Fig. 3 and Tables 1 and 2, the intraspecific Bernardi et al. (32) and Ikemura (33), a stretch of DNA with a certain G + C content usually covers larger regions than 1.1 individual genes, probably including several genes. In this E.c.li Human connection, Hartley and Callan (35) reported that the giant - chromatin loop of a newt lampbrush chromosome has a base composition different from those of other loops. These 0. '; 0 different domains ('200 kilobases) were termed "isochore" r. O. a~~~~~ by Bernardi et al. (32). Within a domain, the third codon 0 position, introns, and 5' and 3' flanking regions show con- certedly high or low G + C contents (32, 33). In this sense,

+ the situation is very similar to the interspecies relationship found in bacteria. These results indicate that each domain of a vertebrate chromosome may have a common, but unique, *LD value. Recently, Nomura et al. (36) reported that the E. ccli Human bacterial intraspecific heterogeneity of DNA G + C content, although considerably smaller than that of vertebrate, is also t4 significant and regional.

~0.5 ~ 0 1..510 .51 DISCUSSION Both the directional mutation theory and the neutral theory base their arguments on the selectively neutral changes by 0 mutation that play a major role in changing DNA base 0.5 1.0 0.5 sequence. In that sense, the two theories belong to the same

G+C content of third position category. The most fundamental difference between the two theories lies in the treatment of the mutation effect itself. FIG. 3. Comparison of G+C content distributions of genes The directional mutation theory treats mutations as bidirec- between E. coli and human. Regression lines of the G + C content of tional events as indicated in Eqs. 1 and 2. In addition, the the first (Pl; o), the second (P2; x), and the average of the first and theory incorporates the effect of selective constraints as an second codon positions (P12; *) against the G + C content of the third integral part. For each mutational event, the principle of codon position (P3) among different genes of human are also shown. random drift should be applicable. In the neutral theory, The E. coli data do not have enough spread in P3 to show significant however, directionality and uniformity of mutation within coefficient. The coefficient of the average regression regression (612) species or within a domain are not taken into account. G + C content of the first and second codon positions (P12) for constants 612, 6123, and appear to be univer- human data is 0.194 + 0.020 and the equilibrium point (EP) iS 0.447 The three EP + 0.011 (see also Table 2). Each datum point was weighted by the sal as shown by the fact that, in the coding areas, they are number of codons of the gene for calculating the regression coeffi- similar both among different bacterial species and among cient. different genes of individual vertebrates (Table 2). The near Downloaded by guest on October 8, 2021 Evolution: Sueoka Proc. Natl. Acad. Sci. USA 85 (1988) 2657

in the Department for their help in improving the manuscript. Constructive comments by Drs. James F. Crow and Norman H. Horowitz on this paper are gratefully acknowledged. This work came out of the research supported by a National Science Founda- tion Grant G-15080 (to N.S.) during the period 1959-1962. a 1. Kimura, M. (1968) Nature (London) 217, 624-626. -) 2. Kimura, M. (1983) The Neutral Theory ofMolecular Evolution (Cambridge Univ. Press, Cambridge, U.K.). 3. King, J. L. & Jukes, T. H. (1969) Science 164, 788-798.

+ 4. Wright, S. (1938) Science 87, 430-431. 5. Sueoka, N. (1962) Proc. Natl. Acad. Sci. USA 48, 582-592. 6. Freese, E. (1962) J. Theor. Biol. 3, 82-101. 7. Sueoka, N., Marmur, J. & Doty, P. (1959) Nature (London) 183, 1429-1431. 8. Rolfe, R. & Meselson, M. (1959) Proc. Natl. Acad. Sci. USA 0.5 1.0 45, 1039-1043. G+C content of third position 9. Sueoka, N. (1959) Proc. Natl. Acad. Sci. USA 45, 1480-1490. 10. Sueoka, N. (1961) J. Mol. Biol. 3, 31-40. FIG. 4. Combined distribution of P12 of yeast, nematode, and 11. Lee, K. Y., Wahl, R. & Barbu, E. (1956) Ann. Inst. Pasteur Drosophila. Distributions of P12 values of yeast (o), nematode (x), Paris 91, 212-224. and D. melanogaster (A) are combined. E12, E123, and Ep values 12. Belozersky, A. N. & Spirin, A. S. (1958) Nature (London) calculated by the least-squares method without weighting each 182, 111-112. datum point with the number of codons are 0.137 ± 0.030, 0.425 ± 13. Bak, A. L., Atkins, J. F. & Meyer, S. A. (1972) Science 175, 0.020, and 0.428 ± 0.033, respectively. The point marked with an 1391-1393. arrow represents the average value of collagen genes (col-l and 14. Sueoka, N. (1961) Proc. Natl. Acad. Sci. USA 47, 1141-1149. col-2) of C. elegans (37); this point was eliminated from the 15. Sueoka, N. (1961) Cold Spring Harbor Symp. Quant. Biol. 26, calculation of the regression coefficient. 35-43. 16. Muto, A. & Osawa, S. (1987) Proc. NatI. Acad. Sci. USA 84, constancy of these values suggests that E12, 6123, and Ep 161-165. reflect the nature of the code and the necessary physico- 17. Bibb, M. J., Bibb, B. J., Wand, J. M. & Cohan, S. N. (1985) chemical constraints of proteins to be stable and functional Mol. Gen. Genet. 199, 26-36. in the cytoplasm. This point is further supported by the data 18. Jukes, T. H. & Bhushan, V. (1986) J. Mol. Evol. 24, 39-44. on and in 19. Clary D. 0. & Wolstenholme, D. R. (1985) J. Mol. Evol. 22, yeast, nematode, Drosophila presented Fig. 4, 252-271. where combination of the data for the three organisms forms 20. Helftenbein, E. (1985) Nucleic Acids Res. 13, 415-433. another set of E12 (0.137 + 0.030), E123 (0.425 + 0.020), and 21. Preer, J. R., Preer, L. B., Rudman, B. M. & Barnett, A. J. Ep (0.428 ± 0.033) that are within the range found in bacteria (1985) Nature (London) 314, 188-190. and vertebrates (Table 2). The extreme values (-25% and 22. Kuchino, Y., Hanyu, N., Tashiro, F. & Nishimura, S. (1985) -75%) in the G + C content of the coding area in bacteria, Proc. Natl. Acad. Sci. USA 82, 4758-4762. protozoa, and algae (10-13) are those corresponding to uD = 23. Jukes, T. H., Osawa, S. & Muto, A. (1987) Nature (London) 0 and 4D = 1. This can be explained as the natural 325, 668. 24. Cox, E. C. & Yanofsky, C. (1967) Proc. NatI. Acad. Sci. USA consequence of the universal values of 6123 and Ep and the relatively small proportion of noncoding areas in these 58, 1895-1902. 25. Treffers, H. P., Sinelli, V. & Belser, N. 0. (1954) Proc. Natl. organisms (Fig. 1). Acad. Sci. USA 40, 1064-1071. To explain the G + C heterogeneity, Cox (38) wrote in 1972 26. Yanofsky, C., Cox, E. C. & Horn, V. (1966) Proc. Natl. Acad. that mutation rates may vary in different parts of the Sci. USA 55, 274-281. chromosome and the genes may be organized on the chro- 27. Maruyama, T., Gojobori, T., Aota, S. & Ikemura, T. (1986) mosome by selection according to the extent of their func- Nucleic Acids Res. 14, Suppl., rlS1-r197. tional constraints. Two plausible explanations based on 28. Normore, W. N. & Brown, J. R. (1970) in Handbook of directional mutation pressure for the wide intraspecific het- Biochemistry: Selected Data for Molecular Biology, ed. West, erogeneity of P3 among proteins of higher vertebrates are as R. C. (Chemical Rubber Co., Cleveland, OH), 2nd Ed., pp. follows. (i) There might be several different directional H24-H103. 29. Bilofsky, H. S., Burks, C., Fickett, J. W., Goad, W. B., mutation pressures (not necessarily mutation rates) in differ- Lewitter, F. L., Rindone, W. P., Swindell, C. D. & Tung, ent locations on the genome, and the cause for this differ- C. S. (1986) Nucleic Acids Res. 14, 1-4. ence might reside in the local structural elements of the 30. Jukes, T. H. (1965) Am. Sci. 53, 477-487. chromatin. Thus, major mutagenic events (DNA replication 31. Yanofsky, C. & vanCleemput, M. (1982) J. Mol. Biol. 154, and repair) may act differently in different domains. (ii) DNA 235-246. replication and DNA repair synthesis may make replication 32. Bernardi, G., Olfsson, B., Filipski, J., Zerial, M., Salinas, J., errors differently and the extent of DNA repair synthesis Cuny, G., Meunier-Rotival, M. & Rodier, F. (1985) Science may vary among various domains of the chromatin because 228, 953-958. of the different susceptibility of DNA to damage and repair 33. Ikemura, T. (1985) Mol. Biol. Evol. 2, 13-34. 34. Meyer, S. L. (1975) Data Analysisfor Scientists and Engineers due to differences in chromatin structure. Clarification of (Wiley, New York), pp. 39-41. chromosome domains is an important future consideration. 35. Hartley, S. E. & Callan, H. G. (1978) J. Cell Sci. 34, 279-288. Whether or not domains are equivalent to loop structures of 36. Nomura, M., Sor, F., Yamagishi, M. & Lawson, M. (1987) the chromatin (39) remains to be seen. Cold Spring Harbor Symp. Quant. Biol. 50, in press. 37. Kramer, J. M., Cox, G. N. & Hirsh, D. (1982) Cell 30, I am grateful to those people who have helped me by supplying 599-606. data as acknowledged in this article and to Shawn Elliott, Charles 38. Cox, E. C. (1972) Nature (London) 239, 133-134. Hambleton, and Todd Devine for their patient assistance in data 39. Gasser, S. M. & Laemmli, U. K. (1987) Trends Genet. 3, calculation. I am much indebted also to those in my laboratory and 16-22. Downloaded by guest on October 8, 2021