Intra- and Interspeci®c DNA Variation and Codon Bias of the Alcohol Dehydrogenase (Adh) Locus in and Arabidopsis Species

Naohiko T. Miyashita,* Akira Kawabe,* Hideki Innan,*1 and Ryohei Terauchi²2 *Laboratory of Genetics, Graduate School of Agriculture, and ²Laboratory of Plant Systematics, Faculty of Science, Kyoto University, Japan

Sequence variation at the alcohol dehydrogenase (Adh) locus was analyzed for six species each of the genera Arabis and Arabidopsis. Phylogenetic analysis showed that investigated species were grouped into three clusters, and the generic classi®cation did not correspond to the clusterings. The results indicated that the genera could not be distinguished on the basis of the Adh variation. A signi®cant difference in the ratio of silent to replacement sites was detected by MK test in two comparisons, with Arabidopsis thaliana polymorphism due to excess silent diver- gence. Silent changes were predominant in the evolution of the Adh locus in Arabis and Arabidopsis. To infer evolutionary signi®cance of silent substitutions, codon bias was studied. The degree of codon bias of the Adh region was relatively constant over Arabis and Arabidopsis species. ``Preferred'' codons of A. thaliana were determined. No evidence of natural selection on codon change was detected in the Adh regions of A. thaliana and Arabis gemmifera.

Introduction Arabidopsis thaliana provides an excellent opportu- two loci were consistent with those of divergence be- nity to study DNA variation in natural plant populations. tween these species, except for an excess of replacement This species has been used as a model system for plant polymorphism in Arabis gemmifera. molecular biology (Meyerowitz and Somerville 1994). The purpose of this study was to investigate genetic Accumulating sequence information on various genes mechanisms acting on DNA variation in the Adh regions can be readily utilized to study sequence variation of A. of Arabidopsis and Arabis species by comparing pat- thaliana and its related species. terns and levels of intra- and interspeci®c DNA variation So far, only a few nuclear genes of this plant spe- in these two genera. Knowledge of the phylogenetic re- cies have been analyzed from the perspective of popu- lationship between species is important in interspeci®c lation and evolutionary genetics. In a worldwide sample comparisons, e.g., HKA (Hudson, Kreitman, and Agu- of A. thaliana, dimorphism of DNA variations was de- ade 1987) and MK (McDonald and Kreitman 1991) tected in the Adh region, i.e., two distinct arrays of DNA tests. Although Hanfstingl et al. (1994) compared a por- polymorphisms segregated throughout the region (Innan tion of exon 4 of the Adh to investigate the phylogenetic et al. 1996). Dimorphism was detected in the acidic chi- relationship between A. thaliana and related species, an tinase (ChiA) region as well (Kawabe et al. 1997). These analysis of a longer sequence is necessary to obtain a results suggest that dimorphism of DNA variations is a reliable phylogeny. At ®rst, we investigated the phylo- characteristic of the nuclear genome of this plant spe- genetic relationship of six species each of Arabis and cies. To explain the dimorphism, we hypothesized that Arabidopsis based on variation in the Adh region. We fusion of two divergent populations occurred in the his- found that synonymous changes were predominant in tory of A. thaliana. sequence evolution of the Adh regions of these plant Levels of nucleotide variation at the Adh and ChiA species. To investigate the evolutionary signi®cance of regions in A. thaliana were comparable with those re- synonymous changes, we analyzed codon bias of the ported for other plant and Drosophila nuclear genes. Adh region. The neutral mutation hypothesis (Kimura 1983) was not rejected in the Adh region, while signi®cant deviation Materials and Methods was detected in the ChiA region. We also compared in- Plant Materials tra- and interspeci®c variation at the two loci of A. thal- iana and related Arabis species to infer the genetic Six species each of the genera Arabis and Arabi- mechanism acting on DNA variation (Miyashita, Innan, dopsis were used (table 1). Some of species are poly- and Terauchi 1996; Kawabe et al. 1997). It was shown ploid, with a basic chromosome number of 16. The na- that the level and pattern of DNA polymorphism of the ture of polyploidy (auto- and alloploidy) and genome constitution have not been investigated. Arabidopsis 1 Present address: Department of Biological Sciences, Graduate suecica has been suggested to be amphiploid between School of Science, University of Tokyo, Japan. A. thaliana and Cardaminopsis arenosa (Hylander 2 Present address: Biocenter, University of Frankfurt, Germany 1957; ReÂdei 1972; O'Kane, Schaal, and Al-Shehbaz Key words: Adh, Arabis, Arabidopsis, phylogeny, codon bias. 1996). A. thaliana and A. gemmifera were described (Miyashita, Innan, and Terauchi 1996; Innan et al. Address for correspondence and reprints: Naohiko Miyashita, Laboratory of Plant Genetics, Graduate School of Agriculture, Kyoto 1996). Seeds of the other Arabidopsis species were ob- University, Sakyo-ku, Kyoto 606-01, Japan. E-mail: tained from Professor N. Goto, Sendai Arabidopsis Seed [email protected]. Stock Center, Miyagi University of Education. Mol. Biol. Evol. 15(11):1420±1429. 1998 were grown in pots placed in an incubator under 24-h ᭧ 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 light conditions. The other Arabis species were sampled

1420 Adh Variation in Arabis and Arabidopsis 1421

Table 1 Arabidopsis and Arabis Species Studied Genus Species N Sampling Location 2n Distribution Arabidopsis .... A. thaliana 17 See Innan et al. (1996) 10 Eurasia, North Africa A. himalaica 1 Ganesh Himal, Nepal (JOS18)a 16 China, West Himalaya, Pakistan A. wallichii 1 Unknown (JS5)a 16 Afganistan, Central Asia, China, West Himalaya, Iran, Pakistan A. korshinskyi 1 Unknown (JS4)a 48 Central Asia A. grif®thiana 1 Unknown (JS3)a 32 Afganistan, Central and Southwest Asia, China, Iran, Pakistan, East Russia A. suecica 1 Finland (JS6)a 26 North Europe Arabis ...... A. gemmifera 1 Minou, Osaka Prefecture, 1994 16 East Asia (Japan, Korea) A. lyrata subsp. kawasakiana 1 Oyodo, Mie Prefecture, 1992 32 East Asia, North America A. glabra 1 Maiko, Shiga Prefecture, 1992 12 Eurasia (Europe, Asia) A. hirsuta subsp. japonica 1 Yushima, Yamanashi Prefecture, ? East Asia (Japan, Korea, Manchuria) 1994 A. stelleri 1 Hamakurosaki, Toyama Prefec- 32 East Asia (Japan, Korea, Kamchatka) ture, 1992 A. ¯agellosa 1 Kifune, Kyoto Prefecture, 1992 ? East Asia (Japan)

NOTE.ÐN ϭ the number accessions analyzed; 2n ϭ diploid chromosome number. a Code number of the Sendai Arabidopsis Seed Stock Center.

in Japan (table 1). As an outgroup for interspeci®c com- using PAUP 3.1.1 (Swofford 1993) with the heuristic parison, Brassica oleacea (cabbage) purchased at a mar- search option. To analyze codon bias and GC content, ket was used. MEGA and CODONS (Lloyd and Sharp 1992) were used. Codon usage data for A. thaliana were obtained DNA Sequencing from the CUTG (Codon Usage Tabulated from Gen- Total DNAs of the Arabis and Arabidopsis species Bank; Nakamura, Gojobori, and Ikemura 1997). After and B. oleacea were puri®ed by a modi®ed CTAB meth- eliminating cDNA sequences of unknown function and od (Terauchi and Konuma 1994) and used for PCR am- YAC and BAC ORF sequences determined by the Ara- pli®cation of about 1.7 kb of the Adh gene. Primers for bidopsis Genome Initiative (AGI), 1,491 nuclear coding PCR ampli®cation were 5Ј-ACC ACC GGA CAG ATT sequences were analyzed. To determine ``preferred'' co- ATT CG-3Ј and 5Ј-CAC CCA TGG TGA TGA TGC dons in synonymous families, we compared codon usage ACC-3Ј, which were located in the ®rst and last (sev- frequencies in high- and low-biased genes, following the enth) exons of the A. thaliana Adh region, respectively method of Sharp and Lloyd (1993) and Akashi (1994). (Chang and Meyerowitz 1986). Because the number of For a synonymous family, codons showing signi®cant Adh genes in these species was not known, direct se- increases in frequency between low- and high-biased quencing was not possible. The PCR products were genes examined by the G-test were considered ``pre- cloned into plasmid pUC18. One clone was sequenced ferred.'' Signi®cance of the G-test was examined by the for each species and used only for interspeci®c com- sequential Bonferroni test (Rice 1989). parisons. The sequencing reaction followed the manu- facturer's protocol (Pharmacia ALFexpress, AutoRead). Results Nucleotide sequences were determined in both strands Phylogenetic Relationship of Arabis and Arabidopsis by a Pharmacia ALFred sequencer. Newly determined Species Adh sequences were deposited in the DDBJ/GenBank/ EMBL databases under accession numbers AB015498± In the investigated Arabis and Arabidopsis species, AB015508. the Adh genes have seven exons and six introns (table 2). Every intron starts with dinucleotide GT and ends Data Analyses with AG, which is typical of eukaryotes. Lengths of Program package DnaSP, version 2.5 (Rozas and introns, especially intron 1, vary among species. It was Rozas 1997), was used to analyze intra- and interspeci®c not straightforward to align each intron over the 12 spe- variation by estimation of nucleotide diversity (␲; Nei cies. In A. thaliana, introns 2, 3, and 6 vary in length. and Li 1979) and the tests of Hudson, Kreitman, and This variation is associated with dimorphic indels (Innan Aguade (1987), Tajima (1989), McDonald and Kreitman et al. 1996). In A. gemmifera, the sizes of four introns (1991), and Fu and Li (1993). Phylogenetic analyses, vary, mainly owing to the variation in repeats of T (Mi- consisting of estimation of the number of nucleotide yashita, Innan, and Terauchi 1996). substitutions per site (Kimura 1980; Nei and Gojobori The number of nucleotide substitutions per site 1986) and construction and bootstrap probability esti- (Nei and Gojobori 1986) in the coding region of the Adh mation of the neighbor-joining (NJ) tree (Saitou and Nei gene between each pair of the 12 species was estimated 1987), were conducted with PHYLIP, version 3.57 (Fel- (table 3). Generally, the number of synonymous substi- senstein 1993), and MEGA, version 1.0.1 (Kumar, Ta- tutions per site (Ks) is more than 10±20 times as large mura, and Nei 1993). A parsimony tree was obtained by as that of replacement substitutions per site (Ka). With 1422 Miyashita et al.

Table 2 Exon-intron Structures of the Adh loci of Arabidopsis and Arabis Species Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 (34) (137) (47) (485) (158) (162) (114) Intron 1 Intron 2 Intron 3 Intron 4 Intron 5 Intron 6 Arabidopsis A. thaliana ...... 132 86 and 103 86 and 87 86 86 74 and 79 A. himalaica ...... 106 85 87 84 87 78 A. wallichii ...... 106 85 88 84 87 91 A. korshinskyi ...... 172 86 101 85 89 80 A. grif®thiana ...... 165 94 91 85 88 79 A. suecica ...... 160 84 82 85 87 74 Arabis A. gemmifera ...... 169ϳ174 87 and 88 85 and 86 85 and 90 87 73 A. lyrata ...... 179 85 96 85 87 73 A. glabra ...... 163 95 95 86 87 72 A. hirsuta ...... 123 90 80 90 90 77 A. stelleri ...... 123 91 80 90 88 77 A. ¯agellosa ...... 83 92 88 91 88 79

NOTE.ÐThe numbers of nucleotides are shown.

respect to Ks, A. thaliana has shorter distances to A. trees, the 12 species of Arabis and Arabidopsis are suecica (0.200), A. gemmifera (0.194), and Arabis lyrata grouped into three clusters, which were well supported (0.193) than to the other species. However, these dis- with high bootstrap probabilities. Arabidopsis thaliana tances are longer than that (0.046) between the two most is included in a cluster with A. suecica, A. lyrata, and divergent sequence types (1 and 6) of A. thaliana A. gemmifera, but A. thaliana is relatively isolated. (Landsberg and Ci-0) (Innan et al. 1996). Distances be- These four species are distributed in distant regions of tween A. gemmifera and A. lyrata (0.029) and between the world. The second cluster consisted of species from Arabidopsis wallichi and Arabidopsis himalaica (0.046) central Asia and China (the Himalayan Arabidopsis) and are at the same level as that between the two sequence a cosmopolitan species, A. glabra. Although central types of A. thaliana. It is notable that the distance be- Asia and the Himalaya are the center of diversity of the tween Arabis hirsuta and Arabis stelleri is extremely genus Arabidopsis (Price, Palmer, and Al-Shehbaz short (0.004). 1994), A. thaliana is not included in the cluster of Him- Phylogenetic trees (®g. 1) were constructed by the alayan Arabidopsis. The third cluster includes the East neighbor-joining (Saitou and Nei 1987) and parsimony Asian Arabis species. It is clear that the genera Arabis (Swofford 1993) methods based on nucleotide variation and Arabidopsis are not distinguished from each other in the Adh coding sequence. For the NJ tree, distances based on the Adh variation. When Ks was used for tree estimated by Kimura's (1980) two-parameter method construction by the NJ method, the topology was the were used. By the parsimony method, two most-parsi- same as that shown. On the other hand, if Ka or distance monious trees were obtained, which differ in the posi- obtained from amino acid sequence was used, the po- tion of Arabis glabra. One of them has a topology iden- sition of A. thaliana changed, although the number of tical to that of the NJ tree. In the NJ and parsimony clusters was the same. Arabidopsis thaliana was includ-

Table 3 The Number of Substitutions Per Site Between Each Pair of Species Species 1234567891011121314 1. Arabidopsis (Ci-O) ...... Ð 0.004 0.021 0.022 0.011 0.017 0.015 0.017 0.021 0.017 0.026 0.024 0.019 0.026 2. Arabidopsis thaliana ...... (Ldbg)a ...... 0.046 Ð 0.019 0.021 0.010 0.016 0.013 0.016 0.019 0.016 0.025 0.023 0.018 0.025 3. Arabidopsis himalaica ...... 0.327 0.320 Ð 0.005 0.012 0.016 0.016 0.016 0.019 0.016 0.024 0.023 0.016 0.024 4. Arabidopsis wallichii ...... 0.339 0.320 0.046 Ð 0.013 0.017 0.012 0.015 0.018 0.017 0.028 0.027 0.019 0.025 5. Arabidopsis korshinskyi ..... 0.314 0.326 0.308 0.301 Ð 0.006 0.006 0.008 0.012 0.011 0.018 0.018 0.013 0.015 6. Arabidopsis grif®thiana ..... 0.320 0.339 0.296 0.289 0.104 Ð 0.012 0.012 0.016 0.015 0.025 0.024 0.018 0.020 7. Arabidopsis suecica ...... 0.195 0.205 0.315 0.320 0.284 0.296 Ð 0.005 0.008 0.015 0.019 0.019 0.012 0.016 8. Arabis gemmifera ...... 0.194 0.194 0.290 0.283 0.243 0.260 0.059 Ð 0.008 0.015 0.021 0.022 0.015 0.018 9. Arabis lyrata ...... 0.190 0.195 0.297 0.290 0.250 0.285 0.060 0.029 Ð 0.018 0.025 0.026 0.019 0.019 10. Arabis glabra ...... 0.283 0.283 0.226 0.215 0.204 0.210 0.226 0.194 0.200 Ð 0.026 0.022 0.021 0.024 11. Arabis hirsuta ...... 0.406 0.428 0.420 0.433 0.413 0.406 0.332 0.306 0.339 0.357 Ð 0.005 0.012 0.028 12. Arabis stelleri ...... 0.398 0.419 0.406 0.418 0.398 0.391 0.320 0.295 0.327 0.344 0.004 Ð 0.012 0.029 13. Arabis ¯agellosa ...... 0.451 0.474 0.444 0.457 0.454 0.450 0.408 0.384 0.413 0.428 0.092 0.091 Ð 0.023 14. Brassica oleacea ...... 0.686 0.655 0.753 0.727 0.772 0.773 0.646 0.637 0.660 0.650 0.744 0.738 0.802 Ð

NOTE.ÐKs values are presented below the diagonal, and Ka values are presented above. a Ldbg ϭ the ecotype Landsberg. Adh Variation in Arabis and Arabidopsis 1423 6.57* 7.57* 4.44 NS 6.10* 6.93* 8.57* 6.76* 10.8** 1D 2D 6.48* 7.36** 4.41* 5.67* 6.75** 9.68** 8.33** 5.67* 2 4 4 5 3 4 2 5 3 v 1 0 9 0 v 10 10 11 11 11 0 v 14 20 11 15 24 25 29 28 2 6 6 s 12 13 14 11 12 14 0, 1, and 2). 1D, one-degree-of-freedom method; 2D; two-degree-of- 1 1 1 s 24 23 25 22 25 23 ϭ i ( i 0 s 29 40 25 33 37 35 46 37 2 16 10 18 17 15 14 11 17 m 1 1 1 34 33 34 33 36 34 m 0 43 60 36 48 61 60 75 65 m

FIG. 1.Ð1, Neighbor-joining tree of 12 Arabis and Arabidopsis species. Bootstrap probabilities Ͼ50% are shown above branches. 2,

One of two most-parsimonious trees. Bootstrap probabilities Ͼ50% are 1,080 1,080 1,080 1,080 1,065 1,065 1,065 1,080 Compared shown above branches, and numbers of mutations are shown below. No. of Sites Ecotype name of A. thaliana is shown in parentheses. Ldbg ϭ the ecotype Landsberg. An accession sampled at Minou, Osaka Prefecture (Miyashita, Innan, and Terauchi 1996), was used for A. gemmifera.

ed in cluster II and became a sister group to A. glabra. However, bootstrap probability for each of three clusters Arabidopsis

in the tree based on amino acid sequence was low. It and

should be mentioned that the Adh genes sequenced may A. gemmifera Arabis lyrata A. gemmifera A. gemmifera Arabidopsis suecica A. gemmifera A. lyrata A. gemmifera not be orthologous because of high polyploidy in some of species studied. Therefore, the phylogenetic trees ob- Arabis tained should be viewed with caution. A relative-rate test (Tajima 1993) was conducted to examine variation in substitution rates among lineages are numbers of total, transition, and transversion substitutions, respectively, found only in species

of the 12 species studied. To compare two species in a i cluster, an outgroup species was chosen from the other v Arabidopsis thaliana Arabis gemmifera A. thaliana A. thaliana A. thaliana A. thaliana A. gemmifera A. thaliana , and clusters. Among 243 tests of this kind, 38 comparisons i s , gave signi®cant results, of which 33 were detected in i m 0.01; NS, nonsigni®cant.

the comparisons involving A. suecica and A. gemmifera. ) Ͻ

Some of the signi®cant results are shown (table 4). The P results indicate that substitution rates in the lineages to A. gemmifera and A. suecica are slower than those in ) ) 0.05; ** Ͻ

the other lineages. Arabis stelleri P (or Intraspeci®c Variation in A. thaliana and Interspeci®c A. stelleri A. stelleri Species 0 Species 1 Species 2

Variation Among Arabis and Arabidopsis Species (or (or

Levels of nucleotide variation for A. thaliana (In- .ÐSpecies 0 is the outgroup. OTE

nan et al. 1996) were reanalyzed for the entire region N Table 4 Summary of Relative-Rate Tests on 12 Species of Arabidopsis grif®thiana A. grif®thiana Arabidopsis korshinskyi Arabidopsis wallichii Arabis hirsuta A. hirsuta A. hirsuta Arabis ¯agellosa and for the exon, intron, and ¯anking regions (table 5). freedom method; * 1424 Miyashita et al.

Table 5 Level of DNA Polymorphism in the Adh Region of Arabidopsis thaliana 5Ј Flanking 3Ј-Flanking A. thaliana (n ϭ 17) Region Exon Region Intron Region Region Entire Region No. of nucleotides ...... 399 1,137 (252.7)a 549 295 2,380 No. of polymorphic sites ...... 24 20 (13, 7)b 21 11 76 ␲ (synonymous site) ...... 0.0128 0.0192 (0.0022)c 0.0082 0.0090 0.0115 (0.0080)d Tajima test (D) ...... Ϫ1.13 NS 0.73 NS Ϫ1.10 NS Ϫ0.68 NS Ϫ0.58 NS Fu and Li test (D*) ...... Ϫ2.22 NS 0.07 NS Ϫ1.53 NS 0.61 NS Ϫ1.05 NS

NOTE.ÐNS indicates P Ͼ 0.05. a The number of silent sites. b (No. of silent sites, no. of replacement sites). c ␲ (nonsynonymous site). d ␲ for pooled data.

The exon region has the highest level of silent-site nu- Codon Bias in the Adh Regions of Arabis and cleotide diversity. Tajima (1989) and Fu and Li (1993) Arabidopsis tests were conducted for each region. Test measures The analyses above show that silent substitutions were negative but not signi®cant, except for the exon were predominant in the Adh sequence evolution among region, in which slightly positive values were detected. the species investigated. Silent substitutions could be The positive values are consistent with high nucleotide subject to natural selection through the ef®ciency of diversity for silent sites in the exon. translation due to codon bias in genes of Escherichia To test if the high level of silent polymorphism in coli (Sharp and Li 1987) and Drosophila (Moriyama and the exon region of A. thaliana is consistent with the Hartl 1993). It is of interest to investigate whether silent level of divergence, the HKA test was conducted for the changes in the Adh loci of Arabis and Arabidopsis spe- three species in cluster I, using each of the 5Ј-¯anking, cies are related to natural selection on codon usage. intron, and 3Ј-¯anking regions as reference regions. Sig- To measure of codon bias, the scaled ␹2 (Shields ni®cance was not detected in any comparisons. This re- et al. 1988) and the effective number of codons (Wright sult indicates that the level of silent polymorphism in 1990) were calculated for the Adh regions of the inves- the exon of A. thaliana is consistent with the divergence tigated species and other plants (table 7). The codon bias level, compared with noncoding regions. of the Adh region in Arabis and Arabidopsis species is The MK test was conducted to examine if the ratio intermediate among plant species and is relatively con- of silent to replacement substitutions is consistent be- stant over Arabis and Arabidopsis species. The mean tween intra- and interspeci®c comparisons (table 6). In and standard deviation of the scaled ␹2 of A. thaliana all of the interspeci®c comparisons, silent divergence nuclear genes (n ϭ 1,491) were 0.277 Ϯ 0.122 (®g. 2). was detected more than replacement divergence. Signif- The codon bias of the Adh region can be considered icance was detected in two comparisons (Arabidopsis higher than the average of A. thaliana nuclear genes. In korshinskyi and Arabis ¯agellosa), and bias was toward Arabis and Arabidopsis species, the GC content seems more silent divergence. to be constant over codon positions. No correlation be- With respect to allozyme variation of the Adh re- tween codon bias and GC content at the third codon gion in natural populations of A. thaliana, Hanfstingl et position (GC3) was detected for the Adh data of these al. (1994) suggested that two amino acid sites in exon plant species. 4 (amino acid positions 106 and 155) were responsible In Drosophila, changes to ``preferred'' and ``non- for three allozyme forms. There were four combinations preferred'' codons are related to weak positive and neg- of amino acids, Gln (CAG)±Pro (CCG), Lys (AAG)± ative selection, respectively (Akashi 1995). A preferred Pro (CCG), His (CAC)±Pro (CCG), and His (CAC)±Arg codon(s) in a synonymous group was de®ned as one (CGG), for super fast, fast, fast and slow forms, respec- increasing in frequency with codon bias (Sharp and tively. As to the dimorphic variation (Innan et al. 1996), Lloyd 1993; Akashi 1994, 1995). We determined pre- sequence type 1 had the super fast form, and sequence ferred codons of A. thaliana, except for a twofold de- type 6 had the fast form (His [CAC]±Pro [CCG]). The generate amino acid, Cys (table 8). Preferred codons of slow form of Adh was not detected in our sample. The twofold degenerate amino acids end with G or C at the other Arabis and Arabidopsis investigated in this study third position, while those of fourfold degenerate amino (including B. oleacea) have the His±Pro combination of acids end with C, U, or A, but not with G. This pattern the fast form, although three Arabis in cluster III and B. is different from that of Drosophila (Akashi 1995), in oleacea have the codon CCT for Pro at position 155. which preferred codons always end with G or C. These results suggest that the fast form, His (CAC)±Pro We classi®ed silent polymorphisms in A. thaliana (CCG), is the ancestral form of the Adh gene of A. thal- (Innan et al. 1996) and A. gemmifera (Miyashita, Innan, iana and the other three forms are derived. In A. gem- and Terauchi 1996) and silent ®xations between the two mifera, no nucleotide polymorphism was detected at species into preferred and nonpreferred changes based these positions. on the direction of mutations inferred by the parsimony Adh Variation in Arabis and Arabidopsis 1425

Table 7 Codon Bias and Nucleotide Composition in the Adh Regions of Arabis and Arabidopsis

2 ␹ ENC GCex GC3 GCin * 81 14

Arabis Arabidopsis ¯agellosa A. thaliana (Ci-0) ...... 0.336 49.52 0.48 0.47 0.31 A. thaliana (Landberg) .... 0.356 49.53 0.48 0.48 0.29 A. himalaica ...... 0.307 51.31 0.47 0.46 0.31 A. wallichii ...... 0.286 52.42 0.48 0.47 0.30 74 18 NS

Arabis A. korshinskyi ...... 0.304 49.63 0.48 0.46 0.33 stelleri A. grif®thiana ...... 0.331 48.06 0.48 0.46 0.29 A. suecica ...... 0.307 48.73 0.47 0.45 0.31 Arabis A. gemmifera ...... 0.351 48.86 0.48 0.46 0.29 74 19 NS A. lyrata ...... 0.363 47.84 0.47 0.45 0.29 Arabis hirsuta A. glabra ...... 0.298 50.73 0.48 0.46 0.29 A. hirsuta ...... 0.356 50.61 0.47 0.42 0.33 A. stelleri ...... 0.357 50.91 0.47 0.43 0.33 A. ¯agellosa ...... 0.293 53.03 0.47 0.44 0.32 52 12 NS Other plant species Arabis glabra Glossypium hirsutum ..... 0.235 53.31 0.48 0.45 Ð Malus domestica ...... 0.145 58.24 0.48 0.47 Ð Triforlium repens ...... 0.291 49.93 0.46 0.40 Ð Pisum sativa ...... 0.294 50.81 0.46 0.42 Ð

36 15 Nicotiana tabaccum ...... 0.217 55.74 0.47 0.43 Ð NS lyrata Arabis Solunum lycopersicum .... 0.348 48.96 0.43 0.33 Ð Solunum tuberosum ...... 0.425 45.75 0.43 0.33 Ð Zea mays (Adh 1) ...... 0.363 48.06 0.55 0.66 Ð Z. mays (Adh 2) ...... 0.658 39.58 0.60 0.83 Ð Oriza sativa ...... 0.211 56.04 0.51 0.52 Ð 36 10 NS Arabis IVERGENCE Hordeum vulgare ...... 0.267 51.95 0.55 0.66 Ð D gemmifera Pinus banksiana ...... 0.246 51.82 0.45 0.41 Ð

NOTE.Ð␹2 is the scaled ␹2 (Shields et al. 1988). ENC is the effective number

of codons (Wright 1990). GCex,GC3, and GCin are the GϩC contents of the 38 10 NS exon, and the third codon position, and the intron, respectively. suecica Arabidopsis Species method (table 9). The ratio of preferred to nonpreferred 2 61 12 NS changes was compared by ␹ tests for polymorphism

grif®thiana between species, for ®xation between lineages, and be- Arabidopsis

Arabidopsis tween polymorphism and ®xation in each lineage. None of the tests gave signi®cant result. and 7 * 59 Discussion korshinskyi Arabidopsis

Arabis Phylogenetic Relationship Between Arabis and

0.05. Arabidopsis Species Ͻ 60 16 P NS loci of This study showed that the investigated Arabis and

wallichii Arabidopsis species were grouped into three clusters. Arabidopsis

Adh Arabidopsis thaliana clusters with Arabis and Arabi- dopsis species of diverse geographical origins, but A. 59 14 NS thaliana is relatively isolated from the three species. In

himalaica addition, A. thaliana is not included in the cluster Arabidopsis formed by the Himalayan Arabidopsis species. This re- sult agrees with the relationship of the in- ferred on the basis of variation at the rDNA (O'Kane,

7 Al-Shehbaz, and Schaal 1995) and the chloroplast rbcL 13 (Price, Palmer, and Al-Shehbaz 1994; Tsukaya et al.

OLYMORPHISM 1997) and supports the proposal that A. thaliana and the P

Arabidopsis thaliana Himalayan Arabidopsis be classi®ed into different gen- era (O'Kane, Al-Shehbaz, and Schaal 1995)...... The three clusters detected in this study do not cor- respond to the generic classi®cation of Arabis and Ara- .ÐThe numbers of sites are shown. NS, nonsigni®cant; *

...... bidopsis. The genera are morphologically similar and OTE

N differ in only a few characters. The main morphological Replacement Signi®cance Table 6 McDonald and Kreitman test (1991) for the Silent character to distinguish Arabis from Arabidopsis is the 1426 Miyashita et al.

Table 8 ``Preferred'' Codons in Arabidopsis thaliana Amino Acid Codon Low High G Phe ...... UUU 1,237 920 UUC 1,390 1,683 74.6* Leu ...... UUA 797 352 UUG 1,277 1,151 91.9* Leu ...... CUU 1,391 1,597 105.2* CUC 1,122 1,204 41.9* CUA 710 291 CUG 779 346 Ile ...... AUU 1,462 1,467 6.53* AUC 1,234 1,633 146.2* AUA 946 300 Met AUG Val...... GUU 1,546 1,899 43.8* GUC 836 1,160 56.5* GUA 663 277 GUG 1,182 1,022 Ser...... UCU 1,235 1,373 22.8* UCC 686 767 14.6* UCA 994 713 UCG 574 413 Pro ...... CCU 949 1,203 6.31* CCC 385 384 CCA 866 111 7.16* CCG 517 460 Thr ...... ACU 1,080 1,315 23.29* ACC 672 1,078 108.1* ACA 990 672 ACG 565 370 Ala ...... GCU 1,469 2,696 254.7* GCC 830 1,007 GCA 1,121 839 GCG 606 524 Tyr ...... UAU 853 581 UAC 923 1,228 95.0* FIG. 2.Ð The frequency distribution of codon bias of A. thaliana Ter...... UAA nuclear genes. In total, 1,491 nuclear coding sequences were used. UAG His ...... CAU 796 617 CAC 542 755 57.4* Gln ...... CAA 1,195 858 relative position of the cotyledons to the radicle in the CAG 959 1,162 70.7* seed (Hedge 1968). The results of this study suggest that Asn ...... AAU 1,343 705 this morphological difference is not suf®cient to classify AAC 1,403 1,520 152.0* the two groups of crucifers. Furthermore, from the tree, Lys ...... AAA 1,828 1,429 it could be speculated that we sequenced the Adh gene AAG 1,925 3,111 256.1* Asp ...... GAU 2,042 1,931 of C. arenosa origin. While analyzing a part of the Adh GAC 1,158 1,322 13.5* region, Hanfstingl et al. (1994) also noted that C. areno- Glu ...... GAA 2,075 1,640 sa was close to A. thaliana. The genus Cardaminopsis GAG 2,001 2,308 70.9* might be included in the group of Arabis and Arabidop- Cys ...... UGU 576 523 0.04 UGC 470 419 sis. Mixing of these three genera is consistent with the Ter...... UGA phylogenetic relationship of the Brassicaceae (Price, Trp ...... UGG Palmer, and Al-Shehbaz 1994; Tsukaya et al. 1997). Arg ...... CGU 469 844 224.3* Price, Palmer, and Al-Shehbaz (1994) suggested that CGC 210 191 tribal and generic classi®cations of the crucifers based CGA 344 190 CGG 341 115 on morphological characters need to be revised using Ser...... AGU 879 593 molecular data. Our results also suggest the necessity AGC 749 663 13.0* for a new assessment of the classi®cation of the Bras- Arg ...... AGA 1,058 905 sicaceae, at least Arabis and Arabidopsis. AGG 737 740 5.4* Gly ...... GGU 1,242 2,409 110.3* As mentioned, sequenced Adh genes might be par- GGC 614 691 alogous. The trees based on the Adh variation may not GGA 1,383 2,224 13.7* re¯ect the true phylogenetic relationship of these spe- GGG 752 479 cies, although the results obtained in this study were NOTE.ÐLow and High indicate the number of codons used in 83 low and consistent with previous results based on other regions. 268 high biased genes, respectively. It would be dif®cult to determine if nuclear genes are * ``Preferred'' codon showing signi®cant increase in frequency with codon orthologous or paralogous because of high polyploidy bias. Signi®cance of the G value was tested by the sequential Bonferroni test in some Arabis and Arabidopsis species. To solve this (Rice 1989). Adh Variation in Arabis and Arabidopsis 1427

Table 9 Campbell and Gowri 1990). This difference in codon Classi®cation of Silent Changes in the Adh of Arabidopsis bias is not evident in the Adh regions of investigated thaliana and Arabis gemmifera plant species (table 7), although GC3 of the Adh in mon- Polymorphism A. thaliana A. gemmifera Total ocots (Poaceae) is higher than that of dicots (). Preferred ...... 3 3 6 The maize Adh2 gene has the highest codon bias and Nonpreferred ...... 7 2 9 GC3 among the Adh genes studied, but codon bias of Neutrala ...... 3 0 3 the Adh genes in the other monocots is relatively low. Total ...... 13 5 18 As in other organisms (Ikemura 1981; Shields et al. LINEAGE 1988), codon bias is related to the expression of genes (abundance of proteins) in plants (Murray, Lotzer, and FIXATION A. thaliana A. gemmifera TOTAL Eberle 1989), and codon change in¯uences the level of b Preferred ...... 6.5 2 8.5 expression of plant genes (Perlak et al. 1991; Rouwen- Nonpreferred ...... 7 3 10 Neutral ...... 11 5 16 dal et al. 1997). The relatively high bias in dicots, es- Total ...... 24.5 10 34.5 pecially Arabis and Arabidopsis, in the Adh (or the rel- atively low codon bias of the monocot) may be related a Silent change between codons of the same type. to evolutionary change in expression and function of the b In a codon, two nucleotide changes were detected, of which one silent change was assigned to the lineage of A. thaliana with a probability of 0.5. Adh gene in the plant classes. The pattern of codon usage is determined by nat- ural selection through translation ef®ciency, re¯ecting problem, which is related to the origin of polyploidy frequencies of isoaccepting tRNAs and mutation pres- (auto- or alloploidy), extensive molecular (library con- sure (GC pressure) related to the GC content of the ge- struction, cloning, and sequencing) and cytological nome (Grantham et al. 1980; Ikemura 1985; Shields et (chromosome pairing) studies are necessary. These stud- al. 1988). For twofold degenerate families, A. thaliana ies should be conducted in future, not only for phylo- and Drosophila have the same preferred codons, but the genetic analysis of these species, but also for an under- preferred codons of fourfold degenerate families in A. standing of the mechanism of polyploidy evolution. thaliana are different from those of Drosophila (Akashi The relative-rate test indicated that evolutionary 1994, 1995). The difference in GC contents between rates in the lineages to A. gemmifera and A. suecica (C. Drosophila and A. thaliana suggests that these organ- arenosa) are slower than those in the other lineages. isms are under different mutation pressures. The iden- These two species and A. lyrata form a subcluster in tical codon preference of twofold degenerate families cluster I. Although signi®cance was not detected for A. suggests that mutation pressure does not contribute to lyrata in the relative-rate tests, these three species seem codon usage in twofold degenerate families in these or- to have shorter branch lengths. Heterogeneity of evo- ganisms. This could be a result of natural selection by lutionary rates is caused by several factors (Kimura translation ef®ciency due to the availability of tRNA for 1983). The three species in the subcluster are distributed twofold degenerate amino acids. On the other hand, in in different parts of the world (table 1). We cannot ®nd fourfold degenerate families in A. thaliana, many pre- any common feature among the species to explain the ferred codons end with A and U at the third position. slower evolutionary rate at the Adh locus. This pattern corresponds with low GC content in A. thal- Evolutionary Change in Codon Bias of the Adh Loci iana. The difference in preferred codons suggests that of Arabis and Arabidopsis the mutation pressure in¯uences codon usage in fourfold degenerate families. In the comparisons of the ratio of In this study, signi®cance was not detected in the preferred to nonpreferred changes, we could not detect tests of neutrality. This result suggests that the Adh any evidence of natural selection on codon changes, genes in these plant species are evolving under neutral- contrary to Drosophila (Akashi 1995). Our results may ity. We found that silent changes were predominant in suggest that codon changes are selectively neutral in this gene. Silent changes could be characteristic of mo- these plant species. Another possibility is that even if lecular evolution of the Adh gene in Arabis and Arabi- natural selection operates, direction and intensity of nat- dopsis. To investigate possible evolutionary signi®cance ural selection are the same in the studied species, such of silent substitutions, codon bias of the Adh was ana- that the effect of selection is not manifested. To test lyzed. natural selection on codon changes, we may need to The frequency distribution of codon bias of A. thal- study polymorphism and divergence for more genes of iana nuclear genes was unimodal, and shifted to lower Arabis and Arabidopsis species. bias compared to those of E. coli, yeast, and Drosophila (Powell and Moriyama 1997). The codon bias of the Adh gene of A. thaliana is higher than the average codon Acknowledgments bias of A. thaliana nuclear genes. The level of codon bias of the Adh gene in Arabis and Arabidopsis is in- We thank N. Goto for Arabidopsis seeds and T. termediate among plant species, and is relatively con- Endo, S. Nasuda, Y. Yasui, and T. Sasanuma for com- stant in the genera. It has been noted that dicot genes ments and suggestions. This is contribution 553 from have lower codon bias and GC content, especially GC3, Laboratory of Plant Genetics, Graduate School of Ag- than monocot genes (Murray, Lotzer, and Eberle 1989; riculture, Kyoto University. 1428 Miyashita et al.

LITERATURE CITED MIYASHITA,N.T.,H.INNAN, and R. TERAUCHI. 1996. Intra- and interspeci®c variation in the alcohol dehydrogenase AKASHI, H. 1994. Synonymous codon usage in Drosophila locus region of wild plants Arabis gemmifera and Ara- melanogaster: natural selection and translational accu- bidopsis thaliana. Mol. Biol. Evol. 13:433±436. racy. Genetics 136:927±935. MORIYAMA, E. N., and D. L. HARTL. 1993. Codon usage . 1995. Inferring weak selection from patterns of bias and base composition in Drosophila. Genetics 143: polymorphism and divergence at ``silent'' sites in Dro- 847±858. sophila DNA. Genetics 139:1067±1076. MURRAY, E. E., J. LOTZER, and M. EBERLE. 1989. Codon CAMPBELL, W. H., and G. GOWRI. 1990. Codon usage in usage in plant genes. Nucleic Acids Res. 17:477±498. higher plants, green algae, and cyanobacteria. Plant NAKAMURA,Y.,T.GOJOBORI, and T. IKEMURA. 1997. Codon Physiol. 92:1±11. usage tabulated from the international DNA sequence da- CHANG, C., and E. M. MEYEROWITZ. 1986. Molecular clon- tabases. Nucleic Acids Res. 25:244±245. ing and DNA sequence of the Arabidopsis thaliana al- NEI, M., and T. GOJOBORI. 1986. Simple methods for esti- cohol dehydrogenase gene. Proc. Natl. Acad. Sci. USA mating the numbers of synonymous and nonsynonymous 83:1408±1412. nucleotide substitutions. Mol. Biol. Evol. 3:418±426. FELSENSTEIN, J. 1993. PHYLIP (phylogenetic inference NEI, M., and W.-S. LI. 1979. Mathematical model for study- package). Version 3.5. Distributed by the author, Depart- ing genetic variation in terms of restriction endonuclease. ment of Genetics, University of Washington, Seattle. Proc. Natl. Acad. Sci. USA 76:5269±5273. FU, Y. X., and W. H. LI. 1993. Statistical tests of neutrality O'KANE, S. L., I. A. AL-SHEHBAZ, and B. A. SCHAAL. 1995. of mutation. Genetics 133:693±709. Phylogenetics of Arabidopsis: scope and content based GRANTHAM, R., C. GAUTIER,M.GOUY,R.MERCIER, and A. on DNA sequences of nuclear rDNA internal transcribed PAVEÂ . 1980. Codon catalog usage and the genome hy- spacers. Am. J. Bot. 82(Suppl.):154. pothesis. Nucleic Acids Res. 8:49±62. O'KANE, S. L., B. A. SCHAAL, and I. A. AL-SHEHBAZ. 1996. HANFSTINGL, U., A. BERRY,E.A.KELLOG,J.T.COSTA III, The origins of Arabidopsis suecica (Brassicaxeae) as in- W. R UDIGER, and M. AUSUBEL. 1994. Haplotype diver- dicated by nuclear rDNA sequences. Syst. Bot. 21:559± gence coupled with lack of diversity at the Arabidopsis 566. thaliana alcohol dehydrogenase locus: role for both bal- PERLAK, F. J., R. L. FUCHS,D.A.DEAN,S.A.MCPHERSON, ancing and directional selection? Genetics 138:811±828. and D. A. FISCHHOFF. 1991. Modi®cation of the coding HEDGE, I. C. 1968. Arabidopsis. Fl. Iranica 57:328±334. sequence enhances plant expression of insect control pro- Â HUDSON, R. R., M. KREITMAN, and M. AGUADE. 1987. A tein gens. Proc. Natl. Acad. Sci. USA 88:3324±3328. test of neutral molecular evolution based on nucleotide POWELL, J. R., and E. N. MORIYAMA. 1997. Evolution of data. Genetics 116:153±159. codon usage bias in Drosophila. Proc. Natl. Acad. Sci. HYLANDER, N. 1957. Cardaminopsis arenosa (Fr.) Hiit., a USA 94:7784±7790. northern amphidiploid species. Bull. Jard. Bot. Brux. 27: PRICE, R. A., J. D. PALMER, and I. A. AL-SHEHBAZ. 1994. 591±604. Systematic relationship of Arabidopsis: a molecular and IKEMURA, T. 1981. Correlation between the abundance of morphological perspective. Pp. 7±19 in E. M. MEYEROW- Escherichia coli transfer RNAs and the occurrence of the ITZ and C. R. SOMERVILLE, eds. Arabidopsis. Cold Spring respective codons in its protein genes. J. Mol. Biol. 146: Harbor Laboratory Press, Cold Spring Harbor, N.Y. 1±21. Â . 1985. Codon usage and tRNA content in unicellular REDEI, G. P. 1972. Compatibility of Arabidopsis thaliana and multicellular organisms. Mol. Biol. Evol. 2:13±34. with Cardaminopsis arenosa. Arabidopsis Inf. Serv. 9:42. RICE, W. R. 1989. Analyzing tables of statistical tests. Evo- INNAN, H., F. TAJIMA,R.TERAUCHI, and N. T. MIYASHITA. 1996. Intragenic recombination in the Adh locus of the lution 43:223±225. wild plant Arabidopsis thaliana. Genetics 143:1761± ROUWENDAL,G.J.A.,O.MENDES,E.J.H.WOLBERT, and 1770. A. D. DE BOER. 1997. Enhanced expression in tobacco KAWABE, A., H. INNAN,R.TERAUCHI, and N. T. MIYASHITA. of the gene encoding green ¯uorescent protein by mod- 1997. Nucleotide polymorphism in the acidic chitinase i®cation of its codon usage. Plant Mol. Biol. 33:989±999. locus (ChiA) region of the wild plant Arabidopsis thali- ROZAS, J., and R. ROZAS. 1997. DnaSP version 2.0: a novel ana. Mol. Biol. Evol. 14:1303±1315. software package for extensive molecular population ge- KIMURA, M. 1980. A simple method for estimating evolu- netics analysis. Comput. Appl. Biosci. 13:307±311. tionary rate of base substitutions through comparative SAITOU, N., and M. NEI. 1987. The neighbor-joining meth- studies of nucleotide sequences. J. Mol. Evol. 16:111± od: a new method for reconstructing phylogenetic trees. 120. Mol. Biol. Evol. 4:406±425. . 1983. The neutral theory of molecular evolution. SHARP, P. M., and W.-H. LI. 1987. The rate of synonymous Cambridge University Press, Cambridge, England. substitution in enterobacterial genes is inversely related KUMAR, S., K. TAMURA, and M. NEI. 1993. MEGA: molec- to codon usage bias. Mol. Biol. Evol. 4:222±230. ular evolutionary genetics analysis. Version 1.0.1. Insti- SHARP, P. M., and A. T. LLOYD. 1993. Codon usage. Pp. tute of Molecular Evolutionary Genetics, Pennsylvania 378±397 in G. MARONI, ed. An atlas of Drosophila genes: State University, University Park. sequences and molecular features. Oxford University LLOYD, A. T., and P. M. SHARP. 1992. CODONS: a micro- Press, New York. computer program for codon usage analysis. J. Hered. 83: SHIELDS, D. C., P. M. SHARP,D.G.HIGGINS, and F. WRIGHT. 239±240. 1988. ``Silent'' sites in Drosophila genes are not neutral: MCDONALD, J. H., and M. KREITMAN. 1991. Adaptive pro- evidence of selection among synonymous codons. Mol. tein evolution at the Adh locus in Drosophila. Nature Biol. Evol. 5:704±716. 351:652±654. SWOFFORD, D. L. 1993. PAUP: phylogenetic analysis using MEYEROWITZ, E. M., and C. R. SOMERVILLE. 1994. Arabi- parsimony. Version 3.1.1. Illinois Natural History Sur- dopsis. Cold Spring Harbor Laboratory Press, New York. vey, Champaign. Adh Variation in Arabis and Arabidopsis 1429

TAJIMA, F. 1989. Statistical test for testing the neutral mu- logical and molecular genetic characterization of Arabi- tation hypothesis by DNA polymorphism. Genetics 123: dopsis himalaica, with reference to A. thaliana. J. Plant 585±595. Res. 110:15±23. . 1993. Simple methods for testing the molecular WRIGHT, F. 1990. The `effective number of codons' used in evolutionary clock hypothesis. Genetics 135:599±607. a gene. Gene 87:23±29. TERAUCHI, R., and A. KONUMA. 1994. Microsatellite poly- morphism in Dioscorea tokoro, a wild yam species. Ge- nome 37:794±801. NARUYA SAITOU, reviewing editor TSUKAYA, H., J. YOKOYAMA,H.IKEDA,H.KUROIWA,T.KU- ROIWA, and K. IWATSUKI. 1997. Morphological, physio- Accepted July 10, 1998