<<

Copyright  2003 by the Genetics Society of America

Duplication-Dependent CG Suppression of the Storage of

Gertrud Lund,*,1 Massimiliano Lauria,* Per Guldberg† and Silvio Zaina‡ *Plant Biochemistry Laboratory, Department of Plant Biology, The Royal Veterinary and Agricultural University, DK-1871 Frederiksberg C, Denmark, †Institute of Cancer Biology, Danish Cancer Society, DK-2100 Copenhagen, Denmark and ‡Experimental Cardiovascular Research, Wallenberg Laboratory, Department of Medicine, University of Lund, 205 02 Malmø, Sweden Manuscript received February 28, 2003 Accepted for publication June 13, 2003

ABSTRACT This study investigates the prevalence of CG and CNG suppression in single- vs. multicopy DNA regions of the maize genome. The analysis includes the single- and multicopy seed storage (zeins), the miniature inverted-repeat transposable elements (MITEs), and long terminal repeat (LTR) retrotranspo- sons. Zein genes are clustered on specific chromosomal regions, whereas MITEs and LTRs are dispersed in the genome. The multicopy zein genes are CG suppressed and exhibit large variations in CG suppression. The variation observed correlates with the extent of duplication each zein has undergone, indicating that gene duplication results in an increased turnover of cytosine residues. Alignment of individual zein genes confirms this observation and demonstrates that CG depletion results primarily from polarized C:T and G:A transition mutations from a less to a more extensively duplicated gene. In addition, transition mutations occur primarily in a CG or CNG context suggesting that CG suppression may result from deamination of methylated cytosine residues. Duplication-dependent CG depletion is likely to occur at other loci as duplicated MITEs and LTR elements, or elements inserted into duplicated gene regions, also exhibit CG depletion.

N many organisms, nuclear DNA is methylated at cyto- than the expected value (Bird 1980). In contrast, both I sine residues, resulting in 5-methylcytosine (5mC). monocot and dicot plant genes are only slightly CG In plants, symmetrical 5Ј-CpG-3Ј (CG) and 5Ј-CpNpG-3Ј depleted (an average of 75–80% of expected values), (CNG) are the most frequent targets of cytosine methyl- and CNG suppression is lacking or less severe than CG ation, whereas in mammals 90% of methylation is re- depletion (McClelland 1983; Gardiner-Garden et al. stricted to the CG dinucleotide (Gruenbaum et al. 1981, 1992; Ashikawa 2001). 1982). However, the degree and ratio of CG and CNG The most commonly explained mechanism of CG methylation can vary considerably between plant species depletion relates to the tendency of 5mC to undergo (Jeddeloh and Richards 1996; Kovarik et al. 1997). spontaneous deamination to thymidine, resulting in C:T .(of cytosine residues are or G:A transition mutations (Couloundre et al. 1978 %28ف ,For example, in maize methylated compared to only 6% in Arabidopsis (Leut- Interestingly, the mutability of CG has been shown to wiler et al. 1984; Matassi et al. 1992; Montero et al. be one of the most important causes of germline point 1992). Furthermore, analysis of the rRNA genes from mutations in human genetic diseases and is a frequent maize has shown that the external cytosine is twofold less occurrence in somatic mutations leading to cancer methylated compared to the internal cytosine residue of (Cooper and Krawczak 1989; Jones et al. 1992; Hol- the 5Ј-CCG-3Ј sequence (Kovarik et al. 1997), indicat- stein et al. 1994). In addition to the mutability of 5mC, ing that CG methylation occurs more frequently than recent evidence has shown that cytosine deamination CNG methylation. also contributes to CG suppression (Fryxell and Zuck- CG suppression, or depletion, refers to the underrep- erkandl 2000). resentation of the CG dinucleotide compared to an Although the majority of methylation in plants is asso- estimated value based on the G ϩ C content of the ciated with repetitive DNA sequences such as transpo- sequence investigated. CG suppression is especially evi- sons, duplicated gene regions can also be methylated dent in the mammalian genome where the frequency (Bianchi and Viotti 1988; Flavell et al. 1988; Bennet- of the CG dinucleotide can be up to fivefold lower zen et al. 1994; Flavell 1994; Bender and Fink 1995; Ronchi et al. 1995; Rabinowicz et al. 1999). In Neuro- spora crassa, duplicated sequences are efficiently tar- geted by methylation, and a large number of C:T transi- 1Corresponding author: Plant Biochemistry Laboratory, Department of Plant Biology, Thorvaldsensvej 40, DK-1871 Frederiksberg C, Den- tion mutations are introduced following duplication mark. E-mail: [email protected] [hence the name repeat-induced point mutation (RIP;

Genetics 165: 835–848 (October 2003) 836 G. Lund et al.

Cambereri et al. 1989; Selker 1990)]. Similarly, in Asco- and Messing 2002). In contrast, the 10-, 15-, 16-, and bolus immersus a process referred to as methylation in- 27-kD proteins that represent the zein-2 fraction are duced premeotically (MIP) results in de novo methyla- encoded by one or two genes that show limited sequence tion of a DNA sequence upon duplication (Goyon and similarity to the ␣-zeins (Prat et al. 1985; Kirihara et Faugeron 1989). The observed consequences of de novo al. 1988; Swarup et al. 1995). methylation in RIP and MIP include gene inactivation The majority of 22-kD zein genes form a dense gene kb in size, on chromosome 4 of maize 168ف ,and a reduction in the frequency of recombination cluster (Barry et al. 1993; Rountree and Selker 1997; (Llaca and Messing 1998), whereas the 19-kD zein Maloisel and Rossignol 1998). Similar roles of dupli- genes are distributed on five unlinked genomic loca- cation-induced DNA methylation have been proposed tions on maize chromosomes 1, 4, and 7 (Soave et al. to occur in plants (Flavell 1994; Bender 1998). 1981, 1982; Wilson et al. 1989; Woo et al. 2001; Song In mammals, duplicated genes are more CG sup- and Messing 2002). Phylogenetic analysis of the ␣-zein pressed compared to single-copy genes. This observa- genes has revealed that the 19- and 22-kD zein genes tion has led to the suggestion that duplicated genes share a common ancestor (Song and Messing 2002). have a history of methylation and subsequent mutation Given that only the 22-kD zein genes have been identi- of methylated residues (Kricker et al. 1992). Likewise, fied in Coix lacryma-jobi, an ancestor of maize (Leite et in plants, the multigene families of 5S rRNA genes from al. 1990), it is probable that in maize the 19-kD zein Arabidopsis and rRNA genes from maize show elevated genes have derived from the 22-kD zein genes. Interest- levels of transition mutations that are consistent with ingly, it has been estimated that the amplification of deamination of 5mC, in particular the nonfunctional the ␣-zein gene family in maize occurred within the last members of these gene families (Edward et al. 1996; 3–4 million years (Song et al. 2001; Song and Messing Matieu et al. 2002). However, neither study can confirm 2002). if CG loss results from spontaneous deamination of In contrast to the clustered zein genes, MITEs and methylated residues over time or whether depletion is LTR elements are dispersed in the genome. MITEs are the consequence of an active mechanism linked to du- frequently associated with promoter and 3Ј regulatory plication. We have analyzed the CG dinucleotide and regions of genes, whereas the larger LTR-transposons CNG trinucleotide content of the large zein gene family, are typically found in intergenic regions (Kumar and which encodes the seed storage proteins of maize. Due Bennetzen 1999). The copy numbers of MITEs and to differences in gene copy number of each subfamily, LTR elements range from 3000 to 10,000 copies and a the zein genes provide an ideal model system to analyze few to 50,000 copies, respectively (Bureau and Wessler the effect of gene duplication on CG suppression. In 1992, 1994; SanMiguel et al. 1996; Zhang et al. 2000). addition, the highly abundant LTR-retrotransposons Similar to the amplification of the zein gene family, a and MITEs have also been analyzed for evidence of CG majority of LTR-retrotransposons have colonized the suppression. maize genome during the last 5 million years (San- The zeins constitute 50–60% of total endosperm pro- Miguel et al. 1998). tein and can be divided into two major fractions, zein-1 Our analysis shows that duplicated zein genes are CG and zein-2, on the basis of solubility characteristics (Esen suppressed and that the degree of suppression corre- 1986). The zein-1 fraction consists of the 19- and 22-kD lates with the copy of each subfamily. Likewise, within polypeptides that are encoded by a large gene family, the 19- and 22-kD zein gene families the extent of CG the ␣-zeins. On the basis of DNA sequence identity and depletion correlates with the number of duplications hybridization characteristics, this family can be further each individual gene has undergone. Despite their high divided into four subfamilies, z1A, z1B, z1C, and z1D copy number, most MITEs and LTR elements are not (Heindecker and Messing 1986; Rubenstein and Ger- CG suppressed, except when duplicated or located in aghty 1986). Genes belonging to the z1A, z1B, and a duplicated DNA sequence. This suggests that the pro- z1D subfamilies encode the 19-kD zein genes, whereas cess leading to CG depletion is activated upon duplica- the 22-kD zein genes are encoded mainly by the z1C tion or is a consequence of the duplication process itself. subfamily. The 19-kD genes have an estimated copy We discuss the possible role of duplication-dependent number of 56 per haploid genome, whereas the 22-kD CG depletion in the of the GC-poor isochores zeins are presumed to be present in 15 copies per hap- in which the zein genes are located. loid genome (Hagen and Rubenstein 1981; Wilson and Larkins 1984). However, the exact copy number of the ␣-zein genes can show considerable variation MATERIALS AND METHODS among different inbred lines (Llaca and Messing 1998; Song and Messing 2002). In addition, within the Sequences: The di-and trinucleotide composition of the coding region of 32 zein genes belonging to the zein-1 fraction 19-kD zein gene family, the z1A subfamily has the high- and 6 genes belonging to zein-2 fraction was analyzed. All 22- est copy number, followed by z1B and z1C (Wilson and kD zein genes were derived from the inbred line BSSS53 Larkins 1984; Heindecker and Messing 1986; Song (af090447), whereas the 19-kD gene sequences were derived Duplication-Dependent CG Depletion 837 from the B73 inbred line (af546187, af546188; af546189, TTTACATACCAATACATAA-3Ј; W22, 5Ј-GGGTATATAATT af546190; Song et al. 2001; Song and Messing 2002). Only AGTGTAATTTAATATATG-3Ј and 5Ј-ATTCTTAAAACTTTA full-length genomic and cDNA clones, including genes with CATACCAATACATAA-3Ј. The resulting PCR products were in-frame stop codons, were analyzed. Clones in which the cloned by TOPO cloning and 16 individual clones were se- open reading frame was disrupted by insertions or deletions quenced at MWG Biotech (Ebersberg, Germany). To confirm were omitted from the analysis. One exception was Z492M16-5, the previously published MITE sequences, the Tourist element which was included as it is expressed despite a deletion in the was also amplified from genomic DNA employing the follow- open reading frame. The correct open reading frame was ing primer pairs: W64A, 5Ј-CCTTGGTTGTTGGCTCATAAT- determined by the use of GenBank’s annotations, and nucleo- 3Ј and 5Ј-CAGATGAGTATGATCTCGGCA-3Ј; W22, 5Ј-ATA tide compositions were generated using the Genetics Com- AGTGTTCTGGATATTGGTTGTT-3Ј and 5Ј-TCAGATGAGT puter Group (GCG) analysis software package. Sequences of ATGATCTCGCA-3Ј. These primers were also tested on bisul- MITEs and LTR-retrotransposons were extracted from gene fite-treated DNA and failed to give a product of the expected sequences according to the annotations of the authors size. To ensure that the observed patterns of methylation did (Bureau and Wessler 1992, 1994; SanMiguel et al. 1996; not result from incomplete strand separation during the bisul- Tikhonov et al. 1999; Zhang et al. 2000). Only the LTR region fite reaction, the Tourist element from the W64A inbred was and primer-binding site of the LTR-retrotransposon was ana- cloned, bisulfite treated, and amplified with the bisulfite prim- lyzed. ers. This element contains only one cytosine that can be meth- CG analysis: To measure the extent of repression of a given ylated by the Escherichia coli dcm methylase (i.e., the internal di- or trinucleotide, a score ␳ was calculated by the formula cytosine residue of the CCWGG sequence). As expected, se- ␳ϭO/E, where O and E denote the observed and expected quence analysis of 10 independent clones showed that this counts, respectively. Overall expected counts for di- and tri- cytosine remained unmodified. All the remaining cytosine nucleotides were calculated by multiplying the observed residues of the Tourist element had undergone modification counts of each nucleotide and dividing the product by the to thymidine. total number of nucleotides found in the sequence. Position- dependent expected counts were calculated assuming the ab- sence of any codon bias. The positions of the 5Ј and 3Ј di- or trinucleotides relative to codon triplets are indicated by roman RESULTS numerals; e.g., I-II denotes a dinucleotide including the first CG and CNG analysis of the zein genes: Table 1 shows two nucleotides of a codon triplet. For the CG dinucleotide, ␳ position-dependent expected counts were calculated as fol- -values (observed/expected) of CG dinucleotides and lows: I-II ϭ 2/3 of arginine-specifying codons; II-III ϭ the sum CNG trinucleotides of genes belonging to the zein-1 of 1/6 of serine-, 1/4 of proline-, threonine-, and alanine- and zein-2 fractions. All 19-kD zein genes analyzed were specifying codons; III-I ϭ NNC ϫ GNN/T; N represents any isolated from the B73 inbred line, whereas the 22-kD nucleotide and T represents the total number of triplets in zein genes were derived from the BSSS53 inbred line the sequence. In the case of CNG trinucleotide, position- ␳ dependent expected counts were calculated as follows: I-III ϭ (Song et al. 2001; Song and Messing 2002). A -average the sum of 1/6 of arginine- and leucine-, 1/4 of proline-, and was calculated of both zein fractions, of the 19- and 22- 1/2 of glutamine-specifying codons; II-I ϭ NCN ϫ GNN/T; kD zein genes, and of each of the three 19-kD zein and III-II ϭ NNC ϫ NGN/T. subfamilies, z1A, z1B, and z1D. The zein-1 fraction, rep- ␣ Alignment between members of the -zein gene family: resenting the multicopy ␣-zeins has a CG average of Pairwise alignments were conducted of individual expressed Ͻ ␳ members of the 19- and 22-kD zein genes by GAP analysis 0.40 (P 0.001), whereas the -average of single-copy (GCG analysis software package). For each alignment, the zein genes of zein-2 fraction is 0.75 (P Ͻ 0.001). Indeed, percentage of C:T and G:A transition mutations was compared the zein-1 fraction is more suppressed than the zein-2 to the total number of single-base-pair point mutations. To fraction (P Ͻ 0.001). Furthermore, the average GC con- establish if transition mutations occur in a polarized fashion, tent of the zein-1 fraction is 48% compared to 66% of i.e., from a younger to an older duplication, or from a gene that has undergone less to more duplications, transition muta- the zein-2 fraction (results not shown), indicating that tions of each gene were counted in individual alignments. CG suppression is accompanied by a decrease in G ϩ Importantly, only transition mutations occurring in a CG or C (GC) content. In contrast, none of the zein fractions CNG context were considered. are suppressed at the CNG trinucleotide. Statistical analysis: Differences between O and E values were It can also be observed that the degree of suppression tested using the chi-square analysis. To test for differences in ␳-values between groups, the Mann-Whitney U-test was em- varies between subfamilies of the zein-1 fraction. The ployed. All statistical tests were performed using the STATIS- more abundant 19-kD zein genes are the more CG sup- TICA software package for Macintosh (StatSoft, Tulsa, OK). pressed compared to the less abundant 22-kD zein genes Bisulfite analysis: Genomic DNA was extracted from young (0.34 and 0.49, respectively; P Ͻ 0.001) and, likewise, leaf tissue of the inbred lines W64A and W22 using the within the 19-kD gene family the degree of suppression DNAeasy kit (QIAGEN, Valencia, CA). The W64 and W22 inbred lines contain the Tourist element located in the 5Ј is associated with the copy number of each subfamily. flanking region of the single-copy or duplicated 27-kD zein The z1A subfamily, which has the highest copy number, gene, respectively (Das and Messing 1987). Between 1 and is more suppressed compared to the less abundant z1B 2 ␮g of DNA were treated with bisulfite as described by Zesch- and z1C subfamilies (P Ͻ 0.009 and P Ͻ 0.027, respec- nigk et al. (1997). For PCR analysis 1/10 vol of bisulfite-treated tively). We also analyzed 18 19-kD genes from different DNA was employed in a standard PCR reaction. The primer pairs employed for amplification of the Tourist element from genetic backgrounds and found no differences in the bisulfite-treated DNA were as follows: W64A, 5Ј-TAGGTATAT average CG and CNG scores (results not shown). GATTAGTGGTAATTTAATATT-3Ј and 5Ј-ATTCTTAAAAC The amino acid content of zein-1 and zein-2 fractions 838 G. Lund et al. -, ␥ CNG ␳ -, ␤ CG ␳ -zeins) ␴ Zein-2 fraction (single-copy CNG Acc. no. bp ␳ CG ␳ 0.40*** 1.15 NS0.09 Zein-2 average: 0.12 0.75*** 1.32*** SD 0.12 0.23 0.001. ϭ P bp 801801 0.52801 0.45801 0.49801 1.11 0.58801 1.08 0.57714 1.10 0.50 1.09 0.44 m12147 0.98 m16460 543 1.11 m16218 552 1.18 x53514 0.70 630 x02230 0.82 m23537 672 0.82 612 1.36 0.82 453 1.47 0.81 1.40 0.53 1.36 1.44 0.86 801801 0.48807 0.50 0.50801 1.03 801 1.05 0.43798 1.16 0.45801 0.44 1.07 0.48 1.08 1.11 1.06 b b b b 0.01; *** b b b a Ͻ P azs22;4 azs22;8 azs22;10 azs22;12 azs22;14 zp22/6 zp22/D87 azs22;2 azs22;5 azs22;6 azs22;11 azs22;15 azs22;20 azs22;21 TABLE 1 0.05; ** Ͻ P -zeins) and ␣ Overall CG and CNG zein score , expected). * E CNG Subfamily Gene ␳ CG , observed; ␳ O ( Zein-1 fraction (multicopy E / O bp 804705 0.30705 0.34705 0.34705 1.26 0.30705 1.21 0.30702 1.29 0.32702 1.30 0.29 1.24 0.21 z1C 1.27 1.34 1.24 726723 0.43 0.35 1.04 0.88 702 0.33 1.22 726723 0.34 0.40 0.93 1.03 726726 0.38723 0.37722 0.37723 1.26 0.29 1.29 0.39 1.25 1.29 1.18 ␳ϭ for accession numbers. b b 19-kD zein gene family 22-kD zein gene family 10-, 15-, 16-, and 27-kD zein genes b a Z448fF14-2 Z448F14-3 Z448F14-4 Z448F14-5 Z448F14-6 Z448F14-7 Z350DO7-1 Z350DO7-2 Z513H09-1 Z57A02-2 Z531H07-2 Z350DO7-3 Z492M16-1 Z492M16-2 Z492M16-4 Z492M16-5 Z492M16-6 Z531H07-1 materials and methods See Nonexpressed gene. NS, not significant;a bp, base pairs; b Average:Zein-1 average: 0.39 0.96 Average: 0.36 1.18 Average: 0.30 1.12 Subfamilyz1A Gene SD 19-kD averageSD 0.34*** 1.20 NS 22-kD average: 0.05 0.13 SD 0.49*** 1.09 NS 0.05 0.05 z1D z1B Duplication-Dependent CG Depletion 839 differs considerably, which could potentially influence highly expressed genes are more CG suppressed com- overall CG and CNG scores. For example, the ␣-zeins pared to low expressing genes. are particularly rich in glutamine, leucine, proline, and To understand if CG suppression is a general effect alanine, whereas the single-copy ␥-zeins have a high of a specific chromosomal region, the CG content of content of methionine. To address this problem, CG intergenic regions of the 22-kD zein located and CNG frequencies were analyzed in a position- on chromosome 4S was analyzed. The lengths of the dependent context (Table 2). Position-dependent fre- 22-kD zein intergenic regions varied from 2517 to 14,438 quencies correct for differences in amino acid content bp. The ␳-average CG of the intergenic region was 0.68, but not for amino acid codon bias. Essentially, position- which is significantly higher than the ␳-average of 0.50 dependent frequencies of the zein genes largely reflect of the 22-kD genes (P Ͻ 0.001). overall CG and CNG frequencies. The multicopy ␣-zeins CG analysis of MITEs and LTR-retrotransposons: are suppressed at positions II-III and III-I (Table 2; ␳ϭ The observed copy number variation in CG suppression 0.41 and 0.55, respectively; P Ͻ 0.001); in contrast, the prompted us to investigate whether the high-copy-num- single-copy zein genes are not CG suppressed at any ber LTR-retrotranposons and MITEs exhibited similar position. This implies that the low overall CG score behaviors. Three MITE families (Tourist, Stowaway, and observed for the 10-kD zein gene, m23537 (Table 1), Heartbreaker) and three groups of LTR-retrotransposon is related to the amino acid content of this gene. Within (Ty1-copia, Ty3-gypsy, and an unclassified group), dif- the zein-1 fraction, the 19-kD zein genes are CG sup- fering in element copy number between and within pressed at positions II-III and III-I (P Ͻ 0.001). The z1A each group, were analyzed for evidence of CG suppres- ␳ ϩ subfamily is more suppressed than the z1B subfamily at sion. The -value and G C content was calculated for position III-I, whereas the opposite is true of position MITEs and LTR elements in different sequence contexts II-III (P Ͻ 0.002 and P Ͻ 0.025, respectively). The 22- (Tables 3 and 4, respectively). Despite the high copy kD zein genes are also CG suppressed at position II-III number of MITEs and LTR elements, no association (␳ϭ0.52; P Ͻ 0.001), but to a lesser degree than the was found between the degree of suppression and ele- ␳ϭ Ͻ ment copy number. Most transposons were not, or were 19-kD zein genes ( 0.33; P 0.001). The CNG ␳ trinucleotide is suppressed only at position I-III of the only slightly, CG suppressed. For example, the -average of Tourist and Heartbreaker families was 0.66 and 0.60 zein-1 fraction (␳ϭ0.60; P Ͻ 0.009; results not shown) (P Ͻ 0.041 and P Ͻ 0.018), respectively, whereas no and, again, the 19-kD zein genes are more suppressed suppression was observed of Stowaway family or LTR compared to the 22-kD zein genes at this position (P Ͻ elements. However, large differences in CG suppression 0.001). were observed of both MITEs and LTR elements (␳ A recent analysis of the B73 and BSSS53 inbred lines ranging from 0.16 to 1.46 and 0.26 to 1.12, respectively). has shown that only a relatively small number of ␣-zein We found that the copy number of the insertion site genes are expressed (Song et al. 2001; Woo et al. 2001). could largely explain the variation in ␳; i.e., elements Most of the nonexpressed genes contain in-frame stop inserted into multicopy gene regions were more CG codons or insertion/deletions (Spena et al. 1983; Liu suppressed than elements inserted into single-copy and Rubenstein 1992; Llaca and Messing 1998; Song genes. This is nicely illustrated by the ␳-values of Stow- et al. 2001). Analysis of the average overall CG content away found 3Ј of the single-copy 10- and 27-kD zein of seven expressed vs. seven nonexpressed 22-kD genes genes and the multicopy 22-kD zein genes (0.66, 0.96, (marked with a superscript b in Table 1) showed no and 0.25, respectively; Table 3B; Bureau and Wessler ϭ significant differences (0.51 vs. 0.47, respectively; P 1994). Two ␳-values of Tourist and Stowaway elements 0.225). However, position-dependent frequencies indi- located 5Ј and 3Ј, respectively, of the 27-kD zein genes cated that the inactive genes are more CG suppressed are indicated (x53514 and x56118; Table 3, A and B). ϭ at position II-III (P 0.025). Due to small sample size, x56118 represents an allele of a tandem duplication of a similar calculation of the 19-kD genes was not under- a 27-kD zein gene (Das et al. 1991), whereas x53514 ␣ taken. The relative contributions of the expressed -zein (zc2) represents a single copy of a 27-kD zein gene genes have been assessed in the B73 inbred line (Woo (Reina et al. 1990). For both Tourist and Stowaway ele- et al. 2001). The 19-kD zein genes are the most highly ments, CG suppression is more severe upon insertion expressed, whereas the 22-kD zein genes are expressed in the duplicated allele. In addition, suppression of the at a threefold lower level. An estimated five genes en- Stowaway element is accompanied by a 2% decrease in code most 19-kD zein gene transcripts, and four genes G ϩ C content. Severe CG suppression was also observed account for most 22-kD zein gene transcripts. Interest- of tandemly duplicated MITEs. An example of this can ingly, the average CG values of these 19- and 22-kD zein be seen by comparing the ␳-value of a single-copy or genes correlate inversely with expression (P ϭ 0.49; ␳ϭ duplicated Tourist element located in the 3Ј region of 0.71). This is also true within the 22-kD zein gene family, the Adh1 locus (x17556 and x04049, respectively; Table and a similar tendency was observed for the 19-kD zein 3A; Bureau and Wessler 1992). No suppression is ob- gene family (results not shown). This indicates that served of the single-copy Tourist element (␳ϭ1.46), 840 G. Lund et al. -zeins) -III, CNG in ␴ r symbols are as -, and ␥ -, ␤ 10-, 15-, 16-, and 27-kD zein genes 0.81ns a 0.68 NS 0.41***0.34 0.55*** Zein-2 average: 0.81 0.13 NS 1.65*** 1.12 0.20 NS SD 0.16 0.55 0.13 0.50 0.571.00 0.450.50 0.600.50 0.68 0.620.75 0.64 0.630.50 0.69 0.590.75 0.81 0.49 0.75 0.70 m13507 0.63 m12147 x53515 m16218 x53514 0.90 x02230 0.90 m23537 0.50 1.16 0.90 1.12 0.75 1.14 1.92 0.90 2.16 1.13 NC 2.06 1.32 1.10 2.18 0.92 1.17 1.12 0.89 0.50 0.490.50 0.530.50 0.50 0.69 0.75 0.72 0.400.50 0.83 0.441.50 0.420.75 0.63 0.49 0.72 0.67 0.73 b b b b b b b TABLE 2 22-kD zein gene family azs22;4 azs22;8 azs22;10 azs22;12 azs22;14 zp22/6 zp22/D87 azs22;2 azs22;5 azs22;11 azs22;6 azs22;15 azs22;20 azs22;21 -zeins) Zein-2 fraction (single-copy ␣ Position-dependent CG and CNG zein scores a Zein-1 fraction (multicopy 0.37NCNC 0.39NC 0.49NC 0.37 0.53NC 0.37 0.26 NC 0.37 0.17 NC 0.44 0.27 z1C 0.27 0.27 0.16 0.25 0.40 0.32 NC 0.38 0.32 1.501.12 0.28 0.210.33 0.54 0.50 0.42 0.29 0.42 0.67 0.33 NC 0.21 0.67 NC0.370.50 0.21 0.26 0.31 0.72 0.54 0.61 b b b Z448fF14-2 Z448F14-3 Z448F14-4 Z448F14-5 Z448F14-6 Z448F14-7 Z350DO7-1 Z350DO7-2 Z531H07-2 Z513H09-1 Z57A02-2 Z492M16-1 Z350DO7-3 Z492M16-2 Z492M16-4 Z492M16-6 Z531H07-1 19-kD zein gene family I-II, CG dinucleotide in position 1 of open reading frame; II-III, CG in position 2 of open reading frame; III-I, CG in position 3 of open reading frame; I Average:Average: 0.87 0.25 0.42 0.58 0.36 0.50 Zein-1 average: Average: 0.37 0.38 0.29 position 1 of openin reading Table frame; 1. II-I, CNG in position 2 of open reading frame; III-II, CNG in position 3 of open reading frame; NC, not calculated; othe Subfamilyz1A Gene I-II II-III III-I Subfamily Gene I-II II-III III-I Acc. no. I-II II-III III-I z1D 19-kD average:SD 0.67 NS 0.33*** 0.42*** 22-kD average: 0.46 0.11 0.68 NS 0.17 0.52*** 0.71 NS SD 0.28 0.08 0.06 SD z1B Duplication-Dependent CG Depletion 841

TABLE 3 CG scores of MITE families

Location CG Acc. no. Gene IS Copy no. of IS Bp OE ␳ %G ϩ C A. Tourist (10,000 copies) x17556a Adh-1C m 3Ј 1 126 10 6.86 1.46 47 x04049b Adh-1S 3Ј 1 137 1 6.26 0.16 43 x07940a Bz-McC 3Ј 1 136 1 2.24 0.45 26 S48688a Wx-B2 Exon 1 128 2 7.27 0.28 48 x53514a 27-kD zein 5Ј 1 132 7 5.49 1.27 41 x56118b 27-kD zein 5Ј 2 131 5 5.72 0.87 42 j05212 Oleosin, KD18 3Ј 3–4 142 1 4.75 0.21 37 NC 40 5.17 0 130 10ف x15406 Pseudo-Gpa1 5Ј 39 0.59 5.12 3 137 10ف x15407 Pseudo-Gpa2 5Ј Tourist average: 0.66* 40

B. Stowaway (copy number not reported) z11879a P-gene Intron 1 80 4 3.20 1.25 40 m23537a 10-kD zein 3Ј 1 153 2 3.07 0.66 28 x53514a 27-kD zein 3Ј 1 154 2 2.08 0.96 23 x56118b 27-kD zein 3Ј 2 163 0 1.72 NC 21 x73152a Gpc4 Intron 4 157 4 3.92 1.02 32 32 0.25 3.98 1 156 15ف x61085b 22-kD zein 3Ј Stowaway average: 0.83 NS 29

C. Heartbreaker (3,000–4,000 copies) af203730 NA NA 1 314 9 14.54 0.62 44 af203733 NA NA 1–2 314 9 14.01 0.64 43 af203731 NA NA 2 314 9 14.36 0.63 44 af203729 NA NA 2–4 313 10 15.11 0.66 45 af203732 NA NA Ͼ10 314 6 13.75 0.44 43 Heartbreaker average: 0.60* 44 IS, insertion site; NA, not available; Adh-1S, alcohol dehydrogenase 1S allele; Adh-1C m, alcohol dehydrogenase 1C m allele; Bz-McC, UDP glucose flavonoid glucosyl transferase; Wx, ADP glucose glucosyl-transferase; Gpa1 and Gpa2, pseudogene of glyceraldehyde-3-phosphate dehydrogenase; P, anthocyanin gene; Gpc4, glyceraldehyde- 3-phosphatase; other symbols are as in Table 1. a Insertion into single-copy sequence. b Insertion in clustered multicopy gene region or duplicated element. whereas the tandemly duplicated element is severely CG family, was analyzed as a single-copy insertion (Russell suppressed (␳ϭ0.16). Again, CG suppression of the and Sachs 1991), and some elements inserted in duplicated element is accompanied by a 4% decrease multicopy sequences were not included as it is unknown in G ϩ C content compared to the single-copy insertion. whether these genes are clustered or dispersed (e.g., LTR regions also exhibit severe CG suppression upon Gpa pseudogenes and oleosins). CG suppression was duplication or if located in a duplicated DNA region not observed of LTR elements and MITEs located in (Table 4). For example, CG suppression is observed single-copy genic regions (␳ϭ0.94), whereas tandemly of x58700, a Hopscotch-like transposon inserted in the duplicated elements or gene sequences were suppressed promoter region of a multicopy 19-kD zein gene (White (␳ϭ0.33; P Ͻ 0.020). Furthermore, the ␳-average of et al. 1994), and of u68406, a tandem duplication of an MITEs and LTR elements inserted in single-copy gene element, Kake-1 (SanMiguel et al. 1996). regions was higher than the average value of tandemly An average ␳-value was calculated of elements inserted duplicated elements and elements inserted into dupli- into single- vs. multicopy gene regions or tandemly du- cated DNA regions (P Ͻ 0.004). Simply analyzing ␳-values plicated elements of selected data points (Tables 3 and of MITEs and LTR elements located in single- vs. multi- 4; labeled with superscript a and b, respectively). The copy gene regions produced results identical to the criterion for selection of data points was knowledge of selected data set, the latter being less significant (P Ͻ copy number of both element and insertion sequence. 0.009). In addition, only clustered multicopy genes were repre- CG depletion as a function of time or gene duplica- sented in the multicopy group. Therefore, the Gpc4 tion: If CG suppression results from passive deamination gene, which belongs to a small dispersed multigene of methylated residues, the extent of depletion should 842 G. Lund et al.

TABLE 4 CG values of LTR-retrotransposons

Element Location CG Acc. no. Name Copy no. Gene IS IS copy no. Bp OE ␳ %GC Ty1-copia group u12626a Hopscotch 2–6 wx-K Exon 12 1 231 11 11.22 0.98 44 43 0.40 4.97 2 147 56ف x58700b Hopscotch 2–6 19-kD zein 5Ј af082134a Stonor 30–40 wx-Stonor Intron5/exon6 1 560 37 38.02 0.97 52 334B7.4 Exon 1 NA 1162 64 63.94 1.00 47 100ف u68401 Fourf Intergenic NA 100 1 3.84 0.26 40 100ف u68410 Victim u68408 Opie-2 Ͼ30,000 334B7.4 3Ј NA 1271 69 80.22 0.86 50 af090447 Prem-2 NA Intergenic NA 1424 82 104.09 0.79 54 u68405 Ji-3 50,000 Intergenic NA 1176 58 72.62 0.80 50 Average: 0.76 NS 48

Ty3-gypsy group af015269a Magellan 4–8 Pl Exon 1 1 336 19 19.54 0.97 49 U68409 Reina Ͻ 10 Intergenic NA 323 15 20.43 0.73 51 334B7.4 Exon 1 NA 1644 162 170.66 0.95 65 100ف U68404 Huck-2 U68403 Grande-zm1 Ͼ1300 Intergenic NA 645 39 43.00 0.90 52 U68402 Cinful 20,000 Intergenic NA 605 20 23.00 0.86 39 af090447 Zeon-1 20,000 Intergenic NA 669 25 23.93 1.04 38 U11059a Zeon-1 20,000 27-kD zein 5Ј 1 649 21 21.40 0.98 37 af090447 NA NA Intergenic NA 669 137 157.52 1.04 44 Average: 0.93 NS 47

Nonclassified group 334B7.4 3Ј NA 742 75 66.47 1.12 60 100ف U68407 Milt Intergenic NA 182 2 6.53 0.27 40 100ف U68406b Kake-1 Average: 0.70 NS 50 Symbols are as in Tables 1 and 3. a Insertion into single-copy sequence. b Insertion in clustered multicopy gene region or duplicated element.

reflect the number of years a sequence has been methyl- ber of C:T and G:A transition mutations and the total ated. Table 5A shows a comparison between ␳, the esti- number of transition mutations occurring in a CG or mated time of insertion, and the number of duplications CNG context were counted and compared to the total of the seven expressed 22-kD zein genes clustered on number of point mutations (results not shown). In addi- chromosome 4S. Similarly, the estimated times of retro- tion, transition mutations occurring in a CG or CNG transposon insertion at the Adh1-F locus has been com- context were counted for each gene in the pairwise pared to CG frequencies (Table 5B). Neither CG nor alignments. Transition mutations were the most com- CNG suppression correlated with time of insertion of mon point mutations observed, representing on average the 22-kD zein genes or of the LTR-retrotransposons. 61% of total point mutations, and the vast majority oc- However, the 22-kD zein genes showed an inverse corre- curred in a CG or CNG context (average 74%). In 7/21 lation between CG content and the number of duplica- alignments performed, it was possible to distinguish if tions each gene has undergone (r ϭ 0.85; P Ͻ 0.014). CG depletion resulted from the time or extent of dupli- This was found to be specific of the CG dinucleotide. cation, whereas the remaining alignments were nonin- Similarly, analysis of the expressed 19-kD zein genes formative. The informative alignments included zp22/ isolated from the B73 cluster also showed that the de- D87 and azs22;8, azs22;10 and azs22;4, azs22;10 and gree of CG suppression correlated with the extent of azs22;14, azs22;12 and zp22/6, zp22/6 and azs22;14, duplication (r ϭ 0.52; P Ͻ 0.05; n ϭ 14; results not azs22;12 and azs22;10, and zp22/6 and azs22;14. Four shown). of these alignments showed that transition mutations Sequence divergence of duplicated DNA sequences: occurring in CG context were consistent with CG sup- To identify the fate of CG dinucleotides, pairwise align- pression resulting from duplication, whereas none were ments of the seven expressed zein genes was conducted consistent with the time of amplification. For example, (Table 5). For each of the 21 alignments, the total num- 11 transition mutations (occurring in a CG context) Duplication-Dependent CG Depletion 843

TABLE 5 and Z448F14-4) were aligned to all the expressed genes CG scores of 22-kD zein genes and LTR regions compared of z1B and z1C. In 16/18 alignments, a higher number to insertion time and duplication of transition mutations at CG dinucleotides were ob- served from the less duplicated z1B and z1C genes to ␳ the more extensively duplicated z1A genes. The same Insertion No. of could be concluded by alignments of two z1B genes A. 22-kD gene (MYR)a duplications CG CNG (Z492M16-1 and Z492M16-4) to the two expressed z1D azs22;12 0.6 4 0.58 1.09 genes. Of the 18 alignments performed, transition muta- azs22;10 0.6 6 0.49 1.10 tions represented on average 57% of total single base zp22/6 0.6 6 0.50 1.11 pair mutations and, of these, 48% occurred in a CG or azs22;14 1.4 6 0.57 0.98 CNG context. azs22;4 1.4 6 0.52 1.11 zp22/D87 1.6 8 0.44 1.18 Similar conclusions were found by aligning the dupli- azs22;8 2.3 7 0.45 1.08 cated 27-kD zein gene, x56118, to the single-copy 27- kD zein gene, zc2 (x53514). These genes exhibit 97% Insertion sequence similarity. Only nine point mutations were B. LTR (MYR)b CG observed, six being C:T or G:A transition mutations. Cinful 0.26 0.90 Again, transition mutations occurred in a polarized fash- Grande-Zm1 0.12 0.90 ion from the single copy to the duplicated gene (four vs. Opie-1 0.18 0.90 two, respectively). However, only transition mutations Fourf 1.39 1.00 observed from the zc2 to the x56118 allele were in a Milt 1.56 1.10 CG context (3/4). This pattern of mutation was also Ji-3 1.86 0.80 mirrored by alignment of the Tourist and Stowaway ele- Reina 2.08 0.73 ments located in the 5Ј and 3Ј flanking region of the Huck-1 2.26 0.90 Victim 2.42 0.30 single-copy and duplicated 27-kD zein gene, respec- Zeon-1 2.75 1.00 tively. Finally, alignment of the single-copy and dupli- cated Tourist element 3Јof the Adh1 locus confirmed a Insertion/duplication as estimated by Song et al. (2001). that C:T and G:A transition mutations are polarized b Insertion time as estimated by SanMiguel et al. (1998). (i.e., occur from the single to the duplicated MITE se- quence). The degree of sequence identity between individual were observed from azs22;12 to zp22/6, whereas only MITEs varies between 46 and 88% (Bureau and Wes- 7 were observed in the opposite direction. This is in sler 1992). Interestingly, the average degree of similar- agreement with duplication-dependent CG suppression ity between duplicated MITEs and MITEs inserted in as zp22/6 has undergone a larger number of duplica- duplicated gene regions is higher than the average de- tions compared to azs22;12 (six vs. four, respectively). gree of similarity of MITEs inserted in single-copy re- If CG suppression were the result of time-dependent gions (66 vs. 63%, respectively; P ϭ 0.0431). This sup- deamination of cytosine residues, an equal number of ports our observations that linked duplicated sequences polarized transition mutations would have been ex- evolve more similarly compared to single or dispersed, pected given the fact that these genes have both ampli- multicopy sequences. fied 0.6 MYA. In contrast, no association between transi- Methylation status of single- vs. multicopy genic re- tion mutations occurring at the CNG trinucleotide and gions: C:T and G:A transition mutations are the pre- duplication was found for the informative alignments. sumed products resulting from deamination of methyl- Interestingly, 5 of the informative alignments did show ated cytosines. Therefore, a possible explanation of the a time-dependent decrease in CNG content. For the enhanced turnover of CG dinucleotides might be that noninformative alignments, the majority of transition duplicated sequences exhibit qualitative or quantitative mutations observed at CG dinucleotides (12/14) also differences in methylation compared to single-copy occurred in a polarized fashion from a less to a more genes. To this end, we analyzed the methylation status of duplicated gene (or from a younger to an older duplica- the Tourist element located in the single-copy or tandem tion). However, only 6/14 alignments showed a similar duplication of the 27-kD zein gene (zc2 and x56118) behavior at the CNG trinucleotide. These results sup- by bisulfite sequencing. These elements were chosen port the prior observation that only CG suppression because they exhibit small differences in CG and G ϩ correlates with the extent of duplication. C content (see Table 3). The results showed that in Alignments were also performed between genes be- both sequence contexts the MITE was hypermethylated longing to the 19-kD zein subfamilies, z1A, z1B, and at all CG dinucleotides, in addition to a proportion z1D, which have undergone an average of eight, six, of methylated cytosines in a nonsymmetrical sequence and four duplications, respectively. Two expressed context (see Figure 1). All 29 cytosine residues of the genes representative of the z1A subfamily (Z448F14-3 MITE located 5Ј of the zc2 coding region were methyl- 844 G. Lund et al.

Figure 1.—Methylation state of a Tourist element in the single-copy or duplicated 27-kD zein gene. Bisulfite sequencing of a Tourist element in the 5Ј region of the single or duplicated 27-kD zein gene. M, methylation of symmetrical CG and CNG sequences; M, methylation of nonsymmetrical sequences. The methylation state of 16 independent clones is indicated, and each letter represents two observations.

ated compared to 25/30 cytosine residues of the ele- duplications. This is supported by the fact that the 19- ment inserted in the tandem duplication. In addition, kD genes are more abundant compared to the 22-kD quantitative analysis of 16 independent clones indicated genes (Hagen and Rubenstein 1981; Wilson and Lar- that the MITE located in the duplicated 27-kD zein kins 1984; Heindecker and Messing 1986). gene showed 19 and 37% reduction in symmetrical and Duplication-dependent depletion of the ␣-zein genes asymmetrical methylation, respectively, compared to results largely from C:T and G:A transition mutations the element located in the single-copy gene. in a CG or CNG context. Similarly, elevated levels of transition mutations have also been identified in other multigene families in plants such as the GAPA and rDNA DISCUSSION gene families of maize and the 5S RNA genes from Our results show that during a short evolutionary time Arabidopsis (Quigley et al. 1989; Edward et al. 1996; span, individual ␣-zein genes have accumulated large Matieu et al. 2002). For these gene families, a higher variations in CG content. In particular, within the 19- level of C:T (or G:A) transition mutations was observed and 22-kD zein gene families, extensively duplicated of the nontranscribed genes, and it was argued that CG genes are more CG depleted compared to less dupli- depletion resulted from relaxation of selective con- cated genes. In addition, the 19-kD zein genes are more straints at the transcriptional level (Quigley et al. 1989; suppressed than the 22-kD zein genes, indicating that Matieu et al. 2002). This explanation is, however, inade- the former have undergone a greater number of gene quate for the variation in CG suppression observed of Duplication-Dependent CG Depletion 845 the ␣-zein gene family, as the most highly expressed be hypermethylated in plant tissue (Bianchi and Viotti genes are the most CG depleted. We speculate that the 1988; Lund et al. 1995), only suppression of the multi- high expression levels of the most CG-suppressed ␣-zeins copy zeins was observed. Taken together, the data sug- are caused by the lack of CG dinucleotides available for gest that methylation status per se is not sufficient to methylation, thus relieving methylation-mediated tran- explain differences in CG suppression. However, dupli- scriptional repression. Indeed, the fact that ␣-zein genes cated sequences may exhibit a specific methylation pat- are methylated in both the coding and noncoding re- tern that alters the mutability of 5mC compared to a gions and exhibit an inverse relation between CG meth- single-copy methylated sequence. The Tourist element ylation and expression lends support to this idea (Bian- inserted in the 5Ј region of the single-copy 27-kD zein chi and Viotti 1988; Lund et al. 1995; Sturaro and gene showed a quantitative difference in methylation Viotti 2001). In addition, differences in the extent of compared to the same element inserted in the dupli- methylation could explain the large variations in expres- cated 27-kD zein gene. The significance of these find- sion levels of individual zein genes (Woo et al. 2001; ings in relation to CG suppression can only be specu- Song and Messing 2002). lated. However, in rodents neither CG density nor the The in vivo rate of deamination of methylated cytosine methylation status could explain the observed mutation residues in plants is not known, whereas in mammals frequencies of CG dinucleotides of a transgene (Skopek the estimated half-life of a cytosine residue is between et al. 1998; Monroe et al. 2001). Likewise, the methyla- 24 and 60 million years (Yang et al. 1996). However, tion status of the expressed and nonexpressed 5S RNA the average plant gene is less depleted in CG compared genes in Arabidopsis failed to account for the elevated to the average mammalian gene (average CG score is levels of C:T and G:A transition mutations observed of 0.68 and 0.22, respectively) perhaps indicative of a de- the nonexpressed genes (Matieu et al. 2002). creased mutability of CG dinucleotides in plants (Gar- An alternative explanation for the observed differ- diner-Garden and Frommer 1987; Gardiner-Garden ences in CG content is that selective pressures differ et al. 1992). On an evolutionary time scale, many between chromosomal regions, which could result in transposons represent recent insertions in the maize different mutation rates of CG dinucleotides. For exam- genome. For example, a large number of LTR-retro- ple, this might explain why the 22-kD zein genes, which transposons that map to the Adh1-F locus have inserted map to chromosome 4S, are suppressed, whereas no during the last 5 million years (SanMiguel et al. 1998). suppression of DNA sequences that map to the Adh1-F Although these LTR regions exhibit a twofold increase chromosomal region was observed. However, analysis in transition mutations compared to nonmethylated in- of intergenic and LTR regions of retrotransposons, lo- tronic regions (SanMiguel et al. 1998), we found no cated in the 340-kb 22-kD zein gene cluster, showed correlation between CG suppression and insertion time that CG suppression was localized to zein gene regions of these sequences. Given that spontaneous deamina- and was not a common feature of this chromosomal tion of 5mC is a very slow process, this suggests that the region. That CG suppression is independent of chromo- observed low CG values of duplicated elements or of somal content is also supported by the fact that many elements inserted into duplicated gene regions also re- sequences analyzed exhibit large variations in CG de- sult from enhanced turnover of CG as a result of gene duplication. spite being located in identical genomic regions, e.g., If duplicated sequences are methylated compared to Tourist and Stowaway elements located in the noncoding their single-copy counterparts, an increase in methylation- regions of the single-copy or duplicated 27-kD zein gene related deaminations might be expected, subsequently or the single or duplicated Tourist element located in Ј resulting in a reduction in CG content. However, bisul- the 3 region of the Adh1 locus. fite analysis of a MITE inserted in the single-copy or A study of 101 maize genes has shown that 40% of duplicated 27-kD zein gene locus showed that the ele- codon-usage variation is due to a bias toward G or C vs. ment was methylated in both sequence contexts. In addi- A or U ending codons (Fennoy and Bailey-Serres tion, as most transposons are hypermethlyated (Rabi- 1993). The bias toward C or G in the third position nowicz et al. 1999; Tompa et al. 2002), it is probable (GC3) is larger for the single-copy zeins, whereas the ␣ that the majority of LTR regions analyzed in this study -zeins have low GC3 values. We found that CG dinucle- ␣ are methylated. Indeed, the observation that LTR re- otides of the -zeins were suppressed at positions II-III gions of retrotransposons that map to the Adh-F locus and III-I, whereas the single-copy genes showed an ex- exhibit a twofold increase in transition mutations com- cess of CG at these positions. This is interesting as CG pared to nonmethylated intronic regions strongly sug- suppression at position III-I seems to be specific of meth- gests that these single-copy insertions are methylated ylating species, whereas nonmethylating species show an (SanMiguel et al. 1998). However, despite indirect evi- excess of CG at this position (Schorderet and Gartler dence that these elements are methylated, most ele- 1992). Therefore, we argue that the differences in the ments were not CG suppressed. Likewise, although both GC3 bias between the single- and multicopy ␣-zein genes single- and multicopy zein genes have been shown to reflect the fact that the ␣-zein genes have undergone 846 G. Lund et al. extensive gene duplication and are not due to a bias in Arabidopsis allotetraploids. These changes include non- codon usage between the single- and multicopy zein genes. random alterations in methylation and gene silencing Mammalian and plant genomes are made up of large caused by methylation or gene loss (Kashkush et al. regions of relatively homogeneous base composition 2002; Madlung et al. 2002). Although it is unclear known as isochores (Bernardi 2000), and the debate whether the observed effects result from chromosome is ongoing of whether this mosaic structure is caused doubling or from the hybridization of different ge- by mutation bias, natural selection, or biased gene con- nomes, it suggests that specific mechanisms are activated version (reviewed by Eyre-Walker and Hurst 2001). in plants in response to DNA amplification that presum- Of particular interest in this context is the finding that ably function to maintain genome stability. cytosine deamination plays a primary role in the evolu- The authors thank Mik Noordeweir for assisting in part of the tion of isochores (Fryxell and Zuckerkandl 2000). sequence analysis, Angelo Viotti and Vincenzo Rossi for critical read- In maize, most genes are confined to isochores with a ing of the manuscript, and E. Linton for estimates of insertion time narrow GC range, with the exception of the ␣-zeins and of the 22-kD zein genes. This work was supported by a grant from ribosomal genes that are located in GC-poor and GC- the Danish National Research Foundation. rich fractions, respectively (Carels et al. 1995). We have previously argued that the low GC3 content of the ␣-zeins could be explained by duplication-dependent LITERATURE CITED CG depletion. Given that the GC content and, in partic- Ashikawa, I., 2001 Gene-associated CpG islands in plants as revealed ular, the GC3 content of a gene is highly correlated to by analyses of genomic sequences. Plant J. 26: 617–625. Barry, C., G. Faugeron and J. L. Rossignol, 1993 Methylation the overall GC content of the isochore in which it is induced premeiotically in Ascobolus: coextension with DNA re- located (Bernardi et al. 1985; Clay et al. 1996; Eyre- peat lengths and effect on transcription elongation. Proc. Natl. Walker and Hurst 2001), duplication-dependent CG Acad. Sci. USA 90: 4557–4561. loss may, in part, explain the evolution of this particular Bender, J., 1998 Cytosine methylation of repeated sequences in eukaryotes; the role of DNA pairing. Trends Biochem. Sci. 23: GC-poor isochore. 252–256. We have shown that duplicated zein genes, LTR ele- Bender, J., and G. R. Fink, 1995 Epigenetic control of an endoge- ments, and MITEs undergo specific changes in nucleo- nous gene family is revealed by a novel blue fluorescent mutant of Arabidopsis. Cell 83: 725–734. tide sequence. These changes have been observed in Bennetzen, J. L., K. Schrick, P. S. Springer, W. E. Brown and CG dinucleotides and result in C:T and G:A transition P. SanMiguel, 1994 Active maize genes are unmodified and mutations and a net reduction in GC content. Such a flanked by diverse classes of modified, highly repetitive DNA. Genome 37: 565–576. process is reminiscent of RIP in N. crassa, where duplica- Bernardi, G., 2000 Isochores and the evolutionary genomics of tions are de novo methylated and riddled with point vertebrates. Gene 241: 3–17. mutations (Selker 1990). Unfortunately, this study can- Bernardi, G., B. Olofsson, J. Filipski, M. Zerial, J. Salinas et al., 1985 The mosaic genome of warm blooded vertebrates. Science not discern if transition mutations have occurred imme- 228: 953–958. diately upon duplication or result from subsequent mi- Bianchi, M. W., and A. Viotti, 1988 DNA methylation and tissue- toses. This question has been addressed in Arabidopsis specific transcription of the storage protein genes of maize. Plant Mol. Biol. 11: 203–214. by sequence analysis of multicopy insertions of trans- Bird, A. P., 1980 DNA methylation and the frequency of CpG in genes after three sexual generations (Mittelstein- animal DNA. Nucleic Acids Res. 8: 1499–1504. Scheid et al. 1994). None of the transition mutations Bureau, T. E., and S. R. Wessler, 1992 Tourist: a large family of small inverted repeat elements frequently associated with maize characteristic of RIP were found, and it was argued genes. Plant Cell 4: 1283–1294. that if RIP occurs in plants it occurs at a much lower Bureau, T. E., and S. R. Wessler, 1994 Stowaway: a new family of frequency. However, this experiment does not necessar- inverted repeat elements associated with the genes of monocotyle- donous and dicotyledonous plants. Plant Cell 6: 907–916. ily exclude the possibility of a RIP-like mechanism in Cambereri, E. B., B. C. Jensen, E. Schabtach and E. Selker, 1989 plants. Indeed, if transition mutations are linked to the Repeat-induced G-C to A-T mutations in Neurospora. Science duplication process, merely inserting a multicopy locus 244: 1571–1575. into the Arabidopsis genome would, expectedly, fail to Carels, N., A. Barakat and G. Bernardi, 1995 The gene distribu- tion of the maize genome. Proc. Natl. Acad. Sci. USA 92: 11057– recover any transition mutations. 11060. From the detailed analysis of the 22-kD zein genes it Clay, O., S. Caccio, Z. Zoubak, D. Mouchiroud and G. Bernardi, is clear that the rate of CG depletion of duplicated 1996 Human coding and noncoding DNA: compositional corre- lations. Mol. Phylogenet. Evol. 5: 2–12. sequences is enhanced compared to the average deami- Cooper, D. N., and M. Krawczak, 1989 Cytosine methylation and nation of methylated residues of dispersed multicopy the fate of CpG dinucleotides in vertebrate genomes. Hum. sequences. Further analysis will reveal if this mechanism Genet. 83: 181–188. Couloundre, C., J. H. Miller, P. J. Farabaugh and W. Gilbert, is common to other duplicated gene families. Suppres- 1978 Molecular basis of base substitution hotspots in Escherichia sion of MITEs and LTR elements inserted in duplicated coli. Nature 274: 775–780. gene regions indicates that this might be the case. Inter- Das, O. P., and J. Messing, 1987 Allelic variation and differential expression of the 27-kDa zein locus in maize. Mol. Cell. Biol. 7: estingly, rapid responses to genome-wide duplication 4490–4497. of genomes have been shown to occur in wheat and Das, O. P., K. Ward, S. Ray and J. Messing, 1991 Sequence variation Duplication-Dependent CG Depletion 847

between alleles reveals two types of copy correction at the 27 kDa ation and expression of specific alleles of zein genes in the endo- zein locus of maize. Genomics 11: 849–856. sperm of Zea mays L. Plant J. 8: 571–581. Edward, S., I. V. Bukler and T. P. Holtsford, 1996 Zea mays Madlung, A., R. W. Masuelli, B. Watson, S. H. Reynolds, J. David- ribosomal repeat evolution and substitution patterns. Mol. Biol. son et al., 2002 Remodeling of DNA methylation and pheno- Evol. 14: 623–632. typic and transcriptional changes in synthetic Arabidopsis allo- Esen, A., 1986 Separation of alcohol-soluble proteins (zeins) from tetraploids. Plant Physiol. 129: 733–746. maize into three fractions by differential solubility. Plant Physiol. Maloisel, L., and J. L. Rossignol, 1998 Suppression of crossing- 80: 623–627. over by DNA methylation in Ascobolus. Genes Dev. 12: 1381– Eyre-Walker, A., and L. D. Hurst, 2001 The evolution of isochores. 1389. Nat. Genet. Rev. 2: 549–555. Matassi, G., R. Melis, K. C. Kuo, G. Macaya, C. W. Gehrke et al., Fennoy, S. L., and J. Bailey-Serres, 1993 Synonymous codon usage 1992 Large-scale methylation patterns in the nuclear genomes in Zea mays L. nuclear genes is varied by levels of C-ending and of plants. Gene 122: 239–245. G-ending codons. Nucleic Acids Res. 23: 5294–5300. Matieu, O., Y. Yukawa, M. Sugiura, G. Pikard and S. Tourmente, Flavell, R. B., 1994 Inactivation of in plants as a 2002 5S rRNA genes expression is not inhibited by DNA methyl- consequence of specific sequence duplication. Proc. Natl. Acad. ation in Arabidopsis. Plant J. 29: 313–323. Sci. USA 91: 3490–3496. McClelland, M., 1983 The frequency and distribution of methyla- Flavell, R. B., M. O’Dell and W. F. Thompson, 1988 Regulation of table DNA sequences in leguminous plant protein coding genes. cytosine methylation in ribosomal DNA and nculeolus organizer J. Mol. Evol. 19: 346–354. expression in wheat. J. Mol. Biol. 204: 523–534. Mittelstein-Scheid, O., K. Afsar and J. Paszkowski, 1994 Gene Fryxell, K. J., and E. Zuckerkandl, 2000 Cytosine deamination inactivation on Arabidopsis thaliana is not accompanied by an plays a primary role in the evolution of mammalian isochores. accumulation of repeat-induced point mutations. Mol. Gen. Mol. Biol. Evol. 17: 1371–1383. Genet. 244: 325–330. Gardiner-Garden, M., and M. Frommer, 1987 CpG islands in verte- Monroe, J. J., M. G. Manjanatha and T. R. Skopek, 2001 Extent brate genomes. J. Mol. Biol. 196: 261–282. of CpG methylation is not proportional to the in vivo spontaneous Gardiner-Garden, M., J. A. Sved and M. Frommer, 1992 Methyla- mutation frequency at transgenic loci in Big Blue rodents. Mutat. tion sites in angiosperm genes. J. Mol. Evol. 34: 219–230. Res. 476: 1–11. Goyon, C., and G. Faugeron, 1989 Targeted transformation of Montero, L. M., J. Filipski, P. Gil, J. Capel, J. M. Martinez-Zapater Ascobolus immersus and de novo methylation of the resulting dupli- et al., 1992 The distribution of 5-methylcytosine in the nuclear cated DNA sequences. Mol. Cell. Biol. 9: 2818–2827. genome of plants. Nucleic Acids Res. 20: 3207–3210. Gruenbaum, Y., T. Naveh-Many, H. Cedar and A. Razin, 1981 Se- Prat, S., J. Cortadas, P. Puigdomenech and J. Palau, 1985 Multi- quence specificity of methylation in higher plant DNA. Nature ple variability in the sequence of a family of maize endosperm 292: 860–862. proteins. Nucleic Acids Res. 13: 1493–1504. Gruenbaum, Y., H. Cedar and A. Razin, 1982 Substrate and se- Quigley, F., H. Brinkmann, W. F. Martin and R. Cerff, 1989 quence specificity of a eukaryotic DNA methylase. Nature 295: Strong functional GC pressure in a light regulated maize gene 620–622. encoding subunit GAPA of chloroplast gyceraldehyde-3-phos- Hagen, G., and I. Rubenstein, 1981 Complex organization of zein phate dehydrogenase: implications for the evolution of GAPA genes in maize. Gene 13: 239–249. pseudogenes. J. Mol. Evol. 29: 412–421. Heindecker, G., and J. Messing, 1986 Structural analysis of plant Rabinowicz, P. D., K. Schutz, N. Dedhia, C. Yordan, L. D. Parnell genes. Annu. Rev. Plant Physiol. 37: 439–466. et al., 1999 Differential methylation of genes and retrotranspo- Holstein, M., M. S. Greenblatt, K. Rice, T. Soussi, R. Fuchs et sons facilitates shotgun sequencing of the maize genome. Nat. al., 1994 Database of p53 gene somatic mutations in human Genet. 23: 305–308. tumors and cell lines. Nucleic Acids Res. 22: 3551–3555. Reina, M., P. Guillen, I. Ponte, A. Boronat and J. Palau, 1990 Jeddeloh, J. A., and E. J. Richards, 1996 mCCG methylation in Sequence analysis of a genomic clone encoding a Zc2 protein angiosperms. Plant J. 9: 579–586. from Zea mays W64A. Nucleic Acids Res. 18: 6425. Jones, P. A., W. M. Rideout, J. C. Shen, C. H. Spruck and Y. C. Tsai, Ronchi, A. K., K. Petroni and C. Tonelli, 1995 The reduced ex- 1992 Methylation, mutation and cancer. Bioessays 14: 33–36. pression of endogenous duplications (REED) in the maize R gene Kashkush, K., M. Feldman and A. A. Levy, 2002 Gene loss, silencing family is mediated by DNA methylation. EMBO J. 14: 5318–5328. and activation in a newly synthesized wheat allotetraploid. Genet- Rountree, M. R., and E. U. Selker, 1997 DNA methylation inhibits ics 160: 1651–1659. elongation but not initiation of transcription in Neurospora crassa. Kirihara, J., J. B. Petri and J. Messing, 1988 Isolation and sequence Genes Dev. 11: 2383–2395. of a gene encoding a methionine rich 10-kDa zein protein from Rubenstein, I., and D. E. Geraghty, 1986 The genetic organization maize. Gene 71: 359–370. of zeins, pp. 297–315 in Advances in Cereal Science and Technology, Kovarik, A., R. Matyasek, A. Leitch, B. Gazdova, J. Fulnecek edited by Y. Pomeranz. American Association of Cereal Chemists, et al., 1997 Variability in CpNpG methylation in higher plant St. Paul. genomes. Gene 201: 25–33. Russell, D. A., and M. M. Sachs, 1991 The maize cytosolic glyceral- Kricker, M. C., J. W. Drake and M. Radman, 1992 Duplication- dehyde 3-phosphate dehydrogenase gene family: organ specific targeted DNA methylation and mutagenesis in the evolution of expression and genetic analysis. Mol. Gen. Genet. 229: 219–228. eukaryotic chromosomes. Proc. Natl. Acad. Sci. USA 89: 1075– SanMiguel, P., A. Tikhonov, Y. K. Jin, N. Motchoulskaia, D. Zak- 1079. harov et al., 1996 Nested retrotransposons in the intergenic Kumar, A., and J. L. Bennetzen, 1999 Plant retrotransposons. Annu. regions of the maize genome. Science 274: 737–738. Rev. Genet. 33: 479–532. SanMiguel, P., B. S. Gaut, A. Tikhonov, Y. Nakajima and J. L. Leite, A., L. M. M. Ottoboni, M. L. P. N. Targon, M. J. Silva, S. R. Bennetzen, 1998 The paleontology of intergene retrotrans- Turcinelli et al., 1990 Phylogenetic relationship of zein and posons of maize. Nat. Genet. 20: 43–45. coixins as determined by immunological cross-reactivity and Schorderet, D. F., and S. M. Gartler, 1992 Analysis of CpG sup- Southern blot analysis. Plant Mol. Biol. 14: 743–751. pression in methylated and non-methylated species. Proc. Natl. Leutwiler, L. S., B. R. Hough-Evans and E. M. Meyerowitz, 1984 Acad. Sci. USA 89: 957–961. The DNA of Arabidopsis thaliana. Mol. Gen. Genet. 194: 15–23. Selker, E. U., 1990 Premeiotic instability of repeated sequences in Liu, C. N., and I. Rubenstein, 1992 Molecular characterization of Neurospora crassa. Annu. Rev. Genet. 24: 579–613. two types of 22 kilodalton ␣-zein genes in a gene cluster in maize. Skopek, T., D. Marino, K. Kort, J. Miller, M. Trumbauer et al., Mol. Gen. Genet. 234: 244–253. 1998 Effect of target gene CpG content on spontaneous muta- Llaca, V., and J. Messing, 1998 Amplicons of maize genes are tion in transgenic mice. Mutat. Res. 400: 77–88. conserved within genic but expanded and constricted in in- Soave, C., R. Reggiani, N. Difonzo and F. Salamini, 1981 Cluster- tergenic regions. Plant J. 15: 211–220. ing of genes for 20 kd zein subunits in the short arm of maize Lund, G., P. Ciceri and A. Viotti, 1995 Maternal-specific demethyl- chromosome 7. Genetics 97: 363–377. 848 G. Lund et al.

Soave, C., R. Reggiani, N. Difonzo and F. Salamini, 1982 Genes White, S. E., L. F. Habera and S. R. Wessler, 1994 Retrotranspo- for zein subunits on maize chromosome 4. Biochem. Genet. 20: sons in the flanking regions of normal plant genes: a role for 1027–1038. copia-like elements in the evolution of gene structure and expres- Song, R., and J. Messing, 2002 Contiguous genomic DNA sequence sion. Proc. Natl. Acad. Sci. USA 91: 11792–11796. comprising the 19-kD gene family from maize. Plant Physiol. 130: Wilson, C. M., G. F. Sprague and T. C. Nelson, 1989 Linkage 1626–1635. among zein genes determined by isoelectrical focusing. Theor. Song, R., V. Llaca, E. Linton and J. Messing, 2001 Sequence, Appl. Genet. 77: 217–226. regulation, and evolution of the maize 22-kD ␣-zein gene family. Wilson, D. R., and B. A. Larkins, 1984 Zein gene organization in Genome Res. 11: 1817–1825. maize and related grasses. J. Mol. Evol. 20: 330–340. Spena, A., A. Viotti and V. Pirotta, 1983 Two adjacent genomic Woo, Y. M., D. W. Hu, B. A. Larkins and R. Jung, 2001 Genomics zein sequences: structure, organization and tissue-specific restric- analysis of genes expressed in maize endosperm identifies novel tion pattern. J. Mol. Biol. 169: 799–811. seed proteins and clarifies patterns of zein gene expression. Plant Sturaro, M., and A. Viotti, 2001 Methylation of the Opaque2 box Cell 13: 2297–2317. in zein genes is parent-dependent and affects O2 DNA binding Yang, A. S., M. L. Gonzalgo, J. Zingg, R. P. Miller, J. Buckley et activity in vitro. Plant. Mol. Biol. 46: 549–560. al., 1996 The rate of CpG mutation in Alu repetitive elements Swarup, S., M. C. P. Timmermans, S. Chaudhuri and J. Messing, within the p53 tumor suppressor gene in the primate germline. 1995 Determinants of the high-methionine trait in wild and J. Mol. Biol. 258: 240–250. exotic germplasm may have escaped selection during early cultiva- Zeschnigk, M., C. Lich, K. Buiting, W. Doerfler and B. Hors- tion of maize. Plant J. 8: 35–40. themke, 1997 A single-tube PCR test for the diagnosis of Tikhonov, A. P., P. J. SanMiguel, Y. Nakajima, N. M. Gorenstein Angelman and Prader-Willi syndrome based on allelic methyla- and J. L. Bennetzen, 1999 Colinearity and its exceptions in tion differences at the SNRPN locus. Eur. J. Hum. Genet. 5: 94–98. orthologous adh regions of maize and sorghum. Proc. Natl. Acad. Zhang, Q., J. Arbuckle and S. R. Wessler, 2000 Recent, extensive, Sci. USA 96: 7409–7414. and preferential insertion of members of the miniature inverted- Tompa, R., C. M. McCallum, J. Delrow, J. G. Henikoff, B. van repeat transposable element family Heartbreaker into genic regions Steensel et al., 2002 Genome-wide profiling of DNA methyla- of maize. Proc. Natl. Acad. Sci. USA 97: 1160–1165. tion reveals transposon targets of CHROMOMETHYLASE3. Curr. Biol. 12: 65–68. Communicating editor: J. Birchler