Copyright 8 1995 by the Genetics Society of America

Intraspecific and Interspecific Variation in 5s RNA Genes Are Decoupled in Diploid Wheat Relatives

Elizabeth A. Kellogg * and Rudi Appels *Department of Organismic and Evolutionary Biology, Haward University, Cambridge, Massachusetts 02138, and +Division of Industry, CSIRO, Canberra, ACT, Australia Manuscript received August20, 1994 Accepted for publication February 13, 1995

ABSTRACT 5s RNAs form part of the ribosome in most organisms. In some, e.g., prokaryotes and some fungi, the genes are part of the ribosomal operon, but in most eukaryotes theyare in tandem arrays of hundreds to thousands of copies separate from the main ribosomal array. 5s RNA genes can be aligned across kingdoms. We were therefore surprised to find that, for 28 diploid species of the wheat tribe () , nucleotide diversity within an array is up to 6.2% in the genes, not significantly different from that of the nontranscribed spacers. Rates of concerted evolution must therefore be insufficient to homogenize the entire array. Between species, there are significantly fewer fixed differencesin the gene than would be expected, given the high within-species variation. Incontrast, the amount of variation between species in the spacer is the same as or greater than that within individuals. This leads to a paradox. High variation within an individual suggests that there is little selection on any particular gene within an array. But conservation of the gene across species impliesthat polymorphisms are periodically eliminated at a rate approximately equal to or greater than that of speciation. Levels of intraspecific polymorphism and interspecific divergenceare thus decoupled. This implies that selective mechanisms exist eliminate to mutations in the gene without also affecting the spacer.

S RNA is a small 120-bp molecule that forms part much, however, and those sites that do accumulate mu- 5 of the ribosome in most known organisms. In eu- tations do so rapidly enough that they are subject to karyotes, it forms a complex with the large subunit of multiple hits; evolution rapidly wipes out its own tracks ribosomal RNA and several ribosomal proteins. The ( HALANYCH 1991; STEELEet al. 1991 ) . precise function of the 5s RNA in the ribosome is un- At least some of the conserved regions can be ex- known. 5s RNA genes are part of the rRNA cistron plained by analysis of gene transcription. 5s RNA genes in some organisms, including most bacteria and some are transcribed by RNA polymerase I11 (pol 111) , a poly- fungi, and occur in the spacer between the large and merase that also transcribes WASand some small nu- small subunit genes. In all animals and green , clear RNAs. A convenient model for study of pol111 has however, the genes are in tandem arrays separate from been theXenopus oocyte, in which there arethousands the main rDNA array (LONG andDAWID 1980). There of 5s RNA genes. Pol111 does not bind to DNA directly, may be one (e.g., Drosophila (SAMSONand WEGNEZ but rather depends on interaction with transcription 1988), two (e.g., members of the Triticeae, see below), factor IIIA (TFIIIA) (HAYES and TULLIUS1992; PIELER or many [ e.g., Linum ( GOLDSBROUGH et aZ. 1981; et aZ. 1987). TFIIIA makes the initial contact with the SCHNEEBERGERand CULLIS 1992) ; Neurospora ( SELKER DNA, binding strongly to a sequence called box C and et al. 1981) ] such arrays, with copy number ranging somewhat less strongly to box A (Figure 1) . In Xeno- from several hundred to several thousand. pus, mutations in either Box A or Box C, prevent tran- The sequence and structure of 5s RNA genes and scription (HAYES and TULLIUS1992) and changes in their products are highly conserved, as might be ex- the distance between the boxes reduce transcription pected for a gene with such a wide distribution. The levels. It is thus no surprise to discover that these re- length varies little from 120 bp. Certain positions and gions contain highly conserved bases. regions of the gene are sufficiently conserved that it In Xenopus oocytes, TFIIIA alsobinds and sequesters canbe aligned across broad taxonomic ranges. The the newly synthesized RNA molecule ( CLEMENSet al. alignment led to early hopes that the gene would be 1993). Thebinding sites for the gene product (RNA) useful for inferring phylogenetic relationships among differ from those for DNA (Figure 1) . If TFIIIA also deep branches of eukaryotes. Many sites do not vary binds the gene productin angiosperms, then mutations in these sites might result in a nonfunctional gene. In- Cmwingauthor: E. A. Kellogg, Harvard Unversity Herbaria, 22 Divinity Ave., Cambridge, MA 02138. teractions with TFIIIA may thus af€ect selection on 5s E-mail: [email protected] RNA genes, quite independent of any functional con-

Genetics 140 325-343 (May, 1995) 326 Kellogg E. A. and R. Appels

cxxx x xx FIGURE1.-Location of intraspecific polymorphisms in 5s RNA genes in the wheat tribe. Consensussequence for the XXXOOOC-GX Triticeae is mappedonto the secondary DNA - BOX A structure for Xenopus 5s RNA,which was determined by using solution data .C-A. and molecular modeling. Sites varying C-G OXX within individual plants marked: short- 1:: X XXG-U f xAAoe spacer units, x; long-spacer units, @. A-G Note that the variable sites are spread xG-U across the molecule. Box A and Box C A are highly conserved DNA binding do- x .G-C mains. Other lines markRNA binding XXXG-U 0 domains. The conserved sequence X U-A GGAUCC at the 3' end of the third loop xx C-G 0 X oC-GX is aBamHI site that was used forcloning; A ox p units with mutations inthis site were not u-u. ' sequenced. @C-G oxxx 2 C-GOX n A- -GOx GU straints imposed by binding with the ribosome. Because group 1 chromosomes and the array with long spacers of the strongsimilarity ofthe 5s genes across kingdoms, on group 5 ( SCOLESet al. 1988). This has been indi- functions identified in Xenopus may well be important cated by deletion mutations (LASSNER and DvoW for 5s genes in general. 1986) , nulli/ tetrasomic combinations ( DVO- et al. 5s RNA genes have been studied extensivelyin 1989), in situ hybridization (APPELS et al. 1980; KIM et plants, with sequences now in the literature for a gym- al. 1993), and pulsed field gel electrophoresis (PFGE; nosperm ( Pinus radiata; MORANet al. 1992) , and many RODER1992). However, in the genera Critesion and angiosperms including dicots, such asAcacia (Legu- Hordeum (both sometimes included in an expanded minosae; PLAYFORDet al. 1992), Linum (Linaceae; Hordeum), the short spacer array occurs on the GOLDSBOROUGH et al. 1982) , Arabidopsis (Brassica- group 2 chromosomes, a result that has been confirmed ceae; CAMPELL 1992), Pisum (Leguminosae; ELLISet by RFLP mapping and by in situ hybridization (KOL al. 1988), monocots of the grass family, such as rice CHINSKY et Ul. 1990; LAURIE et d. 1993). MUINet d. ( Oryza; MCINTYREet al. 1992 ) , and members of the ( 1993) found a long spacer array on chromosome 3. wheat tribe (Triticeae; &PELS et al. 1992; SCOLESet High resolution fluorescent in situ hybridization in bar- al. 1988). The data base for the Triticeae is uniquely ley indicates that there are 5s arrays on chromosomes powerful, because of its size and density. 2, 3, 4, and 7 ( LEITCHand HESLOP-HARRISON1993). The Triticeae (wheat tribe) is a group of diploid Other markers on the long armof 2 (2H) in H. vulgare and polyploid grasses.It contains -500 species, among indicate colinearity with Triticum aestivum (wheat) and them wheat, barley, rye and their wild progenitors. The Secab cereab (rye) . Hence, it appears that the short number of genera is a matter of contention, and de- spacer array from the group 1 chromosome has some- pending on the taxonomist consulted, ranges from l how "moved" onto chromosome 2 ( LAURIE et al. 1993) to 38; we recognize an intermediate number here. De- without disruption of other linkage groups. In the ge- cades of cytogenetic work have shown that the seven nus Dasypyrum (with the single species D. villosum) chromosomes of each haploidgenome are syntenic there is no short-spacer array at all. DVO- et al. ( 1989) throughout the group. Thus chromosomes numbered reported that the short units of T. umbellulatum are on 1 through 7 are presumed homologues. chromosome 5, and long units were not found. They Chromosomal locations of genes in the Triti- also noted that T. speltoides had only a long-spacer array, ceae: The genes for 5s RNA occur in large tandem confirming previous reports by PEACOCKet al. ( 1981 ) . arrays, in which genes alternate with nontranscribed REDDY and APPELS ( 1989) suggested that there may spacers. The arrays are at two distinct chromosomal be additional dispersed locations for 5s RNA genes, locations, and the two loci can be unambiguously distin- based on results of in situ hybridizations; however, their guished by both the length and the sequence of the observations were at the limits of resolution for the spacers ( GERLACHand DYER1980). For most members technique. Using higher resolution techniques, LEITCH of the tribe, the array with short spacers occurs on the and HESLOP-HARRISON( 1993) found only the two ma- Variation in 5s RNA Genes 327 jor arrays on chromosomes 1 and 5. In their studies of tion may be fairly uniform within a locus; GOLDSBO- nulli-tetrasomic and ditelosomic lines of diploid and ROUGH et al. (1982) found that genes andspacers were polyploid Triticeae, DVORAK et al. (1989) determined about equally methylated in Linum, but there are no the chromosomal location of 5s arrays by analyzing data on this in the Triticeae. presence/ absence of bands in Southern blots; they Direct sequencing of RNA has shown that the gene foundbands corresponding to the short-spacer and products are largely homogeneous, implying that there long-spacer arrays. They consistently found that theap is only one invariant class of transcript. If, however, propriateband disappeared when one chromosome minor variants occurred at relatively low frequencies, was deleted. If there were dispersed loci, thenone they would not be detected by this method. Direct se- would expect that in .at least some taxa a band would quencing reveals only the consensus base; nonconsen- persist even when the main array was deleted. Thus if sus bases (polymorphisms) will only be detected if they there are dispersed loci, the copy number is below that are of sufficiently high frequency to produce a band detectable in Southern blots at standard hybridization nearly as bright as the consensus on autoradiograms. conditions. There is also no evidence from any of the In contrast, sequencing of multiple genes(DNA) sequencing studies (described below) that there is any from single individuals has shown that the genes and complete 5s DNA unit other than at the two major loci. spacers within an array are equally variable.This has led The two 5s arrays apparently have evolved indepen- to several papers using the spacer region in particular to dently (but see below). All evidence points to the dis- assess relationships among species.Within the Triti- tinctness of the two sets of spacer sequences ( SCOLES ceae, gene trees from the long-spacer and short-spacer et al. 1988; DVORAKet al. 1989). Sequences are more array do notagree entirely. The high level of polymor- similar within each class than they are between classes. phism amonggene copies from a single individual Furthermore, theorigin of the sequences remains iden- means that close relationships often cannot beresolved. tifiable even in allopolyploids ( DVOMKet al. 1989; LA- Interestingly, genes within an array are at least as GUDAH et al. 1989). We infer that there is little or no polymorphic as the spacers. This is very surprising for concerted evolution between the short-spacer and long- a gene thatcan be aligned across kingdoms. We suggest spacer arrays, nor among short-spacer arrays on differ- here thatthis may involve processesacting on the entire ent genomes in polyploids. array, rather than on individual genes. We explore this The copy number of the short-spacer unit is ca. 5000, possibility further in the present paper. and the longspacer unit ca. 3000 per haploid genome ( RODERet al. 1992) . Variations in the size of the array MATERIALSAND METHODS are reported, however ( RODERet al. 1992; KIM et al. 1993). For 10 lines of T. monococcum, KIM et al. (1993) The taxa studied include 34 diploid members of the Triti- reported that there were more short units than long, ceae, including all but four of the diploid genera in the tribe ( Eremopyrum, Heteranthelium, and Peridictyon, but in two other lines, the numbers were reversed. LA- the latter three monotypic) . One to seven 5s DNA units were GUDAH et al. ( 1989) also describe variation in the rela- sequenced from one or both arrays in each species (Table tive abundance of short-spacer us. long-spacer units in 1 ) . Three species, Bromus inennis, Bruchypodium pinnutum and T. tauschii; they note that the sizeof the two arrays Bruchypodium sp, were included initially asoutgroups, but were varies independently, Le., increasing numbers of short- later excluded because of extensive sequence divergence (see below) . spacer units do not correlate with a decrease in long- Most sequences were generated using standard methods of spacer units. KANAZIN et al. ( 1993) also report fourfold cloning and sequencing, as described by LAWRENCE and Ap- variation in 5s rDNA copy number between two culti- PELS ( 1986). Briefly, DNA was fractionated on CsC1-actinomy- vars of barley. They examined the inbred progeny of a cin D gradients, cut with BumHI, whichrecognizes a conserved cross between the two and found that thetwo parental site in the gene sequence, was cloned into pUC vectors and screened with a probe for 5s DNA. For 13 of the taxa, 5s classes and two recombinant types could be detected- units were amplified by the polymerase chain reaction (PCR) i.e., the size of the arrays was stable over at least two using primers described in APPELS et ul. (1992). The PCR generations. product was then cut with BumHI, cloned into pUC vectors, Transcription: It is not known if either locus is pref- and sequenced using an AB1 automated sequencer. 5s units erentially transcribed. DVORAK et al. ( 1989) and REDDY with short spacers were preferentially amplifiedby these meth- ods,resulting in more detailed sampling of shortspacer and APPELS ( 1989) found thatsome units were meth- 5s arrays. ylated on C.BamHI digests (with the recognition se- Initial alignments were done using TREE, a multiway align- quence GGATCC) produced ladders, whereas digests ment program (SMITH1987) or ClustalV (HIGGINSet ul. with the methyl-insensitive MboI (recognition sequence 1992). These were then checked by eye and modified as nec- GATC) produced a single band. The long-spacer locus essary. Alignments of 5s units from single individuals were has been reported to be more heavily methylated than easily done by eye, as were alignments between individualsof the same species. Among species within an array,alignments the short-spacer ( GRELLETand PENON1984), but this were alsononproblematical. Alignments between short-spacer was not confirmed byDVORAK et al. ( 1989). Methyla- and long-spacer sequences were generally more difficult. In 328 Kellogg E. A. and R. Appels general we have analyzed the two loci separately, as the gene This is also illustrated in Figure 2, which shows that duplication event clearly precedes the diversification of the nucleotide diversity among the genes and spacers for tribe (see below). Nucleotide diversity and its sampling variance were calcu- both arrays is appreciable. lated following NEI (1987; equations 10.6 and 10.7, p.256), Most of the sequences were generated by standard either by hand or using an unpublished program (R. C. LE- cloning techniques rather than PCR followed by clon- WONTIN). ing (Table 1) so the variation cannot be ascribed to Polymorphism within arrays was compared with that be- misincorporation by Taq polymerase. We tested for dif- tween arrays using a modification of the McDonald-Kreitman test (MCDONALD and KREITMAN 1991; SAWYERand HARTL ferences between PCR-generated us. standard clones us 1992). In their paper,MCDONALD and KREITMAN used a 2 X ing an analysisof variance and found no significant 2 contingency table to test whether the ratio of replacement difference ( P = 0.89) . Furthermore, because we are to synonymous polymorphisms within species was statistically comparing gene and spacer sequences from the same different from the ratio of replacement to synonymous fixed clones, PCR errors should affect both equally. In other differences between species. We constructed similar 2 X 2 contingency tables, but the rowswere substitutions in the words, even if PCR introduces a very large number of gene and spacer, rather than replacement and synonymous artifacts, they should not affect the relative nucleotide substitutions. The columns were fixed differences and poly- diversity in the sequences. A similar point could be morphisms, as in the original paper. We conducted the test made for automated sequencing, which was used for only for those taxa for which we had three or more 5s units sequenced. some but by no means all of the sequences. Phylogenetic analyses weredone using PAUP 3.1.1 ( SWOF- LACUDAH et al. (1989) reported that the long-spacer FORD 1993) on a Macintosh Quadra 750. Searches weredone array in T. tuuschii ismore variable than theshort-spacer using the heuristic algorithm witha simple addition sequence, array, whereas REDDY and APPELS (1989) reported the TBR branch swapping, MULPARS on, and no character state opposite forspecies of Secale. These differences can be weighting. Retention indices are high enough (>0.8) that multiple islands of trees are not likely ( MADDISON 1991 ) , so seen by simple inspection of Table 1. The long spacer we felt that multiple searches with random addition sequences array in Australopymmuelutinum may be marginally were unnecessary. Additional exploration of trees was done more variable than the short spacer. In general, how- using MacClade 3.0 ( MADDISON and MADDISON 1992). ever, there are toofew species with both long and short units sequenced to make a clear comparison. RESULTS Substitutions: We investigated whether or not the Variation within individuals polymorphic sites were alllocalized at a single hot spot within the gene. Nucleotide substitions occur through- Nucleotide diversity: Within the Triticeae, 152 5s out the 5s unit across both spacer and gene (Figures units were sequenced and virtually all ofthem differed 1, 3 and 4). SCOLESet ul. (1988) had noted a marked at one or more sites. Previous workers had noted that drop in substitutions in the gene relative to the spacer, mutations appear not only in the spacers but within the but this reflected only changes among consensus se- genes as well. We verified and quantified this observa- quences; the drop is not seen when non-consensus sites tion by calculating nucleotide diversity, a measure that are included. Figure 1 shows the consensus sequence gives variation per nucleotide and is unbiased with re- for the Triticeae, with sites that vary within individuals spect to sample size. We found thatnucleotide diversity marked. There are 73 variable sites (61% of the total), among the genes is as high as or higher than that in distributed across the gene. the spacer in many cases (Table 1) . This high level of variation in the gene appears whether or not gaps are Transition / transversion ratios are low, generally <2, included in estimates of nucleotide diversity. The large and often < 1 (Table 3). This may reflect either multi- number of species sequenced shows that the phenome- ple transitions at many sites or low constraints on base non is widespread, and does not reflect the history of pairing. This contradicts earlier observations (GOLDS a single aberrant individual. BOROUGH et al. 1982) that most mutations in 5s arrays We also tested the apparent similarity of nucleotide are transitional. Because the variation reflects change diversity across the study group as a whole. Table 2 within a single species, we suggest that multiple hits are summarizes the overall meannucloetide diversity of unlikely unless there is a notably elevated mutation rate. genes and spacers for each of the arrays. The mean Because there is no other reason to suspect elevated nucleotide diversity of the spacers is higher than that rates of mutation, it seems more likely that mutations of the genes for both arrays, whether or not gaps are are only slightly biased. included in the estimate. However, this difference is Insertion/ deletion events Insertion / deletions (in- significant only for the short spacer array with gaps dels) are also common among 5s units within an indi- included in the estimate of diversity. For substitutions vidual. Deletions that result in loss of one end of the alone, and for the long spacer sequences with or with- 5s unit ( e.g., from midspacer or midgene through the out gaps, there is no significant difference between the BumHI site) appear periodically and have been inter- diversity in the gene and the diversity in the spacer. preted as cloning artifacts. Deletions occurring within Variation in 5s RNA Genes 329 a m m 0 8 8 tI tl 0 a W 0 8 8 r-m m0 9 0 8 tl tl +I $1 0 tl mom W 8 omm m 999 8 000 8

t. W * 0 8 8 tI $1 m t. 2 8 8 t. t. m 4 8 8 +I +I m m W m 8 8

tl tl $1 tl +I tl tI ti tl

m**+&-*mo- P-lna-m mm a-mmr-0m-m -0000 m- 999999999 99999 88 000000000 00000*+ +I tl tl +I tI tItI +I tl ti tI $1 +I+I 0 0 tl +I

+I tI tl 0 1-1 $1 0 +I tl

r + + r m mJ 3m *+* + 304

+ r t + 33 3 L + r- +mm+lnmmm+ m lnm m 330 E. A. Kellogg and R. Appels

TABLE 2 Comparison of nucleotide diversities over all genes and spacers in Table 1

Chromosome 1 array-short spacers Chromosome 5 array-long spacers Gaps Gaps Gaps Gaps Gaps included excluded included excluded

Gene 0.023Gene 20.019 0.017 .C.0.028 0.015 20.023 0.018 ? 0.016 Spacer 0.040 20.028 0.033 20.042 0.025 2 0.023 0.033 2 0.021 P <0.040 <0.159 <0.111 <0.196 Values (mean 2 SD) were compared by &tests. Diversity ofgenes and spacers is only significant inthe short spacer array if gaps are included in the estimate of diversity. the unit, however, probably reflect actual variation in dogenes. While most of the variant gene sequences in the array. the Triticeae are notobviously defective in transcription Singlebase repetitions or deletions are most fre- initiation sites, some ( 17% ) do have mutations in either quent, and often occur in short runs of a single base. Box A or Box C or both; we infer that these are pseu- This pattern is consistent with a model of replication dogenes. However, genes without mutations in the tran- slippage. As with substitutions, these changes occur in scription sites are variable elsewhere. This can be seen both gene and spacer. Polymorphic indels are rarely from simple inspection of Table 1; many species that fixed except for the C at position 397in the short- lack obvious pseudogenes still have high nucleotide di- spacer array of T. longissimum. versity among genecopies. Even if obvious pseudogenes Larger insertion / deletion events are also common. are excluded therefore,the remaining presumably The most dramatic is a duplication of 110 bp in the functional genes are still surprisingly variable. spacer of A. pectinaturn 5s units. The sequence places it We have used a conservative criterion for recognizing clearly with the short-spacer genes of chromosome I, pseudogenes; other changes could impair or com- although its length is comparable to a long spacer. pletely disrupt gene function. This is particularly true Other smaller duplications occur in A. uelutinum, Agro- for modifications in the RNA binding sites, which pre- pyron cristatum, Psathyrostachys juncea, and Hwdeum hl- sumably affect the transcript pool rather thantranscrip- bosum (cJ SCOLESet al. 1988; SASTRIet al. 1992). tion itself. We cannot therefore rule out the possibility Pseudogenes: We addressed the possibility that many that there are many more pseudogenes or genes with of the 5s RNA genes in an array are actually pseu- reduced function than we have identified. It may be

0.12 j" A

0.10 . B O@ 0 0.08 . e. 0.08 . 0 0 0.06 . 0 4) L Q) 0 0.04 . In e

0.02 - 0

/O e / 0 0.00 " 2i.01 k 001 0.62 0:03 0:04 0:05 0:06 !07 Gene -0.02 7 -0.01 0 0.01 0.02 0.03 0.04 0.05 6 Gene

FIGURE 2.-Comparison of nucleotide diversities for genes and spacers in short-spacer arrays(A) and long-spacer arrays (B) . Dashed lines have slope of 1.0 and indicate equality of values for ene and spacer. Gaps included in estimate of diversity, 0; gaps excluded ( Le., substitutions only), 0. For short spacer arrays, = 0.388 with gapsincluded, ? = 0.431 with gaps excluded; for long-spacer arrays, ? = 0.025 with gaps included, r2 = 0.004 with gaps excluded. Variation in 5s RNA Genes 331 that a large proportion of the genes in the array is not tum, Y-447 in T. bicme and T. longassimum, and R491 fully functional. in A. pectinatum and A. velutinum. Clearly selection is acting on the genes in the array Variation between individuals to maintain the consensus sequence. However, the high nucleotide diversity within an array indicates that selec- Only a few species have data on multiple individuals. tion is not acting continuously or with high intensity Theseare Secab cereab, T. monococcum, and T. tripsa- on individual copies. This affects hypotheses of mecha- coides for the short-spacer array, and T. monococcum and T. speltoides for the long spacer array. S. cereab and T. nisms, as will be discussed below. speltoides are the only two for which there are multiple Interspecific variation in spacer sequences:Fixed in- sequences from each individual, and in both cases the terspecific substitutions in spacers are also high and second individual has only two sequences. There are comparable to intraspecific (Table 4) ; this is expected no shared polymorphisms. This could mean that the for a nontranscribed sequence. Unlike the genes, poly- sample size is so small that shared polymorphisms are morphisms in the spacers are reflected in fixed poly- not likely to be detected. Evidence presented in the morphisms between species. For example, examination next section leads us to favor the latter explanation. of position 22 (short spacer array) shows three taxa polymorphic for C and T, two fixed for T and the re- Variation between species mainder for C. Similarly at position 36 in the long- spacer array, one species is polymorphic for C and G, Interspecific variation in gene sequences: We have two are fixed for G, and the rest for C. found that variation among genes within an array is Comparing spacer sequences among species reveals surprisingly high. At any variable site, however, at least that conspecifics are more similar that congeneric spe- one sequence per array is found with the consensus cies which are in turn more similar than genera in the nucleotide. In other words, nonconsensus nucleotides tribe. This pattern is much more conventional than that are almost never fixed within a species. The consensus seen for gene sequences. for each site was determined the base present in the as As for the gene, some congeneric species share poly- majority of the sequences. Reference to Figure 3 shows morphisms in the spacer (e.g., species of Secale, Triti- that this is easily done by inspection. cum, and Thinopyrum). The shared polymorphisms Comparison of gene sequences across the 34 Triticeae among congeners contradicts the lack of shared poly- species reveals only seven fixed differences despite the morphisms among conspecific plants noted above. very high level of nucleotide diversity in an array. The However, we suspect that the sample size among the seven positions at which one species has a nonpolymor- phic base different from the consensus are marked with conspecifics may have been too small to detect shared polymorphisms. t (Figure 3) ; two such positions are in the short-spacer array and five in the long-spacer array. Exceptfor these Testsfor departure from the neutral expecta- seven cases, polymorphisms are not fixed between spe- tion: The foregoing data suggest that polymorphism cies. Thus, for example, at position 460 in the short- and divergence in the genes are decoupled, whereas spacer array (Figure 3A), there are species that have they are correlated for the spacers. We have tested this both A and G represented among the members of the using a modification of the McDonald-Kreitman test array, but we have found no species that is fixed for A. ( MCDONALDand KREITMAN 1991) , where the number This is completely counter to the neutral expectation, of polymorphic sites for the genes and spacers of two which dictates that fixation of polymorphism should be species are compared to the number of fixed differ- random and that therefore within-species polymorphism ences for genes and spacers between species. The null should be reflected as fixeddifferences between species. hypothesis is that the proportion of substitions in the The lack of divergence isreflected inthe fixed interspe- gene is independent of whether they are fixed or poly- cific differences among genes, which are consistently 0 or morphic. Because the number of fixed differences in nearly so (Table 4). Note that this is a parsimony-based the gene is 0 or one for almost all possible painvise calculation, in which polymorphism withina species does comparisons, we used Fisher’s exact test (Sow and not contribute to the estimation of differences between ROHLF 1981, p. 740) to compute values of P. species. Nearly all of the nonconsensus sites that occur in We tested all possible pairwise combinations of spe- an individual are eliminated over evolutionary timeso that cies for which we had sequences of three or more 5s only the consensus sequence is consewed. units. Contingency tables are available from the first Some polymorphisms appear in congeners, and may author on request; P values are summarized in Table have been retained across speciation events, e.g., in the 5. These include some taxa that are so closely related short spacer array, K-462 in Secab syluestre and S. uaui- that they have not completely diverged and still share lovii, Y-446 and M463 in Thinqbpm bessarabicum and polymorphisms ( e.g., Secab cereab and S. vauilouii) . The T.junct?i$im, M425 in T. sharonense and T. uniarista- majority, however, are more distant, yet are not so di- 332 and Kellogg E. A. R. Appels verged that multiple mutations in the spacer region are differences in the spacer to be significantly different likely to be a problem. from 0. For example, in the comparison of T. comosum Of the 114 comparisons in Table 5,93 are significant with T. tauschii, the ratio of polymorphisms in gene to at the5% level and anothersix at the10% level, indicat- spacer is 3:6, whereas for the fixed differences the ratio ing that the lack of fixed differences in the genes is is 0:7 (P= 0.15). significantly less than expected no matter which taxa Saturation of fixed interspecific substitutions is un- are used to construct the contingency table. In thecases likely to be a problem. Saturation would only affect the where the lack of fixed differences in the gene is not spacer region, and would lead to an overestimate of the significant the taxa are very closelyrelated and / or not genespacer ratio. For these data, increasing the ratio of very polymorphic (e.g., T. bicorne, in which nucleotide fixed differences between gene and spacer would make diversity is 0). It appears that the species in the large it closer to the ratio observed for polymorphisms, and Triticum clade ( T. tauschii, T. comosum, T. tripsacoides, would lead to a nonsignificant test. This may be creating and T. sharome) have not accumulated enough fixed some of the marginally significant values comparisonsin

A Short-macer- aeauences- 10 20 30 40 50 60 70 80 90 100 Crde --TTTTAMTATTTTTT--G-COCCG-CGTG-~-CA~--ACOCAC~TG~-CATATA---TTTACCOCG~RTT------~TT-CCCAC Hepe -T...Y..T...W....--.-..YST-....A....-YAT---S....W....CAA.A-Y.,.Y.---A.AR..A.R..C.C.C------A .. .T-. . .Y Ps ju .4 ...... WLIWWWW..... S..T.T.....G. ..G...TAT...... A.C.C.A.T...... A.C..G...T...... CTA T. . .T-. ... Scer .T ...... T...... T.T.....A...G.TATA...... CM.A.T...... A..GA.CGCG...A..TTTYTTCTTCTGA ...T..... SCeP .T ...W..'IW.. ... K... .. Y.T.T.....A ...G.TAYATlQA...... CAR.A.T...... R..3R.CVCR...AQ.TTTCTTCTTCYRR...T.Y... semo .T ...W..TW...... T.T.....A..GO.TAT..ALQ...... CAR.A.T..R.....A..GA.C.CG...A..TTTCTTCTTCTGH ... T..... sesy .T ...... T...... T.T.....A...G...MTAT...... CM.A.T..lQ....A..GA2MCR...M..TTTCTTCQTCTGA...4..... Seva .T ...... T...... T.T.....A...G141MTA4...... CM.A.T..14....A..GA.Mffi...M4.TTTCTTCTTC~A...T..... Psli .T ...... W.T. ST.S... 4.Y2...... A...... WM...... C.C.A.Y...... A...A.C.ffi..WT.CTT...... A...T.YS.. PSSP TT...... T...... GT.....A...... M...... Y.C...A...... A...A.C.ffi...-.CTT------A .. .T-. . .T .T T... T... Thbe .T ...... Y...... M...... C.CRC...... ""A..TA.CAffi...CA.TT------A. ..T-... . Taca TT...... CAT ...... TA..T.Y...... M...... C.C...... ""A..TA.CATG...GA.T------A ...T- .... Trta 4T ...... T.-...... 2 ...... 3...... l..M...... 2...C.C.A...... G..TAOCAffi444TA.T...... A...T.G... TrCO .T ...... T...... 4...... M...... R...32.C.A...... TATG..TAOCACG...TA.T...... A...T.G.. T Trbi .T ...... 4.T ...... Y....W...... M...... C.C.A...... G..TAGCAYG...TA.T...... A.A.T.G... Trlo .T .....lT.T ...... 4.3. ...T...... Ml...... C.C.A...... G..T.CAffi...TA.T...... A.A.T.G... Trsh .T ...... ?WT.W....44.....T...... M...... C.C.A...... G..TAOCAffi...TAYT...... A.A.T.G... Trt2 .T ...... W.T ...... G...... M...... C.C.A...... G..TADCACS...TA.T...... A .. .TTG. .T Trdi .T ...... T...... "...... - " M...... C.C.A...... G..TAOCACG...TA.T...... A .. .TTG. .T Trtr .T ...... T.T...... "...... M...... C.C.A...... G..TADCACG...TA.T...... A .. .TTG. .T Trum TT... C..T.-...... "...... " M...... C.C.A...... G..TAGCACG...TA.T...... A .. .TTG.. . .T T...... Trse .T ...... A"- ...... M...... C.C.A...... ----GC.TAOCACG...TA.T------AG..T-G... Trun .T ...... T.T...... -_..... Y...... CR...... M...Y...... C.C.A...... G..TAOCAYG...TA.W...... A ..... W W-G TrmO .T ...... W...... "_ .... TOAC...... R ..W.H.M....R...... C.C...... TA.CACG...CA.T...... A ...T-G. .Y TnnR .T ...... T...... T.A..A...... G...... AAA...... C.C...... TA.CACG...CA.T...... A...T.00.. Thel .T ...A..T...... T...... GT.....A...... G.M...... C...A...... C..C.CA...TA.T...... A ...T- .... Thju TT...... T.T ...... Y ...... M...... C.CRC...... "_ ...... A .....CA..T...... A ...T-. ... Thsc .T ...A..T ...... T. ....A...... TA..T...... C.C...... TA.C.A...C..C.A.A...... A...TTT-.. Apec .T ...... T...... T4 .. -T "~""~"~""~"_~~""~~~""~~~"~"~""~""""""""~"""""""""" Avel .T ...... CAT ...... T...... GT..R..A...... MT...... c...... A...T-G... 120 130 140 110130 120 150 160 170 180 190 200 210 Crde GTTTOC-GTTMG~A-ACG~TCATTATTCRffiT-~CC~~~T4GY~OC~M--G2G-TTH-VlGAAA~~AAACCG-T~TA Hew ..Y ..... RHOY.T ..T...C...SYA..W..W....MSRY...A..T.....RCST...... MR...A.R...ClYRWT.HCKR.RII....CK...... Ps ju ..G ..A..A..T.T..T..M...CA...A.C.C...A..C...... AOC.AC.OCG.C.T...... T.C.CG.T.CT..G.G....C....GG..... C. Scer ....AT .. A ..T.T.KT. ..GC..G....T...... TA..C...... AffiC...... A...... TCAC.G.T.TT....CC...C...... A... step ....RY1.A.AY.T ..T..KKY.RG ....T...... YA.SC...... K.RR.RRffiY...4...A..K.R.3TCAY.G.T.TY....SY...C...... R..RR.. semo ...... A ..C.T ..T...DC..G....T...... A..C...... RRYGT...... A..K....TCAC.G.T.TY....GT...C...... A... Sesy ...... A ..C.T .. T.1 .GC..G....T...... A..C3....W32S...AffiT...4...R....l.ATCRY.G.T.W....GT...C...... R... seva ...... A ..C.T ..T..KDC.RI(....T...... A..C...... K.R...AffiT...T...R.....RATCAC.G.T.TY....GT...C...... R... Psli R.Y ..Y..R ..Y....T...C...S...... Y.....C....T..S.3...ffiT...T...... RH.....SK.Y..Y431..R....CR...... W pssp ...... G ..C.T ..T.C.C ...C...... A..C....T...... CGT...T...... ""SSG.T-CT....G....C..G...... Thbe ...... G ....T..T...C...C...... C.....A..C....T...... CGT...T...... G...C..T.CT....G....C...... Taca ...... GC... T ..TT..C...C...... A..C....T....SY..CGT3T.T...... AGR....C.TOCT.CT....G..A.C...... Trta ....A. ..G ....T..T...C...C...... A..C....T.~..SS.SS...... C.TO.T.CT....G....C...... A...... TrCO ...... G ....T..T...C...C...... A..C....T...... C""""-""- .. --.C.TG. T-CT...------T. .T Trbi ...... 0.... T..T ... C..RC.R...... A..C...... C.TG.T.CT G....C...... A...... Trlo ...... C..AC.A...T..T A..C-... G...... 234G.T.CT ....G. ...C...... A4..... Trsh ...... G ....T..T ... C..AC.A...... A..C...... C.TG.T.CT 3...... C...... A...... Trt2 .....T..G ....T..T...C...C....S...... A..C....T...... CKYG.TlCT ....G....C...... A...... Trdi .....T..G....T..T...~... c ...... A..c....T...... C.TG.T.CT.G..G....C...... A...... Trtr .....T..G ....T..T...C...C...... A..C....T...... C.lY3.T-CT....G ....C ...... A...... Trum .....T..G.... T..T ...c...c...... A..c-...T.------.C.TG.T-CT....G....C....G.A-..... Trse ...... G....T..T...c... C...... A..C....T...... C.TG.T.CT....G....c...... A...... G.R T ..T...C...Y...... A..C....T....~...C...... C.TG.T.CT....G....C...... A...... Trun ...... TrmO ...... -.~....4 ....T- c ...... y...~(.... A ...... c- GT.3 R...G TnnR ...... C...... T..T...C...... G....A..C...GT...... G"""""""""-"""""""""""""""" Thel ...... G.DCAT ..T...C...C...... A..C....T...C....CAT...T..T...... A.CG.T.CTA...GC...C....AAA."... Thju ...... G ....T..T...C...C...... C.....A..C....T...... CGT...T.3...... G.C.ffiCT.CT....G....C...... A ..T.T ..T...C...C...... Thsc ""~""""~"""~""""~"""~""""""~""""~~"""""~~~"""-"""""------Apec ...... Avel ...G..- ....MG3T.C------T .... G....G ..G....-...W.

FIGURE 3. Multiple alignments of 5s DNA sequences. Alignments begin with spacer sequences followedby gene; gene region is in boldface. Species namesabbreviated from Table 1. (A) Short-spacer array; (B) Long-spacer array. Large duplications have been omitted for clarity. Polymorphisms are indicated by standard IUPAC symbols; additional symbols: 1, A or -; 2, C or -; 3, G or -; 4, T or -. Fived interspecific differences in the genes are indicated by t. Variation in 5s RNA Genes

22 0 230 220 240 280 250 270 260 320 290 310 300 Crde AATAOOCOCTGO4KC00--TT4G-ADA--DGGA0034T4G-3AAACC-GTGOT-AAACTCGTCTCBKTGO-~~Y~A~G----SCCAMOCAlTOYAS-AV Hepe M ...C.Y.T.RCRG...... K.W.G....S...TGOTS.S...... RH...... GY ...... C..A.-.Y .RRC...AGTO.-----GKM..AT.A..T.G-.A Ps ju C ..TT...T..CTO...... GA...... GOTA.G.....A...... T...... CR.A...... C...AG.G.T.....Y.G.AT.AC.T.G.. A Scer ..G.C ...T..CTG.....CG...... T.GY.A.T..A...... G...CG..CG.A.T.A...G.A.AG.A.T..OO..G..AT.A..T.G.. A sceg .1G.Y3 ..T..CTG.R...YG.R...... R...T.GT.A3T..A.1...... G...YG..CG.R.T~A...33R3AGRA3T..OO..Gl.AW.PX.T.G.. A sa0 ....C...T..CTO..."CG-.-...... T3GTRH.T ..A...... G...CG..CGWK.4-A...G.A.AG-A.T--OO..3..AWMRK.T.G-.A Sesy ..G.C ...T..CTO...--CR-.- .... 3 ...... T.GT.A.T..A... Y...... G...YS.MCG.R.T-A..33.A.WC-A.T--OO..G ..AT.A..T.G-.A Seva ..G.C ...T..CTO.....CG...... T.GT.A.T..A...Y...... G...YG..CG.13G.A..33.A.WC.ART..GO..3..A T.A ..T.G-.A Psli ....C.T.KY.R TO...--C..K...... G.GT .. 3...... CGYA...... 3AGYG...... l.AGRA...RC.. A PS'SP ....C.T.G...M...--..-.- ...... GOT.-G...... CGCA ...... C. ..AGTG...... AG.A..TG. C.A Thbe ...... GOT..G ...... G...... CG.A...... C...AGYG...... A..T.-C.A Taca ...... TO...... 4-.-...DG.--.. .. SGT.-G...M ...... 2.R....TCG.A.- .....C ...AGY33...... A ..T.-C.A Trta ...... A....TO...... A...... 42..TG.A.4 .....C ...A3CG...... G.A..TG. C.A Trco ...... A....T3...... R-A...... CG.A.-.A...C ...AGCG.-----. ..R..G.A..TG-C.A Trbi ...... A....n;Y...... R ...... Y...... CG.A.-.. ...C...AGS3.----- ...... S,A..TG-C.A Trlo ...... A....TGT...... T...... CG.A...... C...AG.G...... G.A ..TG3C.A Trsh ...... A.... TOT ...... YK...... CG.A ...... C ...AG.G...... S.A..TG. C.A Trt2 ...... A....TO...... COOA.-...... C ...A3YSK---- ..... l.G.A..TR-G.A Trdi ...... A....%..... c ...... CG.A.-.....C ...AGCO.----- ..C...G.A..TOO-.A Trtr ...... AT..CTO...... CG.A ...... C...AGCG ...... C...G.A ..TOO-.A Trum ...... A....n;...... CG.A.-.....C ...AGCG.------.. C...G.A..TGO-.A Trse ...... A....CG...... CG.A.- .....C...A~G ...... G.A..TG-C.A Trun ..A....A....n;...--..-.------...... CG.A.-... ..C. ..AGCG.TCCTCT .....G.A..TG-C.A TmO _"""""_"""""""""""""""""""""""""""""""""""-"""" ...... AA. TG-C.A TrmR ...... AA.TG. C.A Thel ""C.GCT..CTO...-- GA..... OOT.-G...... OO .A.-.....C AGCG.-----...... A..T.-C.A .... Thju ...... GOT..G ...... G...... CG.A.-.. ...C ...AGYG...... A..T.. C.A ThSC ""C.T.T ..CTO...TA.A...... AGT.GG...... A...... TCG.-.T .....C....G-A.----- ....G...ACTGTAC.A Apec ------CW.-.TO.R.TT..-.------A..GA.3GOT.-GM....-...... Y...CG.AK-..S.SC...AGTR-----AH.R..AG.A..TC-C.A Avel ....C.T.... .TO...--..-.-...-GR.AG.--GOT.-GY...... S..-...... T.CCG.A ...... AGAG.----- .....AG.A..T.-C.C

330 340 350 360 370 380 390 t t400 410 420 Crde -YCDTCTTTGTAGTGRAGCC--GGKA-GRGGCAAOCAT--AA-GDD-RYGMGR-HS~CA-TO~QM~-C~T-~T~-AOC-A~~-~ Hepe .T.R ..21C ....430.T2T43 ...... G.3 ...... R....R.MCK...GCD...3A.T..S...... Q...... Y...... Y....l...Y 2. Ps ju .T .....CC...... G.A.T....G...G....W...... G.A.R...AA..TA.R...... C.Q.....A...... S8...... K..C.. scer .T ....C.C....C.GDC.T....G...G...... A.GC....A.C. ...A.T...... C.K...... C.. scep -T....C.C....CKGGY.T--R.G.-.G..M...... "..... A-GS ...... Y....A3T....3....8Q0...-....-...... -...1...... -...CS. semo .T ....C.C3 ...C.GDC.T....G...G...... 21....A.T...... C.Q...... C.. Sesy .T ....C.C....C.GGY.Y....G...... -_..... A-KS...-GAC....A-T....-...C.Q..-I...-...... -...-...... -...Y.. seva .T ....C.C....C.GGY.Y....G...... M"....A.KS....GAC....A.T...R....C.Q..3...... C.. Psli -T.R...W....SY.G..2T--..G.-.G....l....--..-..G-AH.....M-....-341...-...C.Q...-....-...... -..8-.8....-...C.. pssp .T ..Y...... G...T..R.G...S...... -_..... G-AC....G-CA...G-T....-...C.Q..-..,.-...... -...-...... -...C.. Thbe .T ...... G...T....G...G ...... -_ G-AC...-GAC....G-T....- ...C.Qi..-.R..-...... -...-...... -...C.. Taca .TA ...... KG..24..3331.30...... "..... G-AH...-. AS ....K-T.l ...... Q...-....-...... l..N-...... -....Y. Trta .T ...G.G...... G...T....G...G ...... -- G-ASS ... GAC ....G-T ...... C.Q..-....-...... -...-...... 3...C.. Trco .T ...G.G...G...G...T....G...G...... " K.G-AC...-GAC....G-T ...... C.Q...-....-...... -...-...... -...C.. Trbi .T ...G.G...... G...T....G...G...... " G-AC .... GAC3 ... G-41 ...... C.Q...1...... C.. Trlo .T ...G.G ...... G. .. T.. ..G.3.G...... " 3..G-AC...-GAC....G-TW...-...C.Q..3C.B..a...... -...-...... l...2Y. Trsh .T ...G.G...... G...W....G...G...... "..... G-AC...-GAC.. ..G-T....-...C.Q...-....-...... -...-.E....-...clr. Trt2 .T ...G.G...... G...T....G...G...... "..... G4AC .... GRS....V-T....-...C.Q...-....-...... -...-...... -...C.. Trdi .T ...G.G...... G...T....G...G...... "..... G-AC...-GAC....G-T....-...C.Q...-....-E.....-...-...... -...C.. Trtr .T ...G.G...... G...T....G...G...... "..... G-AC...-GAC....G-T ...... C.Q...-....-...... -...-...... -...T.. Trm .T ...G.G...... G.T.T....G...G...... "..... G-AC .... GAC....G-T....-...C.a...-....-...... -...-...... -...C.. Trse .T ..CG.G...... G...T....G...G...... "..... G-AC...-GAC....A-T....-...C.Q...-....-...... -...-...... -...C.. TSU .T.K.G.G ...... G...T....G...G...G...... G.ACK...GAC....G.C...... T.Q...Z...... YYTY . TnnO .C ...G.G. ..R...G.3.T...RG...G...... M...... G.AC....GAC.R..G.TR...... C.Q...... Y...... Y.. TnnR .C ...G.G...... G...T....G...G...... "..... G-AC .... GAC....G-T....-...T.Q..-....-...... -...-...... -...C.. Thel .T ...... G...T....C...G...... -- ..... G-AC...-GAC....G-T....- ... C.Q ...... C.. Th ju .T ...... G...T....G...G...... "..... G-AC...-GAC....G-T....-...C.Q...-....-...... -...-...... -...C.. Thsc .T .....C..C....G.C.T....G...T...T...... G.AC....GDC....G.T...... C.Q...... C.. Apec .TY ...... GG.. T.. ..G...G..Y...... M...... G.AC.R.KG.C...AG.T...... M.Q.....R...... Y.. Avel CT ...... C.GC ..T-- ..G...G ..HW...... Y.AC ....G.C...AG.T...... C.Q....Y...... Y. a.

430 440 450 460 t 470 480 490 500 510 520 530 Crde -~T-CC-~T-P~-~C-~~--~A-OTAC-T~~-~~~-Q-~-~~CE Hepe -...-..-..4a .. n..-r...n. ....-....n...- w.Qz..Y...... -....-...... -...... n..-.-...-.~.mr...... c Psju -...-..-...-C....-...... -...... --...... K...... C scer -...-..-...-c....-...... -...... --...... -....-...... -...... c SCeP -...-..-...-C. ...-Y...... 8...... 8...... 4..R...... ~ semo ...... c....-...... Y...C sesy -...-..-...-a... N-..Y...... -...... K.--...... Y ...... C seva -...-..-...-c....-...... -...... K.--...... 8 ...... K..R.A.~...... Y..... c Psli -...-..-...-C.... -...... -....8..S-- ...... K....-...... C pssp -...-..-...-c....-..E...... -...... --...E...... -...... c Thbe -...-..-...-C....-.T...... - ...... Y--...... -....-...... C Taca -...-..-...-C....-...... -...... --...... K....-....-...... -...... Trta ...... c....-...... Y...... -.a...-...... a TIC0 ...... C...N-...... C Trbi ...... lC....-..Y...... l...... --..K...... 3..-....-...... 8...3...... 3...-...... C Trlo ~...~..~...Pc....-..YI...... -...... R..w.K.-...... -.-...-...... c Trsh -...-..-...-c....-...... 1.-...... --..3...... -....-..3...3...-...... ~...1 ...... c Trt2 -...-..-...-C....-...... -...... --...... c Trdi -...-..-...-c....-...... -...... --...... c Trtr ...... c....-...... - ...... c Trum -...-..-...-c....-...... -...... --...... Trse -...-..-...-c....-...... -...... c Trm -...-..-.1.1C ...... 31.....a...... -....-....8.8...-...... 8...... a...... c Tnno -...-..-...-c....-..Y...... -....8...--...... - ...... c TnnR -...-..-...-C....-...T...... -...... --...... C Thel -...-..-...-Q....-...... -...... C Th ju -...-..-...-C. ...-.Y...... -...... N--..;...... -...... c Thsc -.-.----...-C....-....A.....-...... --...... A.A.-...... A...... C Apec -...-..--..-C....-...... -n...R...--..K...... R...-...... R....-.-...-...... C Avel ...... W..C ...... Y...... II...... R... C FIGURE3. -Continued 334 E. A. Kellogg and R. Appels

B Long-spacer" se~uences- 10 20 30 40 50 60 70 80 90 Crbr TTTTTTAA-TATATTTTTG-CTCCACGC-GAG~CATGAC~CGTGCGCG------TATTTATT-~CM~ATCTTA~-TTGACG~ Hobu .....G ..-...... -.G.....-T.GC.GCG.lATGA...AC.T.A.TC4134...A...-G..G.CCATTTYA...... -.....R... Agcr .....W ..W--.T...... -.GA.T.R-W..Y.G.S...W...A...... A----- ...A. CMY-GTT..T...... -..D. ....Y pssp ...... T.T.T ...... -.GA.T.A-T..C.. ...AA....A...... A""- ...A ....-..C.CTGTT..A...... -...... Havi - ...... 14T.T ...... -.G.GT..Y-..S...G..A.... A .....K..A----- ...A....-....RCGTTT..A ..G. .-...... G Tspl -...... -.TAX .....-G.G ..WY.. T..C..M...AT...A.R..Y.Y.A-----...AW..K-..T.CCGTT..AX. ....-....S.... TspZ - ...... -.TATW....-S.G..TT.-T..C...... AT ...A...... A""- ...AY ...-..T.CCSTT..A...... -...... TrSe -...... -.TAT ...... -.G..T..-T..C...... AT ...A...... A""- ...A ....-..C.CCGTT..A...... -...... Trsh - ...... -.TAT ...... G..T..-T..C...... AT.A.A ...... A....-..C.CCGTT..A G ...... Trum ...... -. TAT ...... G..TA.-T..C...... AT ...A...... A....-..C.CCGTT..A...... -.....R..Y Trta -...... -.TAX ...... 2.G..T..-T..C...... AT ...AT...... A----- ...A ....-..C.CCGTY..A...... -....YA ... Sece - ...... W.T.T....d.-.GTGT..-T...... RAA. T..A...... T .A"--- C.TZGTT..A.Y....A...... A...... seva - ...... W.T.T ....-.-.GTGT ..-T.,,.. ...AA. T..A ...... T.A----- ...A....-..C.TCGTT ..A.Y ....A ...... semo ...... -. T.T ....4.-.GTGT..-T..,,. ...AA. T.TA...... A....3..C.YCGTT..A...... A ...... sesy ...... -... T ...... C.GTGT.CGT...... AA. T..A ...... T.ATA-.-...A....-..C.CCG TT.. A...... A4431234.. TrmO ...... -. TAT ...... -TA.GTACAT.. .G. ...GA....A...... T .A""- ...A. G.CTGT ....A...... A.. A. .. TmN TAT.. ..-.-TA.GTACGT...G. ...GA....A..... A... ..TA G.CTGT.. ..A...... Thel -...... -.TAT,. ....-.G..T..-T..C.. T...A...... A"--- ...T C.CCGCTG.A..C...-..... A ...... A...... Aret 4 ...... -.TAT ....-.-.G.RCWY3T ..C.. ...RAR.. YA. ....SS.H-----.G.RY...-..CATTGTTW..4.T...T...... YRY Avel 4... ..H.-.T.T...I-WGR-.T..-T..C .....KA..R.A ...... A""- ...A...Y-..CAGWGTT TA ..T ...... A.

100 110 150 120 140 130 160180 170 Crbr DCDGTAAG-TTATAGCTTGCGGCGCA-CTACTNACGC-GTCTA~G-GCRTTGCGGTGGCA------~GGTACCGCG-TT-G-TGAAA-G Hobu TGT.A.GTG ..T .....CA. ...T..-T.GT.C.G..-..A..T.T...... AAG.GC..T-.G.-OAAAG C. Agcr ...A ..Y.-..TA....C..T..TG.-A....C.YS.-34S...... C..G.C.---....GC-GAAA.--.-.G....-C.-C..C.R.-- pssp ...A ..T.-..TA....C..T..T..-T....C....AC...... -..G...---....GC-GAAA-----.G..---..-C-.C...-- Havi TD.A.ST.-A.W ...G.C..T.GT ..-T..Y.C....-..GMT...... -..G...---..K.DC-AAAA-----.G....-..-C--K... T- Tspl ...AY.T TA ....C.MT.YT..-T.G..CG ...-....T .....R-..W...---....GC.G-AAG----.G....-..-CZ.....-. Tsp2 ...AC.T .-..TA....C.AT..T..-T.SS.CG ...-....T ...... -..T...---....GC.G-AAG---- .G. ...-..-C-...W.-. Trse ...A ..T.-..T.....C..T..T..-T.GG .C AT. ..---.....C-G-AGG----.G...... -C-.....-. Trsh .T.A ..T.-..T.....C..T..T. .. T.GG.C A....-..T...---.....C-G-AGG.---.G...... -T-.....-. Trum ...A ..T.- .. T. ....C..T..T...T.GG.C. ...-.C....A...... T.G.---....GC-G-AGG.---.R. C- .....-. Trta ...A ..T.-..T.....C.YT..T..-T.GG.C.n..-...... A..Y.-..T....--....GYIG-ASG.-.-.3...S-.Y-C...W..-. sece ...A..T-- ..T .....C..T."TA..-TA...C. ...-..G...... -..G....--..A.GC-G-..---.G....-..-C-...G. A. seva ...A ..T--..T.....C..T.MT . .-TA. ..C YG...... -..G.. A.GC-G-.----.G..Y.-..-C-...G. A. Semo ...A ..T-- ..T...I.C..T..T..3TA...C...... G.....23S-..G.....-..A.GC-G.AAA----.G....-..-C-...G. A. Sesy ...A ..-3-KKKWR ...C..T..T..-TA...C...... G..3.....-T.G...-.-..A.GC-G.AAA..--.G....l..-C-...G. A. TrmO ...A ..T.T..T.....C.GA..T..-T.. G.C ....-...... AGA-.-- G...GC-G-AAA----.G C- .....G. Tmru ...A..T.T ..-.....C.GA..T..-T..G.C. AGA- G-AAA----.G....-..-C-.AG..A. Thel C.TA ..T.- ..T.....C..T..T..-T.G..C....-...... -..C....--....GC.G-AAG-..-.G....A..GC-..... G. Aret ..SA.. T.Y.Y- .....C..T..T.R-T....C....-...W..R....-..G443---4...GC-G"----32.YK-.Y-B-YSl..GS Avel ...A ..T.T ..-.....CKYT ..T..-T..S.C....-R...... R..R-SSG.W.--.....G-MGAAWA.-..-32...-..-C-.C... G.

190 200 210 220 260 230 250 240 270 280 Crbr GGGTCGAAA-----CCGCGGTAGAACTCG-n;Tn;G--T-GCGGTAGAG-A~AA--SGGTGGAAAC~TKGAAAACCCGTCT-TTGTTGTTGA--G Hobu .....A TGTTA----- ...------....-....3.--.C.A ...... " T3-G...C.A ..T.YG.G.T..C.T...G.-CC...... ". Agcr T ...G....A--TA--.------Z.-C....C--C...... T...W....33GG....R..C.MR.G.T....T...... CC..A...R.--. pssp ...... G.. A--TA------..-..... C--C...... T...... "G- ...... CG.G.T ...GT...... CC..A.. ...--. Havi ...... T G AA-CA""""""- ...... - C-- - C...... -GG-...... CG.G.T ....T...... CC.YA.....--. TSpl ...... G M AAMyw""""""-. ....- Y.S--.-.Y....Y-.Gl ..."...... R ..YG.G.W....T...... YC..GA....-.. Tsp2 .. 3.3,. ..-CT""""""- ...... - C--4-...3..T-33 ..3 ..-GG-33 ..3.G..TG.G...... T...... CC..GA....--. Trse ....GA ...M-TA------..-..... C-- ... T ....T...... -GG-...A.....CG.G.T.. ..T. CC..GT. Trsh ....GA...AA-TA------.....- ...--... T....T..-. ....-GG-...... CG.G.T ....T...... CC..GT....--. Twn ..A.GA ...AA- TA------..-..... C-- ...... Y.T ...... -GG-...... CG.G.T... ..M..T.)CY..GT....--. Trta ..R.-A ...AAlTA------.- ..l.....CZl...Y....T...... R-G-...... CG.G.T....T.....-YC..GT....3 G. Sece ....A-G -- AAATA------..-.. Y ..C-- ...... T..3.....-GG-...... CG.G.T....T. CC..A.C..R.-. seva ...A-GAAATA C--...... T...... -GG-...... CG.G.T ....T.....-CC..A.C.. .--. Semo ....A-G -- AAAWA Y ....C--.-..3...T...... -GG- ...... CG.GRT....T.....-CC..A.Y...--. Sesy .-A-G ---.AAATA------... C ....C-- .-..3...T..-.....-GG-...-.....CG.GAT....T.....-CC..A34...-.. TrmO ...- GT --.AAATA ------..G ..... C-- .-...... T...... -GG- ...... TCA.G.T....T.....-.C..A...... -. Tmru ....GT --.AAATA------...... C--.-.T...GT...... -GG-...... TCG.G.T....T .....-.C ..A.....". Thel ..-. G- ...AAATA- C...C.-....T...T..-.....-GG-...A.....CG.G.T....T....C-CC..GT....--. Aret .C-.G. ...AAATA Y.C --Y...R.Y .TW.... S..-GG-K....C...CG.G.T....T.....-.C..A.CC...- R Avel ....- GR-- AAATA ------..-....AA-- C-R.K...T..-...Y.-GG- ...... CG.G.T ..M.T.....-.C.. AX....".

310 320 330 340 350 360 370 380 370 360 350 340290 330 300 320 310 Crbr -CGGGAGAATACGTGG----TACGGYGCGG-TATCC--GTTATTA-GGAG-CGG~------AAAAGM-~TACGG-AGGTGTTTATGG Hobu -...... GG.GA ..A3 ----.GT ..T ....-.....--A...C..G.CGAAT.....TGA------"-...-TGC....CCC..C...T. A. AgCS -YR..RKCG..A..A.----.RYAXGR.A-TC....--A.CR.-.A..GAA.R..C.TM-TGSTAGTAA----...-TG.G.AA-TC..SW..G.T. PSSP - ...... G ..A..A..---C.TA.GA.ACT.....--A..G...-..CCAA...T.TCA-TGGTAGTM----...-TG...AA-TC..CA..G. T. Havi -.G..... G..A..A.------..M K---.--.------.- ..CAA....T.TM-TGETMGTARG...... -TG.R.AA..C..C... G.AA Tspl 2...... X..A..A..---..TA.GA.A-TY....--A..G...-..GAA....T.TAA-TGGTAGTGA----...-TG..GAA-TC..C.D.GG A. Tsp2 - ...... G..A..A.----..TW.GA.A-TC....--A..G...-..GAA....T.TM-TGGTAGTGA----...-TG..GAA-TC..C.G.GGA. Trse .G ..C...G..A..A.TATA...A.G..A.T.....--A...... CAA....T.TAA-TDDTAGTAA.---...-TG-..AA-TCA.C...GC A. Trsh .G ...... G..A..A.TATA..TA.G..A-T...... -A...... -..CAA....T.TAA-TGGTAGTM..--T..-TG-..A.TTCA.CC..G. A. Trum GG.. 2 ...G..A3.ASTATA..TA.G..A.T...... -A...... CAA....T.TAA2TGGTAGTM----...2TG-..AA2TCA.C.C.GC A. Trta 3G ...G ..G..A..A.TATA..TA.G..A-T...... -A...... -..CWRM...T.TM-TGGTAGTAA.---...-TK-..AA-TCA.C...GY A. Sece -H ...... G.GA ..A.----..TA.G..A-T..G..--A...... -..CAA....C.TGAA-GGTGGCAA----..G-TG..ATAC-C..C..C G.AC seva - ...... G.GR ..A.---- ..TA.G ..A-T ..G..--A...... -..CAA....C.TGM-GGTOGfM----..G-TG..ATAC-C..C..C G.AC semo ...... RG.GA .. .. TA.G ..A-T ..G..-.A...... -..CAA....C.TGM-GGTRGYAA----..G-TG..ATAC-S..C..C G.AC sesy - ...... GCGA ..A.---- ..TA.G ..A.T .GG...- A...... -..CAA....C.TOAA-GGTAGTAA.---..G-TG..ATAC-C..C..C G.AC TrmO -..A ....G..A..A.----..TA.G..A-T.....--A...G..-..C.A....-.TAA-TGGTAGTAG-..-...-TG..TA.C-C..CGG.G. A. Tmru -..A ...--..A..A.---...TA.G..A-T.....CCA...G..-..G.A....T.TAA-TGGTAGTAC.-.....-TG..TAAC-C..C...G. A. Thel -.....C.T ..A ..A.---...TA.-CGA-T.....-.A...... -..CAA....T.TM-TffiTAGTAA---....-TG...AAT-C..C...G. A. Aret -T ....K.G ..A..A.----..YA.G.MA-T...... -R...... -..CAA....T.TAA-TGATWGTAG---..R.-TGC..AAT-C.CC... C.CA Avel -T ...... R..A....--.-.RTA.G..A-T.G....-A...Y.....CAA....T.TAA-YGATAGTAA-"....-TG...AAT-C..CW..C. C.

FIGURE 3.- Continued Variation in 5s RNA Genes

390 400 410 420 430 440 450 460 t 1 480 Crbr TGGAGC-TGAGA-GGGGCTAGAA-TAA-CG--AAGGCGGRAGTAACA-TGTCOaA-TaC-~TCAT-A~~~~Q-C-ACC-Q~TCCC Hobu ..A...G ..G...... A....C...... A.A...... ATA.TT.TG..S...... Agcr ...... C..S..-.Y..AA..C.-..WS...-..--...A...GG...12.-...... R....-.Y...... -S-..Y-...... pssp ...... C.AG ..A.T...A....A.....A...... G..GG...... -...... Havi ...... C ..G...... A..C...... X...... G...... Tspl C ..... C..G....T...A..C...... - ....." A...GGSM.R ...... -.. Y...... Tsp2 C ..... C..G....T...A..C...... - ....." A...GG.A ...... Trse ...... C ..G....T...A..C...... C...... GG...... Q...... Trsh ...... C..G. ... T ...A..C...... C...... GG...... Trum ...... C ..G....T...A..C...... S.ll....S..AG...... 2...... L.Y...... CZZ...... Trta ....3.C.3G .... T. ..A..C.- ...... 3 ...... GS...... 3...1...... M....WZ...... sece ...... C ..G....T...A..C...... S.....l.....A.G G...... Seva ...... C ..G....T...A..C...MM...... A.GG...... Semo ...... C ..G....T...A..C...... - ...... -_ A.GG ...... R...... 3YZ...... Sesy ...... C..G..-.T...A..C.-...... A.GG...... Q...... Trmo C ..... G..G....T...A..C.4...... T...... C...GG...... mru C.A...G ..G....TA..A..C...... T...... C...GG...... -... A-.....-...... -.-...-...... Thel ...... C ..G....T...A..C...... - ..-_ ...... GG...... C-.C....-...-...... -.AM...Q.....-.-~-...... Aret ...... CWKG .... C...A..C.T...... G...... GG...... -...... lW...... Y...... Y Avel ..... MC.SG .... T.R.A..C...... KRG...... GK...... Y...-...... T..PIY......

4901 500 540 510 530 520 570 1560 550 Crbr ATY-AQ-MCTCC3M~M~~~~~ACTA~~-QAC~C~Q-TC~SQM~~CC Hobu ... C ..-...... Q...... T...... T...... C...T...... Agc r ... C ..-...... R.W...... RS.....T...... Y...... CR..T...... Y.. pssp ..-C ..-...... Q...... A...... T...... -.A..CC..T...... Havi ..-C..-...... Q...... T...... -...... -....C...T...... Tspl ... C..-...... Q...... T.....K...... -...... -....Y...T...... Tsp2 ... c ..-...... Q...... Y...... ~..T...... ~...... C...T...... Trse ..-C ..-...... Q...... T...... A...... C...T...... Trsh ... C..-...... Q...... T...... -...... -....C...T...... Trum ..-C.R-.M ....Q...... S...... T.R...... V...... C...T...... Trta ... C.3a.M ...... 2...... 234R31.R3....C...T...... Sece ... C.R. ..M...Q...... T...... C...T...... I Seva ... C.R-..A...Q...... S...... T...... ~....-...... K....-....C...T...... Semo ... C ..-...... Q...... TR...... CR..T...... M Sesy ... C..-...... Q...... R.T...... -...... -....C...T...... Trmo ..T-..-.....M.T...... T...... -...... -....C...T...... mru ... C ...... Q...... T...... C...T...... Thel ..-Q..-.....M...... T...... -...... T...... -....C...T.A...... Aret ..-C ..-...... Q...... T...... C...T..Y...... Avel ..1C ..-...... Q...... T...... D...... -..S.C...T...... 2 FIGURE3.- Continued

201 Shortunits

Longunits Rasepairs f

VI .--'0 7.5 3 5 5 v: n 2 2.5

0 5 55 105 155 205 255 305 355 405 .d5S 505 555 Rasepairs T FIGURE4.-Distributions of variant nucleotides across 5s units. Vertical axis is number of units with a nonconsensus base at that site. Horizontal axis is nucleotide position in multiple alignment. Arrow indicates start (5' end) of gene. 336 E. A. Kellogg and R. Appels

TABLE 3 Transition:transversion ratios for genes and spacers within arrays

Gene Spacer SpeciesTransitionsTransversions Ratio Transitions Transversions Ratio Agropyron cristatum (L) 8 3 2.67 20 18 1.11 Australopyrum pectinatum (S) 4 3 1.33 7 10 0.70 Awtralopyrum retropactum (L) 2 2 1.oo 26 20 1.30 Awtralopyrum uelutinum (L) 2 5 0.40 19 20 0.95 Australopyrum uelutinum (S) 3 4 0.75 4 6 0.67 Critesion bogdanii (L) 2 2 1.oo 5 5 1.oo Crithopsis delileana (S) 2 2 1.oo 15 16 0.94 Dqpyrum uihsum (L) 0 0 0.00 7 10 0.70 Henrardia persica (S) 9 4 2.25 39 30 1.30 Hordeum bulbosum (L) 0 0 0.00 3 2 1.50 Psathyrostachysjuncea (S) 1 4 0.25 5 4 1.25 Pseudoroegneria libanotica (S) 1 4 0.25 27 19 1.42 Pseuhoegneria spicata (S) 0 2 0.00 3 3 1.oo Secale cereakl (L) 1 2 0.50 4 4 1.oo Secale cereakl (S) 4 4 1.oo 45 19 2.37 Secale cereak2 (S) 0 1 0.00 2 1 2.00 kale montanum (L) 3 2 1.50 7 3 2.33 Secale montanum (S) 1 0 - 7 12 0.58 Secak syluestre (L) 1 0 - 2 5 0.40 Secak syluestre (S) 4 3 1.33 13 10 1.30 &cab uauilouii (L) 1 3 0.33 4 4 1 .oo Scab uauilouii (S) 3 4 0.75 13 11 1.18 Taeniatherum caput-medusae (S) 1 3 0.33 7 7 1.oo Thinopyrum bessarabicum (S) 1 1 1.oo 3 0 - Thinopyrumjunceiforme (S) 1 1 1.oo 3 0 - Triticum bicorne (S) 2 1 2.00 7 3 2.33 Triticum comosum (S) 0 1 0.00 3 1 3.00 Triticum dichasium (S) 0 1 0.00 0 0 0.00 Triticum longissimum (S) 0 0 0.00 0 0 0.00 Triticum monococcum (S) 5 0 - 9 5 1.BO Triticum sharonense (S) 1 2 0.50 2 5 0.40 Triticum speltoidesl (L) 2 1 2.00 15 16 0.93 Triticum speltoides2 (L) 1 1 1 .oo 1 7 0.14 Triticum tauschii (L) 5 6 0.83 13 12 1.08 Triticum tauschii (S) 1 0 - 0 8 0.00 Triticum tripsacoides (S) 0 0 0.00 4 9 0.44 Triticum umbellulatum (L) 4 4 1.oo 5 4 1.25 Triticum uniaristatum (S) 3 0 - 7 5 1.40 involving the species of Australopyrum,but will not lead Gene trees: Phylogenetic trees of the 5s genes from to Type I errors elsewhere in the data set. both long and short loci are shown in Figures 5 and 6. The comparisons sumarized in Table 5 are not inde- Analysis of the long locus arrays produced 500 trees of pendent of each other. Even those that involve different length 640, with a consistency index (CI; KLUGE and taxa may reflect changes along thesame branch in the FARRIS1989; Fms1989) = 0.692, and a retention tree. For example, the comparison of the shortunits of index (RI;FARRIS 1989) = 0.885. Analysis of the short- T. tauschii with those in Taeniathemm caput-medusae is spacer arrays produced 300 trees, length 815, CI = not actually independent of the comparison between 0.566, RI = 0.867. These retention indices are much T. nwnococcum and Crithopsis delileana.The two Triticum higher than the cutoff found by MADDISON ( 1991 ) be- species share a common ancestor, and therefore some low which multiple island of trees become a problem. of the fixed differences in their spacer regions are Differences in terminal taxa reflect both differences in shared. sample due to PCR and cloning bias, lack of a short In summary, virtually all comparisons among sister spacer array in Dasypyrum and T. speltoides, and paral- taxa indicate that the lack of fixed differences in the ogy of both the long- and short-spacer arrays in Crite- gene(Table 4) is significantly different from what sion and Hordeum. would be predicted by the neutral theory. Trees were initiallyrooted with Bromus, shownby sev- Variation in 5s RNA Genes

ooooomoomm~o qqqqqqqoq888000000000 000000000000

ggggg2882 22% 999999999 113 000000000 000

ooo abbm~momb 000 mm--*30m3 999 3""""""rj" 000 000000000 338 andE. A. Kellogg R. Appels

TABLE 5 Results of McDonald-Kreitman tests for painvise comparisons among taxa for which at least three 5s units were sequenced

Awtr. Austr. Henr.Crith. Secak Secab Taenia. Tn'tic. Tritic. Tritic Tritic. Tritic. pectin. uelut. delil. persic. cereak uauil. caput. bicor.caput.uauil. cereak persic. delil. uelut. pectin. Sharon.monoc.comos. tawch. A. Group I (short-spacer)array Aus. pec. Asu. vel. 0.0147* Cri. del 0.06290.0018* Hen. per. 0.09990.0017* 0.0141* Sec. cer. 0.0211*0.0021* 0.0008* 0.0036* Sec. vau. 0.0026*0.0001* 0.0001* 0.0004* 0.8437 Tue.cap. 0.0188*0.0001* 0.0141* 0.0034* 0.0001* <0.0001* Tri. bic. 0.0643 0.0004* 0.0471*0.0017* 0.0099* 0.0005* 0.0040* Tri. com. 0.0321*0.0001* 0.0056* 0.0004* 0.0040* 0.0002* 0.0003* 0.1923 Tri. mon. 0.0193*0.0001* 0.0005* 0.0005* 0.0039* 0.0002* <0.0001* 0.0091*0.0112* Tri. sha. 0.0373*0.0002* 0.0038* 0.0005* 0.0023* 0.0001* 0.0003* 1.0000 0.0372* 0.0045* Tri. tau. 0.0454*0.0002* 0.0149" 0.0010* 0.0070* 0.0003* 0.0009* 0.4286 0.1500 0.0116* 0.0827 Tn'. tri. 0.13980.0023* 0.0640 0.0050* 0.0199* 0.0029* 0.0135* 1.0000 0.4872 0.4790 0.2574 0.6957

Agr. m'. Aus. ret. Aus. vel. cri. bog. Sec. cer. Sec. uav. Tri. spe. Tri. tau. B. Group V (long-spacer) array Agr. mi. Aus. ret. 0.001 1* Aus. vel. 0.15740.0021" Cri. bog. <0.0001* 0.0023*0.0024* Sec. m. <0.0001* 0.0061*0.0092* <0.0001* Sec. vau. 0.0132*0.0205*0.0001* <0.0001* 1.oooo 0.0130* 0.1146 0.0954 0.0032* 0.0193* 0.0320*0.0193* Tri. 0.0032* spe. 0.0954 0.1146 0.0130* Tri. tau. <0.0001* 0.0102*0.0083* <0.0001* <0.0001* 0.10240.0002* Tri. umb. <0.0001* 0.0009*0.0013* <0.0001* <0.0001* 0.0109*0.0171*<0.0001* Conting.encv tables are available from the first author on request. Values are for Fisher's exact test; * < 0.05. Note that Y, P comparisons are not all independent. See text for discussion era1 independent data sets to be the sister group to the this is difficult to evaluate because of differences in Triticeae (DAVISand SORENC1993; HSIAOet aZl994; P. terminal taxa. We therefore produced trees using only CATALANand R. OLMSTEAD,unpublished data) andwith shared taxa (Figure 7) . These trees have onlyone clear Brachypodium. The latter has been shown recently to conflict, which involves the position of T. monococcum be very distantly related (DAVISand SORENG1993; HSIAO (A), but this conflict cannot be easily explained by et aZl994; P. CATALANand R. OLMSTEAD,unpublished lineage sorting. Rather, it points to some different his- data). This is reflected in the sequences of the 5s RNA tory of the two 5s arrays in that species. spacers, which are difficult to align with the other Triti- ceae. Depending on the choice of alignment parameters, DISCUSSION Bromusinermis is 52-66% similar to representatives of the Triticeae, and Brachypodiumplnnatumis only 44-62% Thereare two surprising aspects tothe data pre- similar. We have therefore not included them here. sented here. First, concerted evolution has not resulted In general, the 5s units for each species are more in completely homogenized arrays. Second, selection closely related to each other thanthey are to units from on the gene appears varyto in intensity and is indepen- other species. This is also true in four of the five cases dent of any effect on the spacer. We will discuss each for which we have sampled arrays from two conspecific of these in turn. plants. There are exceptions, however, notably among Lack of concerted evolution-is gene product really the species of Secale, in which either a complex history homogeneous? The amount of variation among genes of lineage sorting or introgressive hybridization is evi- within an array implies that there is as little constraint dent. Within most species, apparently all the units of on the sequence of a particular 5s RNA gene as there the array share a common ancestor more recent than is on the sequence of the nontranscribed spacer. This the speciation event. can be easily explained by the high copy number of the There is also some incongruence in the trees, but genes. With many genes in an array, the organism is Variation in 5s RNA Genes 339

7ITritkum smitoldes - 1 Triticum speitoi&s’- 2 ‘] Triticum speitoi&s - 2 Triticum sharonense

Triticum tauschii

Triticumumbeiiuiatum 4 41LA I Thinopyrum elongatum

Agropyron cristatum FIGURE5.-Phylogenetic tree of long- spacer units, showing that for most species, e1Pseudoroegneriaspicata gene trees coalesce within species. Strict consensus of 500 trees, consistency index (GI; KLUGE and Fms1969; Fms1989) Australopyrum = 0.692; retention index FMS 1989) retrofractum (RI; = 0.885.

Australopyrum velutlnum

Triticum monOcOccum - 1 Gitlcummonococcum - 2

Secale cereale Secale cereale Secale vavilovii Secale montanum

7Dasypyrum vliiosum buffered against the deleterious effects of mutation in We do not know how many of the 5s RNA genes are any one copy. represented in the transcript pool. Because availableRNA The forces of concerted evolution are clearly insuffi- sequences were generated by direct sequencing, any sub cient to homogenize the arrays of 5s RNA genes in stitution occurring at a low frequency would go unde- the Triticeae. That this is true for most of the 35 taxa tected. Hence, the homogeneity of RNA sequences could indicates that it is a general phenomenon, and does represent either transcripts from only a small subset of not reflect a single peculiar plant. We conclude that if genes, or inability to detect heterogeneity in the gene replication slippage, unequal crossing over and dele- products, or both. Because we have found so many variant tion/amplification events areoccurring (as seems genes, though, we believe that the number of “perfect” likely) , their effect is quite local and/or they may be copies in any individual may be quite small. relatively infrequent. This is consistent with observa- There are two explanations for this high variation. tions by SAMSONand WEGNEZ(1988) on Drosophila First, it may reflect mutation across the entire array. and by SZOSTAKand Wu ( 1980) in Saccharomyces cerevis- Second, there may be sufficient interspecific gene flow iae, which indicate that genetic exchange from unequal to keep introducing variants into an array. We cannot crossing overaffects sequences no more than six or rule out the second possibility. However,some circum- eight units apart. stantial evidence renders it unlikely. Most diploid Triti- We have sequenced only a tiny fraction of the 5s units ceae are intersterile; the hybridization for which the in an array, even in our best-sampled species (seven tribe is famous occurs predominantly among poly- sequences out of ca. 4000). The fact that many of them ploids. Also, high variation is found in 5s arrays in every deviate from the consensus and almost no two are iden- plant species that has been investigated (see below). tical suggests that variant sequences are very common. In some cases, this may be due to dispersed arrays, but Within the gene alone, 70% (107/ 152) of the units others do not fit this explanation. differ from the consensus. Forcesacting on spacer different fromthose on 340 E. A. Kellogg and R. Appels

Triticum tauschii

Triticum comosum

Triticum dichasians Triticum tripsacoides- 1 Triticumtripsacoides-2 - Triticum searsii Tritfcum bicornis C7 Triticum sharonense J Triticum longissimum 3 Triticum sharonense Triticum monococcum-1

I ” Crithopsis delileana FIGURE6.-Phylogenetic tree of short- spacer units. Strict consensus of 300 trees, CI = 0.566, RI = 0.867. Taeniatherum CaDut-medusae 7PseudoroGneria libanotica IPseudoroegneria spicata Australopyrum pectinatum Australopyrum 1 velutinum Pseudoroegneria libanotica - Thinopyrum elongatum 1Henrardia persica

- 7 Secale silvestre Secale vavilovii - 7 Psathyrostachys juncea cereale Secale gene: Remarkably, the variation in the gene is not sig- the spacer. This rules out unequal crossing over, and nificantly different from that of the spacer within an replication slippage as mechanisms homogenizing the array, yetsignificantly different from the spacer be- genes, because they would also act on the spacers. tween species. This implies that the evolutionary forces The sizeof the 5s array isknown to fluctuate. If acting on the gene are different from those acting on these fluctuations resulted in periods of very low copy

Australopyrumret. 51 vel. Australopyrum <50 1 1 Pseudoroegneria spi. 98 speltoides Triticum - FIGURE7.-Comparison of trees 5 Triticumtauschii for both 5s RNA arrays, showing dif- 92 ferent histories for the arrays in %ti- Triticum monococ. cum monococcum. Comparisonwas 2 5 Thinopyrum elong. - done usingonly taxa in common. Numbersabove lines indicateper- Secalecereale 10 Secale montanum 100>8 numberscentage of below lines are100 val-replicates;bootstrap decay >8 ,‘, ues. Dasypyrum c Psathyrostachys

spacer arrayshort spacer arraylong spacer Variation in 5s RNA Genes 341 number, they would tend tohomogenize the array. The evidence that any particular 5s RNA mutant is selec- fact that the array is comparatively nonhomogeneous tively maintained. However, this could only be ad- indicates that such deletion events either occur very dressed rigorously by a population-level study, which infrequently or do not reduce thecopy number enough has not been done in plants. to eliminate all variation. Certainly the appearance of Our results are also somewhat similar to those re- shared polymorphisms in several members of a clade, ported recently by SCHLOTTERER and TAUTZ( 1994) for as noted for Secale, some Triticum species, and some ribosomal DNA sequences in Drosophila. Their data Thinopyrum species, suggests that sharp reductions in suggest that homogenization of repeats along chromo- array size may be infrequent. somes proceeds more rapidly than recombination be- The frequency of deletion/amplifkation events is un- tween chromosomes and that therefore concerted evo- known in plants. If the array contracted sharply in size lution may proceed by selection on the linear array, during gametophyte formation, and was then reamplified, rather thanon any singlerepeat unit.This is also consis- it seems unlikelythat one would see the level of variation tent with other results from modeling variation in rDNA reported here, unless replication were abnormally error copy number ( LYCKEGAARLIand CLARK1991 ) . prone. We suspect therefore that deletion and amplifica- How general is the pattern? Other plants exhibit sim- tion occur less frequently. This is supported by data of ilar patterns of polymorphism. Mutations are spread KANAZIN et al. ( 1993) ,which indicate that size of the array across both gene and spacer inArabidopsis, Acacia, does not fluctuate with stage of the life cycle nor do size Pinus, Oryza, and Linum. The interpretation of the fluctuations occur in every generation. variation is somewhat complicated in Linum, which is Deletion/ amplification cannot by itself explain the known to have many small dispersed 5s arrays, rather difference in between-species variation in geneand than a couple of large localized ones ( GOLDSBOROUGH spacer. Loss of variation due to deletion / amplification et al. 1982). Oryza may be similar, in that sequences of events should act similarly on both gene and spacer. 5s units do not fall into discrete classes as they do in Selection on the 5s array: We suggest that, in addi- the Triticeae ( MCINTYREet al. 1992). In these groups tion to gene level processes, there may be selection on it is therefore possible that entire arrays are non-func- the 5s array as a whole, which would be selection for tional and that populations of 5s genes appear and numbers of functional copies. In this model, the 5s disappear around the genome. In such a situation the DNA array accumulates mutations throughout its lack of demonstrable concerted evolution is less surpris- length, more or less randomly with respect to position ing. The situation in Acacia and Pinus is more similar in spacer and gene.As long as the number of impaired to Triticeae, in that there appear to be a couple of genes is below a certain level, there is no effect on the dicrete loci ( MORANet al. 1992; PLAYFORDet al. 1992) , organism. As the number of nonfunctional or partially and in Arabidopsis there is apparently only one. How- functional genes reaches a threshhold, however, the ever, the Acaciaspecies studied by PLAYFORDet al. organism’s fitness declines and the array is removed (1992) are unknown cytogenetically. Thus 5s RNA from the population. This threshhold effect would be genes theTriticeae may be representative of seed plants intensified if the copy number of the genes happened in general. The 5s arraysin Drosophila are much to be low, as the total number of units, and hence the smaller than in plants, but there are also reports of buffering effect of extra copies, is reduced. Spacers with mutations in the genes. mutations would be as likely to be associated with func- We conclude that gene-level processes are not the tional genes as with nonfunctional ones. Hence, selec- only factors influencing variation in the highly repeated tion for functional genes could still fix substitutions in 5s RNA genes. If gene conversion occurs, it must be the spacers (essentially a hitchhiking effect). very precise and absolutelybiased toward functional Based on the distribution of within-plant variation on copies. An alternative is that selection on the array as the gene trees, we suggest that selection on the array a whole produces the observed pattern in which 5s is occurring continually, but there areepisodes of more RNA genes are more variable within individual plants that they are between genera. intense selection that cause removal of arrays carrying defective genes from the population. These episodes We thank D. ACKERLY,A. BERRY,D. LEWONTIN,D. HARTL, D. HIE BETT,L. LANDwEBER,R. MASON, C. SCHLOTTERER,P. WILSONand may not correlate with speciation. other membersof the populationgenetics labs at Harvard for helpful This model is similar to that shown to operate in the discussion, to R. MASON, P. F. STEVENS,and L. VAWTERfor comments abnormal abdomen ( aa) mutants in Drosophila mercatorum on the manuscript, and to D. LEWONTIN foraccess to an unpublished ( TEMPLETONet al. 1993). In these, insertion of a trans- computer program. This work was supported by National Science Foundation grantDEE9106581 to E.A.K. and by Commonwealth Sci- posable element into rRNA genes disrupts members entific and Industrial Research Organization in Canberra. of the rRNA array. The aa phenotype only appears, however, when more than one-third of the genes are LITERATURE CITED APPELS, R., and R.L. HONEYCUTT, 1986 rDNA evolution over a disrupted. In the case of aa, selection appears to main- billion years, pp. 81-125 in DNA Systematics II. Plant DNA, edited tain the gene in the population, whereas there is no by S. K. DUTTA.CRC Press, Boca Raton, FL. 342 E. A. Kellogg and R. Appels

&'PELS, R., W. L. GERLACH,E. S. DENNIS,H. SWIFTand W. J. PEACOCK, nizer region, seed protein and isozyme loci on chromosome 1R 1980 Molecular and chromosomalorganization of DNA se- in rye. Theor. Appl. Genet. 71: 742-749. quences coding for theribosomal RNAs in cereals. Chromosoma LEITCH,I. J., and J. S. HESLOP-HARRISON,1993 Physical mapping of 78: 293-311. the four sites of 5s DNA sequences and onesite of the a-amylase- APPEIS,R., B.R. BAUMand B. C. CLARKE,1992 The 5s DNA units of 2 gene in barley (Hwrdeum vulgare) . Genome 36 517-523. bread wheat (Triticum mtivum) . Plant Syst. Evol. 183 183-194. LONG,E. O., and I. B.DAWID, 1980 Repeated genes in eukaryotes. ARNHEIM, N., 1983 Concerted evolution of multigene families, pp. Annu. Rev. Biochem. 49: 727-764. 38-61 in Evolution of Genes and Proteins, edited by M. NEI andR. LYCKEGAARD,E. M. S., and A. G. CLARK,1991 Evolution of ribosomal K. KOEHN. SinauerAssociates, Sunderland, MA. RNA gene copy number on the sex chromosomes of Drosophila CAMPEI.I.,B. R., 1992 Sequence and organization of 5s ribosomal melanogaster. Mol. Biol. Evol. 8: 454-474. RNA-encoding genes of Arabidopsis thaliana. Gene 112: 225-228. MADDISON,D. R., 1991 The discovery and importance of multiple CLEMENS,K. R., V.WOLF, S. J. MCBRYANT, P.ZHANG, X. LIAOet al., islands of most-parsimonious trees. Syst. Zool. 40: 315-328. 1993 Molecular basis for specific recognition of both RNA and MADDISON,W. P., and D. R. MADDISON,1992 MacChde: Analysis of DNA by a zinc finger protein. Science 260: 530-533. Phylogeny and Character Evolution, Version 3. Sinauer Associates, DAVIS,J. I., and R. J. SORENG,1993 Phylogenetic structure in the Sunderland, MA. grass family () , as determined from chloroplast DNA MCDONALD,J. H., and M. WITMAN,1991 Adaptive protein evolu- restriction site variation. Am. J. Bot. 80: 1444-1454. tion at the Adh locus in Drosophila. Nature 351: 652-654. DVORAK,J., H.-B. ZHANC,R. S. KOTAand M. LASSNER,1989 Organiza- MCINTYRE,C. L., B. WINBERG,K. HOUCHINS,R. APPELS and B. BAUM, tion and evolution of the 5s ribosomal RNA gene family in wheat 1992 Relationships between Oyza species (Poaceae) based on and related species. Genome 32 1003-1016. 5s DNA sequences. Plant Syst. Evol. 183: 249-264. EILIS,T. H. N., D. LEE,C. M. THOMAS, P. R. SIMPSON,W. G. CLEARY MOM, G. F., D. SMITH,J. C. BEIL and R. APPELS, 1992 The 5s RNA et al., 1988 5s rRNA genes in Pisum: Sequence, long range and genes in Plnus radiata and the spacer region as a probefor relation- chromosomal organization. Mol. Gen. Genet. 214: 333-342. ships between Pinw species. Plant Syst. Evol. 183: 209-221. FARRIS,J. S., 1989 The retention index andthe rescaled consistency NEI, M., 1987 Molecular evolutionary genetics. Columbia University index. Cladistics 5: 417-419. Press, New York. GERLACH,W. L., and T. A.DYER, 1980 Sequence organization of PEACOCK,W. J., W. L. GERIACHand E. S. DENNIS,1981 Wheat Sn- the repeating units in the nucleus of wheat which contain 5s ence-Today and Tommrow, pp. 41-60, edited by L. T. EVANSand rRNA genes. Nucleic Acids Res. 8: 4851-5865. W. J. PFACOCK.Cambridge University Press, Cambridge. GOLDSBOROUGH,P. B., T. H.N. E~~~sandC. A. CUI.I.IS, 1981 Organi- PIELER,T., J. HA" and R. G. ROEDER,1987 The 5s gene internal zation of the 5s RNA genes in flax. Nucleic Acids Res. 9: 5895- control region is composed of three distinct sequence elements, organized as two functional domains with variable spacing. Cell 5904. 48: 91-100. GOLDSBOROUGH,P. B., T. H.N. ELLISand G. D. LOMONOSSOFF,1982 PLAYFORD,J., R. APPEIS and B. BAUM,1992 The 5s DNA units of Sequence variation and methylation of the flax 5s RNA genes. Acacia species (Mimosaceae) . Plant Syst. Evol. 183: 235-247. Nucleic Acids Res. 10: 4501-4514. REDDY,P., and R. APPELS,1989 A second locus for the multigene GREILET,F., and P. PENON,1984 Chromatin organization and meth- 5s family in Secale L.: sequence divergence in two lineages of the ylation patterns of wheat 5s RNA genes (Triticum aestivum var. family. Genome 32: 456-467. Hardi). 37: Plant Sci. Lett. 129-136. RODER,M. S., M. E. SORRELLSand S. D. TANKSLEY,1992 5s ribc- HAIANYCH,K. M., 1991 5s ribosomal RNA sequences inappropriate soma1 gene clusters in wheat: pulsed field gel electrophoresis for phylogenetic reconstruction. Mol. Biol. Evol. 8: 249-253. reveals a high degree of polymorphism. Mol. Gen. Genet. 232: HAYES,J. J., and T. D. TULLIUS,1992 Structure of the TFIIIA-5S 215-220. DNA complex. J. Mol. Biol. 227: 407-417. SMSON, M.-L., and M. WEGNEZ,1988 Bipartite structure of the 5s HIGGINS,D. G., A. J. BLEA~BYand R. FUCHS,1992 CLUSTAL V ribosomal gene family in a Drosophila melanogaster strain, and its improved software for multiple sequence alignment. Comput. evolutionary implications. Genetics 118: 685-691. 8: Appl. Biosci. 189-191. SASTRI,D. C., K. HILU,R. APPELS,E. S. LAGUDAH, J. PIAYFORDet al., HSIAO, C., N. J. CHATTERTON,K. H. ASAY and K. B. JENSEN,1994 1992 An overviewof evolution inplant 5s DNA. Plant Syst. Phylogenetic relationships of the monogenomic species of the Evol. 183: 169-181. wheat tribe, Triticeae (Poaceae) , inferred from nuclear rDNA SAWYF.R,S. A,, and D. L. HARTL, 1992 Population genetics of poly- (ITS) sequences 1,2. Genome (in press). morphism and divergence. Genetics 132: 1161-1176. KANALIN, V., E. ANANIW and T. BIAKE:, 1993 The genetics of 5s SCHI.

TEMPLETON,A. R., H. HOLLOCHERand J. S. JOHNSTON, 1993 The WOL~~,J., and V. A. ERDMANN, 1986 Cladistic analysis of 5s rRNA molecular through ecological genetics of abnml abdomen in and 16s rRNA secondary and primary structure - the evolution Drosophila mercatorum. V. Female phenotypic expressionon natu- of eukaryotes and their relation to archaebacteria. J. Mol. Evol. ral genetic backgrounds and in natural environments. Genetics 24152-166. 134; 475-485. - Communicating editor: A. G. CLARK