Analysis and mapping of randomly chosen bacterial artificial chromosome clones from hexaploid bread wheat

Katrien M. Devos*†‡, Jianxin Ma§¶, Ana C. Pontaroli§, Lee H. Pratt†, and Jeffrey L. Bennetzen‡§

Departments of *Crop and Soil Sciences, †Plant Biology, and §, University of Georgia, Athens, GA 30602

Contributed by Jeffrey L. Bennetzen, November 2, 2005 The current view of wheat genome composition is that genes are Ϸ91.6% repetitive sequences (16). How these repeats are asso- compartmentalized into gene-rich and gene-poor regions. This ciated with each other and with genes is not well understood. The model can be tested by analyzing randomly selected bacterial little information we have to date on sequence interspersion in artificial chromosome (BAC) clones for gene content, followed by wheat has been gained primarily through sequencing of individ- placement of these BACs onto physical and genetic maps. Map ual bacterial artificial chromosomes (BACs) containing genes of localization could be difficult for BACs that consist entirely of agronomic interest (17–23) and through large-scale physical repeated elements. We therefore developed a technique where mapping of ESTs to chromosome regions defined by deletions repeat junctions are used to generate unique markers. Four BAC (‘‘bins’’; ref. 24). Most BACs preselected to contain at least one clones from hexaploid wheat variety Chinese Spring were ran- gene have been shown to be composed of one or two small gene domly selected and sequenced at 4- to 6-fold redundancy. About islands that are separated by 5- to 150-kb blocks of repetitive 50% of the BAC sequences corresponded to previously identified DNA, primarily retroelements (17, 18, 20). The mapping of ESTs repeats, mainly LTR-, whereas most of the re- to sets of overlapping deletion lines further suggests that gene- maining DNA consisted of sequences with unknown origin or containing BACs are clustered in the genome to form gene-rich function. The average gene content was <1%, although each BAC regions that can be cytologically defined (24). A similar study, contained one or two identified genes. Repeat boundaries were but using a larger number of deletion lines and combining the amplified and used to map each clone to a chromosome arm. data across the three homoeologous genomes, has reported that Extrapolation from wheat–rice comparative knowledge suggests 29% of the genome contains 94% of the genes, with 60% of the that three of the four BAC clones originate from ‘‘gene-rich’’ genes being concentrated in only 11% of the genome (25). regions of the wheat genome. Nevertheless, because these BACs Assuming that this is an accurate representation of the gene carry only a single gene (two BACs) or two genes (one BAC), the distribution in wheat, identification of 94% of the genes would predicted gene density is Ϸ1 gene per 75 kb, which is considerably require sequencing of some 5,000 Mb of DNA of the hexaploid lower than previously estimated gene densities (one gene per 5–20 wheat genome with a BAC by BAC approach. This is twice the

kb) for gene-rich regions in wheat. This analysis of randomly size of the genome. PLANT BIOLOGY selected wheat BAC clones suggests that genes are more evenly To make an informed decision on the best strategy to sequence distributed in wheat than previously believed and substantiates the wheat genome, more hard data are needed on the distribu- the need for large-scale random BAC sequencing to determine tion of genes. This can be best accomplished by sequencing a wheat genome organization. random selection of wheat BACs (26). One useful, but challeng- ing, addition to such a project would be to link the BAC clones gene density ͉ genome organization ͉ repeat boundaries ͉ repeat to previously established regions of high or low gene density. markers ͉ Triticum aestivum Based on the current perception of the organization of the wheat genome, more than half of the randomly selected BACs would ow that sequencing of the rice genome has been nearly contain only repeats, which are difficult to map, and no genes. Ncompleted (1–3) and sequencing of the maize genome is in This study develops the amplification of unique repeat bound- progress (4, 5), consideration of whether to comprehensively aries as markers for BAC mapping, using sequences identified on sequence other cereal genomes is warranted (6). Bread wheat four randomly chosen BAC clones from the hexaploid wheat ranks third in world production, after maize and rice. Its variety Chinese Spring. Linking of these BACs to chromosome allohexaploid genome, however, is 40 times larger than that of regions with known EST densities suggests that gene densities in rice and 6 times that of maize (7). In maize, sequencing efforts ‘‘gene-rich’’ regions are considerably lower, and that gene dis- to date have concentrated on the gene-rich fraction of the tribution in wheat is more even, than calculated in previous genome (4, 5). Gene contigs can be ordered by using end publications. sequences of methylation-spanning linker library clones, which consist of blocks of repetitive DNA flanked on either side by Materials and Methods genic regions (8), by superimposing the contigs on a low- BAC Clone Sequencing. Four BACs were chosen randomly from redundancy sequence of the entire genome or anchoring them to among 1,200,000 hexaploid wheat (Triticum aestivum L. cv. a genetic map. It is estimated that the gene discovery ability of Chinese Spring) BAC clones (27). To reduce the chance that these combined technologies is significantly Ͼ95% (9). Alter- native strategies are based on the fact that genes are not evenly distributed over a genome (10–13). In Lotus japonicus, which has Conflict of interest statement: No conflicts declared. a 500-Mb genome, only bacterial artificial clones that contain Abbreviations: BAC, bacterial artificial chromosome; SSCP, single-strand conformation ESTs are being sequenced (14, 15). The level of gene discovery polymorphism; NT, nullisomic-tetrasomic. provided by this approach depends on the EST coverage and the Data deposition: The sequences reported in this paper have been deposited in the GenBank genome organization. database (accession nos. AY772732–AY772735). Sample sequence analysis of a few thousand randomly selected ‡To whom correspondence may be addressed. E-mail: [email protected] or [email protected]. clones with inserts from Aegilops tauschii, the D genome donor ¶Present address: Department of Agronomy, Purdue University, West Lafayette, IN 47907. of bread wheat, has shown that its nuclear DNA contains © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0509473102 PNAS ͉ December 27, 2005 ͉ vol. 102 ͉ no. 52 ͉ 19243–19248 Downloaded by guest on September 26, 2021 including the gene candidates predicted by FGENESH, was sub- sequently investigated by CROSS࿝MATCH searches against the Gramineae repeat databases (www.tigr.org͞tdb͞e2k1͞plant. repeats; http:͞͞wheat.pw.usda.gov͞ITMI͞Repeats͞index.shtml; http:͞͞data.genomics.purdue.edu͞ϳpmiguel͞projects͞retros). Predicted genes that did not match retroelements or other transposons were used as queries in BLASTX searches against the GenBank protein database. They were considered ‘‘real’’ genes if they detected homology at an expect value of ϽEϪ10 in any species that was not a member of the Triticeae. Our previous studies have shown that this is a rigorous criterion for gene identification (31). The requirement of a low expect value excludes most transposable elements that are misannotated as genes, partly because these mobile DNAs evolve more rapidly in Fig. 1. Primer design across a repeat boundary on BAC TA1353M1 and an sequence than do the other coding sequences in plant nuclear agarose gel showing the mapping of the corresponding amplification product genomes. on a set of 21 NT lines. The absence of the fragment in the line nullisomic for 3B (N3B) allocates BAC TA1353M1 to chromosome 3B. Primer Design. To establish the origin of BACs that consist mainly of repetitive DNA, primers were designed so that one of the duplicate clones would be selected, the four clones were taken primers or the amplification product spanned a repeat boundary from different plates. Shotgun libraries for each of these four (Fig. 1). Primers were designed across four, five, one, and four BACs were constructed as described (28). A total of 384 boundaries, respectively, for BACs TA350B2, TA574A4, subclones were sequenced from both directions by using ABI TA1353M1, and TA1426L2. In addition, two sets of nested PRISM BigDye Terminator Chemistry (Applied BioSystems, primers were designed for BAC TA1426L2. If genes were Foster, CA) and run on an ABI3730xl capillary sequencer. present, these were used as queries against the EST࿝others Base calling and quality assessment were done by using PHRED section of GenBank to establish the presence and position of (29), and reads were assembled with PHRAP. The resulting data introns. Where possible, primers against genes were made to yielded 4- to 6-fold average sequence redundancy across the span an intron. This process increased the likelihood that the A, four BACs. The contigs for each BAC were ordered by using B, and D genome products would be distinguishable in either CONSED (30). PHRED, PHRAP, and CONSED were used under the length or base composition, and hence could be separated by default parameters, as described (28–31). Unordered contigs single-strand conformation polymorphism (SSCP). The primers were concatenated at the end of each of the four ‘‘working that were used to generate the BAC fragments that were mapped draft’’ sequences. The sequences from BACs TA350B2, are given in Table 1. TA574A4, TA1353M1 and TA1426L2 have been deposited in GenBank under accession nos. AY772734, AY772735, Mapping. DNA from nullisomic-tetrasomic (NT) and ditelosomic AY772732, and AY772733, respectively. lines was kindly provided by P. Stephenson and J. Beales, John Innes Centre, Norwich, United Kingdom. PCRs were carried out Sequence Analysis. The gene prediction program FGENESH, with in a total volume of 20 ␮lof1ϫ PCR buffer containing 100 ng the monocot (maize, rice, wheat, and barley) training set (www. of template DNA, 1.5 mM MgCl2, 200 ␮M dNTPs, 500 nM softberry.com), was used to predict genes. The entire sequence, forward primer, 500 nM reverse primer, and 0.8 units of Taq

Table 1. Primer sequences that were used for mapping, the type of sequence in which they are located, and the chromosomal location of the amplification products Chromosomal Primer Sequence Repeats origin

TA350B2࿝18F1 ACG ATG TTT ACC CAG GTT CG LTR Fatima 1B TA350B2࿝18R1 CTT GTC CTT CCT TTC GCT TG CACTA element Jorge TA350B2࿝18F2 TAC GAG AGA AAC GCA CAT CG CACTA element Jorge 1BL TA350B2࿝18R2 ATC CCG GAA ACC GAA TTA TC LTR retrotransposon Fatima TA574A4࿝10F1 CAT GGT GTC TAT CTA CCT CTC EST 3D TA574A4࿝10R1 GGA CGA CAT CCT GAG GTT C EST TA574A4࿝10F2 GGG TGG CAT TTC TGG ATC AG EST 3B or 3D TA574A4࿝10R2 TTG AGC GCC TCC TGA TGA G EST TA574A4࿝27F1 GGC AGA TCA CTG GAA GAA GC CACTA element Caspar 3DL TA574A4࿝27R1 ACA GCA TCA GTC CGG GTT AG LTR retrotransposon Wham TA1353M1࿝5F1 ATA AAT TTC CGC CGA TTC AG CACTA element Sherlock 3BL TA1353M1࿝5R1 TGC GCC ATT AGT GTC AAT TC CATCA element TAT-1 TA1353M1࿝13F1 GCT GAT TGG AGC CTA CAA TC EST 3B TA1353M1࿝13R1 TCT GGT AAC AGT GAG ATA AC EST TA1426L2࿝16F2 TGA TCC AGT GAT AAC CCA CAA G 5B TA1426L2࿝16R2 GCC CCT GGT TCT TAT TTA CG LTR retrotransposon Egug TA1426L2࿝17F1 ACA CCC AAA GCA TTC CTA CG LTR retrotransposon WIS 5BL TA1426L2࿝17F2 TGC GAC GAT TGG ATT ACA AG CACTA element Caspar TA1426L2࿝28F1 GAA AGA CCT CGG TGA AGC TG LTR retrotransposon Angela 5BL TA1426L2࿝28F2 AAA CCG GAT GGC TCT AGT CC LTR retrotransposon Fatima

19244 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0509473102 Devos et al. Downloaded by guest on September 26, 2021 Table 2. Sequence composition of four randomly selected BACs from hexaploid wheat cultivar Chinese Spring BACs

Composition TA350B2 TA574A4 TA1353M1 TA1426L2 All

DNA composition Size, kb 117 100 75 124 416 Genes Number 1 1 1 2 5 Size, bp 477 534 1,655 783 3,449 Percent 0.4 0.5 2.2 0.6 0.8 Retrotransposons Size, bp 43,714 47,007 2,041 61,772 154,534 Percent 37.2 47.2 2.7 50.0 37.2 DNA transposons Size, bp 14,806 5,792 7,426 7,264 35,288 Percent 12.6 5.8 9.9 5.9 8.5 Other repeats Size, bp 0 1,061 0 590 1,651 Percent 0.0 1.1 0.0 0.5 0.4 Total repeats Size, bp 58,520 53,860 9,467 69,626 191,473 Percent 49.8 54.1 12.6 56.4 46.1 Uncharacterized DNA Size, bp 58,400 45,147 63,748 53,120 220,415 Percent 49.7 45.4 85.1 43.0 53.1

DNA polymerase (Promega). PCR conditions consisted of an which is published as supporting information on the PNAS web initial denaturation step of 3 min at 94°C, followed by 45 cycles site). Given the many gaps derived from the low redundancy of 94°C for 30 s, touchdown from 62°C to 55°C at a rate of 0.7°C sequence across each of these BACs, efforts were not made to per cycle, 72°C for 1 min, and a final extension of 3 min at 72°C. find new transposons or other repeat families by structural For nested PCR, conditions were as above but 1 ␮lofthe criteria (33, 34). The LTR retrotransposons found most fre- previous amplification reaction mixture was used as DNA tem- quently on these four BACs were Fatima, Sabrina, WIS, and

plate. Amplification products were separated on 1.2% agarose Sukkula, with a respective four, four, five, and six copies, none PLANT BIOLOGY gels and visualized after staining with ethidium bromide or of them solo LTRs. The most numerous DNA transposons separated by SSCP on 0.5ϫ mutation detection enhancement identified were Jorge and Caspar (Table 5), CACTA elements (MDE) gels (Cambrex, East Rutherford, NJ) (32). The MDE that had been previously seen to be the most abundant large gels were run overnight at a constant power of8Wand DNA elements in wheat (16, 34). silver-stained. The remaining 50% of the BAC sequence had no hits against the databases used and remains uncharacterized. It is likely that Results and Discussion much of this uncharacterized sequence is comprised of repeat BAC Composition. Four BAC clones were randomly picked from families that have not yet been cataloged. BAC TA1353M1 also four different plates of the 1.2 million clone library containing carried one gene but, in contrast to the other three BACs, only the Chinese Spring wheat hexaploid genome inserted into vector one LTR retrotransposon could be identified, so that Ϸ80% of pIndigoBAC5 at 9ϫ redundancy (27). For each BAC, 384 the DNA remained uncharacterized (Tables 2 and 5). The gene subclones with 3- to 5-kb inserts were selected for sequencing. and DNA transposon content in TA1353M1 is similar to that of This approach results in a 4–6ϫ average coverage for BACs in the other three BACs sequenced. We therefore hypothesize that the size range from 75,000 to 125,000 bp. After assembly, 117 kb TA1353M1 carries numerous, as yet uncharacterized, transpos- of sequence with 13 gaps organized into three scaffolds was able elements. As previously noted in other studies of wheat and obtained for BAC TA350B2, 100 kb of sequence with 20 gaps in other members of the Triticeae, CACTA DNA transposons were eight scaffolds for TA574A4, 75 kb of sequence with 9 gaps in found to be much more abundant on all four of these BACs than two scaffolds for TA1353M1, and 124 kb of sequence with 20 in those from other well characterized plant genomes, namely gaps in seven scaffolds for TA1426L2 (see Fig. 3, which is Arabidopsis, rice, and maize (34, 35). published as supporting information on the PNAS web site). A total of 71 genes were predicted by FGENESH to be present Previous reconstruction of sequence coverage in BACs that were across the four BACs (Table 3). However, 54 matched trans- completely sequenced from wheat, maize, and rice indicate that posable elements and only four fulfilled our stringent criterion 3ϫ coverage is sufficient to identify Ͼ95% of all genes and for a ‘‘true’’ gene (31). This criterion is homology at the protein abundant repeats (W. Ramakrishna and J.L.B., unpublished level to a predicted or known gene from a species outside the observations). Hence, the 4–6ϫ coverage in this study was Triticeae tribe at an E value of Ͻ10Ϫ10 (Table 4). One predicted sufficient to establish the repeat and genic content of each BAC gene on BAC TA1426L2 showed homology to a rice hypothetical (Table 2). protein at an expect value of 3 EϪ5. Although this E value was The compositions of BACs TA350B2, TA574A4, and higher than our cut-off value of 10Ϫ10, this gene was considered TA1426L2 were very similar. Each BAC carried one (TA350B2 a true gene because (i) the gene structure predicted by FGENESH and TA574A4) or two genes (TA1426L2), and Ϸ50% of the represented only the 3Ј end of the gene, (ii) BLASTX analysis of sequence consisted of previously characterized repeats, mainly the complete wheat gene derived from a wheat EST had members of LTR retrotransposon families (Table 2 and Table 5, homology to the rice hypothetical protein at an E value Ͻ 10Ϫ10,

Devos et al. PNAS ͉ December 27, 2005 ͉ vol. 102 ͉ no. 52 ͉ 19245 Downloaded by guest on September 26, 2021 Table 3. Predicted genes and their EST matches in cereals cation in Chinese Spring, the variety that was the DNA source Nos. of predicted genes for the sequenced BAC clones. Of the 14 primer sets made across matching ESTs repeat boundaries, two failed to amplify, nine produced a single amplification product, and two generated two or three bands as ESTs in wheat, ESTs in cereals, assessed by separation of the reaction products on a 1.2% Types of predicted genes Total no. ϽEϪ30 ϽEϪ20 agarose gel (data not shown). Primer pairs that produced the Total predicted genes 71* 48 56 strongest signals were subsequently used for amplification in a Retrotransposon homology 47† 40 40 set of 21 Chinese Spring NT lines. A NT line lacks one DNA transposon homology 7† 45chromosome pair and its absence is compensated by the presence No protein homology 13‡ 27of an extra pair of homoeologous chromosomes. Such lines are Non-Triticeae homology 4‡ 24available for each of the 21 wheat chromosomes (38). For BAC TA350B2, the primer sets TA350B2࿝18F1͞R1 and F2͞R2 that *Predicted by FGENESH. spanned different boundaries between a Fatima LTR retrotrans- Ϫ †Identified by CROSS࿝MATCH and BLASTX (ϽE 10). poson and a Jorge CACTA transposon amplified a single product Ϫ ‡Identified by BLASTX (ϽE 10) against the nr Peptide Sequence Section of in all NT lines with the exception of the line nullisomic for 1B, Genbank. indicating that the amplification product, and thus BAC TA350B2, originated from chromosome 1B (Table 1). BAC and (iii) this gene had homology to both Triticeae and non- TA350B2 was further mapped to the long arm of 1B by using the Triticeae (sugarcane) ESTs. Furthermore, the rice genes or- ditelosomic lines 1BS and 1BL that carry the short and long arms of chromosome 1B, respectively. In a similar manner, BAC thologous to these two true genes on wheat BAC TA1426L2 TA1353M1 was shown to originate from chromosome arm 3BL were both located on the same rice BAC clone. The other 13 by using primer set TA1353M1࿝5F1͞R1 that amplified the predicted gene models did not exhibit strong homology with any junction between two CACTA elements belonging to the Sher- known gene or EST in rice, maize, or other monocotyledonous lock and TAT-1 families (Fig. 1). The chromosomal location of or dicotyledonous plant. Our previous experience with annotat- this BAC clone was confirmed by using a primer set designed ing rice and other cereal genomes (31, 36) has shown that all or against the single gene present in TA1353M1 (Table 1). most of these 13 candidates will eventually be characterized as The three primer sets designed against BAC TA1426L2 that portions of as-yet-uncharacterized transposons. Hence, we pre- produced amplification products in Chinese Spring also gener- dict that these BACs encode five protein-encoding genes. The ated fragments in all NT lines. To enhance the specificity of the single ‘‘certified’’ genes on BACs TA574A4 and TA1353M1, and reaction, nested PCR was carried out across two repeat bound- one of the genes on TA1426L2, are complete. The second gene aries, but failed to reveal the origin of BAC TA1426L2. In the on TA1426L2 and the gene on TA350B2, both of which are near course of the amplification experiments for BACs TA350B2 and the end of a contig, have been predicted to be complete genes TA1353M1, it had been noted that the absence of the ‘‘perfect’’ by FGENESH. Comparative data, however, suggest that the priming sites often resulted in the amplification of a secondary Ј TA1426L2 gene lacks the 5 end and the TA350B2 gene lacks an fragment (data not shown). To test whether a secondary frag- Ј intact 3 end. ment of similar molecular weight as the primary fragment had been generated in one of the NT lines, amplification products Mapping. Because we wanted to develop a technology that could generated by three primer sets were separated by SSCP gel be used to locate on the genetic map any sequenced wheat BAC, electrophoresis, a technology that detects tiny differences in size even those that contained zero genes and perhaps no single copy and nucleotide composition. All primer sets now unambiguously DNA, we decided to see whether repeat boundaries could be located BAC TA1426L2 to chromosome 5B (Fig. 2). used as unique PCR products. For virtually all plant transposable BAC TA574A4 was mapped to chromosome arm 3DL by using element insertions that have been investigated, the insertion site primers designed across the boundary of a Caspar CACTA element has been unique, although often into another repetitive DNA and a Wham LTR retrotransposon (Table 1). The 3D location was (e.g., another ) (37). Hence, in theory, the confirmed by using primer sets against the identified EST. Primers use of primers that flank such an insertion point should yield a designed against exons often amplify fragments of similar molecular unique product, even when each primer is specific to a different weight from two or three of the homoeologous wheat genomes. The DNA repeat. If successfully amplified, these markers should homoeologous genes usually differ in a few base pairs, which allow have a very good chance to detect polymorphism because the their separation by SSCP. Primer set TA574A4࿝10F2͞R2 indeed transposable element complement of any genome tends to be its produced amplification products from both chromosomes 3B and most dynamic component. 3D. TA574A4࿝10F1͞R1, however, was specific for 3D and unam- BLAST alignments between the wheat BAC sequences and biguously located BAC TA574A4. previously identified repeats were manually inspected to pre- These experiments demonstrate that it is possible to map any cisely locate the repeat boundaries͞junctions. Primers were BAC clone, regardless of the gene content. In fact, where designed so that either one of the primers or the amplification primers designed against ESTs often amplify from multiple product spanned a boundary. Primers were tested for amplifi- homoeoloci, allowing allocation of the marker only to a chro-

Table 4. Predicted genes in four sequenced BACs from Chinese Spring Gene location

BAC From, bp To, bp Homology E value

TA350B2 99,418 99,894 Rice putative RNA apurinic site-specific lyase 3EϪ23 TA574A4 105,884 106,417 Rice hypothetical protein 7EϪ24 TA1353M1 67,203 68,857 Rice MADS-box protein-like 4EϪ28 TA1426L2 66,023 66,583 Rice hypothetical protein 8EϪ21 TA1426L2 115,435 115,656 Rice hypothetical protein 3EϪ5

19246 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0509473102 Devos et al. Downloaded by guest on September 26, 2021 observed gene density with previously established high and low gene-density regions of the wheat genome, the sequenced wheat BACs were located in silico to cytologically defined chromosome bins (39). We first identified the rice BAC͞P1 artificial chro- mosome (PAC) clone that had the highest homology to the gene identified on each of the wheat BACs. A putative orthologous origin of the homologous wheat and rice genes was supported if they were located on syntenic chromosomes (44), which was the case for three of the wheat genes. The gene on BAC TA574A4, which originated from wheat chromosome 3D, identified a putative ortholog on rice chromosome 1 PAC P0503C12. Sim- ilarly, TA1353M1 (located on wheat chromosome 3B) had putative orthology to rice chromosome 1 PAC P0459B04 and TA1426L2 (5B) to rice chromosome 3 BAC OSJNBa0057G07. The highest homology of the gene on TA350B2 (1B) was with a region on rice chromosome 7 BAC P0045F02. As rice chromo- some 7 is nonsyntenic to the wheat group 1 chromosomes (44), no in silico map position was obtained for TA350B2. The bin position of the other three rice BAC͞PAC clones was derived from the available knowledge on the colinearity between dele- tion-mapped wheat ESTs and the rice genomic sequence (ref. 45 Fig. 2. Separation of the amplification products generated by primer set ͞ ͞ ͞ ͞ ࿝ TA1426L2࿝16F2͞R2 in the group 5 NT lines by SSCP. A comparison of the and www.tigr.org tdb e2k1 tae1 wheat synteny.shtml). banding patterns in the N5A, N5B, and N5D lines shows that specific amplifi- TA574A4 was located to deletion interval 3L-0.42-0.50, cation products are absent in the N5B line, which indicates that chromosome TA1353M1 to 3L-0.81-1.00, and TA1426L2 to 5L-0.76-0.79. All 5B is the origin of BAC clone TA1426L2. three bins represent regions of high gene density (46, 47). It thus appears that at least three of the randomly selected BACs are derived from gene-rich regions. Combined, these three BACs mosome group in a polyploid rather than an individual chro- comprise 300 kb, yielding a calculated density of one gene per 75 mosome, boundaries between repeats, even between high-copy- kb. Bin 3L-0.42-0.50 comprises 8% of the long arm and corre- number LTR retrotransposons, are generally unique features in sponds to Ϸ40 Mb of DNA. Based on the amount of mapped the genome that can be exploited to establish the chromosomal ESTs in bin 3L-0.42-0.50 and a total gene number for wheat of origin of a sequenced BAC clone. Although we only located BAC 150,000, we expect, on average, a gene density of one gene per clones to chromosome arms, a more precise map position can be 62 kb in this gene-rich bin. The expected gene density in bin obtained by using either chromosome deletion lines (39) or 3L-0.81-1.00 is one gene per 107 kb. One gene per BAC, as segregation analysis. Physical mapping onto the deletion lines obtained in our study, is in line with these expectations. Inter- Ͻ requires only intergenomic polymorphisms (between the A, B, estingly, the gene density of these regions is 2-fold higher than PLANT BIOLOGY and D genomes). The presence of intragenomic polymorphisms, the value of one gene per 113 kb expected for a random however, is a prerequisite for genetic mapping. In maize, the distribution of Ϸ150,000 genes in the 17,000-Mb hexaploid wheat organization of repeats in a genome is variety dependent (40, genome. 41). If this is also the case in wheat, it should be possible to The lower gene density on the randomly selected BACs in this genetically map repeat boundaries in progeny derived from study, compared with preselected gene-containing BACs, can be crosses in which one of the parents is the source of the BAC partly explained by the fact that most wheat BACs sequenced so DNA. far carried disease resistance and storage protein genes (18–20, 23, 43). These genes often occur in clusters, leading to an Wheat Genome Organization. Previously reported BAC analyses, overestimation of overall gene densities in gene-rich regions of and consequently any conclusions regarding gene densities in the wheat genome. Gene density around other genes such as the wheat, have been based on BACs that had been preselected to vernalization response gene Vrn-Am1 in Triticum monococcum, contain a gene of agronomic importance (17–23, 42, 43). The a close relative of T. aestivum, which is also located in a gene-rich sequence of randomly selected BACs, on the other hand, pro- portion of the genome, is in the expected range of one to two vides unbiased information on the organization of the wheat genes per BAC (21). These data suggest that the previous genome, at least to the degree in which the library itself provides estimates of one gene per 5–20 kb in the area of disease an unbiased representation of the genome. To provide a com- resistance and storage protein genes do not accurately represent prehensive and statistically significant description of the entire the organization of the gene-rich fraction of the genome. Se- hexaploid wheat genome, we have estimated that a minimum of quencing and bin mapping of a larger number of randomly Ϸ60 randomly chosen BACs would need to be sequenced from selected BAC clones is needed to establish in detail the overall each of the A, B, and D genomes (unpublished observations). organization of the wheat genome. Hence, this preliminary study is not meant to describe the entire wheat genome, but only the properties of a few randomly chosen We thank G. Moore (John Innes Centre, Norwich, United Kingdom) for the wheat BAC clones; P. Stephenson and J. Beales for donation of DNA BACs that were subsequently mapped to known locations. of the NT and ditelosomic lines; B. Keller and M. Sorrells for their Although none of the four BACs in our study were preselected helpful comments concerning this manuscript; and T. Thomas and M. to contain a gene, they each carried a single (three BACs) or two Estep for expert technical assistance. This research was supported by the (one BAC) well supported gene candidate(s). To correlate the Doris and Norman Giles Professorship (J.L.B.).

1. Goff, S. A., Ricke, D., Lan, T. H., Presting, G., Wang, R., Dunn, M., 3. Sasaki, T., Matsumoto, T., Yamamoto, K., Sakata, K., Baba, T., Katayose, Y., Wu, Glazebrook, J., Sessions, A., Oeller, P., Varma, H., et al. (2002) Science 296, J. Z., Niimura, Y., Cheng, Z. K., Nagamura, Y., et al. (2002) Nature 420, 312–316. 92–100. 4. Whitelaw, C. A., Barbazuk, W. B., Pertea, G., Chan, A. P., Cheung, F., Lee, Y., 2. Yu, J., Hu, S. N., Wang, J., Wong, G. K. S., Li, S. G., Liu, B., Deng, Y. J., Dai, Zheng, L., van Heeringen, S., Karamycheva, S., Bennetzen, J. L., et al. (2003) L., Zhou, Y., Zhang, X. Q., et al. (2002) Science 296, 79–92. Science 302, 2118–2120.

Devos et al. PNAS ͉ December 27, 2005 ͉ vol. 102 ͉ no. 52 ͉ 19247 Downloaded by guest on September 26, 2021 5. Palmer, L. E., Rabinowicz, P. D., O’Shaughnessy, A. L., Balija, V. S., 25. Erayman, M., Sandhu, D., Sidhu, D., Dilbirligi, M., Baenziger, P. S. & Gill, K. S. Nascimento, L. U., Dike, S., de la Bastide, M., Martienssen, R. A. & (2004) Nucleic Acids Res. 32, 3546–3565. McCombie, W. R. (2003) Science 302, 2115–2117. 26. Bennetzen, J. L. (2003) in Proceedings of the 10th International Wheat Genetics 6. Gill, B. S., Appels, R., Botha-Oberholster, A.-M., Buell, C. R., Bennetzen, J. L., Symposium, eds. Pogna, N. E., Romano`, M., Pogna, E. A. & Galterio, G. Chalhoub, B., Chumley, F. G., Dvora´k,J., Iwanaga, M., Keller, B., et al. (2004) (Scalabrini International Migration Institute, Rome), pp. 215–220. Genetics 168, 1087–1096. 27. Allouis, S., Moore, G., Bellec, A., Sharp, R., Faivre-Rampant, P., Mortimer, K., 7. Bennett, M. D. & Leitch, I. J. (1995) Ann. Bot. (London) 76, 113–176. Pateyron, S., Foote, T. N., Griffiths, S., Caboche, M., et al. (2003) Cereal Res. 8. Yuan, Y. N., SanMiguel, P. J. & Bennetzen, J. L. (2002) Genome Res. 12, Commun. 31, 331–338. 1345–1349. 28. Dubcovsky, J., Ramakrishna, W., SanMiguel, P. J., Busso, C. S., Yan, L. L., 9. Springer, N. M., Xu, X. & Barbazuk, W. B. (2004) Plant Physiol. 136, Shiloff, B. A. & Bennetzen, J. L. (2001) Plant Physiol. 125, 1342–1353. 3023–3033. 29. Ewing, B. & Green, P. (1998) Genome Res. 8, 186–194. 10. Gill, K. S., Gill, B. S., Endo, T. R. & Boyko, E. V. (1996) Genetics 143, 30. Gordon, D., Abajian, C. & Green, P. (1998) Genome Res. 8, 195–202. 1001–1012. 11. Mudge, J., Huihuang, Y., Denny, R. L., Howe, D. K., Danesh, D., Marek, L. F., 31. Ma, J., SanMiguel, P., Lai, J., Messing, J. & Bennetzen, J. L. (2005) Genetics Retzel, E., Shoemaker, R. C. & Young, N. D. (2004) Genome 47, 361–372. 170, 1209–1220. 12. Qi, X., Pittaway, T. S., Lindup, S., Liu, H., Waterman, E., Padi, F. K., Hash, 32. Martins-Lopes, P., Zhang, H. & Koebner, R. M. D. (2001) Plant Mol. Biol. C. T., Zhu, J., Gale, M. D. & Devos, K. M. (2004) Theor. Appl. Genet. 109, Reporter 19, 159–162. 1485–1493. 33. McCarthy, E. M., Liu, J., Lizhi, G. & McDonald, J. F. (2002) Genome Biol. 3, 13. Tanksley, S. D., Ganal, M. W., Prince, J. P., de Vicente, M. C., Bonierbale, 1–11. M. W., Broun, P., Fulton, T. M., Govannoni, J. J., Grandillo, S., Martin, G. B., 34. Wicker, T., Guyot, R., Yahiaoui, N. & Keller, B. (2003) Plant Physiol. 132, et al. (1992) Genetics 132, 1141–1160. 52–63. 14. Kato, T., Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E. & Tabata, S. (2003) 35. Langdon, T., Jenkins, G., Hasterok, R., Jones, R. N. & King, I. P. (2003) DNA Res. 10, 277–285. Genetics 163, 1097–1108. 15. Sato, S., Kaneko, T., Nakamura, Y., Asamizu, E., Kato, T. & Tabata, S. (2001) 36. Bennetzen, J. L., Coleman, C., Liu, R., Ma, J. & Ramakrishna, W. (2004) Curr. DNA Res. 8, 311–318. Opin. Plant Biol. 7, 732–736. 16. Li, W., Zhang, P., Fellers, J. P., Friebe, B. & Gill, B. S. (2004) Plant J. 40, 37. Bennetzen, J. L. (2000) Plant Mol. Biol. 42, 251–269. 500–511. 38. Sears, E. R. (1954) Missouri Agric. Exp. Station Res. Bull. 572, 1–59. 17. SanMiguel, P. J., Ramakrishna, W., Bennetzen, J. L., Busso, C. S. & Dubcovsky, 39. Endo, T. R. & Gill, B. S. (1996) J. Hered. 87, 295–307. J. (2002) Funct. Integrative Genomics 2, 70–80. 40. Fu, H. & Dooner, H. K. (2002) Proc. Natl. Acad. Sci. USA 99, 9573–9578. 18. Wicker, T., Stein, N., Albar, L., Feuillet, C., Schlagenhauf, E. & Keller, B. 41. Brunner, S., Fengler, K., Morgante, M., Tingey, S. & Rafalski, A. (2005) Plant (2001) Plant J. 26, 307–316. Cell 17, 343–360. 19. Wicker, T., Yahiaoui, N., Guyot, R., Schlagenhauf, E., Liu, Z.-D., Dubcovsky, 42. Anderson, O. D., Rausch, C., Moullet, O. & Lagudah, E. S. (2003) Funct. J. & Keller, B. (2003) Plant Cell 15, 1186–1197. Integrative Genomics 3, 56–68. 20. Gu, Y. Q., Coleman-Derr, D., Kong, X.-Y. & Anderson, O. D. (2004) Plant 43. Feuillet, C. & Keller, B. (1999) Proc. Natl. Acad. Sci. USA 96, 8265–8270. Physiol. 135, 459–470. 21. Yan, L., Loukoianov, A., Tranquilli, G., Helguera, M., Fahima, T. & Dubcov- 44. Devos, K. M. (2005) Curr. Opin. Plant Biol. 8, 155–162. sky, J. (2003) Proc. Natl. Acad. Sci. USA 100, 6263–6268. 45. La Rota, M. & Sorrells, M. E. (2004) Funct. Integrative Genomics 4, 34–46. 22. Faris, J. D., Fellers, J. P., Brooks, S. A. & Gill, B. S. (2003) Genetics 164, 46. Linkiewicz, A. M., Qi, L. L., Gill, B. S., Ratnasiri, A., Echalier, B., Chao, S., 311–321. Lazo, G. R., Hummel, D. D., Anderson, O. D., Akhunov, E. D., et al. (2004) 23. Brooks, S. A., Huang, L., Gill, B. S. & Fellers, J. P. (2002) Genome 45, 963–972. Genetics 168, 665–676. 24. Qi, L. L., Echalier, B., Chao, S., Lazo, G. R., Butler, G. E., Anderson, O. D., 47. Munkvold, J. D., Greene, R. A., Bertmudez-Kandianis, C. E., La Rota, C. M., Akhunov, E. D., Dvorak, J., Linkiewicz, A. M., Ratnasiri, A., et al. (2004) Edwards, H., Sorrells, S. F., Dake, T., Benscher, D., Kantety, R., Linkiewicz, Genetics 168, 701–712. A. M., et al. (2004) Genetics 168, 639–650.

19248 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0509473102 Devos et al. Downloaded by guest on September 26, 2021