<<

Weed Science 2007 55:193–203

Characterization of an EST Database for the Perennial Weed Leafy Spurge: An Important Resource for Weed Biology Research James V. Anderson, David P. Horvath, Wun S. Chao, Michael E. Foley, Alvaro G. Hernandez, Jyothi Thimmapuram, Lie Liu, George L. Gong, Mark Band, Ryan Kim, and Mark A. Mikel*

Genomics programs in the weed science community have not developed as rapidly as that of other crop, horticultural, forestry, and model plant systems. Development of genomic resources for selected model weeds are expected to enhance our understanding of weed biology, just as they have in other plant systems. In this report, we describe the development, characteristics, and information gained from an expressed sequence tag (EST) database for the perennial weed leafy spurge. ESTs were obtained using a normalized cDNA library prepared from a comprehensive collection of tissues. During the EST characterization process, redundancy was minimized by periodic subtractions of the normalized cDNA library. A success rate of 88% yielded 45,314 ESTs with an average read length of 671 nucleotides. Using bioinformatic analysis, the leafy spurge EST database was assembled into 23,472 unique sequences representing 19,015 unigenes (10,293 clusters and 8,722 singletons). Blast similarity searches to the GenBank nonredundant protein database identified 18,186 total matches, of which 14,205 were nonredundant. These data indicate that 77.4% of the 23,472 unique sequences and 74.7% of the 19,015 unigenes are similar to other known proteins. Further analysis indicated that 2,950, or 15.5%, of the unigenes have previously not been identified suggesting that some may be novel to leafy spurge. Functional classifications assigned to leafy spurge unique sequences using Munich Information Center for Protein or Ontology were proportional to functional classifications for of arabidopsis, with the exception of unclassified or unknowns and transposable elements which were significantly reduced in leafy spurge. Although these EST resources have been developed for the purpose of constructing high-density leafy spurge microarrays, they are already providing valuable information related to sugar metabolism, cell cycle regulation, dormancy, terpenoid secondary metabolism, and flowering. Nomenclature: Leafy spurge, Euphorbia esula L. EPHES; arabidopsis, Arabidopsis thaliana (L.) Heynh. Key words: Expressed sequence tag, , leafy spurge, perennial weeds.

The era of genomics and bioinformatics has increased our some weed species representing perennial and annual grassy knowledge of plant genome structure and organization, gene and broad-leafed classes should be promoted as models (Basu function, marker development, plant physiology, and . et al. 2004; Chao et al. 2005). Although extensive studies have The recent explosion of genetic and genomic data for a wide been carried out on a few model species, little is still known range of plant and animal species has led to a proliferation of about basic physiological processes and signal transduction publicly available information databases throughout the inter- pathways that control growth, development, dormancy, and net (http://www.ncbi.nlm.nih.gov/dbEST, http://arabidopsis. biotic and abiotic stress resistance in weedy species. De- org, http://rgp.dna.affrc.go.jp). In addition, complete plant velopment of genomic resources (i.e., EST databases, genomes are available for arabidopsis (The Arabidopsis microarrays, and genome sequences) for selected model weed Genome Initiative 2000) and rice (Oryza sativa L.; In- species would enhance our ability to answer fundamental ternational Rice Genome Sequencing Project 2005). EST questions about how weeds survive and adapt to natural and databases, which are composed of partial sequences (200 to manmade stresses and about what evolutionary or environ- 800 base pairs [bp]) of expressed genes, compose a large mental changes cause them to become invasive. Many of these portion of these publicly available DNA sequence databases mechanisms could be exploited to enhance adaptability of (Ohlrogge and Benning 2000). To date, at least 34 gene indices domestic crops to environmental stress. for plants are publicly available (Quackenbush et al. 2001; Leafy spurge is currently being used as a model for the http://compbio.dfci.harvard.edu/tgi/plant/html) and other tran- study of perennial broadleaf weeds (Chao et al. 2005). Leafy script assembles are available for over 200 species (http:// spurge is an invasive perennial weed causing economic losses plantta.tigr.org) that integrate data from international EST to range, recreational, and right-of-way lands in North sequencing, genome sequencing, and gene research projects. American plains and prairies (Bangsund et al. 1999; Leitch Historically, the weed science community has not received et al. 1996). The perennial nature of leafy spurge is attributed the attention or funding needed to sustain genomics programs to vegetative propagation from an abundance of underground similar to those of crops or other model plants (e.g., adventitious buds (more commonly referred to as crown and arabidopsis). As a consequence, weed genomics initiatives root buds). Dormancy-imposed inhibition of new shoot are currently at the grassroots stage of development, and growth from crown and root buds is one of the key consensus within the weed science community suggests that characteristics leading to the persistence of perennial weeds like leafy spurge (Coupland et al. 1955). Although biocontrol programs have been successful at reducing the spread of leafy DOI: 10.1614/WS-06-138.1 * First through fourth authors: USDA-ARS, Biosciences Research Laboratory, spurge in some ecosystems, natural enemies are not successful 1605 Albrecht Boulevard, Fargo, ND 58105; fifth through tenth authors: in all ecosystems. In addition, seemingly eradicated patches of University of Illinois, W. M. Keck Center for Comparative and Functional leafy spurge are able to regenerate through seeds which can Genomics, 1201 West Gregory Drive, Edward R. Madigan Laboratory, Urbana, remain dormant in the soil bank for up to 8 yr (Chao and IL 61801; University of Illinois, Department of Crop Sciences and Roy J. Carver Biotechnology Center, 1206 West Gregory Drive, 2610 Institute for Genomic Anderson 2004). Biology, Urbana, IL 61801. Corresponding author’s E-mail: The development of genomic resources for leafy spurge has [email protected]. been an ongoing project for the last 5 to 10 yr. In addition to

Anderson et al.: An EST database for leafy spurge N 193 the development of cDNA and genomic libraries, a small-scale From the garden plot, leaf, stem, and meristem were collected EST project was initiated that resulted in identification of every 2 wk from August 22, 2003, through October 31, 2003, a limited set of sequences (Anderson and Horvath 2001) and which represents a period of senescence; seeds were collected low-density microarrays (Horvath et al. 2005, 2006). EST August 20, 2003; and crown buds were collected monthly databases are important for identifying expressed genes that from January 15, 2002 through September 10, 2003. Crown can be used for developing high-density DNA microarrays buds collected over this 21-month period experienced (Richmond and Sommerville 2000). cDNA microarray environmentally stressful conditions, including cold, freezing, technology, first developed by Schena et al. (1995), has dehydration, and heat. From the wild population, shoots, resulted in an explosion of scientific papers related to immature and mature flowers, galls resulting from Spurgia expression (Marshall 2004). For example, in esulae (an introduced biocontrol agent of leafy spurge that lays plants, this technology has been used to identify specific gene eggs in the flowers, resulting in a gall that interferes with functions (Aharoni et al. 2000; Gutierrez et al. 2002), evaluate flowering/seed production), and mature and developing seed transcript profiles during development (Kloosterman et al. pods with seeds were collected in June 2003. From 2005), study well-defined phases of dormancy in various greenhouse-grown plants, crown and root buds of 3- to 4- tissues (Cadman et al. 2006; Schrader et al. 2004) or responses mo-old plants were collected at 0, 1, 2, 3, 4, and 5 d after to various physiological or environmental conditions (Lee et decapitation of aerial tissue, which breaks paradormancy in al. 2002; Oztur et al. 2002; Potokina et al. 2002; Reymond et greenhouse-grown leafy spurge root and crown buds (Horvath al. 2000; van Hal et al. 2000; Zhu et al. 2003), and evaluate et al. 2005). Greenhouse-grown plants were also used to transcript profiles from genetically modified species (van Hal obtain vernalized leaf, stem, and crown buds from 3- to 4-mo- et al. 2000). However, because few large-scale EST projects old plants that were collected at 0, 1, 2, 4, 5, 7, 14, 15, 21, 35, have been undertaken for non–crop-related weeds, the 55, and 90 d during incubation at , 4 to 7 C. Vernalization 1 development of high-density microarrays for model weedy was accomplished with the use of an incubator with an species that could be used for transcriptome analysis similar to 8 : 16 h light : dark photoperiod. Light intensity during those described above has so far been limited. vernalization was 50 photosynthetically active radiation. In addition to EST projects, full genome sequencing projects for many plant species are underway and are expected Library Construction. Total RNA and mRNA Isolation. Total to affect future plant research. For example, the use of RNA was extracted separately from each ground tissue sample coordinately expressed genes identified by microarray analysis by the ‘‘pine tree’’ method (Chang et al. 1993). The integrity in combination with genome sequences that include noncod- and purity of total RNA was verified by denaturing agarose ing portions of these genes would provide a mechanism for gels and by spectrophotometry (ratio A260/280). Total RNA detecting regulatory sequences such as short from different samples was pooled in approximately equal factor binding sites shared among coordinately regulated gene amounts before isolation of mRNA. Poly(A)+mRNA was clusters. Such an approach can be a powerful tool for isolated twice from total RNA from each tissue with the identifying the specific genes and signaling components Oligotex Direct mRNA kit.2 responsible for sensing environmental or developmental cues involved in regulating specific phases of weed growth and cDNA Synthesis. Reverse transcription of mRNA into double- development or responses to stress and other factors. stranded cDNA was accomplished by the SuperScript Choice In an attempt to develop more robust microarrays, System3 with a NotI/oligo(dT) primer [59-AACTGGAA- including genes expressed in multiple tissues at different GAATTCGCGGCCGCACGCA(T) V-39]. developmental stages or under different environmental 18 conditions, and to increase the potential of developing leafy spurge as a model for perennial weed studies (Chao et al. Directional Cloning. Double-stranded cDNAs of $ 450 bp 2005), we have greatly increased the number of unigenes were selected by agarose gel electrophoresis. EcoRI adaptors within our leafy spurge EST database. In this report, we (59-AATTCCATTGTGTTGGG-39) were ligated to the describe the generation of an EST database and its cDNAs, followed by digestion with Not I. The cDNAs were bioinformatic analysis, and we discuss how this resource then directionally ligated into EcoRI–Not I-digested pBlue- script II SK(+) phagemid vector.4 Ligated cDNAs were might benefit other researchers studying specific physiological TM 5 responses in leafy spurge and related species. electroporated into DH10B cells.

Titers and Inserts Size. The total number of white colony- Methods and Materials forming units (cfu) in the primary library before amplification 6 Library Development. Plant tissues used to initiate this was 1.0 3 10 . More than 99% of the clones were project originated from three sources that included (1) recombinant. The average insert size of random clones on outdoor garden plants located at the USDA/ARS, Biosciences the basis of polymerase chain reaction (PCR) was 1.2 kilo- Research Laboratory in Fargo, ND, (2) wild plants from base pairs. a population located , 1.6 km (1 mile) north of the Biosciences Research Laboratory, and (3) greenhouse-grown Library Normalization. Normalization was performed plants. Leafy spurge (biotype 1984-ND001) plants main- following the procedure of Bonaldo et al. (1996). Briefly, tained in a greenhouse were propagated as previously purified plasmid DNA from the primary library was described (Anderson and Davis 2004). Outdoor garden plants converted to single-stranded circles by digestion with GeneII located at the laboratory originated from a portion of the leafy and Exonuclease III and used as a template for PCR spurge greenhouse population that was transplanted in 1998. amplification with the use of the T7 and T3 priming sites

194 N Weed Science 55, May–June 2007 flanking the cloned cDNA inserts. The purified PCR used to trim off vector sequences, ambiguous regions, and products, representing the entire cloned cDNA population, user-defined sequence patterns (see Table 1 for definitions were used as a driver for normalization. Hybridization and links to genomics terms used). Sequences of . 200 bp in between the single-stranded library and the PCR products length after processing were considered useful for further were carried out for 44 h at 30 C. Nonhybridized single- analysis. A report summarizing success rates and quality scores stranded DNA circles were separated from hybridized DNA were generated for each plate. Subsequently, sequences were by hydroxyapatite, rendered partially double-stranded by screened for contaminated sequences, such as bacterial primer extension with the use of standard M13 reverse chromosomal DNA, viral DNA or RNA, rRNA, mitochon- primer, and electroporated into DH10B cells to generate the dria, and chloroplast DNA with the BLASTN program normalized library. The total number of clones with insert was (Altschul et al., 1990). Contaminated sequences were 6 3 106 cfu and . 99% of the clones were recombinant. excluded from the final set of ‘‘cleaned’’ sequences. Repeats and low-complexity sequences were screened by RepeatMas- Library Subtraction. Subtraction was carried out as pre- ker program. (The cleaned sequences, their associated features, viously described (Bonaldo et al. 1996). Briefly, purified and their original raw sequences were stored in SeqDB, an in- DNAs from previously sequenced clones were pooled and house Oracle database developed at the Bioinformatics Unit, used as template for PCR amplification with standard T7 and and could be accessed through a web-based system, the T3 primers. The purified PCR products were used as a driver Management System (Liu et al. 2000), also for subtraction. Purified plasmid DNA from the normalized developed by the Bioinformatics Unit.) library was converted to single-stranded circles by digestion with GeneII and Exonuclease III and used as a tracer for the Protocol for Clustering, Assembling, and Assessing subtraction reaction. Hybridization between the single- Redundancy. The final clean sequences were used for stranded tracer library and the PCR products was carried clustering and assembly by Paracel TranscriptAssembler out for 88 h at 30 C. Nonhybridized, single-stranded DNA (PTA). EST sequences were clustered on the basis of local circles were separated from hybridized DNA by hydroxyap- similarity of pairwise comparison with 88% identity over 100 atite, rendered partially double-stranded by primer extension nucleotides. Clusters containing only one sequence are called with standard M13 reverse primer, and electroporated into singlets. The clusters were then assembled into contigs DH10B cells to generate the subtracted library. (contiguous sequence) by the CAP4 program, which is an Subtractions were carried out when overall redundancy improved version of CAP3 (Huang and Madan 1999) with approached 40%. In all, three subtractions were performed: the criteria of 95% identity and a minimum overlap of 30 subtracted library 1, with the use of , 12,000 clones nucleotides. A consensus sequence for each contig was sequenced from the normalized library as driver; subtracted generated. The sequences in a cluster that could not be library 2, with , 25,000 clones sequenced from the first assembled into contigs are called cluster_singlets. The subtracted library as driver; and subtracted library 3, from redundancy is calculated as {1 2 [(no. of contigs + no. of which , 39,000 previously sequenced clones were used as cluster_singlets + no. of singlets)/total no. of sequences]} 3 driver. The total number of clones with inserts from the first 100. subtraction was 3.5 3 106 cfu ml21 and . 98% of the clones were recombinant. In the second subtraction, the titer was 5 3 106 cfu ml21, and empty clones made up 4% of the Results and Discussion library. In the third subtraction, the titer was 107 total cfu, , EST Database Characterization. The use of a normalized and the background of empty clones was 4%. whole-plant leafy spurge cDNA library in combination with three library subtractions and a sequencing success rate of DNA Sequencing. The libraries were plated on agar and 88% yielded 45,314 clean ESTs with an average read length 6 individual colonies were picked by a Genetix Q-Pix robot and of 671 nucleotides (Table 2). Overall, our data indicate that racked as glycerol stocks in 384-well plates. After overnight the leafy spurge EST database assembled into 23,472 unique growth of the glycerol stocks, bacteria were inoculated into 96- sequences (10,436 contigs, 4,314 cluster_singlets, and 8,722 well deep cultures with Luria Bertani medium and carbenicillin 21 singlets) that represent 19,015 unigenes (10,293 clusters and (100 mg ml ). Plasmid DNA was purified from the bacterial 8,722 singlets). It has been estimated that arabidopsis contains cultures after 24 h of growth at 37 C by Qiagen 8000 and , 7 26,000 genes (The Arabidopsis Genome Initiative 2000), Qiagen 9600 robots. Sequencing reactions were performed and rice has as many as 37,000 genes (International Rice with standard T7 primer (providing sequence for the 59 end of Genome Sequencing Project 2005). Thus, our EST database cloned inserts only) and ABI BigDye terminator chemistry on 8 could potentially contain between 60 and 90% of the genes ABI 3730XL capillary systems. present in leafy spurge. The high number of coding sequences demonstrates the efficiency and cost effectiveness of EST- Sequence Processing Protocol. EST sequences were pro- based studies for initial genome analysis. cessed in batch by the bioinformatics pipeline developed in The increased number of unique sequences compared with the Bioinformatics Unit at the W. M. Keck Center, University unigenes can be explained by several facts: During assembly of of Illinois, Urbana-Champaign. The steps in the pipeline are clusters, some will not form any contigs and include only described as follows. After base-calling by the PHRED cluster_singlets because these singlets do not overlap, and program (Ewing et al., 1998; Ewing and Green, 1998), the some clusters have multiple contigs, with or without Cross-match program (http://bozeman.mbt.washington.edu/ cluster_singlets. Thus, the number of contigs + cluster_sing- phrap.docs/phrap.html), and Qualtrim script (a quality lets will not add up to the number of clusters. Multiple control script developed by the Bioinformatics Unit) were contigs within a single cluster can be due to a lack of sequence

Anderson et al.: An EST database for leafy spurge N 195 Table 1. A list of terms, definitions, and website resources commonly used in genomics research. Term Definition Website References BLAST Basic Local Alignment Search Tool. Finds regions of local http://www.ncbi.nlm.nih.gov/BLAST/ Altschul et al. 1990; similarity between sequences. McGinnis and Madden 2004 Clustering and Paracel Transcript Assembler using CAP4 program generates No web site available Huang and Madan 1999 assembly putative transcripts. Cluster A set of sequences grouped together on the basis of No web site available Huang and Madan 1999 pairwise comparison, clone name similarity, or both. Cluster_singlet Sequences in a cluster that could not be assembled into No web site available Huang and Madan 1999 contigs are called cluster_singlets. Contig A contiguous sequence based on overlapping regions No web site available Huang and Madan 1999 of ESTs in a given cluster. Cross_Match A program for comparing any two sets of DNA sequences. http://bozeman.mbt.washington.edu/ NA. Website has phrap.docs/phrap.html documentation dbEST Expressed Sequence Tag database. A division of http://www.ncbi.nlm.nih.gov/dbEST/ Boguski et al. 1993 GenBank that contains sequence data and other information on ‘‘single-pass’’ cDNA sequences from a number of organisms. GO Gene Ontology. Provides a controlled vocabulary to describe http://www.geneontology.org/ Gene Ontology gene and gene products in terms of their associated Consortium 2004 biological processes, cellular components, and molecular functions in any organism. GenBank National Institutes of Health genetic sequence database, an http://www.ncbi.nlm.nih.gov/Genbank/ Benson et al. 2005 annotated collection of all publicly available DNA sequences. MIPs Munich Information Center for Protein. Automatically http://mips.gsf.de/ Mewes et al. 2002 generated and manually annotated genome-specific databases; develops systematic classification schemes for functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. NCBI National Center for Biotechnology Information. A national http://www.ncbi.nlm.nih.gov/ Wheeler et al. 2004 resource for molecular biology information to aid in the understanding of fundamental molecular and genetic processes. Phred A base-calling program that examines DNA sequence http://www.phrap.org/phredphrap/phred.html Ewing et al. 1998; traces and assigns a quality score to each base call. Ewing and Green 1998 QualTrim A script used to trim off vector sequences, ambiguous NA Kumar et al. 2004 regions, and user-defined sequence patterns. RepeatMasker A program that screens DNA sequences for interspersed http://www.repeatmasker.org/ Smit et al. 1996–2004 repeats and low-complexity DNA sequences. Singlet Clusters containing only one sequence are called singlets. No web site available Huang and Madan 1999 TAIR The Arabidopsis Information Resource. A database of http://godot.ncgr.org/ Garcia-Hernandez et al. genetic and molecular biology data for Arabidopsis thaliana. 2002 TIGR The Institute for Genomic Research. A not-for-profit http://www.tigr.org/ NA center dedicated to deciphering and analyzing genomes.

information to form an overlap, , gene families, or the presence of chimera in the clusters. For example, Figure 1 shows an example of two contigs Table 2. Summary report for leafy spurge expressed sequence tags (ESTs). (CV03_9.sd.6.C1.Contig10398 and CV03_9.sd.6.C2.Con- Feature Statistic tig10399) from a single cluster that both translate into full- length cyclophilins (cyclophilins belong to a gene family in Clean ESTs 45,314 Average EST length (nucleotides) 671 arabidopsis). The two contigs have 66% identity at the Unique sequences 5 (contigs + cluster_singlets + singlets) nucleotide level, but 90% similarity at the amino acid level. Contigs 10,436 Because assembly was not stringent (95% over 30 nucleotides) Cluster_singlets 4,314 and the parameters used for clustering (88% over 100 Singlets 8,722 nucleotides) and assembly were different, it is unlikely that total 5 23,472 Unigenes 5 (clusters + singlets) chimera within a cluster caused ESTs to break away during Clusters 10,293 assembly. Singlets 8,722 In a BLASTX search of the GenBank nonredundant total 5 19,015 protein database (National Center for Biotechnology In- Clusters with . 1 contig 723 Contigs with . 1 hit to nonredundant and 58 formation [NCBI]) a total of 18,186 out of 23,472 leafy arabidopsis protein database spurge unique transcripts had at least one match, of which Contig ranges 14,205 were nonredundant at an expectation value (E-value) 201–500 nucleotides 870 cutoff of 1025 (Figure 2). Only 28 of the unique leafy spurge 501–1,000 nucleotides 6,357 1,001–1,500 nucleotides 2,739 sequences had matches to insect transcripts that likely 1,501–2,692 nucleotides 470 occurred from including gall tissue resulting from Spurgia Clusters with . 50 ESTs 5 esulae during the development of the normalized cDNA Highest number of ESTs in a cluster 79 library. Further BLAST searches, including arabidopsis pro-

196 N Weed Science 55, May–June 2007 Figure 1. Consensus nucleotide (nt) and predicted protein sequence (aa) for leafy spurge expressed sequence tag contigs CV03_9.sd.6.C1.Contig10398 and CV03_9.sd.6.C2.Contig10399. Translation start (ATG) and stop (TGA) codons are in bold text in each nucleotide consensus sequence. For protein sequences, methionine residues (M) are referenced in bold lettering. tein (TAIR, The Arabidopsis Information Resource), Rice BLAST matches to various species. As a result, only 801 protein (NCBI), soybean UniGene (NCBI), Medicago clusters and 2,149 singlets have no matches to other plant UniGene (NCBI), Poplar UniGene (NCBI), Medicago_TCs species, leading to a conclusion that there are 2,950 sequences (TIGR, The Institute for Genomic Research), and Pop- (15 to 16% of the total 19,015 unigenes) that, to our lar_TCs (TIGR) databases, identified 4,463 leafy spurge knowledge, are reported for the first time in leafy spurge. transcripts with no matches to other species (potential leafy Although further verification is needed to support this spurge–specific transcripts). conclusion, the 2,950 leafy spurge unigenes identified for A BLASTN search of the 4,463 putative leafy spurge– the first time in this report are similar to the 2,859 sequences specific transcripts against the plant dbEST database (NCBI) that were found to be novel to rice (International Rice identified an additional 117 matches not found in other leafy Genome Sequence Project 2005). spurge databases (Anderson and Horvath 2001; Chao et al. On the basis of the identification of 23,472 unique 2006). After adjusting for matches identified in dbEST and transcripts from a total of 45,314 clean sequences (ESTs), our for 26 no-matches caused by low-complexity regions within results suggest an overall redundancy rate of 48%. Ideally, the sequences, an estimated total of 4,320 putative leafy ESTs generated from a total cDNA library should represent spurge–specific transcripts were identified that represent all the expressed genes in the tissue from which the library was 3,733 unigenes (1,476 clusters and 2,257 singlets). In some constructed. However, because the expression level of each cases, one or more of the sequences within these clusters had gene in a given tissue is different, it is difficult to capture rare

Anderson et al.: An EST database for leafy spurge N 197 develop our leafy spurge EST database not only reduced the redundancy rate but also increased the overall cost-effective- ness of the project. These conclusions are supported by the data shown in Table 2, in which only 5 clusters contained . 50 ESTs and the highest number of ESTs found in any one cluster was 79.

Functional Classification and Gene Ontology. On the basis of the two fully sequenced genomes of plants (arabidopsis and rice), BLASTX results indicated that 23,472 unique leafy spurge sequences had greater homology to arabidopsis (17,711 total matches and 11,509 nonredundant matches) than to rice (16,523 total matches and 10,166 nonredundant matches) (Figure 2). This is not surprising because leafy spurge and arabidopsis are both dicots and thus are more Figure 2. BLASTX total and nonredundant matches obtained using 23,472 unique closely related to each other than to rice, which is a monocot. leafy spurge sequences and known genes from arabidopsis (Arabidopsis-TAIR.aa), rice (rice.aa), and all known genes accessions in the GenBank nonredundant amino However, , 23% of the unique transcripts identified in leafy acid database (nr.aa). All matches represent a stringency of E 25. spurge produced no matches with arabidopsis. Of the 17,711 unique leafy spurge transcripts with BLASTX matches to arabidopsis (Figure 2), 56.6% are categorized as unclassified mRNAs from cDNA libraries. This problem also leads to proteins and 4.27% as classification not yet clear (Figure 3) redundant sequencing of some abundant clones, thereby on the basis of the Munich Information Center for Protein affecting the efficiency and cost effectiveness of the EST Sequences (MIPs) functional categorization (Mewes et al. approach (Bonaldo et al. 1996). 2002). For those genes that have been classified in arabidopsis To overcome the redundancy problem in large-scale cDNA (Schoof et al. 2002), our unique leafy spurge transcripts had sequencing projects, a cDNA library normalization approach the greatest matches with genes classified in categories of is used to generate uniform abundances of cDNA classes metabolism (6.92%) and localization (5.03%). Eight other within a library (Soares et al. 1994). This system has been classifications with matches ranging from , 2 to 3% included adopted successfully in other systems. For example, normal- transcription, protein synthesis, protein fate, protein binding ized cDNA libraries developed from different human tissues function, cellular transport, cellular communication/signal and organs have proven effective in representing rare and low- transduction, cell rescue/defense and virulence (including abundance mRNAs (Bonaldo et al. 1996). Recently, ESTs stress responses), and biogenesis of cellular components. Four generated from normalized cDNA libraries have provided categories had hits ranging between 1 and 2% and included comprehensive analyses of genes expressed in the model energy, cell cycle and DNA processing, cell fate, and dicotyledonous plant arabidopsis (Asamizu et al. 2000b), development. The remaining seven categories shown in trefoil (Lotus japonicus, Asamizu et al. 2000a), and wheat Figure 3 had hits with , 1%, with the lowest number of (Triticun aestivum, Ali et al. 2000). The strategy used to matches categorized as transposable elements.

Figure 3. Classification of leafy spurge sequences on the basis of Munich Information Center for Protein Sequences functional classification of arabidopsis. Each piece of the pie chart represents percentage of leafy spurge genes that had matches at a stringency of E 25.

198 N Weed Science 55, May–June 2007 Figure 4. Comparison of Munich Information Center for Protein Sequences functional classifications between leafy spurge unique sequences and genes of arabidopsis.

The results described above are reasonably proportional to cellular components, active in one or more biological processes, the functional categorization of genes from arabidopsis with during which it performs one or more molecular functions (see several exceptions (Figure 4). For example, transcripts catego- www.geneontology.org for detailed documentation). We used rized as ‘‘unclassified proteins’’ in leafy spurge were significantly BLAST results of our 23,472 unique leafy spurge sequences reduced compared with arabidopsis (11,862 vs. 19,650, against the arabidopsis protein database to infer gene ontology respectively). Leafy spurge transcripts categorized as ‘‘classifi- terms for leafy spurge sequences (see supplemental data 1, cation not yet clear-cut’’ and ‘‘transposable elements, viral and http://www.ars.usda.gov/SP2UserFiles/Place/54420510/supp_ plasmid proteins’’ were also reduced compared with arabidopsis data_1.xls). Because not all leafy spurge sequences had (Figure 4). Combined, these results suggest that although the homology to arabidopsis proteins and not all arabidopsis gene distribution of gene functions in our library is fairly similar to products are associated with a GO term yet, we do not have GO arabidopsis, it is likely that some gaps in gene representation terms or annotation for all leafy spurge unique sequences. Each exist because of the choice of tissues used for library of these GO annotated sequences can be associated with more construction. Also, the comparison of transcripts (ESTs) vs. than one category or more than one GO term. For example, genes identified by whole-genome sequencing projects could Contig7285 (CV03_9.6868.C2.Contig7285) can be described explain the disproportional number of arabidopsis genes (annotated) by the molecular function term ATP binding, the categorized as unclassified and transposable elements. This biological process terms apoptosis and defense response, and result could also reflect that very little genomics work has been the cellular component term endomembrane system. Thus, done on perennials, and many genes perhaps have not been a given sequence might be present on one or more of the GO studied in other model organisms. It will be interesting to see categories. Because GO annotation, curation, or both of the what proportion of genes from the poplar genome project, for reference databases change with time, the GO annotation of example, will be placed in these categories. leafy spurge is also expected to change as new terms and genes Gene Ontology (GO) describes gene products in terms of are added. their associated biological processes, cellular components, and Of the 17,711 leafy spurge sequences with BLAST hits to molecular functions in a species-independent manner. A gene the arabidopsis protein database (Figure 2), 11,371 have GO product might be associated with or located in one or more terms associated with them. But almost half of them (5,427)

Anderson et al.: An EST database for leafy spurge N 199 fell into the unknown category. Different GO categories of the Release of the leafy spurge EST database to the public leafy spurge transcripts classified are given in Figure 5. Similar through NCBI (dbEST) has already generated interest from to MIPs functional categories, leafy spurge was proportional to outside groups. For example, leafy spurge clones coding for the GO of arabidopsis with the exception of ‘‘unknowns.’’ For casbene synthase are being used for the expression of full- example, , 27% of leafy spurge transcripts were listed under length gene products in engineered microbial hosts to study the GO term ‘‘biological process unknown’’ compared with the the production of diterpenoid precursors such as prostratin 48% listed under the same GO term for arabidopsis (data (12-deoxyphorbaol 13-acetate) and DPP (12-deoxyphorbol available at http://www.tigr.org/tigr-scripts/tgi/GO_browser. 13-phenylacetate) at the Keasling Lab at the University of pl?species+arab&gi_dir5agi). Similarly, 26 and 2% of leafy California–Berkeley. These studies hold the potential of spurge transcripts were listed under GO terms ‘‘cellular meeting the demand for a new HIV treatment. component unknown’’ and ‘‘molecular function unknown,’’ respectively (Figure 5), compared with the , 62 and 52% Ontology of Cross-Hybridizing Genes and Implications. listed under the same GO terms for arabidopsis. Again, as Leafy spurge is a member of the genetically diverse previously mentioned, these results likely reflect differences Euphorbiaceae family that includes important crop, horticul- associated with EST vs. whole-genome sequencing projects. tural, weedy, and endangered species (see Anderson et al. 2004). For example, cassava (Manihot esculenta Crantz) is one Analysis of Repetitive Elements. Excessive representation of of the most important human food crops in the world, repetitive elements in a library can bias data mining and is whereas Castor bean (Ricinus communis L.) is a major source indicative of a poor EST database. The leafy spurge unique of castor oil and has gained attention because of the escalating sequences were analyzed for known interspersed repeats with threat of bioterrorism resulting from the production of ricin. RepeatMasker (Smit et al. 1996–2004) and TIGR’s plant Other important members of the family include rubber tree repeat database as the custom library. Various repeat elements (Hevea brasiliensis Mu¨ll. Arg.); poinsettia (Poinsettia pulcher- identified and the lengths they occupy are given in the rima L.), which is a weed in some ecosystems; and endangered Table 3 (a complete description of the repeat elements is species such as Akoka (Chamaesyce spp.) and telephus spurge provided in supplemental data 2, http://www.ars.usda.gov/ (Euphorbia telephioides Chapm.). SP2UserFiles/Place/54420510/supp_data_2.xls). Only 1.7% The data we present in this report suggests that nearly 15 to of the total length of the unique sequences is represented in 16%, or 2,950 unigenes, identified from this leafy spurge EST these repetitive elements, suggesting that transposons and database have the potential to be classified as leafy spurge– other repetitive elements are not overrepresented in our specific. Future experiments with microarrays that include database. Because we used an existing repeat database and the these leafy spurge unigenes will determine their hybridization entire genome sequencing is not complete, the repeat content efficiency with other Euphorbiaceae family members. For of the leafy spurge genome might be a low estimation. example, it will be interesting to see how many of these genes have the potential to be classified as ‘‘leafy spurge–specific’’ or Current Benefits of the Leafy Spurge EST Database. As ‘‘novel to leafy spurge’’ by not hybridizing to expressed genes indicated above, our leafy spurge EST database contains in related Euphorbiaceae species such as cassava, castor bean, a multitude of transcripts spread across a range of functional poinsettia, and rubber tree. Additionally, because a significant classifications. Some of these ESTs have already provided understanding of the conservation and diversity of genes benefits to our understanding of growth and development in between members of the Euphorbiaceae family is lacking, it is leafy spurge. For example, because of our interest in the role currently difficult to design treatments or breeding programs that sugar sensing plays in bud dormancy (Anderson et al. to improve genetic stocks of desirable species or to develop 2005; Chao et al. 2006) and growth inhibition (Chao et al. methods to control the growth of undesirable species. Many 2006; Horvath et al. 2002), ESTs representing transcripts of these problems could be solved by the development of coding for sugar-metabolizing enzymes have enhanced our sequence databases and genomic-based research strategies for understanding of pathways regulating sugar flux and signaling members of this family. However, until these resources are (Anderson et al. 2005), whereas other ESTs have provided generated, it should be possible to use the resources developed further insights into cell cycle regulation (Sonju and Horvath for leafy spurge. 2005). Comparative genomic hybridization of DNA samples from Additionally important to the invasive nature of leafy other Euphorbia species will show the utility of the planned spurge is seed production and dormancy. Because seemingly microarrays for cross-species experiments and evolutionary eradicated patches of leafy spurge can re-establish from studies. An analysis of sequence similarity between . 9,000 dormant seeds located within the soil bank for up to 8 yr unigenes from cassava and the . 19,000 unigenes from leafy (Chao and Anderson 2004), an understanding of seed spurge indicates that . 50% of the cassava genes have dormancy could contribute to the control of leafy spurge functional orthologues in leafy spurge (data available at: infestations. Numerous genes in the leafy spurge EST database http://titan.biotec.uiuc.edu/cassava/). Additionally, hybridiza- with homologues to arabidopsis genes that are involved in tion of low-density leafy spurge microarrays with transcripts of germination and dormancy can be used as probes to enhance poinsettia (considered a weed in some native environments) our understanding of seed dormancy. The leafy spurge EST suggests that . 60% of the leafy spurge genes hybridize with database also contains a wealth of clones related to flower their counterparts in poinsettia and that the leafy spurge genes regulation, including FLOWERING LOCUS T-like (FT) could be used as probes or sources of primer sequences for and CONSTANS (CO), which interestingly have also been cloning the orthologous genes from poinsettia (data not linked to regulation of growth cessation, bud set, and bud shown). In comparison, , 47% of the cDNAs present on the dormancy in trees (Bo¨hlenius et al. 2006). Arabidopsis Functional Genomics Consortium arabidopsis

200 N Weed Science 55, May–June 2007 Figure 5. Characterization of leafy spurge expressed sequence tag database on the basis of Gene Ontology matches. Numbers in parentheses indicate the total number of unique sequences within biological process (A), cellular component (B), and molecular function (C). Bars show the number of sequences for each subcategory within A, B, and C.

Anderson et al.: An EST database for leafy spurge N 201 Table 3. Unique leafy spurge sequences with known interspersed repeats 3 SuperScriptTM Choice System, Invitrogen, Carlsbad, CA identified by RepeatMasker software and TIGR plant repeat database as the 92008. custom library. Various repeat elements identified and the lengths they occupy 4 are shown. pBS II SK(+) phagemid vector, Stratagene, 1834 State Highway 71 West, Cedar Creek, TX 78612. Number of sequences 23,472 5 Total length 18,242,199 base pairs (bp) DH10B cells, Invitrogen, Carlsbad, CA 92008. 6 Genetix Q-Pix robot, Genetix, Hampshire, UK. Classification No. of elements Length occupied (bp) 7 Qiagen 8000 and Qiagen 9600 robots, Qiagen Inc., 28159 Transposable elements Avenue Stanford, Valencia, CA 91355. Retrotransposons 178 26,384 8 ABI BigDye terminator chemistry and ABI 3730XL capillary Transposons 76 6,177 MITEsa 44 8,460 systems, Applied Biosystems, Foster City, CA 94404. Centromere-related Centromere-specific 0 retrotransposons Literature Cited Centromeric satellite 146Aharoni, A., L.C.P. Keizer, and H.J. Bouwmeester, et al. 2000. Identification of repeats the SAAT gene involved in strawberry flavor biogenesis by use of DNA Unclassified centromere 6 813 microarrays. Plant Cell 12:647–661. sequences Ali, S., B. Holloway, and W.C. Taylor. 2000. Normalization of cereal endosperm Telomere-related EST libraries for structural and functional genomic analysis. Plant Mol. Biol. Telomere sequences 2 79 Rep. 18:123–132. Telomere-associated 3 516 Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic Unclassified 12 2,797 local alignment search tool. J. Mol. Biol. 215:403–410. Anderson, J.V. and D.G. Davis. 2004. Abiotic stress alters transcript profiles and Total interspersed repeats 322 45,272 activity of enzymes involved in glutathione-metabolism in Euphorbia esula. Ribosomal RNA genes Physiol. Plant. 120:421–433. 45S rDNA 113 18,869 Anderson, J.V., M. Delseny, and M.A. Fregene, et al. 2004. An EST resource for 5S rDNA 66 10,323 cassava and other species of Euphorbiaceae. Plant Mol. Biol. 56:527–539. Anderson, J.V., R.W. Gesch, Y. Jia, W.S. Chao, and D.P. Horvath. 2005. Simple repeats 2,752 10,5400 Seasonal shifts in dormancy status, carbohydrate metabolism, and related gene Low complexity 3,038 129,130 expression in crown buds of leafy spurge. Plant Cell Environ. 28:1567–1578. a Anderson, J.V. and D.P. Horvath. 2001. Random sequencing of cDNAs and Abbreviation: MITE, miniature inverted-repeat transposable element. identification of mRNAs. Weed Sci. 49:590–597. cDNA microarrays hybridized significantly to total labeled Arabidopsis Genome Initiative, The. 2000. Analysis of the genome sequenceof the flowering plant Arabidopsis thaliana. Nature 408:796–815. cDNA from leafy spurge (Horvath et al. 2003). Asamizu, E., Y. Nakamura, S. Sato, and S. Tabata. 2000a. Generation of 7137 non-redundant expressed sequence tags from a legume, Lotus japonicus. DNA Future Uses. The main goal of developing the whole-plant Res. 7:127–130. leafy spurge EST database is to provide a resource for the future Asamizu, E., Y. Nakamura, S. Sato, and S. Tabata. 2000b. Large scale analysis of cDNA in Arabidopsis thaliana: Generation of 12,028 non-redundant expressed development of high-density microarrays. Although low- sequence tags from normalized and size-selected cDNA libraries. DNA Res. density leafy spurge microarrays have provided information 7(3):175–180. about known and novel signaling processes associated with Bangsund, D.A., F.L. Leistritz, and J.A. Leitch. 1999. Assessing economic well-defined phases of dormancy in leafy spurge (Horvath et al. impacts of biological control of weeds: The case of leafy spurge in northern 2005, 2006), they often do not provide a clear understanding of Great Plains of the United States. J. Environ. Manag. 56:35–43. Basu, C., M.D. Halfhill, T.C. Mueller, and C.N. Stewart. 2004. Weed genomics: physiological changes that are associated with genes that are New tools to understand weed biology. Trends Plant Sci. 9:391–398. coordinately regulated (Horvath et al. 2006). Microarrays Benson, D.A., I. Karsch-Mizrach, D.J. Lipman, J. Ostell, and D.L. Wheeler. developed with the use of the 19,015 leafy spurge unigene set 2005. GenBank: Update. Nucleic Acids Res. 33:D34–D38. described in this paper will represent the first known high- Boguski, M.S., T.M.J. Lowe, and C.M. Tolstoshev. 1993. dbEST-database for density microarrays for a perennial weed species. This EST ‘‘expressed sequence tags,’’ Nat. Genet. 4:332–333. Bo¨hlenius, H., T. Huang, L. Charbonnel-Campaa, A.M. Brunner, S. Jansson, database should serve as a source of clones and sequences for S.H. Strauss, and O. Nilsson. 2006. CO/FT regulatory module controls timing other researchers studying specific physiological responses in of flowering and seasonal growth cessation in trees. Science 312:1040–1043. leafy spurge and related species. Although the EST database and Bonaldo, M.F., G. Lennon, and M.B. Soares. 1996. Normalization and microarrays will provide greatly needed tools for studying subtraction: Two approaches to facilitate gene discovery. Genome Res. various aspects of weed biology, the development of resources 6:791–806. Cadman, C.S.C., P.E. Toorop, H.W.M. Hilhorst, and W.E. Finch-Savage. 2006. for identifying regulatory sequences is still needed. Thus, the profiles of arabidopsis Cvi seeds during dormancy cycling next step toward a functional genomics characterization of leafy indicate a common underlying dormancy control mechanism. Plant J. spurge should include the development of a high-density 46:805–822. Bacterial Artificial Chromosome library from which regulatory Chang, S., J. Puryer, and J. Cairney. 1993. A simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 11:113–116. sequences from important genes can be readily obtained. Chao, W.S. and J.V. Anderson. 2004. Euphorbia esula. In: Crop Protection However, care should be taken when constructing this library Compendium, 2004 ed. Wallingford, UK: CAB International. because it could also serve as the backbone of any future Chao, W.S., D.P. Horvath, J.V. Anderson, and M.P. Foley. 2005. Potential genome sequencing project. model weeds to study genomics, ecology, and physiology in the 21st century. Weed Science 53:929–937. Chao, W.S., M.D. Serpe, J.V. Anderson, R.W. Gesch, and D.P. Horvath. 2006. Sources of Materials Sugars, hormones, and environment affect the dormancy status in un- derground adventitious buds of leafy spurge (Euphorbia esula L.). Weed Sci. 1 Model 1-68, Percival Scientific Inc., Boone, IA 50036. 54:59–68. 2 Coupland, R.T., G.W. Selleck, and J.F. Alex. 1955. Distribution of vegetative Oligotex Direct mRNA kit, Qiagen, 28159 Avenue Stanford, buds on the underground parts of leafy spurge (Euphorbia esula L.). Can. J. Valencia, CA 91355-1106. Agric. Sci. 35:161–167.

202 N Weed Science 55, May–June 2007 Ewing, B. and P. Green. 1998. Base-calling of automated sequencer traces using Mewes, H.W., D. Frishman, and U. Gu¨ldener, et al. 2002. MIPS: A database for phred. II. Error probabilities. Genome Res. 8:186–194. genomes and protein sequences. Nucleic Acids Res. 30:31–34. Ewing, B., L. Hillier, M.C. Wendl, and P. Green. 1998. Base-calling of Ohlrogge, J. and C. Benning. 2000. Unravelling plant metabolism by EST automated sequencer traces using phred. I. Accuracy assessment. Genome Res. analysis. Curr. Opin. Plant Biol. 3:224–228. 8:175–185. Oztur, Z.N., V. Talame, M. Deyholos, C.B. Michalowski, D.W. Galbraith, N. Garcia-Hernandez, M., T.Z. Berardini, and G. Chen, et al. 2002. TAIR: A Gozukirmizi, R. Tuberosa, and H.J. Bohnert. 2002. Monitoring large-scale resource for integrated arabidopsis data. Funct. Integr. Genomics 2:239–253. changes in transcript abundance in drought- and salt-stressed barley. Plant Gene Ontology Consortium. 2004. The gene ontology (GO) database and Mol. Biol. 48:551–573. informatics resource. Nucleic Acids Res. 32:D258–D261. Potokina, E., N. Sreenivasulu, L. Altschmied, W. Michalek, and A. Graner. 2002. Gutierrez, R.A., R.M. Ewing, J.M. Cherry, and P.J. Green. 2002. Identification Differential gene expression during seed germination in barley (Hordeum of unstable transcripts in arabidopsis by cDNA microarray analysis: Rapid vulgare L.). Funct. Integr. Genomics 2:28–39. decay is associated with a group of touch and specific clock-controlled genes. Quackenbush, R.C., J. Cho, and D. Lee, et al. 2001. The TIGR Gene Indices: Proc. Natl. Acad. Sci. USA. 99(17):11,513–11,518. Analysis of gene transcript sequences in highly sampled eukaryotic species. Horvath, D.P., J.V. Anderson, M. Soto, and W.S. Chao. 2006. Transcriptome Nucleic Acids Res. 29:159–164. analysis of leafy spurge (Euphorbia esula L.) crown buds during shifts in well- Reymond, P., H. Weber, M. Damond, and E.E. Farmer. 2000. Differential gene defined phases of dormancy. Weed Sci. 54:821–827. expression in response to mechanical wounding and insect feeding in Horvath, D.P., W.S. Chao, and J.V. Anderson. 2002. Molecular analysis of arabidopsis. Plant Cell 12:707–719. signals controlling dormancy and growth in underground adventitious budsof Richmond, T. and S. Somerville. 2000. Chasing the dream: Plant EST leafy spurge (Euphorbia esula L.). Plant Physiol. 128:1439–1446. microarrays. Curr. Opin. Plant Biol. 3:108–116. Horvath, D.P., R. Schaffer, M. West, and E. Wisman. 2003. Arabidopsis Schena, M., D. Shalon, R.W. Davis, and P.O. Brown. 1995. Quantitative microarrays identify conserved and differentially-expressed genes involved in monitoring of gene expression patterns with a complementary DNA shoot growth and development from distantly related plant species. Plant J. microarray. Science 270:467–470. 34:125–134. Schoof, H., P. Zaccaria, H. Gundlach, K. Lemcke, S. Rudd, G. Kolesov, R. Horvath, D.P., M. Soto, Y. Jia, W.S. Chao, and J.V. Anderson. 2005. Arnold, H.W. Mewes, and K.F.X. Mayer. 2002. MIPS Arabidopsis thaliana Transcriptome analysis of paradormancy release in root buds of leafy spurge database (MAtDB): An integrated biological knowledge resource based on the (Euphorbia esula). Weed Sci. 53:795–801. first complete plant genome. Nucleic Acids Res. 30:91–93. Huang, H. and A. Madan. 1999. CAP3: A DNA sequence assembly program. Genome Res. 9:868–877. Schrader, J., R. Moyle, R. Bhalerao, M. Hertzberg, J. Lundeberg, P. Nilsson, and International Rice Genome Sequencing Project. 2005. The map-based sequence R.P. Bhalerao. 2004. Cambial meristem dormancy in trees involves extensive of the rice genome. Nature 436:793–800. remodeling of the transcriptome. Plant J. 40:173–187. Kloosterman, B., O. Vorst, R.D. Hall, R.G.F. Visser, and C.W. Bachem. 2005. Smit, A.F., R. Hubley, and P. Green. 1996–2004. RepeatMasker Open-3.0. Tuber on a chip: Differential gene expression during potato tuber http://www.repeatmasker.org. development. Plant Biotech. J. 3:505–519. Soares, M.B., M.F. Bonaldo, P. Jelene, L. Su, L. Lawton, and F. Efstratiadis. Kumar, C.G., R. LeDuc, G. Gong, L. Roinishivili, H.A. Lewin, and L. Liu. 1994. Construction and characterization of a normalized cDNA library. Proc. 2004. ESTIMA, a tool for EST management in a multi-project environment. Natl. Acad. Sci. USA. 91:9228–9232. BMC Bioinf. 5:176. Sonju, R. and D.P. Horvath. 2005. Cloning and expression of Krp genes from Lee, J.M., M.E. Williams, S.V. Tingey, and J.A. Rafalski. 2002. DNA array adventitious buds of the perennial weed leafy spurge. Page 31 in 2005 Midwest profiling of gene expression changes during maize embryo development. American Society of Plant Biology Sectional Meeting. Donald Danforth Plant Funct. Integr. Genomics 2:13–27. Science Center, St. Louis, MO. July 18–19. [Abstract]. Leitch, J.A., F.L. Leistritz, and D.A. Bangsund. 1996. Economic effect of leafy van Hal, N.L.W., O. Vorst, A.M.M.L. van Houwelingen, E.J. Kok, A. spurge in the upper Great Plains: Methods, models, and results. Impact Assess. Peijnenburg, A. Aharoni, A.J. van Tunen, and J. Keijer. 2000. The application 14:419–433. of microarrays in gene expression analysis. J. Biotech. 78:271–280. Liu, L., L. Roinishvili, X. Pan, Z. Liu, and C. Kumar. 2000. GPMS A Web based Wheeler, D.L., D.M. Church, and R. Edgar, et al. 2004. Database resources of Genome Project Management System. Pages 62–67 in Proceeding of the 4th the National Center for Biotechnology Information: Update. Nucleic Acids World Multi-conference on Systematics, Cybernectics, and Informatics Res. 32:D35–D40. SCI2000. Zhu, T., P. Budworth, and W. Chen, et al. 2003. Transcriptional control of Marshall, E. 2004. Getting the noise out of gene arrays. Science 306:630–631. nutrient partitioning during rice grain filling. Plant Biotech. J. 1:59–70. McGinnis, S. and T.L. Madden. 2004. BLAST: At the core of a powerful and diverse set of sequence analysis tools. Nucleic Acid Res. 32:W20–W25. Received August 16, 2006, and approved December 22, 2006.

Anderson et al.: An EST database for leafy spurge N 203