US 2006O195944A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/0195944 A1 Heard et al. (43) Pub. Date: Aug. 31, 2006

(54) TRANSCRIPTION FACTOR SEQUENCES (60) Provisional application No. 60/713.952, filed on Aug. FOR CONFERRING ADVANTAGEOUS 31, 2005. Provisional application No. 60/336,049, PROPERTIES TO filed on Nov. 19, 2001. Provisional application No. 60/310,847, filed on Aug. 9, 2001. Provisional appli (75) Inventors: Jacqueline E. Heard, Stonington, CT cation No. 60/338,692, filed on Dec. 11, 2001. Pro (US); Jose Luis Riechmann, Pasadena, visional application No. 60/465,809, filed on Apr. 24. CA (US); Oliver Ratcliffe, Oakland, 2003. Provisional application No. 60/434,166, filed CA (US); Omaira Pineda, Vero Beach, on Dec. 17, 2002. Provisional application No. 60/411, FL (US) 837, filed on Sep. 18, 2002. Correspondence Address: Mendel Biotechnology, Inc. Publication Classification 21375 Cabot Blvd. Hayward, CA 94545 (US) (51) Int. Cl. (73) Assignee: Mendel Biotechnology, Inc., Hayward, AOIH I/00 (2006.01) CA CI2N 5/82 (2006.01) CI2N 5/04 (2006.01) (21) Appl. No.: 11/375,241 (52) U.S. Cl...... 800/287: 435/419, 435/468 (22) Filed: Mar. 13, 2006 (57) ABSTRACT Related U.S. Application Data (63) Continuation-in-part of application No. 10/225,067, The invention relates to transcription factor polypep filed on Aug. 9, 2002, and which is a continuation tides, polynucleotides that encode them, homologs from a in-part of application No. 09/837,944, filed on Apr. variety of plant species, and methods of using the polynucle 18, 2001, now abandoned, and which is a continua otides and polypeptides to produce transgenic plants having tion-in-part of application No. 10/171.468, filed on advantageous properties compared to a reference or control Jun. 14, 2002, now abandoned. plant, including increased plant size, seed size, increased Continuation-in-part of application No. 10/714,887, leaf size, lignification, water deprivation tolerance, cold filed on Nov. 13, 2003. tolerance, or altered flowering time. Sequence information Continuation-in-part of application No. 10/666,642, related to these polynucleotides and polypeptides can also be filed on Sep. 18, 2003. used in bioinformatic search methods and is also disclosed.

Patent Application Publication Aug. 31, 2006 Sheet 1 of 9 US 2006/0195944 A1

Fagales Cucurbitales Rosales Fabales Oxalidales Malpighiales Sapindales Malvales Brassicales Myrtales GeranialesDipSacales Asterales Apiales Aquifoliales Solanales Lamiales Gentianales Garryales Ericales Cornales Saxifragales Santalales Caryophyllales Proteales Ranunculales Zingiberales Commelinales Poales Arecales Pandanales Dioscoreales Asparagales Alismatales AcoralesPiperales Magnoliales Laurales Ceratophyllales FIG. 1 Patent Application Publication Aug. 31, 2006 Sheet 2 of 9 US 2006/0195944 A1 Patent Application Publication Aug. 31, 2006 Sheet 3 of 9 US 2006/0195944 A1

64 G3652 O. Satiya 95 G3653 O. sativa 97. G3655 O. sativa 99. G3654 O. sativa G2576 A. thaliana 98. G872 A. thaliana G3657 O. sativa 100. G2115 A. thaliana 80G2294 A. thaliana G1090 A. thaliana 86 CBF4A. thaliana 100 CBF2 A. thaliana 89 CBF3A. thaliana 50 CBF1 A. thaliana 80 G3644 O. Satiya 100 G3651 O. Sativa G3650 Z. mavs G3649 O. Satiya 93 G3643 G. max G3647 Z. elegans 56 98 G2133 A. thaliana G3646 B. oleracea 99 G3645 B. rapa 98 G47 A. thaliana G3656 Z. mays 97 -G12 A. thaliana 99 G24 A. thaliana 93. -G1277 A. thaliana 99. G1379 A. thaliana G867 A. thaliana FIG. 3

Patent Application Publication Aug. 31, 2006 Sheet 5 of 9 US 2006/0195944 A1

G47 (SEO ID NO: 66) and G2133 (SEO ID NO: 152) conserved domains G47 : QSKYKGIRRRKWGKWVSEIRVPGTRDRLWLGSFSTAEGAAVAHDVA QSKYKGIRRRKWGKWVSEIRVPGTR RLWLGSFSTAEGAAVAHDVA G21.33: QSKYKGIRRRKWGKWVSEIRVPGTRORLWLGSFSTAEGAAVAHDVA

G47 : FFCLHOPDSL--ESLNFPHLLNPSLV F-CLH-P SIL ES NFPHILL SIL G2133 : FYCLHRPSSLDDESFNFPHILLTTSILA

G47 (SEO ID NO: 66) and G3643 (SEQ ID NO: 158) conserved domains G47 : SKYKGIRRRKWGKWVSEIRVPGTRDRLWLGSFSTAEGAAVAHDVAF --K KG-RRRKWGKWVSEIRVPGT----RLWLG------T E AAVAHDVA G3643 : NKLKGVRRRKWGKWVSEIRVPGTOERLWLGTYATPEAAAVAHDVAV

G47 : FCLHOPDSLESLNFPHLLN +CL +P SL-- INFP T-- G3643 : YCLSRPSSLDKLNFPETLS

G47 (SEQ ID NO 66) and G3649 (SEO ID NO: 154) conserved domains G47 : KYKGIRRRKWGKWVSEIRVPGTRDRIWGSFSTAEGAAVAHDVA --Y--G--RRR- WGKWVSEIRVPGTR-RLWLGS----TAE AAVAHD A G3649 : RYRGVRRRRWGKWVSEIRVPGTRERLWLGSYATAEAAAVAHDAA

G47 (SEO ID NO: 6) and G3644 (SEO ID NO: 156) conserved domains G47 : KYKGIRRRKWGKWVSEIRVPGTRDRLWLGSFSTAEGAAVAHDVA +Y+G+RRR--WGKWVSEIRVPGTR-RLWLGS----T E AAVAHD A G3644 : RYRGVRRRRWGKWVSEIRVPGTRERLWLGSYATPEAAAVAHDTA

G47: FFCL + Lu G3644 : VYFL FIG. 5 Patent Application Publication Aug. 31, 2006 Sheet 6 of 9 US 2006/0195944 A1

Patent Application Publication Aug. 31, 2006 Sheet 7 of 9 US 2006/0195944 A1

FIG. 7B Patent Application Publication Aug. 31, 2006 Sheet 8 of 9 US 2006/0195944 A1

Patent Application Publication Aug. 31, 2006 Sheet 9 of 9 US 2006/0195944 A1

FIG. 9 US 2006/0195944 A1 Aug. 31, 2006

TRANSCRIPTION FACTOR SEQUENCES FOR result in a transgenic plant having the desired phenotype CONFERRING ADVANTAGEOUS PROPERTIES TO with the enhanced agronomic trait. Therefore, methods to PLANTS select individual transgenic events from a population may be 0001) This application claims the benefit of U.S. Provi required to identify those transgenic events that are charac sional Application No. 60/713,952, filed Aug. 31, 2005 terized by the enhanced agronomic trait. (pending); and this application is a continuation-in-part of 0004) Other aspects and embodiments of the invention prior U.S. application Ser. No. 10/225,067, filed Aug. 9. are described below and can be derived from the teachings 2002 (pending) which claims the benefit of U.S. Provisional of this disclosure as a whole. Application No. 60/336,049, filed Nov. 19, 2001 (now abandoned), U.S. Provisional Application No. 60/310,847, BACKGROUND OF THE INVENTION filed Aug. 9, 2001 (now abandoned) and U.S. Provisional Application No. 60/338,692, filed Dec. 11, 2001 (now 0005 Transcription factors can modulate gene expres abandoned); and, prior U.S. application Ser. No. 10/225,067, Sion, either increasing or decreasing (inducing or repressing) filed Aug. 9, 2002 (pending) is a continuation-in-part of U.S. the rate of transcription. This modulation results in differ Non-provisional application Ser. No. 09/837,944, filed Apr. ential levels of gene expression at various developmental 18, 2001 (now abandoned), and U.S. Non-provisional appli stages, in different tissues and cell types, and in response to cation Ser. No. 10/171.468, filed Jun. 14, 2002 (now aban different exogenous (e.g., environmental) and endogenous doned); and, this application is a continuation-in-part of stimuli throughout the life cycle of the organism. prior U.S. application Ser. No. 10/714,887, filed Nov. 13, 0006 Because transcription factors are key controlling 2003 (pending); and, this application is a continuation-in elements of biological pathways, altering the expression part of prior U.S. application Ser. No. 10/666,642, filed Sep. levels of one or more transcription factors can change entire 18, 2003 (pending) which claims the benefit of U.S. Provi biological pathways in an organism. For example, manipu sional Application No. 60/465,809, filed Apr. 24, 2003 (now lation of the levels of selected transcription factors may abandoned), U.S. Provisional Application No. 60/434,166, result in increased expression of economically useful pro filed Dec. 17, 2002 (now abandoned) and U.S. Provisional teins or metabolic chemicals in plants or to improve other Application No. 60/411,837, filed Sep. 18, 2002. Each of agriculturally relevant characteristics. Conversely, blocked these applications is hereby incorporated by reference in or reduced expression of a transcription factor may reduce their entirety. biosynthesis of unwanted compounds or remove an unde sirable trait. Therefore, manipulating transcription factor FIELD OF THE INVENTION levels in a plant offers tremendous potential in agricultural 0002 This invention relates to the field of plant biology. biotechnology for modifying a plant's traits. More particularly, the present invention pertains to compo 0007. The present invention provides novel transcription sitions and methods for phenotypically modifying a plant. factors useful for modifying a plant's phenotype in desirable ways. INTRODUCTION 0003 Transgenic plants with improved traits, including SUMMARY OF THE INVENTION enhanced yield, environmental stress tolerance, pest resis 0008. The present invention pertains to transgenic plants, tance, herbicide tolerance, improved seed compositions, and and methods for producing the transgenic plant, that have the like are desired by both farmers and consumers. desirable characteristics relative to wild-type or control Although considerable efforts in plant breeding have pro plants. The desirable characteristics in the transgenic plants, vided significant gains in desired traits, the ability to intro which have been transformed with a sequence that is closely duce specific DNA into plant genomes provides further or phylogenetically related to G47, polynucleotide SEQ ID opportunities for generation of plants with improved and/or NO: 65 and polypeptide SEQ ID NO: 66, include increased unique traits. Fortunately, a plant's traits, such as its bio size and/or biomass, tolerance to osmotic stress or drought, chemical, developmental, or phenotypic characteristics, may and/or increased lignification. The transgenic plants may be controlled through a number of cellular processes. One also be delayed in their flowering, relative to a control or important way to manipulate that control is through tran wild-type plant of the same species. The transgenic plants Scription factors—proteins that influence the expression of a are made by first producing an expression vector that particular gene or sets of genes. Transformed and transgenic comprises a nucleotide sequence encoding a polypeptide plants that comprise cells having altered levels of at least one with a conserved domain, said domain having at least 69%, selected transcription factor, for example, possess advanta or at least 73%, or at least 80%, or at least 87% amino acid geous or desirable traits. Strategies for manipulating traits identity to the conserved domain of G47 (amino acid coor by altering a plant cells transcription factor content can dinates 11-80 of G47 or SEQ ID NO: 66). The expression therefore result in plants and crops with commercially vector is next introduced into a suitable target plant, and the valuable properties. Polynucleotides encoding transcription polypeptide is overexpressed in this now transgenic plant. factors have been identified, transformed into transgenic plants, and the plants have been analyzed for a variety of This results in the transgenic plant having increased size important improved traits. In so doing, important polynucle and/or biomass, tolerance to the osmotic stress or drought, otide and polypeptide sequences for producing commer delayed flowering, and/or increased lignification. cially valuable plants and crops as well as the methods for 0009 Methods for increasing plant size and/or biomass, making and using them were identified. In some cases, increasing osmotic stress or drought tolerance of a plant, because of epigenetic effects, positional effects, or the like, increasing lignin content, or causing a delay in development introducing recombinant DNA into a plant genome does not or flowering are also encompassed by the invention. US 2006/0195944 A1 Aug. 31, 2006

BRIEF DESCRIPTION OF THE SEQUENCE 0017 FIG. 4 shows a Clustal W alignment of the AP2 LISTING, TABLES, AND FIGURES domains of the G47 clade and other representative AP2 0010. The Sequence Listing provides exemplary poly proteins. The three residues indicated by the boxes define the nucleotide and polypeptide sequences of the invention. The G47 clade; clade members (indicated by the vertical line at traits associated with the use of the sequences are included left) have a two valines and a histidine residue at these in the Examples. positions, respectively. 0011. This application contains a Sequence Listing. CD 0018 FIG. 5 shows the conserved domain of G47 (SEQ ROMs Copy 1 and Copy 2, and the CRF copy of the ID NO: 66) aligned against the conserved domains of Sequence Listing under CFR Section 1.821 (e), are read Arabidopsis paralog sequence G2133 (SEQID NO: 152; 62 only memory computer-readable compact discs. Each con of 71 or 87% identical residues) and three orthologs, soy tains a copy of the Sequence Listing in ASCII text format. G3643 (SEQ ID NO: 158; 45 of 65 or 69% of residues are The Sequence Listing is named “MBI-0036-3CIPST25.txt”, identical), rice G3649 (SEQ ID NO: 154; 35 of 44 or 80% the electronic file of the Sequence Listing contained on each of residues identical) and rice G3644 (SEQID NO: 156; 35 of these CD-ROMs was created on Mar. 13, 2006, and is 516 of 48 or 73% of residues identical). Alignments and per kilobytes in size. The copies of the Sequence Listing on the centage identity were determined from BLASTp analysis in CD-ROM discs are hereby incorporated by reference in their which the conserved domain of G47, amino acid coordinates entirety. 11-80, were queried against a database containing the G47 homologs, with default settings of a wordlength (W) of 3, an 0012 FIG. 1 shows a phylogenic tree of related plant expectation (E) of 10, and the BLOSUM62 scoring matrix families adapted from Daly et al. (2001 Plant Physiology (see Henikoff & Henikoff (1989) supra). 127: 1328-1333). 0.019 FIGS. 6A-6C show Arabidopsis G47, SEQ ID 0013 FIG. 2 shows a phylogenic dendogram depicting NO: 66 (FIG. 6A, plant at left), soy G3649, SEQ ID NO: phylogenetic relationships of higher plant taxa, including 154 (FIG. 6B, plants at left and center), and rice G3643, clades containing tomato and Arabidopsis; adapted from Ku SEQ ID NO: 158 (FIG. 6C, plants at left and center) et al. (2000) Proc. Natl. Acad. Sci. USA 97: 9121-9126; and overexpressors at 58, 44, and 33 days after planting, respec Chase et al. (1993) Ann. Missouri Bot. Gard. 80: 528-580. tively. The overexpressors generally developed later, and 0014 FIG. 3 shows a phylogenetic tree and multiple Some lines had larger rosettes and an increased amount of sequence alignments of G47 and related full length proteins vegetative tissue compared to the control plants at the right were constructed using ClustalW (CLUSTAL W Multiple of each photograph. Sequence Alignment Program version 1.83, 2003) and 0020 FIGS. 7A-7B compare seedlings ectopically MEGA2 (http://www.megaSoftware.net) software. expressing rice sequence G3644, SEQ ID NO: 156 (FIG. Sequences closely related to G47, SEQ ID NO: 66, fall 7A) and wild-type seedling controls. The 35S::G3644 seed within the G47 clade and descend from a common ancestral lings (FIG. 7A) were generally larger and greener after sequence represented by the arrow at an ancestral node of germination in a 150 mM NaCl than the wild-type control the tree. These phylogenetically-related sequences within seedlings exposed to the same conditions (FIG. 7B). The the G47 clade that have thus far been shown to have a small pale seedlings in FIG. 7A represent wild-type seg transcriptional regulatory activity of G47 by conferring regants, based on kanamycin resistance segregation data similar morphological and physiological characteristics from the same population. have conserved domains that are at least 69% identical to the conserved domain of G47 (amino acid coordinates 11-80). 0021. As shown in FIGS. 8A-8B, seedlings ectopically The percentage identity was determined by BLASTp analy expressing rice sequence G3649, SEQ ID NO: 154 (FIG. sis against a database containing G47 homologs, with 8A) were generally larger and greener after germination in default settings of a wordlength (W) of 3, an expectation (E) a medium containing 0.3 uMabscisic acid than the wild-type of 10, and the BLOSUM62 scoring matrix (see Henikoff & control seedlings exposed to the same conditions (FIG. 8B). Henikoff (1989) Proc. Natl. Acad. Sci. USA 89: 10915 0022 FIG. 9 illustrates a dramatic example of osmotic 10919). ClustalW multiple alignment parameters for FIG. 3 stress tolerance. Seedlings overexpressing Arabidopsis were as follows: G2133, SEQ ID NO: 152, in the pot at the left were 0.015 Gap Opening Penalty: 10.00; Gap Extension significantly greener and more vigorous than the wild-type Penalty: 0.20; Delay divergent sequences: 30%; DNA control seedlings, seen at right, after both sets of plants had Transitions Weight: 0.50; Protein weight matrix: Gon been exposed to the same severe drought conditions and net series; DNA weight matrix: IUB: Use negative rewatered. The overexpressors readily recovered from the matrix: OFF. severe treatment after resumption of watering, whereas the few control plants at right that survived had been severely 0016 A FastA formatted alignment was then used to and adversely affected by the drought treatment. generate a phylogenetic tree in MEGA2 using the neighbor joining algorithm and a p-distance model. A DETAILED DESCRIPTION OF EXEMPLARY test of phylogeny was done via bootstrap with 1000 EMBODIMENTS replications and Random Seed set to default. Cut off values of the bootstrap tree were set to 50%. Orthologs 0023 The present invention relates to polynucleotides of G47 are considered as being those proteins within and polypeptides for modifying phenotypes of plants, par the node of the tree below with a bootstrap value of 93, ticularly those associated with increased biomass, increased bounded by G3644 and G47, as indicated by the disease resistance, and/or abiotic stress tolerance. Through sequences within the box. out this disclosure, various information sources are referred US 2006/0195944 A1 Aug. 31, 2006 to and/or are specifically incorporated. The information encodes a transcription factor polypeptide, which may be Sources include Scientific journal articles, patent documents, functional or require processing to function as an initiator of textbooks, and World Wide Web browser-inactive page transcription. addresses. While the reference to these information sources 0028 Operationally, genes may be defined by the cis clearly indicates that they can be used by one of skill in the trans test, a genetic test that determines whether two muta art, each and every one of the information sources cited tions occur in the same gene and that may be used to herein are specifically incorporated in their entirety, whether determine the limits of the genetically active unit (Rieger et or not a specific mention of “incorporation by reference' is al. (1976)). A gene generally includes regions preceding noted. The contents and teachings of each and every one of (“leaders'; upstream) and following (“trailers’: down the information Sources can be relied on and used to make stream) the coding region. A gene may also include inter and use embodiments of the invention. vening, non-coding sequences, referred to as “introns'. 0024. As used herein and in the appended claims, the located between individual coding segments, referred to as singular forms “a”, “an', and “the include the plural “exons'. Most genes have an associated promoter region, a reference unless the context clearly dictates otherwise. Thus, regulatory sequence 5' of the transcription initiation codon for example, a reference to “a host cell includes a plurality (there are some genes that do not have an identifiable of such host cells, and a reference to “a stress” is a reference promoter). The function of a gene may also be regulated by to one or more stresses and equivalents thereof known to enhancers, operators, and other regulatory elements. those skilled in the art, and so forth. 0029. A “recombinant polynucleotide' is a polynucle otide that is not in its native state, e.g., the polynucleotide Definitions comprises a nucleotide sequence not found in nature, or the 0.025 "Nucleic acid molecule” refers to an oligonucle polynucleotide is in a context other than that in which it is otide, polynucleotide or any fragment thereof. It may be naturally found, e.g., separated from nucleotide sequences DNA or RNA of genomic or synthetic origin, double with which it typically is in proximity in nature, or adjacent Stranded or single-stranded, and combined with carbohy (or contiguous with) nucleotide sequences with which it drate, lipids, protein, or other materials to perform a par typically is not in proximity. For example, the sequence at ticular activity Such as transformation or form a useful issue can be cloned into a vector, or otherwise recombined composition Such as a peptide nucleic acid (PNA). with one or more additional nucleic acid. 0026 “Polynucleotide' is a nucleic acid molecule com 0030. An "isolated polynucleotide' is a polynucleotide, prising a plurality of polymerized nucleotides, e.g., at least whether naturally occurring or recombinant, that is present about 15 consecutive polymerized nucleotides. A polynucle outside the cell in which it is typically found in nature, otide may be a nucleic acid, oligonucleotide, nucleotide, or whether purified or not. Optionally, an isolated polynucle any fragment thereof. In many instances, a polynucleotide otide is subject to one or more enrichment or purification comprises a nucleotide sequence encoding a polypeptide (or procedures, e.g., cell lysis, extraction, centrifugation, pre protein) or a domain or fragment thereof. Additionally, the cipitation, or the like. polynucleotide may comprise a promoter, an intron, an 0031. A “polypeptide' is an amino acid sequence com enhancer region, a polyadenylation site, a translation initia prising a plurality of consecutive polymerized amino acid tion site, 5' or 3' untranslated regions, a reporter gene, a residues e.g., at least about 15 consecutive polymerized selectable marker, or the like. The polynucleotide can be amino acid residues. In many instances, a polypeptide single-stranded or double-stranded DNA or RNA. The poly comprises a polymerized amino acid residue sequence that nucleotide optionally comprises modified bases or a modi is a transcription factor or a domain or portion or fragment fied backbone. The polynucleotide can be, e.g., genomic thereof. Additionally, the polypeptide may comprise: (i) a DNA or RNA, a transcript (such as an mRNA), a cDNA, a localization domain; (ii) an activation domain; (iii) a repres PCR product, a cloned DNA, a synthetic DNA or RNA, or sion domain; (iv) an oligomerization domain; (v) a DNA the like. The polynucleotide can be combined with carbo binding domain; or the like. The polypeptide optionally hydrate, lipids, protein, or other materials to perform a comprises modified amino acid residues, naturally occurring particular activity Such as transformation or form a useful amino acid residues not encoded by a codon, non-naturally composition such as a peptide nucleic acid (PNA). The occurring amino acid residues. polynucleotide can comprise a sequence in either sense or antisense orientations. “Oligonucleotide' is substantially 0032 “Protein’ refers to an amino acid sequence, oli equivalent to the terms amplimer, primer, oligomer, element, gopeptide, peptide, polypeptide or portions thereof whether target, and probe and is preferably single-stranded. naturally occurring or synthetic. 0027 “Gene' or “gene sequence” refers to the partial or 0033 “Portion', as used herein, refers to any part of a complete coding sequence of a gene, its complement, and its protein used for any purpose, but especially for the screening 5' or 3' untranslated regions. A gene is also a functional unit of a library of molecules which specifically bind to that of inheritance, and in physical terms is a particular segment portion or for the production of antibodies. or sequence of nucleotides along a molecule of DNA (or 0034. A “recombinant polypeptide' is a polypeptide pro RNA, in the case of RNA viruses) involved in producing a duced by translation of a recombinant polynucleotide. A polypeptide chain. The latter may be subjected to Subsequent “synthetic polypeptide' is a polypeptide created by consecu processing Such as chemical modification or folding to tive polymerization of isolated amino acid residues using obtain a functional protein or polypeptide. A gene may be methods well known in the art. An "isolated polypeptide.” isolated, partially isolated, or found with an organism’s whether a naturally occurring or a recombinant polypeptide, genome. By way of example, a transcription factor gene is more enriched in (or out of) a cell than the polypeptide in US 2006/0195944 A1 Aug. 31, 2006

its natural state in a wild-type cell, e.g., more than about 5% erably at least nine base pairs (bp) in length. A conserved enriched, more than about 10% enriched, or more than about domain (for example, a DNA binding domain) with respect 20%, or more than about 50%, or more, enriched, i.e., to presently disclosed polypeptides refers to a domain that alternatively denoted: 105%, 110%, 120%, 150% or more, exhibits at least about 38% sequence identity, or at least enriched relative to wild type standardized at 100%. Such an about 55% sequence identity, or at least about 62% sequence enrichment is not the result of a natural response of a identity, or at least about 69%, or at least about 70%, or at wild-type plant. Alternatively, or additionally, the isolated least about 73%, or at least about 76%, or at least about 78%, polypeptide is separated from other cellular components or at least about 80%, or at least about 82%, or at least about with which it is typically associated, e.g., by any of the 85%, or at least about 87%, or at least about 89%, or at least various protein purification methods herein. about 95%, amino acid residue sequence identity, to a conserved domain of a polypeptide of the invention. 0035) “Homology” refers to sequence similarity between Sequences that possess or encode for conserved domains a reference sequence and at least a fragment of a newly that meet these criteria of percentage identity, and may have sequenced clone insert or its encoded amino acid sequence. comparable biological activity to the present transcription 0036) “Identity” or “similarity” refers to sequence simi factor sequences. This is particularly true for sequences that larity between two polynucleotide sequences or between two derive from a common ancestral sequence that had the same polypeptide sequences, with identity being a more strict or similar function, and for which the function has been comparison. The phrases “percent identity” and “96 identity” retained. These sequences, which are closely and phyloge refer to the percentage of sequence similarity found in a netically related, being members of a particular clade of comparison of two or more polynucleotide sequences or two transcription factor polypeptides, are encompassed by the or more polypeptide sequences. “Sequence similarity” refers invention. A fragment or domain can be referred to as to the percent similarity in base pair sequence (as deter outside a conserved domain, outside a consensus sequence, mined by any suitable method) between two or more poly or outside a consensus DNA-binding site that is known to nucleotide sequences. Two or more sequences can be any exist or that exists for a particular transcription factor class, where from 0-100% similar, or any integer value family, or Sub-family. In this case, the fragment or domain therebetween. Identity or similarity can be determined by will not include the exact amino acids of a consensus comparing a position in each sequence that may be aligned sequence or consensus DNA-binding site of a transcription for purposes of comparison. When a position in the com factor class, family or Sub-family, or the exact amino acids pared sequence is occupied by the same nucleotide base or of a particular transcription factor consensus sequence or amino acid, then the molecules are identical at that position. consensus DNA-binding site. Furthermore, a particular frag A degree of similarity or identity between polynucleotide ment, region, or domain of a polypeptide, or a polynucle sequences is a function of the number of identical, matching otide encoding a polypeptide, can be “outside a conserved or corresponding nucleotides at positions shared by the domain if all the amino acids of the fragment, region, or polynucleotide sequences. A degree of identity of polypep domain fall outside of a defined conserved domain(s) for a tide sequences is a function of the number of identical amino polypeptide or protein. Sequences having lesser degrees of acids at corresponding positions shared by the polypeptide identity but comparable biological activity are considered to sequences. A degree of homology or similarity of polypep be equivalents. tide sequences is a function of the number of amino acids at 0039. As one of ordinary skill in the art recognizes, corresponding positions shared by the polypeptide conserved domains may be identified as regions or domains Sequences. of identity to a specific consensus sequence (see, for 0037 “Alignment” refers to a number of nucleotide bases example, Riechmann et al. (2000) Science 290: 2105-2110, or amino acid residue sequences aligned by lengthwise Riechmann et al. (2000b) Curr. Opin. Plant Biol. 3: 423 comparison so that components in common (i.e., nucleotide 434). Thus, by using alignment methods well known in the bases or amino acid residues at corresponding positions) art, the conserved domains of the plant transcription factors, may be visually and readily identified. The fraction or for example, for the AT-hook proteins (Reeves and Becker percentage of components in common is related to the bauer (2001) Biochim. Biophys. Acta 1519: 13-29; and homology or identity between the sequences. Alignments Reeves (2001) Gene 277: 63-81), may be determined. such as those of FIG. 4 or FIG. 5 may be used to identify 0040. The conserved domains for many of the transcrip conserved domains and relatedness within these domains. tion factor sequences of the invention are listed in Table 4. An alignment may suitably be determined by means of A comparison of the regions of these polypeptides allows computer programs known in the art, such as MACVEC one of skill in the art (see, for example, Reeves and Nissen TOR software (1999) (Accelrys, Inc., San Diego, Calif.). (1995) Prog. Cell Cycle Res. 1: 339-349) to identify 0038 A “conserved domain or “conserved region” as domains or conserved domains for any of the polypeptides used herein refers to a region in heterologous polynucleotide listed or referred to in this disclosure. or polypeptide sequences where there is a relatively high 0041) “Complementary refers to the natural hydrogen degree of sequence identity between the distinct sequences. bonding by base pairing between purines and pyrimidines. For example, an “AT-hook” domain', such as is found in a For example, the sequence A-C-G-T (5'->3') forms hydrogen polypeptide member of AT-hook transcription factor family, bonds with its complements A-C-G-T (5'->3') or A-C-G-U is an example of a conserved domain. An “AP2’ domain’. (5'->3'). Two single-stranded molecules may be considered Such as is found in a polypeptide member of AP2 transcrip partially complementary, if only some of the nucleotides tion factor family, is another example of a conserved bond, or “completely complementary’ if all of the nucle domain. With respect to polynucleotides encoding presently otides bond. The degree of complementarity between disclosed transcription factors, a conserved domain is pref nucleic acid strands affects the efficiency and strength of US 2006/0195944 A1 Aug. 31, 2006 hybridization and amplification reactions. “Fully comple 0046. In general, the term “variant” refers to molecules mentary” refers to the case where bonding occurs between with some differences, generated synthetically or naturally, every base pair and its complement in a pair of sequences, in their base or amino acid sequences as compared to a and the two sequences have the same number of nucleotides. reference (native) polynucleotide or polypeptide, respec 0042. The terms “highly stringent” or “highly stringent tively. These differences include substitutions, insertions, condition” refer to conditions that permit hybridization of deletions or any desired combinations of Such changes in a DNA strands whose sequences are highly complementary, native polynucleotide of amino acid sequence. wherein these same conditions exclude hybridization of 0047. With regard to polynucleotide variants, differences significantly mismatched DNAS. Polynucleotide sequences between presently disclosed polynucleotides and polynucle capable of hybridizing under stringent conditions with the otide variants are limited so that the nucleotide sequences of polynucleotides of the present invention may be, for the former and the latter are closely similar overall and, in example, variants of the disclosed polynucleotide many regions, identical. Due to the degeneracy of the sequences, including allelic or splice variants, or sequences genetic code, differences between the former and latter that encode orthologs or paralogs of presently disclosed nucleotide sequences may be silent (i.e., the amino acids polypeptides. Nucleic acid hybridization methods are dis encoded by the polynucleotide are the same, and the variant closed in detail by Kashima et al. (1985) Nature 313: polynucleotide sequence encodes the same amino acid 402-404, Sambrook et al. (1989) Molecular Cloning A sequence as the presently disclosed polynucleotide. Variant Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor nucleotide sequences may encode different amino acid Laboratory, Cold Spring Harbor, N.Y., 1989 (“Sambrook”), sequences, in which case such nucleotide differences will and by Haymes et al. (1985) Nucleic Acid Hybridization: A result in amino acid substitutions, additions, deletions, inser Practical Approach, IRL Press, Washington, D.C., which tions, truncations or fusions with respect to the similar references are incorporated herein by reference. disclosed polynucleotide sequences. These variations may 0043. In general, stringency is determined by the tem result in polynucleotide variants encoding polypeptides that perature, ionic strength, and concentration of denaturing share at least one functional characteristic. The degeneracy agents (e.g., formamide) used in a hybridization and wash of the genetic code also dictates that many different variant ing procedure (for a more detailed description of establish polynucleotides can encode identical and/or substantially ing and determining stringency, see the section “Identifying similar polypeptides in addition to those sequences illus Polynucleotides or Nucleic Acids by Hybridization'. trated in the Sequence Listing. below). The degree to which two nucleic acids hybridize under various conditions of stringency is correlated with the 0048 Also within the scope of the invention is a variant extent of their similarity. Thus, similar nucleic acid of a transcription factor nucleic acid listed in the Sequence sequences from a variety of Sources, such as within a plants Listing, that is, one having a sequence that differs from the genome (as in the case of paralogs) or from another plant (as one of the polynucleotide sequences in the Sequence List in the case of orthologs) that may perform similar functions ing, or a complementary sequence, that encodes a function can be isolated on the basis of their ability to hybridize with ally equivalent polypeptide (i.e., a polypeptide having some known transcription factor sequences. Numerous variations degree of equivalent or similar biological activity) but are possible in the conditions and means by which nucleic differs in sequence from the sequence in the Sequence acid hybridization can be performed to isolate transcription Listing, due to degeneracy in the genetic code. Included factor sequences having similarity to transcription factor within this definition are polymorphisms that may or may sequences known in the art and are not limited to those not be readily detectable using a particular oligonucleotide explicitly disclosed herein. Such an approach may be used probe of the polynucleotide encoding polypeptide, and to isolate polynucleotide sequences having various degrees improper or unexpected hybridization to allelic variants, of similarity with disclosed transcription factor sequences, with a locus other than the normal chromosomal locus for Such as, for example, encoded transcription factors having the polynucleotide sequence encoding polypeptide. 38% or greater identity with the conserved domain of 0049) “Allelic variant” or “polynucleotide allelic variant” disclosed transcription factors. refers to any of two or more alternative forms of a gene 0044) The terms “paralog and “ortholog” are defined occupying the same chromosomal locus. Allelic variation below in the section entitled “Orthologs and Paralogs”. In arises naturally through mutation, and may result in pheno brief, orthologs and paralogs are evolutionarily related genes typic polymorphism within populations. Gene mutations that have similar sequences and functions. Orthologs are may be 'silent or may encode polypeptides having altered structurally related genes in different species that are derived amino acid sequence. “Allelic variant' and “polypeptide by a speciation event. Paralogs are structurally related genes allelic variant may also be used with respect to polypep within a single species that are derived by a duplication tides, and in this case the terms refer to a polypeptide event. encoded by an allelic variant of a gene. 0045. The term “equivalog describes members of a set 0050 “Splice variant” or “polynucleotide splice variant” of homologous proteins that are conserved with respect to as used herein refers to alternative forms of RNA transcribed function since their last common ancestor. Related proteins from a gene. Splice variation naturally occurs as a result of are grouped into equivalog families, and otherwise into alternative sites being spliced within a single transcribed protein families with other hierarchically defined homology RNA molecule or between separately transcribed RNA types. This definition is provided at the Institute for molecules, and may result in several different forms of Genomic Research (TIGR) World Wide Web (www) web mRNA transcribed from the same gene. Thus, splice variants site, “tigr.org under the heading “Terms associated with may encode polypeptides having different amino acid TIGRFAMs. sequences, which may or may not have similar functions in US 2006/0195944 A1 Aug. 31, 2006

the organism. “Splice variant' or “polypeptide splice vari comprise a region that encodes an conserved domain of a ant’ may also refer to a polypeptide encoded by a splice transcription factor. Exemplary fragments also include frag variant of a transcribed mRNA. ments that comprise a conserved domain of a transcription 0051. As used herein, "polynucleotide variants' may also factor. Exemplary fragments include fragments that com refer to polynucleotide sequences that encode paralogs and prise an conserved domain of a transcription factor, for orthologs of the presently disclosed polypeptide sequences. example, amino acid residues 11-80 of G47 (SEQ ID NO: “Polypeptide variants' may refer to polypeptide sequences 66). that are paralogs and orthologs of the presently disclosed 0054 Fragments may also include subsequences of polypeptide sequences. polypeptides and protein molecules, or a Subsequence of the polypeptide. Fragments may have uses in that they may have 0.052. Differences between presently disclosed polypep antigenic potential. In some cases, the fragment or domain tides and polypeptide variants are limited so that the is a Subsequence of the polypeptide which performs at least sequences of the former and the latter are closely similar one biological function of the intact polypeptide in Substan overall and, in many regions, identical. Presently disclosed tially the same manner, or to a similar extent, as does the polypeptide sequences and similar polypeptide variants may intact polypeptide. For example, a polypeptide fragment can differ in amino acid sequence by one or more Substitutions, comprise a recognizable structural motif or functional additions, deletions, fusions and truncations, which may be domain such as a DNA-binding site or domain that binds to present in any combination. These differences may produce a DNA promoter region, an activation domain, or a domain silent changes and result in a functionally equivalent tran for protein-protein interactions, and may initiate transcrip scription factor. Thus, it will be readily appreciated by those tion. Fragments can vary in size from as few as 3 amino acid of skill in the art, that any of a variety of polynucleotide residues to the full length of the intact polypeptide, but are sequences is capable of encoding the transcription factors preferably at least about 30 amino acid residues in length and transcription factor homolog polypeptides of the inven and more preferably at least about 60 amino acid residues in tion. A polypeptide sequence variant may have “conserva length. tive changes, wherein a Substituted amino acid has similar structural or chemical properties. Deliberate amino acid 0055. The invention also encompasses production of substitutions may thus be made on the basis of similarity in DNA sequences that encode transcription factors and tran polarity, charge, Solubility, hydrophobicity, hydrophilicity, scription factor derivatives, or fragments thereof, entirely by and/or the amphipathic nature of the residues, as long as a synthetic chemistry. After production, the synthetic significant amount of the functional or biological activity of sequence may be inserted into any of the many available the transcription factor is retained. For example, negatively expression vectors and cell systems using reagents well charged amino acids may include aspartic acid and glutamic known in the art. Moreover, synthetic chemistry may be acid, positively charged amino acids may include lysine and used to introduce mutations into a sequence encoding tran arginine, and amino acids with uncharged polar head groups Scription factors or any fragment thereof. having similar hydrophilicity values may include leucine, isoleucine, and valine; glycine and alanine; asparagine and 0056 “Derivative” refers to the chemical modification of glutamine; serine and threonine; and phenylalanine and a nucleic acid molecule or amino acid sequence. Chemical tyrosine. More rarely, a variant may have “non-conserva modifications can include replacement of hydrogen by an tive changes, e.g., replacement of a glycine with a tryp alkyl, acyl, or amino group or glycosylation, pegylation, or tophan. Similar minor variations may also include amino any similar process that retains or enhances biological acid deletions or insertions, or both. Related polypeptides activity or lifespan of the molecule or sequence. may comprise, for example, additions and/or deletions of 0057 The term “plant” includes whole plants, shoot one or more N-linked or O-linked glycosylation sites, or an vegetative organs/structures (for example, leaves, stems and addition and/or a deletion of one or more cysteine residues. tubers), roots, flowers and floral organs/structures (for Guidance in determining which and how many amino acid example, bracts, sepals, petals, stamens, carpels, anthers and residues may be substituted, inserted or deleted without ovules), seed (including embryo, endosperm, and seed coat) abolishing functional or biological activity may be found and fruit (the mature ovary), plant tissue (for example, using computer programs well known in the art, for vascular tissue, ground tissue, and the like) and cells (for example, DNASTAR software (see U.S. Pat. No. 5,840, example, guard cells, egg cells, and the like), and progeny of 544). same. The class of plants that can be used in the method of the invention is generally as broad as the class of higher and 0053 “Fragment', with respect to a polynucleotide, lower plants amenable to transformation techniques, includ refers to a clone or any part of a polynucleotide molecule ing angiosperms (monocotyledonous and dicotyledonous that retains a usable, functional characteristic. Useful frag ments include oligonucleotides and polynucleotides that plants), gymnosperms, ferns, horsetails, psilophytes, lyco may be used in hybridization or amplification technologies phytes, bryophytes, and multicellular algae (see for or in the regulation of replication, transcription or transla example, FIG. 1, adapted from Daly et al. (2001) supra, tion. A polynucleotide fragment” refers to any Subsequence FIG. 2, adapted from Ku et al. (2000) supra; and see also of a polynucleotide, typically, of at least about 9 consecutive Tudge (2000) in The Variety of Life, Oxford University nucleotides, preferably at least about 30 nucleotides, more Press, New York, N.Y. pp. 547-606. preferably at least about 50 nucleotides, of any of the 0058. A “control plant’ as used in the present invention sequences provided herein. Exemplary polynucleotide frag refers to a plant cell, seed, plant component, plant tissue, ments are the first sixty consecutive nucleotides of the plant organ or whole plant used to compare against trans transcription factor polynucleotides listed in the Sequence genic or genetically modified plant for the purpose of Listing. Exemplary fragments also include fragments that identifying an enhanced phenotype in the transgenic or US 2006/0195944 A1 Aug. 31, 2006

genetically modified plant. A control plant may in some limited to, an enhance agronomic trait characterized by cases be a transgenic plant line that comprises an empty enhanced plant morphology, physiology, growth and devel vector or marker gene, but does not contain the recombinant opment, yield, nutritional enhancement, disease or pest polynucleotide of the present invention that is expressed in resistance, or environmental or chemical tolerance. In more the transgenic or genetically modified plant being evaluated. specific aspects of this invention enhanced trait is selected In general, a control plant is a plant of the same line or from group of enhanced traits consisting of enhanced water variety as the transgenic or genetically modified plant being use efficiency, enhanced cold tolerance, increased yield, tested. A Suitable control plant would include a genetically enhanced nitrogen use efficiency, enhanced seed protein and unaltered or non-transgenic plant of the parental line used to enhanced seed oil. In an important aspect of the invention generate a transgenic plant herein. the enhanced trait is enhanced yield including increased yield under non-stress conditions and increased yield under 0059 A “transgenic plant” refers to a plant that contains environmental stress conditions. Stress conditions may genetic material not found in a wild-type plant of the same include, for example, drought, shade, fungal disease, viral species, variety or cultivar. The genetic material may include disease, bacterial disease, insect infestation, nematode infes a transgene, an insertional mutagenesis event (such as by tation, cold temperature exposure, heat exposure, osmotic transposon or T-DNA insertional mutagenesis), an activation stress, reduced nitrogen nutrient availability, reduced phos tagging sequence, a mutated sequence, a homologous phorus nutrient availability and high plant density. “Yield recombination event or a sequence modified by chimera can be affected by many properties including without limi plasty. Typically, the foreign genetic material has been tation, plant height, pod number, pod position on the plant, introduced into the plant by human manipulation, but any number of internodes, incidence of pod shatter, grain size, method can be used as one of skill in the art recognizes. efficiency of nodulation and nitrogen fixation, efficiency of 0060 A transgenic plant may contain an expression vec nutrient assimilation, resistance to biotic and abiotic stress, tor or cassette. The expression cassette typically comprises carbon assimilation, plant architecture, resistance to lodging, a polypeptide-encoding sequence operably linked (i.e., percent seed germination, seedling vigor, and juvenile traits. under regulatory control of) to appropriate inducible or Yield can also affected by efficiency of germination (includ constitutive regulatory sequences that allow for the con ing germination in stressed conditions), growth rate (includ trolled expression of polypeptide. The expression cassette ing growth rate in stressed conditions), ear number, seed can be introduced into a plant by transformation or by number per ear, seed size, composition of seed (starch, oil, breeding after transformation of a parent plant. A plant refers protein) and characteristics of seed fill. to a whole plant as well as to a plant part, Such as seed, fruit, 0064. Increased yield of a transgenic plant of the present leaf, or root, plant tissue, plant cells or any other plant invention can be measured in a number of ways, including material, e.g., a plant explant, as well as to progeny thereof, plant Volume, plant biomass, test weight, seed number per and to in vitro systems that mimic biochemical or cellular plant, seed weight, seed number per unit area (i.e. seeds, or components or processes in a cell. weight of seeds, per acre), bushels per acre (bu?a), tonnes per 0061 “Wild type' or “wild-type', as used herein, refers acre, tons per acre, and/or kilo per hectare. For example, to a plant cell, seed, plant component, plant tissue, plant maize yield may be measured as production of shelled corn organ or whole plant that has not been genetically modified kernels per unit of production area, for example in bushels or treated in an experimental sense. Wild-type cells, seed, per acre or metric tons per hectare, often reported on a components, tissue, organs or whole plants may be used as moisture adjusted basis, for example at 15.5 percent mois controls to compare levels of expression and the extent and ture. Increased yield may result from improved utilization of nature of trait modification with cells, tissue or plants of the key biochemical compounds, Such as nitrogen, phosphorous same species in which a transcription factor expression is and carbohydrate, or from improved responses to environ altered, e.g., in that it has been knocked out, overexpressed, mental stresses, such as cold, heat, drought, salt, and attack or ectopically expressed. by pests or pathogens. Recombinant DNA used in this invention can also be used to provide plants having 0062. A “trait” refers to a physiological, morphological, improved growth and development, and ultimately increased biochemical, or physical characteristic of a plant or particu yield, as the result of modified expression of plant growth lar plant material or cell. In some instances, this character regulators or modification of cell cycle or photosynthesis istic is visible to the human eye, such as seed or plant size, pathways. Also of interest is the generation of transgenic or can be measured by biochemical techniques, such as plants that demonstrate enhanced yield with respect to a seed detecting the protein, starch, or oil content of seed or leaves, component that may or may not correspond to an increase in or by observation of a metabolic or physiological process, overall plant yield. Such properties include enhancements in e.g. by measuring tolerance to water deprivation or particu seed oil, seed molecules such as tocopherol, protein and lar salt or Sugar concentrations, or by the observation of the starch, or oil particular oil components as may be manifest expression level of a gene or genes, e.g., by employing by an alteration in the ratios of seed components. Northern analysis, RT-PCR, microarray gene expression assays, or reporter gene expression systems, or by agricul 0065 “Trait modification” refers to a detectable differ ence in a characteristic in a plant ectopically expressing a tural observations such as hyperosmotic stress tolerance or polynucleotide or polypeptide of the present invention rela yield. Any technique can be used to measure the amount of tive to a plant not doing so. Such as a wild-type plant. In comparative level of, or difference in any selected chemical Some cases, the trait modification can be evaluated quanti compound or macromolecule in the transgenic plants, how tatively. For example, the trait modification can entail at eVe. least about a 2% increase or decrease, or an even greater 0063 As used herein an "enhanced trait” means a char difference, in an observed trait as compared with a control or acteristic of a transgenic plant that includes, but is not wild-type plant. It is known that there can be a natural US 2006/0195944 A1 Aug. 31, 2006

variation in the modified trait. Therefore, the trait modifi also refers to altered expression patterns that are produced cation observed entails a change of the normal distribution by lowering the levels of expression to below the detection and magnitude of the trait in the plants as compared to level or completely abolishing expression. The resulting control or wild-type plants. expression pattern can be transient or stable, constitutive or 0.066 When two or more plants have “similar morpholo inducible. In reference to a polypeptide, the term “ectopic gies”, “substantially similar morphologies”, “a morphology expression or altered expression further may relate to that is substantially similar, or are “morphologically simi altered activity levels resulting from the interactions of the lar, the plants have comparable forms or appearances, polypeptides with exogenous or endogenous modulators or including analogous features such as overall dimensions, from interactions with factors or as a result of the chemical height, width, mass, root mass, shape, glossiness, color, stem modification of the polypeptides. diameter, leaf size, leaf dimension, leaf density, internode 0071. The term “overexpression” as used herein refers to distance, branching, root branching, number and form of a greater expression level of a gene in a plant, plant cell or inflorescences, and other macroscopic characteristics, and plant tissue, compared to expression in a wild-type plant, the individual plants are not readily distinguishable based on cell or tissue, at any developmental or temporal stage for the morphological characteristics alone. gene. Overexpression can occur when, for example, the genes encoding one or more transcription factors are under 0067 “Modulates' refers to a change in activity (biologi the control of a strong promoter (e.g., the cauliflower mosaic cal, chemical, or immunological) or lifespan resulting from virus 35S transcription initiation region). Overexpression specific binding between a molecule and either a nucleic may also under the control of an inducible or tissue specific acid molecule or a protein. promoter. Thus, overexpression may occur throughout a 0068. The term “transcript profile” refers to the expres plant, in specific tissues of the plant, or in the presence or sion levels of a set of genes in a cell in a particular state, absence of particular environmental signals, depending on particularly by comparison with the expression levels of that the promoter used. same set of genes in a cell of the same type in a reference state. For example, the transcript profile of a particular 0072. Overexpression may take place in plant cells nor transcription factor in a Suspension cell is the expression mally lacking expression of polypeptides functionally levels of a set of genes in a cell knocking out or overex equivalent or identical to the present transcription factors. pressing that transcription factor compared with the expres Overexpression may also occur in plant cells where endog sion levels of that same set of genes in a suspension cell that enous expression of the present transcription factors or has normal levels of that transcription factor. The transcript functionally equivalent molecules normally occurs, but Such profile can be presented as a list of those genes whose normal expression is at a lower level. Overexpression thus expression level is significantly different between the two results in a greater than normal production, or "overproduc treatments, and the difference ratios. Differences and simi tion of the transcription factor in the plant, cell or tissue. larities between expression levels may also be evaluated and 0073. The term “transcription regulating region” refers to calculated using statistical and clustering methods. a DNA regulatory sequence that regulates expression of one or more genes in a plant when a transcription factor having 0069. With regard to transcription factor gene knockouts one or more specific binding domains binds to the DNA as used herein, the term “knockout” refers to a plant or plant regulatory sequence. Transcription factors of the present cell having a disruption in at least one transcription factor invention possess an conserved domain. The transcription gene in the plant or cell, where the disruption results in a factors of the invention also comprise an amino acid Sub reduced expression or activity of the transcription factor sequence that forms a transcription activation domain that encoded by that gene compared to a control cell. The regulates expression of one or more abiotic stress tolerance knockout can be the result of for example, genomic dis genes in a plant when the transcription factor binds to the ruptions, including transposons, tilling, and homologous regulating region. recombination, antisense constructs, sense constructs, RNA silencing constructs, or RNA interference. A T-DNA inser Traits which May be Modified tion within a transcription factor gene is an example of a genotypic alteration that may abolish expression of that 0074 Trait modifications of particular interest include transcription factor gene. those to seed (such as embryo or endosperm), fruit, root, flower, leaf stem, shoot, seedling or the like, including: 0070 "Ectopic expression or altered expression' in ref enhanced tolerance to environmental conditions including erence to a polynucleotide indicates that the pattern of freezing, chilling, heat, drought, water saturation, radiation expression in, e.g., a transgenic plant or plant tissue, is and oZone; improved tolerance to microbial, fungal or viral different from the expression pattern in a wild-type plant or diseases; improved tolerance to pest infestations, including a reference plant of the same species. The pattern of expres nematodes, mollicutes, parasitic higher plants or the like; sion may also be compared with a reference expression decreased herbicide sensitivity; improved tolerance of heavy pattern in a wild-type plant of the same species. For metals or enhanced ability to take up heavy metals; example, the polynucleotide or polypeptide is expressed in improved growth under poor photoconditions (e.g., low light a cell or tissue type other than a cell or tissue type in which and/or short day length), or changes in expression levels of the sequence is expressed in the wild-type plant, or by genes of interest. Other phenotype that can be modified expression at a time other than at the time the sequence is relate to the production of plant metabolites, such as varia expressed in the wild-type plant, or by a response to different tions in the production of taxol, tocopherol, tocotrienol, inducible agents, such as hormones or environmental sig sterols, phytosterols, vitamins, wax monomers, anti-oxi nals, or at different expression levels (either higher or lower) dants, amino acids, lignins, cellulose, tannins, prenyllipids compared with those found in a wild-type plant. The term (such as chlorophylls and carotenoids), glucosinolates, and US 2006/0195944 A1 Aug. 31, 2006 terpenoids, enhanced or compositionally altered protein or proteins and distinguish them from other members of the oil production (especially in seeds), or modified Sugar AP2/EREBP protein family. (See Jaglo et al. (2001) supra.) (insoluble or soluble) and/or starch composition. Physical plant characteristics that can be modified include cell devel Polypeptides and Polynucleotides of the Invention opment (such as the number of trichomes), fruit and seed 0079 The present invention provides, among other size and number, yields of plant parts Such as stems, leaves, things, transcription factors (TFS), and transcription factor inflorescences, and roots, the stability of the seeds during homologue polypeptides, and isolated or recombinant poly storage, characteristics of the seedpod (e.g., Susceptibility to nucleotides encoding the polypeptides, or novel variant shattering), root hair length and quantity, internode dis polypeptides or polynucleotides encoding novel variants of tances, or the quality of seed coat. Plant growth character transcription factors derived from the specific sequences istics that can be modified include growth rate, germination provided here. These polypeptides and polynucleotides may rate of seeds, vigor of plants and seedlings, leaf and flower be employed to modify a plant's characteristic. senescence, male sterility, apomixis, flowering time, flower 0080 Exemplary polynucleotides encoding the polypep abscission, rate of nitrogen uptake, osmotic sensitivity to tides of the invention were identified in the Arabidopsis soluble Sugar concentrations, biomass or transpiration char thaliana GenBank database using publicly available acteristics, as well as plant architecture characteristics Such sequence analysis programs and parameters. Sequences ini as apical dominance, branching patterns, number of organs, tially identified were then further characterized to identify organ identity, organ shape or size. sequences comprising specified sequence strings corre Transcription Factors Modify Expression of Endogenous sponding to sequence motifs present in families of known Genes transcription factors. In addition, further exemplary poly 0075 Expression of genes which encode transcription nucleotides encoding the polypeptides of the invention were factors that modify expression of endogenous genes, poly identified in the plant GenBank database using publicly nucleotides, and proteins are well known in the art. In available sequence analysis programs and parameters. addition, transgenic plants comprising isolated polynucle Sequences initially identified were then further character otides encoding transcription factors may also modify ized to identify sequences comprising specified sequence expression of endogenous genes, polynucleotides, and pro strings corresponding to sequence motifs present in families teins. Examples include Peng et al. (1997) Genes Develop. of known transcription factors. Polynucleotide sequences 11:3194-3205) and Peng et al. (1999) Nature 400: 256-261). meeting Such criteria were confirmed as transcription fac In addition, many others have demonstrated that an Arabi tOrS. dopsis transcription factor expressed in an exogenous plant 0081 Additional polynucleotides of the invention were species elicits the same or very similar phenotypic response. identified by Screening Arabidopsis thaliana and/or other See, for example, Fu et al. (2001) Plant Cell 13: 1791-1802); plant cDNA libraries with probes corresponding to known Nandi et al. (2000) Curr. Biol. 10: 215-218); Coupland transcription factors under low stringency hybridization (1995) Nature 377: 482483); and Weigel and Nilsson (1995) conditions. Additional sequences, including full length cod Nature 377: 482-500). ing sequences were Subsequently recovered by the rapid amplification of cDNA ends (RACE) procedure, using a 0076). In another example, Mandel et al. (1992) Cell commercially available kit according to the manufacturers 71-133-143) and Suzuki et al. (2001) Plant J. 28: 409–418) instructions. Where necessary, multiple rounds of RACE are teach that a transcription factor expressed in another plant performed to isolate 5' and 3' ends. The full length cDNA species elicits the same or very similar phenotypic response was then recovered by a routine end-to-end polymerase of the endogenous sequence, as often predicted in earlier chain reaction (PCR) using primers specific to the isolated 5' studies of Arabidopsis transcription factors in Arabidopsis and 3' ends. Exemplary sequences are provided in the (see Mandel et al. (1992) supra; Suzuki et al. (2001) supra). Sequence Listing. 0077. Other examples include Müller et al. (2001) Plant 0082 The polynucleotides of the invention can be or J. 28: 169-179); Kim et al. (2001) Plant J. 25: 247-259); were ectopically expressed in overexpressor or knockout Kyozuka and Shimamoto (2002) Plant Cell Physiol. 43: plants and the changes in the characteristic(s) or trait(s) of 130-135); Boss and Thomas (2002, Nature 416: 847-850); the plants observed. Therefore, the polynucleotides and He et al. (2000) Transgenic Res. 9: 223-227); and Robson et polypeptides can be employed to improve the characteristics al. (2001) Plant J. 28: 619-631). of plants. 0078. In yet another example, Gilmour et al. (1998) Plant J. 16: 433-442) teach an Arabidopsis AP2 transcription 0083. The polynucleotides of the invention can be or factor, CBF1, which, when overexpressed in transgenic were ectopically expressed in overexpressor plant cells and plants, increases plant freezing tolerance. Jaglo et al (2001) the changes in the expression levels of a number of genes, Plant Physiol. 127: 910-017) further identified sequences in polynucleotides, and/or proteins of the plant cells observed. Brassica napus which encode CBF-like genes and that Therefore, the polynucleotides and polypeptides can be transcripts for these genes accumulated rapidly in response employed to change expression levels of a genes, polynucle to low temperature. Transcripts encoding CBF-like proteins otides, and/or proteins of plants. were also found to accumulate rapidly in response to low 0084. The polynucleotide sequences of the invention temperature in wheat, as well as in tomato. An alignment of encode polypeptides that are members of well-known tran the CBF proteins from Arabidopsis, B. napus, wheat, rye, Scription factor families, including plant transcription factor and tomato revealed the presence of conserved amino acid families, as disclosed in Table 4. Generally, the transcription sequences, PKK/RPAGRXKFXETRHP and DSAWR, which factors encoded by the present sequences are involved in cell bracket the AP2/EREBP DNA binding domains of the differentiation and proliferation and the regulation of US 2006/0195944 A1 Aug. 31, 2006

growth. Accordingly, one skilled in the art would recognize (Wu et al. (1997) Plant Physiol. 114: 1421-1431); the that by expressing the present sequences in a plant, one may polycomb (PCOMB) family (Goodrich et al. (1997) Nature change the expression of autologous genes or induce the 386: 44-51); the teosinte branched (TEO) family (Luo et al. expression of introduced genes. By affecting the expression (1996) Nature 383: 794-799); the AB13 family (Giraudat et of similar autologous sequences in a plant that have the al. (1992) Plant Cell 4: 1251-1261); the triple helix (TH) biological activity of the present sequences, or by introduc family (Dehesh et al. (1990) Science 250: 1397-1399); the ing the present sequences into a plant, one may alter a plants EIL family (Chao et al. (1997) Cell 89: 113344); the phenotype to one with improved traits. The sequences of the AT-HOOK family (Reeves and Nissen (1990).J. Biol. Chem. invention may also be used to transform a plant and intro 265: 8573-8582); the SIFA family (Zhou et al. (1995) duce desirable traits not found in the wild-type cultivar or Nucleic Acids Res. 23: 1165-1169); the bZIPT2 family (Lu strain. Plants may then be selected for those that produce the and Ferl (1995) Plant Physiol. 109: 723); the YABBY family most desirable degree of over- or under-expression of target (Bowman et al. (1999) Development 126:2387-96); the PAZ genes of interest and coincident trait improvement. family (Bohmert et al. (1998) EMBO J. 17: 170-80); a 0085. The sequences of the present invention may be family of miscellaneous (MISC) transcription factors from any species, particularly plant species, in a naturally including the DPBF family (Kim et al. (1997) Plant J. 11: occurring form or from any source whether natural, Syn 1237-1251) and the SPF1 family (Ishiguro and Nakamura thetic, semi-synthetic or recombinant. The sequences of the (1994) Mol. Gen. Genet. 244: 563-571): the GARP family invention may also include fragments of the present amino (Hall et al. (1998) Plant Cell 10: 925-936), the TUBBY acid sequences. In this context, a “fragment” refers to a family (Boggin et al (1999) Science 286: 21 19-2125), the fragment of a polypeptide sequence which is at least 5 to heat shock family (Wu (1995) Annu. Rev. Cell Dev. Biol. 11: about 15 amino acids in length, most preferably at least 14 441-469), the ENBP family (Christiansen et al. (1996) Plant amino acids, and which retain some biological activity of a Mol. Biol. 32:809-821), the RING-zinc family (Jensen et al. transcription factor. Where “amino acid sequence' is recited (1998) FEBS Letters 436: 283-287), the PDBP family (Janik to refer to an amino acid sequence of a naturally occurring et al. (1989) Virology 168: 320-329), the PCF family (Cubas protein molecule, "amino acid sequence' and like terms are et al. Plant J. (1999) 18: 215-22), the SRS (SHI-related) not meant to limit the amino acid sequence to the complete family (Fridborg et al. (1999) Plant Cell 11: 1019-1032), the native amino acid sequence associated with the recited CPP (cysteine-rich polycomb-like) family (Cvitanich et al. protein molecule. (2000) Proc. Natl. Acad. Sci. 97: 8.163-8168), the ARF (auxin response factor) family (Ulmasov et al. (1999) Proc. 0086). As one of ordinary skill in the art recognizes, Natl. Acad. Sci. 96: 5844-5849), the SWI/SNF family (Col transcription factors can be identified by the presence of a lingwood et al. (1999).J. Mol. Endocrinol. 23:255-275), the region or domain of structural similarity or identity to a ACBF family (Seguin et al. (1997) Plant Mol. Biol. 35: specific consensus sequence or the presence of a specific 281-291), PCGL (CG-1 like) family (da Costa e Silva et al. consensus DNA-binding site or DNA-binding site motif (1994) Plant Mol. Biol. 25: 921-924) the ARID family (see, for example, Riechmann et al. (2000a) supra). The (Vazquez et al. (1999) Development 126: 733-742), the plant transcription factors may belong to one of the follow Jumonji family (Balciunas et al. (2000), Trends Biochem. ing transcription factor families: the AP2 (APETALA2) Sci. 25: 274-276), the bZIP-NIN family (Schauser et al. domain transcription factor family (Riechmann and Mey (1999) Nature 402: 191-195), the E2F family (Kaelin et al. erowitz (1998) Biol. Chem. 379: 633-646); the MYB tran (1992) Cell 70:351-364) and the GRF-like family (Knaap et scription factor family (ENBib; Martin and Paz-Ares (1997) al. (2000) Plant Physiol. 122: 695-704). As indicated by any Trends Genet. 13: 67-73); the MADS domain transcription part of the list above and as known in the art, transcription factor family (Riechmann and Meyerowitz (1997) Biol. factors have been sometimes categorized by class, family, Chem. 378: 1079-1101); the WRKY protein family (Ishiguro and Sub-family according to their structural content and and Nakamura (1994) Mol. Gen. Genet. 244: 563-571): the consensus DNA-binding site motif, for example. Many of ankyrin-repeat protein family (Zhang et al. (1992) Plant Cell the classes and many of the families and Sub-families are 4: 1575-1588); the Zinc finger protein (Z) family (Klug and listed here. However, the inclusion of one sub-family and Schwabe (1995) FASEB J. 9: 597-604): Takatsuji (1998) not another, or the inclusion of one family and not another, Cell. Mol. Life Sci. 54: 582-596); the homeobox (HB) does not mean that the invention does not encompass protein family (Buerglin (1994) in Guidebook to the polynucleotides or polypeptides of a certain family or Sub Homeobox Genes, Duboule (ed.) Oxford University Press); family. The list provided here is merely an example of the the CAAT-element binding proteins (Forsburg and Guarente types of transcription factors and the knowledge available (1989) Genes Dev. 3: 1166-1178); the squamosa promoter concerning the consensus sequences and consensus DNA binding proteins (SPB) (Klein et al. (1996) Mol. Gen. Genet. binding site motifs that help define them as known to those 1996 250: 7-16); the NAM protein family (Souer et al. of skill in the art (each of the references noted above are (1996) Cell 85: 159-170); the IAA/AUX proteins (Abeletal. specifically incorporated herein by reference). A transcrip (1995).J. Mol. Biol. 251: 533-549): the HLH/MYC protein tion factor may include, but is not limited to, any polypep family (Littlewood et al. (1994) Prot. Profile 1: 639-709); tide that can activate or repress transcription of a single gene the DNA-binding protein (DBP) family (Tucker et al. (1994) or a number of genes. This polypeptide group includes, but EMBO.J. 13: 2994-3002); the bZIP family of transcription is not limited to, DNA-binding proteins, DNA-binding pro factors (Foster et al. (1994) FASEB J 8: 192-200); the Box tein binding proteins, protein kinases, protein phosphatases, P-binding protein (the BPF-1) family (da Costa e Silva et al. protein methyltransferases, GTP-binding proteins, and (1993) Plant J. 4: 125-135); the high mobility group (HMG) receptors, and the like. family (Bustin and Reeves (1996) Prog. Nucl. Acids Res. Mol. Biol. 54: 35-100); the scarecrow (SCR) family (Di 0087. In addition to methods for modifying a plant phe Laurenzio et al. (1996) Cell 86: 423433); the GF14 family notype by employing one or more polynucleotides and US 2006/0195944 A1 Aug. 31, 2006

polypeptides of the invention described herein, the poly Nature 369: 684-685 and the references cited therein, in nucleotides and polypeptides of the invention have a variety which PCR amplicons of up to 40 kb are generated. One of of additional uses. These uses include their use in the skill will appreciate that essentially any RNA can be con recombinant production (i.e., expression) of proteins; as verted into a double stranded DNA suitable for restriction regulators of plant gene expression, as diagnostic probes for digestion, PCR expansion and sequencing using reverse the presence of complementary or partially complementary transcriptase and a polymerase. See, e.g., Ausubel, Sam nucleic acids (including for detection of natural coding brook and Berger, all Supra. nucleic acids); as Substrates for further reactions, e.g., muta 0091 Alternatively, polynucleotides and oligonucle tion reactions, PCR reactions, or the like; as substrates for otides of the invention can be assembled from fragments cloning e.g., including digestion or ligation reactions; and produced by Solid-phase synthesis methods. Typically, frag for identifying exogenous or endogenous modulators of the ments of up to approximately 100 bases are individually transcription factors. synthesized and then enzymatically or chemically ligated to Producing Polypeptides produce a desired sequence, e.g., a polynucleotide encoding all or part of a transcription factor. For example, chemical 0088. The polynucleotides of the invention include synthesis using the phosphoramidite method is described, sequences that encode transcription factors and transcription e.g., by Beaucage et al. (1981) Tetrahedron Letters 22: factor homologue polypeptides and sequences complemen 1859-1869; and Matthes et al. (1984) EMBO.J. 3: 801-805. tary thereto, as well as unique fragments of coding sequence, According to Such methods, oligonucleotides are synthe or sequence complementary thereto. Such polynucleotides sized, purified, annealed to their complementary strand, can be, e.g., DNA or RNA, e.g., mRNA, cRNA, synthetic ligated and then optionally cloned into Suitable vectors. And RNA genomic DNA, cDNA synthetic DNA, oligonucle if so desired, the polynucleotides and polypeptides of the otides, etc. The polynucleotides are either double-stranded invention can be custom ordered from any of a number of or single-stranded, and include either, or both sense (i.e., commercial Suppliers. coding) sequences and antisense (i.e., non-coding, comple mentary) sequences. The polynucleotides include the coding Homologous Sequences sequence of a transcription factor, or transcription factor 0092 Sequences homologous, i.e., that share significant homologue polypeptide, in isolation, in combination with sequence identity or similarity, to those provided in the additional coding sequences (e.g., a purification tag, a local Sequence Listing, derived from Arabidopsis thaliana or ization signal, as a fusion-protein, as a pre-protein, or the from other plants of choice are also an aspect of the like), in combination with non-coding sequences (e.g., invention. Homologous sequences can be derived from any introns or inteins, regulatory elements such as promoters, plant including monocots and dicots and in particular agri enhancers, terminators, and the like), and/or in a vector or culturally important plant species, including but not limited host environment in which the polynucleotide encoding a to, crops such as soybean, wheat, corn, potato, cotton, rice, transcription factor or transcription factor homologue rape, oilseed rape (including canola), Sunflower, alfalfa, polypeptide is an endogenous or exogenous gene. Sugarcane and turf; or fruits and vegetables, such as banana, 0089. A variety of methods exist for producing the poly blackberry, blueberry, strawberry, and raspberry, cantaloupe, nucleotides of the invention. Procedures for identifying and carrot, cauliflower, coffee, cucumber, eggplant, grapes, hon isolating DNA clones are well known to those of skill in the eydew, lettuce, mango, melon, onion, papaya, peas, peppers, art, and are described in, e.g., Berger and Kimmel, Guide to pineapple, pumpkin, spinach, squash, Sweet corn, tobacco, Molecular Cloning Techniques, Methods in Enzymology tomato, watermelon, rosaceous fruits (such as apple, peach, volume 152 Academic Press, Inc., San Diego, Calif. pear, cherry and plum) and vegetable brassicas (such as (“Berger); Sambrook et al. supra, and Current Protocols in broccoli, cabbage, cauliflower, Brussels sprouts, and kohl Molecular Biology, F. M. Ausubel et al., eds., Current rabi). Other crops, fruits and vegetables whose phenotype Protocols, a joint venture between Greene Publishing Asso can be changed include barley, rye, millet, Sorghum, currant, ciates, Inc. and John Wiley & Sons, Inc., (Supplemented avocado, citrus fruits such as oranges, lemons, grapefruit through 2000) ("Ausubel'). and tangerines, artichoke, cherries, nuts such as the walnut and peanut, endive, leek, roots, such as arrowroot, beet, 0090 Alternatively, polynucleotides of the invention, can cassaya, turnip, radish, yam, and Sweet potato, and beans. be produced by a variety of in vitro amplification methods The homologous sequences may also be derived from adapted to the present invention by appropriate selection of woody species, such pine, poplar and eucalyptus, or mint or specific or degenerate primers. Examples of protocols Suf other labiates. ficient to direct persons of skill through in vitro amplifica tion methods, including the polymerase chain reaction Orthologs and Paralogs (PCR) the ligase chain reaction (LCR), QB-replicase ampli 0093. Several different methods are known by those of fication and other RNA polymerase mediated techniques skill in the art for identifying and defining these functionally (e.g., NASBA), e.g., for the production of the homologous homologous sequences. Three general methods for defining nucleic acids of the invention are found in Berger (Supra), paralogs and orthologs are described; a paralog or ortholog Sambrook (supra), and Ausubel (supra), as well as Mullis et or homolog may be identified by one or more of the methods al., (1987) PCR Protocols A Guide to Methods and Appli described below. cations (Innis et al. eds) Academic Press Inc. San Diego, Calif. (1990) (Innis). Improved methods for cloning in vitro 0094 Orthologs and paralogs are evolutionarily related amplified nucleic acids are described in Wallace et al., U.S. genes that have similar sequence and similar functions. Pat. No. 5,426,039. Improved methods for amplifying large Orthologs are structurally related genes in different species nucleic acids by PCR are summarized in Cheng et al. (1994) that are derived from a speciation event. Paralogs are US 2006/0195944 A1 Aug. 31, 2006

structurally related genes within a single species that are such as CLUSTAL (Thompson et al. (1994) Nucleic Acids derived by a duplication event. Res. 22: 4673-4680; Higgins et al. (1996) Methods Enzymol. 0.095 Within a single plant species, gene duplication may 266: 383-402), potential orthologous sequences can be cause two copies of a particular gene, giving rise to two or placed into the phylogenetic tree and its relationship to more genes with similar sequence and similar function genes from the species of interest can be determined. Once known as paralogs. A paralog is therefore a similar gene with the ortholog pair has been identified, the function of the test a similar function within the same species. Paralogs typi ortholog can be determined by determining the function of cally cluster together or in the same clade (a group of similar the reference ortholog. It is then a matter of routine to align genes) when a gene family phylogeny is analyzed using sequences that are most closely related by virtue of their programs such as CLUSTAL (Thompson et al. (1994) presence in a related clade (e.g., a group of sequences Nucleic Acids Res. 22: 4673-4680; Higgins et al. (1996) descending from a strong node of a phylogenetic tree Methods Enzymol. 266: 383402). Groups of similar genes representing a common ancestral sequence) using BLAST or can also be identified with pair-wise BLAST analysis (Feng similar analysis, or compare similarity or identity of the and Doolittle (1987) J. Mol. Evol. 25: 351-360). For amino acid residues of these sequences and/or their con example, a clade of very similar MADS domain transcrip served domains or motifs that confer and correlate with tion factors from Arabidopsis all share a common function in flowering time (Ratcliffe et al. (2001) Plant Physiol. 126: conserved function. 122-132), and a group of very similar AP2 domain tran 0097 Transcription factors that are homologous to the Scription factors from Arabidopsis are involved in tolerance listed sequences will typically share at least about 30% of plants to freezing (Gilmour et al. (1998) Plant J. 16: amino acid sequence identity, or at least about 30% amino 433-442). Analysis of groups of similar genes with similar acid sequence identity outside of a known consensus function that fall within one clade can yield Sub-sequences sequence or consensus DNA-binding site. More closely that are particular to the clade. These Sub-sequences, known related transcription factors can share at least about 50%, as consensus sequences, can not only be used to define the about 60%, about 65%, about 70%, about 75% or about 80% sequences within each clade, but define the functions of or about 90% or about 95% or about 98% or more sequence these genes; genes within a clade may contain paralogous or identity with the listed sequences, or with the listed orthologous sequences that share the same function. (See sequences but excluding or outside a known consensus also, for example, Mount, D. W. (2001) Bioinformatics. sequence or consensus DNA-binding site, or with the listed Sequence and Genome Analysis, Cold Spring Harbor Labo sequences excluding one or all conserved domain. Factors ratory Press, Cold Spring Harbor, N.Y. page 543.) that are most closely related to the listed sequences share, 0.096 Speciation, the production of new species from a e.g., at least about 85%, about 90% or about 95% or more parental species, can also give rise to two or more genes with % sequence identity to the listed sequences, or to the listed similar sequence and similar function. These genes, termed sequences but excluding or outside a known consensus orthologs, often have an identical function within their host sequence or consensus DNA-binding site or outside one or plants and are often interchangeable between species with all conserved domain. At the nucleotide level, the sequences out losing function. Because plants have common ancestors, will typically share at least about 40% nucleotide sequence many genes in any plant species will have a corresponding identity, preferably at least about 50%, about 60%, about orthologous gene in another plant species. Transcription 70% or about 80% sequence identity, and more preferably factor gene sequences are thus conserved across diverse about 85%, about 90%, about 95% or about 97% or more eukaryotic species lines (Goodrich et al. (1993) Cell 75: sequence identity to one or more of the listed sequences, or 519-530; Lin et al. (1991) Nature 353: 569-571; Sadowski to a listed sequence but excluding or outside a known et al. (1988) Nature 335: 563-564). Plants are no exception consensus sequence or consensus DNA-binding site, or to this observation; diverse plant species possess transcrip outside one or all conserved domain. The degeneracy of the tion factors that have similar sequences and functions. It is genetic code enables major variations in the nucleotide well known in the art that protein function can be classified sequence of a polynucleotide while maintaining the amino using phylogenetic analysis of gene trees combined with the acid sequence of the encoded protein. Conserved domains corresponding species. Functional predictions can be greatly (for example, a DNA binding domain) within a transcription improved by focusing on how the genes became similar in factor family may exhibit a high degree of sequence homol sequence (i.e., evolution) rather than on the sequence simi ogy, such as at least about at least about 65%, or at least larity itself (Eisen, (1998) Genome Res. 8: 163-167): “the about 69%, or at least about 70%, or at least about 73%, or first step in making functional predictions is the generation at least about 76%, or at least about 78%, or at least about of a phylogenetic tree representing the evolutionary history 80%, or at least about 82%, or at least about 85%, or at least of the gene of interest and its homologs. Such trees are about 87%, or at least about 89%, or at least about 95%, distinct from clusters and other means of characterizing amino acid residue sequence identity, to a conserved domain sequence similarity because they are inferred by techniques of a transcription factor polypeptide of the invention listed that help convert patterns of similarity into evolutionary in the Sequence Listing. Transcription factors that are relationships. . . . After the gene tree is inferred, biologically homologous to the listed sequences should share at least determined functions of the various homologs are overlaid 30%, or at least about 60%, or at least about 75%, or at least onto the tree. Finally, the structure of the tree and the relative about 80%, or at least about 90%, or at least about 95% phylogenetic positions of genes of different functions are amino acid sequence identity over the entire length of the used to trace the history of functional changes, which is then polypeptide or the homolog. In addition, transcription fac used to predict functions of as yet uncharacterized genes' tors that are homologous to the listed sequences should share (Eisen, Supra). Thus, once a phylogenic tree for a gene at least 30%, or at least about 60%, or at least about 75%, family of one species has been constructed using a program or at least about 80%, or at least about 90%, or at least about US 2006/0195944 A1 Aug. 31, 2006

95% amino acid sequence similarity over the entire length of and a query is made against the sequence database using the the polypeptide or the homolog. relevant sequences herein and associated plant phenotypes 0.098 Percent identity can be determined electronically, or gene functions. e.g., by using the MEGALIGN program (DNASTAR, Inc. 0102) In addition, one or more polynucleotide sequences Madison, Wis.). The MEGALIGN program can create align or one or more polypeptides encoded by the polynucleotide ments between two or more sequences according to different sequences may be used to search against a BLOCKS (Bai methods, e.g., the clustal method. (See, e.g., Higgins and rochet al. (1997) Nucleic Acids Res. 25: 217-221), PFAM, Sharp (1988) Gene 73: 237-244.) The clustal algorithm and other databases which contain previously identified and groups sequences into clusters by examining the distances annotated motifs, sequences and gene functions. Methods between all pairs. The clusters are aligned pairwise and then that search for primary sequence patterns with secondary in groups. Other alignment algorithms or programs may be structure gap penalties (Smith et al. (1992) Protein Engi used, including FASTA, BLAST, or ENTREZ, FASTA and neering 5:35-51) as well as algorithms such as Basic Local BLAST. These are available as a part of the GCG sequence Alignment Search Tool (BLAST: Altschul (1993) J. Mol. analysis package (University of Wisconsin, Madison, Wis.), Evol. 36: 290-300; Altschul et al. (1990) supra), BLOCKS and can be used with or without default settings. ENTREZ (Henikoff and Henikoff (1991) Nucl. Acids Res. 19: 6565 is available through the National Center for Biotechnology 6572), Hidden Markov Models (HMM; Eddy (1996) Curr: Information. In one embodiment, the percent identity of two Opin. Str. Biol. 6: 361-365: Sonnhammer et al. (1997) sequences can be determined by the GCG program with a Proteins 28: 405-420), and the like, can be used to manipu gap weight of 1, e.g., each amino acid gap is weighted as if late and analyze polynucleotide and polypeptide sequences it were a single amino acid or nucleotide mismatch between encoded by polynucleotides. These databases, algorithms the two sequences (see U.S. Pat. No. 6,262.333). and other methods are well known in the art and are described in Ausubel et al. (1997) Short Protocols in 0099. Other techniques for alignment are described in Molecular Biology, John Wiley & Sons, New York N.Y., unit Methods in Enzymology, Vol. 266. Computer Methods for 7.7) and in Meyers, R. A. (1995) Molecular Biology and Macromolecular Sequence Analysis (1996), ed. Doolittle, Biotechnology, Wiley VCH, New York N.Y., p856-853). Academic Press, Inc., San Diego, Calif., USA. Preferably, an alignment program that permits gaps in the sequence is 0103). Furthermore, methods using manual alignment of utilized to align the sequences. The Smith-Waterman is one sequences similar or homologous to one or more polynucle type of algorithm that permits gaps in sequence alignments otide sequences or one or more polypeptides encoded by the (Shpaer (1997) Methods Mol. Biol. 70: 173-187). Also, the polynucleotide sequences may be used to identify regions of GAP program using the Needleman and Wunsch alignment similarity and conserved domains. Such manual methods are method can be utilized to align sequences. An alternative well-known of those of skill in the art and can include, for search strategy uses MPSRCH software, which runs on a example, comparisons of tertiary structure between a MASPAR computer. MPSRCH uses a Smith-Waterman polypeptide sequence encoded by a polynucleotide which algorithm to score sequences on a massively parallel com comprises a known function, with a polypeptide sequence puter. This approach improves ability to pick up distantly encoded by a polynucleotide sequence which has a function related matches, and is especially tolerant of Small gaps and not yet determined. Such examples of tertiary structure may nucleotide sequence errors. Nucleic acid-encoded amino comprise predicted a helices, B-sheets, amphipathic helices, acid sequences can be used to search both protein and DNA leucine Zipper motifs, Zinc finger motifs, proline-rich databases. regions, cysteine repeat motifs, and the like. 0100. The percentage similarity between two polypeptide VI. Identifying Polynucleotides or Nucleic Acids by Hybrid sequences, e.g., sequence A and sequence B, is calculated by ization dividing the length of sequence A, minus the number of gap residues in sequence A, minus the number of gap residues in 0.104 Polynucleotides homologous to the sequences sequence B, into the Sum of the residue matches between illustrated in the Sequence Listing and tables can be iden sequence A and sequence B, times one hundred. Gaps of low tified, e.g., by hybridization to each other under stringent or or of no similarity between the two amino acid sequences are under highly stringent conditions. Single stranded poly not included in determining percentage similarity. Percent nucleotides hybridize when they associate based on a variety identity between polynucleotide sequences can also be of well characterized physical-chemical forces. Such as counted or calculated by other methods known in the art, hydrogen bonding, solvent exclusion, base stacking and the e.g., the Jotun Hein method. (See, e.g., Hein (1990) Methods like. The stringency of a hybridization reflects the degree of Enzymol. 183: 626-645.) Identity between sequences can sequence identity of the nucleic acids involved, such that the also be determined by other methods known in the art, e.g., higher the stringency, the more similar are the two poly by varying hybridization conditions (see US Patent Appli nucleotide Strands. Stringency is influenced by a variety of factors, including temperature, salt concentration and com cation No. 20010010913). position, organic and non-organic additives, solvents, etc. 0101 Thus, the invention provides methods for identify present in both the hybridization and wash solutions and ing a sequence similar or paralogous or orthologous or incubations (and number thereof), as described in more homologous to one or more polynucleotides as noted herein, detail in the references cited above. Encompassed by the or one or more target polypeptides encoded by the poly invention are polynucleotide sequences that are capable of nucleotides, or otherwise noted herein and may include hybridizing to the polynucleotide sequences, listed in the linking or associating a given plant phenotype or gene Sequence Listing; and fragments, thereof under various function with a sequence. In the methods, a sequence conditions of stringency. (See, e.g., Wahl and Berger (1987) database is provided (locally or across an inter or intra net) Methods Enzymol. 152: 3994.07; Kimmel, A. R. (1987) US 2006/0195944 A1 Aug. 31, 2006

Methods Enzymol. 152: 507-511.) Estimates of homology 500 mM. NaCl, 50 mM trisodium citrate, 1% SDS, 35% are provided by either DNA-DNA or DNA-RNA hybridiza formamide, and 100 lug/ml denatured salmon sperm DNA tion under conditions of stringency as is well understood by (ssDNA). In a most preferred embodiment, hybridization those skilled in the art (Hames and Higgins, eds. (1985) will occur at 42° C. in 250 mM. NaCl, 25 mM trisodium Nucleic Acid Hybridisation, IRL Press, Oxford, U.K.). Strin citrate, 1% SDS, 50% formamide, and 200 g/ml ssDNA. gency conditions can be adjusted to Screen for moderately Useful variations on these conditions will be readily appar similar fragments, such as homologous sequences from ent to those skilled in the art. distantly related organisms, to highly similar fragments, 0108. The washing steps that follow hybridization can Such as genes that duplicate functional enzymes from also vary in Stringency. Wash Stringency conditions can be closely related organisms. Post-hybridization washes deter defined by Salt concentration and by temperature. As above, mine Stringency conditions. wash stringency can be increased by decreasing salt con 0105. In addition to the nucleotide sequences listed in centration or by increasing temperature. For example, Strin Table 4, full length cDNA, orthologs, paralogs and gent salt concentration for the wash steps will preferably be homologs of the present nucleotide sequences may be iden less than about 30 mM NaCl and 3 mM trisodium citrate, tified and isolated using well known methods. The cDNA and most preferably less than about 15 mM NaCl and 1.5 libraries orthologs, paralogs and homologs of the present mM trisodium citrate. Stringent temperature conditions for nucleotide sequences may be screened using hybridization the wash steps will ordinarily include temperature of at least methods to determine their utility as hybridization target or about 25° C., more preferably of at least about 42° C. amplification probes. Another preferred set of highly stringent conditions uses two 0106 An example of stringent hybridization conditions final washes in 0.1xSSC, 0.1% SDS at 65° C. The most for hybridization of complementary nucleic acids which preferred high stringency washes are of at least about 68°C. have more than 100 complementary residues on a filter in a For example, in a preferred embodiment, wash steps will Southern or northern blot is about 5° C. to 20° C. lower than occur at 25°C. in 30 mMNaCl, 3 mM trisodium citrate, and the thermal melting point (T) for the specific sequence at 0.1% SDS. In a more preferred embodiment, wash steps will a defined ionic strength and pH. The T is the temperature occur at 42°C. in 15 mM NaCl, 1.5 mM trisodium citrate, (under defined ionic strength and pH) at which 50% of the and 0.1% SDS. In a most preferred embodiment, the wash target sequence hybridizes to a perfectly matched probe. steps will occur at 68°C. in 15 mM NaCl, 1.5 mM trisodium Nucleic acid molecules that hybridize under stringent con citrate, and 0.1% SDS. Additional variations on these con ditions will typically hybridize to a probe based on either the ditions will be readily apparent to those skilled in the art (see entire cDNA or selected portions, e.g., to a unique Subse U.S. Patent Application No. 20010010913). quence, of the cDNA under wash conditions of 0.2xSSC to 0.109 As another example, stringent conditions can be 2.0xSSC, 0.1% SDS at 50-65° C. For example, high strin selected Such that an oligonucleotide that is perfectly gency is about 0.2xSSC, 0.1% SDS at 65° C. Ultra-high complementary to the coding oligonucleotide hybridizes to stringency will be the same conditions except the wash the coding oligonucleotide with at least about a 5-10x higher temperature is raised about 3 to about 5°C., and ultra-ultra signal to noise ratio than the ratio for hybridization of the high Stringency will be the same conditions except the wash perfectly complementary oligonucleotide to a nucleic acid temperature is raised about 6 to about 9° C. For identifica encoding a transcription factor known as of the filing date of tion of less closely related homologues washes can be the application. Conditions can be selected Such that a higher performed at a lower temperature, e.g., 50° C. In general, signal to noise ratio is observed in the particular assay which stringency is increased by raising the wash temperature is used, e.g., about 15x. 25x, 35x. 50x or more. Accordingly, and/or decreasing the concentration of SSC, as known in the the Subject nucleic acid hybridizes to the unique coding art. oligonucleotide with at least a 2x higher signal to noise ratio as compared to hybridization of the coding oligonucleotide 0107. In another example, stringent salt concentration to a nucleic acid encoding known polypeptide. Again, higher will ordinarily be less than about 750 mM. NaCl and 75 mM trisodium citrate, preferably less than about 500 mM NaCl signal to noise ratios can be selected, e.g., about 5x, 10x. and 50 mM trisodium citrate, and most preferably less than 25x, 35x. 50x or more. The particular signal will depend on about 250 mM NaCl and 25 mM trisodium citrate. Low the label used in the relevant assay, e.g., a fluorescent label, stringency hybridization can be obtained in the absence of a colorimetric label, a radioactive label, or the like. organic Solvent, e.g., formamide, while high Stringency 0110. Alternatively, transcription factor homolog hybridization can be obtained in the presence of at least polypeptides can be obtained by Screening an expression about 35% formamide, and most preferably at least about library using antibodies specific for one or more transcrip 50% formamide. Stringent temperature conditions will ordi tion factors. With the provision herein of the disclosed narily include temperatures of at least about 30° C., more transcription factor, and transcription factor homologue preferably of at least about 37°C., and most preferably of at nucleic acid sequences, the encoded polypeptide(s) can be least about 42° C. Varying additional parameters. Such as expressed and purified in a heterologous expression system hybridization time, the concentration of detergent, e.g., (e.g., E. coli) and used to raise antibodies (monoclonal or sodium dodecyl sulfate (SDS), and the inclusion or exclu polyclonal) specific for the polypeptide(s) in question. Anti sion of carrier DNA, are well known to those skilled in the bodies can also be raised against synthetic peptides derived art. Various levels of stringency are accomplished by com from transcription factor, or transcription factor homologue, bining these various conditions as needed. In a preferred amino acid sequences. Methods of raising antibodies are embodiment, hybridization will occur at 30° C. in 750 mM well known in the art and are described in Harlow and Lane NaCl, 75 mM trisodium citrate, and 1% SDS. In a more (1988) Antibodies: A Laboratory Manual, Cold Spring Har preferred embodiment, hybridization will occur at 37° C. in bor Laboratory, New York. Such antibodies can then be used US 2006/0195944 A1 Aug. 31, 2006 to screen an expression library produced from the plant from the transcription factor are included within the scope of the which it is desired to clone additional transcription factor present invention, as are polypeptides encoded by Such homologues, using the methods described above. The cDNAs and mRNAs. Allelic variants and splice variants of selected cDNAS can be confirmed by sequencing and enzy these sequences can be cloned by probing cDNA or genomic matic activity. libraries from different individual organisms or tissues Sequence Variations according to standard procedures known in the art (see U.S. 0111. It will readily be appreciated by those of skill in the Pat. No. 6,388,064). art, that any of a variety of polynucleotide sequences are capable of encoding the transcription factors and transcrip 0115 For example, Table 1 illustrates, e.g., that the tion factor homologue polypeptides of the invention. Due to codons AGC, AGT, TCA, TCC, TCG, and TCT all encode the degeneracy of the genetic code, many different poly the same amino acid: Serine. Accordingly, at each position in nucleotides can encode identical and/or Substantially similar the sequence where there is a codon encoding serine, any of polypeptides in addition to those sequences illustrated in the the above trinucleotide sequences can be used without Sequence Listing. Nucleic acids having a sequence that altering the encoded polypeptide. differs from the sequences shown in the Sequence Listing, or complementary sequences, that encode functionally equiva TABLE 1. lent peptides (i.e., peptides having some degree of equiva lent or similar biological activity) but differ in sequence Amino acid Possible Codons from the sequence shown in the sequence listing due to Alanine Ala A GCA GCC GCG GCU degeneracy in the genetic code, are also within the scope of Cysteine Cys C TGC TGT the invention. 0112 Altered polynucleotide sequences encoding Aspartic acid Asp D GAC GAT polypeptides include those sequences with deletions, inser Glutamic acid Glu E GAA GAG tions, or Substitutions of different nucleotides, resulting in a polynucleotide encoding a polypeptide with at least one Phenylalanine Phe F TTC TTT functional characteristic of the instant polypeptides. Glycine Gly G GGA GGC GGG GGT Included within this definition are polymorphisms which may or may not be readily detectable using a particular Histidine His H CAC CAT oligonucleotide probe of the polynucleotide encoding the instant polypeptides, and improper or unexpected hybrid Isoleucine Ile I ATA ATC ATT ization to allelic variants, with a locus other than the normal Lysine Lys K AAA AAG chromosomal locus for the polynucleotide sequence encod ing the instant polypeptides. Leucine Leu L A TTG CTA CTC CTG CTT 0113 Allelic variant refers to any of two or more alter Methionine Met M ATG native forms of a gene occupying the same chromosomal locus. Allelic variation arises naturally through mutation, Asparagine Asn. N AAC AAT and may result in phenotypic polymorphism within popu Proline Pro P CCA CCC CCG CCT lations. Gene mutations can be silent (i.e., no change in the encoded polypeptide) or may encode polypeptides having Glutamine Glin Q CAA CAG altered amino acid sequence. The term allelic variant is also Arginine Arg R AGA AGG CGA CGC CGG CGT used herein to denote a protein encoded by an allelic variant of a gene. Splice variant refers to alternative forms of RNA Serine Ser S AGC AGT TCA TCC TCG TCT transcribed from a gene. Splice variation arises naturally through use of alternative splicing sites within a transcribed Threonine Thir T ACA ACC ACG ACT RNA molecule, or less commonly between separately tran Waline Wal W GTA GTG. GTG GTT scribed RNA molecules, and may result in several mRNAs transcribed from the same gene. Splice variants may encode Tryptophan Trp W TGG polypeptides having altered amino acid sequence. The term Tyrosine Tyr Y TAC TAT splice variant is also used herein to denote a protein encoded by a splice variant of an mRNA transcribed from a gene. 0114 Those skilled in the art would recognize that G47, 0116 Sequence alterations that do not change the amino SEQ ID NO: 66, represents a single transcription factor; acid sequence encoded by the polynucleotide are termed allelic variation and alternative splicing may be expected to “silent variations. With the exception of the codons ATG occur. Allelic variants of SEQ ID NO: 65 can be cloned by and TGG, encoding methionine and tryptophan, respec probing cDNA or genomic libraries from different individual tively, any of the possible codons for the same amino acid organisms according to standard procedures. Allelic variants can be substituted by a variety of techniques, e.g., site of the DNA sequence shown in SEQ ID NO: 65, including directed mutagenesis, available in the art. Accordingly, any those containing silent mutations and those in which muta and all Such variations of a sequence selected from the above tions result in amino acid sequence changes, are within the table are a feature of the invention. Scope of the present invention, as are proteins which are allelic variants of SEQ ID NO: 66. cDNAs generated from 0.117) In addition to silent variations, other conservative alternatively spliced mRNAs, which retain the properties of variations that alter one, or a few amino acids in the encoded US 2006/0195944 A1 Aug. 31, 2006 polypeptide, can be made without altering the function of the polypeptide, these conservative variants are, likewise, a TABLE 3 feature of the invention. Residue Similar Substitutions 0118 For example, substitutions, deletions and insertions Ala Ser: Thr; Gly; Val; Leu: Ile introduced into the sequences provided in the Sequence Arg Lys; His; Gly Listing are also envisioned by the invention. Such sequence ASn Gln: His: Gly: Ser: Thr modifications can be engineered into a sequence by site Asp Glu, Ser: Thr Gln ASn; Ala directed mutagenesis (Wu (ed.) Meth. Enzymol. (1993) vol. Cys Ser: Gly 217, Academic Press) or the other methods noted below. Glu Asp Amino acid Substitutions are typically of single residues; Gly Pro; Arg insertions usually will be on the order of about from 1 to 10 His ASn; Gln: Tyr; Phe: Lys; Arg Ile Ala; Leu; Val; Gly; Met amino acid residues; and deletions will range about from 1 Leu Ala: Ile: Val; Gly; Met to 30 residues. In preferred embodiments, deletions or Lys Arg; His; Glin; Gly; Pro insertions are made in adjacent pairs, e.g., a deletion of two Met Leu: Ile; Phe residues or insertion of two residues. Substitutions, dele Phe Met; Leu: Tyr; Trp: His; Val; Ala Ser Thr; Gly: Asp; Ala; Val: Ile: His tions, insertions or any combination thereof can be com Thr Ser; Val; Ala; Gly bined to arrive at a sequence. The mutations that are made Trp Tyr; Phe: His in the polynucleotide encoding the transcription factor Tyr Trp; Phe: His should not place the sequence out of reading frame and Wall Ala: Ile: Leu; Gly; Thr; Ser; Glu should not create complementary regions that could produce secondary mRNA structure. Preferably, the polypeptide encoded by the DNA performs the desired function. 0121 Substitutions that are less conservative than those in Table 2 can be selected by picking residues that differ 0119) Conservative substitutions are those in which at more significantly in their effect on maintaining (a) the least one residue in the amino acid sequence has been structure of the polypeptide backbone in the area of the removed and a different residue inserted in its place. Such Substitution, for example, as a sheet or helical conformation, Substitutions generally are made in accordance with the (b) the charge or hydrophobicity of the molecule at the target Table 2 when it is desired to maintain the activity of the site, or (c) the bulk of the side chain. The substitutions which protein. Table 2 shows amino acids which can be substituted in general are expected to produce the greatest changes in for an amino acid in a protein and which are typically protein properties will be those in which (a) a hydrophilic regarded as conservative Substitutions. residue, e.g., Seryl or threonyl, is Substituted for (or by) a hydrophobic residue, e.g., leucyl, isoleucyl, phenylalanyl. TABLE 2 valyl or alanyl; (b) a cysteine or proline is substituted for (or Conservative by) any other residue; (c) a residue having an electropositive Residue Substitutions side chain, e.g., lysyl, arginyl, or histidyl, is Substituted for (or by) an electronegative residue, e.g., glutamyl or aspartyl: Ala Ser Arg Lys or (d) a residue having a bulky side chain, e.g., phenylala ASn Gln: His nine, is Substituted for (or by) one not having a side chain, Asp Glu e.g., glycine. Gln ASn Cys Ser Further Modifying Sequences of the Invention—Mutation/ Glu Asp Forced Evolution Gly Pro His ASn; Glin Ile Leu, Val 0122) In addition to generating silent or conservative Leu Ile: Val Substitutions as noted, above, the present invention option Lys Arg: Glin ally includes methods of modifying the sequences of the Met Leu: Ile Sequence Listing. In the methods, nucleic acid or protein Phe Met; Leu: Tyr Ser Thr; Gly modification methods are used to alter the given sequences Thr Ser; Val to produce new sequences and/or to chemically or enzy Trp Tyr matically modify given sequences to change the properties Tyr Trp; Phe of the nucleic acids or proteins. Wall Ile: Leu 0123 Thus, in one embodiment, given nucleic acid sequences are modified, e.g., according to standard 0120 Similar substitutions are those in which at least one mutagenesis or artificial evolution methods to produce residue in the amino acid sequence has been removed and a modified sequences. The modified sequences may be created different residue inserted in its place. Such substitutions using purified natural polynucleotides isolated from any generally are made in accordance with the Table 3 when it organism or may be synthesized from purified compositions is desired to maintain the activity of the protein. Table 3 and chemicals using chemical means well know to those of shows amino acids which can be substituted for an amino skill in the art. For example, Ausubel, Supra, provides acid in a protein and which are typically regarded as additional details on mutagenesis methods. Artificial forced structural and functional Substitutions. For example, a resi evolution methods are described, for example, by Stemmer due in column 1 of Table 3 may be substituted with residue (1994) Nature 370: 389-391, Stemmer (1994) Proc. Natl. in column 2; in addition, a residue in column 2 of Table 3 Acad. Sci. USA 91: 10747-10751, and U.S. Pat. Nos. 5,811, may be substituted with the residue of column 1. 238, 5,837,500, and 6.242.568. Methods for engineering US 2006/0195944 A1 Aug. 31, 2006 synthetic transcription factors and other polypeptides are Expression and Modification of Polypeptides described, for example, by Zhanget al. (2000).J. Biol. Chem. 275: 33850-33860, Liu et al. (2001) J. Biol. Chem. 276: 0129. Typically, polynucleotide sequences of the inven 11323-11334, and Isalanet al. (2001) Nature Biotechnol. 19: tion are incorporated into recombinant DNA (or RNA) 656-660. Many other mutation and evolution methods are molecules that direct expression of polypeptides of the also available and expected to be within the skill of the invention in appropriate host cells, transgenic plants, in vitro practitioner. translation systems, or the like. Due to the inherent degen eracy of the genetic code, nucleic acid sequences which 0124 Similarly, chemical or enzymatic alteration of encode Substantially the same or a functionally equivalent expressed nucleic acids and polypeptides can be performed amino acid sequence can be substituted for any listed by standard methods. For example, sequence can be modi sequence to provide for cloning and expressing the relevant fied by addition of lipids, Sugars, peptides, organic or homologue. inorganic compounds, by the inclusion of modified nucle otides or amino acids, or the like. For example, protein Vectors, Promoters, and Expression Systems modification techniques are illustrated in Ausubel, Supra. 0.130. The present invention includes recombinant con Further details on chemical and enzymatic modifications can structs comprising one or more of the nucleic acid sequences be found herein. These modification methods can be used to herein. The constructs typically comprise a vector, such as a modify any given sequence, or to modify any sequence plasmid, a cosmid, a phage, a virus (e.g., a plant Virus), a produced by the various mutation and artificial evolution bacterial artificial chromosome (BAC), a yeast artificial modification methods noted herein. chromosome (YAC), or the like, into which a nucleic acid 0125. Accordingly, the invention provides for modifica sequence of the invention has been inserted, in a forward or tion of any given nucleic acid by mutation, evolution, reverse orientation. In a preferred aspect of this embodi chemical or enzymatic modification, or other available ment, the construct further comprises regulatory sequences, methods, as well as for the products produced by practicing including, for example, a promoter, operably linked to the Such methods, e.g., using the sequences herein as a starting sequence. Large numbers of Suitable vectors and promoters Substrate for the various modification approaches. are known to those of skill in the art, and are commercially 0126 For example, optimized coding sequence contain available. ing codons preferred by a particular prokaryotic or eukary 0131 General texts that describe molecular biological otic host can be used e.g., to increase the rate of translation techniques useful herein, including the use and production or to produce recombinant RNA transcripts having desirable of vectors, promoters and many other relevant topics, properties, such as a longer half-life, as compared with include Berger, Sambrook and Ausubel, supra. Any of the transcripts produced using a non-optimized sequence. Trans identified sequences can be incorporated into a cassette or lation stop codons can also be modified to reflect host vector, e.g., for expression in plants. A number of expression preference. For example, preferred stop codons for Saccha vectors suitable for stable transformation of plant cells or for romyces cerevisiae and mammals are TAA and TGA, respec the establishment of transgenic plants have been described tively. The preferred stop codon for monocotyledonous including those described in Weissbach and Weissbach, plants is TGA, whereas insects and E. coli prefer to use TAA (1989) Methods for Plant Molecular Biology, Academic as the stop codon. Press, and Gelvin et al., (1990) Plant Molecular Biology 0127. The polynucleotide sequences of the present inven Manual, Kluwer Academic Publishers. Specific examples tion can also be engineered in order to alter a coding include those derived from a Ti plasmid of Agrobacterium sequence for a variety of reasons, including but not limited tumefaciens, as well as those disclosed by Herrera-Estrella to, alterations which modify the sequence to facilitate clon et al. (1983) Nature 303: 209, Bevan (1984) Nucl Acid Res. ing, processing and/or expression of the gene product. For 12: 8711-8721, Klee (1985) Bio/Technology 3: 637-642, for example, alterations are optionally introduced using tech dicotyledonous plants. niques which are well known in the art, e.g., site-directed mutagenesis, to insert new restriction sites, to alter glyco 0.132. Alternatively, non-Ti vectors can be used to trans Sylation patterns, to change codon preference, to introduce fer the DNA into monocotyledonous plants and cells by splice sites, etc. using free DNA delivery techniques. Such methods can involve, for example, the use of liposomes, electroporation, 0128. Furthermore, a fragment or domain derived from microprojectile bombardment, silicon carbide whiskers, and any of the polypeptides of the invention can be combined viruses. By using these methods transgenic plants such as with domains derived from other transcription factors or wheat, rice (Christou (1991) Bio/Technology 9: 957-962) synthetic domains to modify the biological activity of a and corn (Gordon-Kamm (1990) Plant Cell 2: 603-618) can transcription factor. For instance, a DNA-binding domain be produced. An immature embryo can also be a good target derived from a transcription factor of the invention can be tissue for monocots for direct DNA delivery techniques by combined with the activation domain of another transcrip using the particle gun (Weeks et al. (1993) Plant Physiol. tion factor or with a synthetic activation domain. A tran 102: 1077-1084; Vasil (1993) Bio/Technology 10: 667-674; Scription activation domain assists in initiating transcription Wan and Lemeaux (1994) Plant Physiol. 104:3748, and for from a DNA-binding site. Examples include the transcrip Agrobacterium-mediated DNA transfer (Ishida et al. (1996) tion activation region of VP16 or GAL4 (Moore et al. (1998) Proc. Natl. Acad. Sci. USA 95: 376-381; and Aoyama et al. Nature Biotech. 14: 745-750). (1995) Plant Cell 7: 1773-1785), peptides derived from 0.133 Typically, plant transformation vectors include one bacterial sequences (Ma and Ptashne (1987) Cell 51; 113 or more cloned plant coding sequence (genomic or cDNA) 119) and synthetic peptides (Giniger and Ptashne, (1987) under the transcriptional control of 5' and 3' regulatory Nature 330: 670-672). sequences and a dominant selectable marker. Such plant US 2006/0195944 A1 Aug. 31, 2006

transformation vectors typically also contain a promoter 270: 1986-1988); or late seed development (Odell et al. (e.g., a regulatory region controlling inducible or constitu (1994) Plant Physiol. 106: 447-458). tive, environmentally- or developmentally-regulated, or 0.136 Plant expression vectors can also include RNA cell- or tissue-specific expression), a transcription initiation processing signals that can be positioned within, upstream or start site, an RNA processing signal (such as intron splice downstream of the coding sequence. In addition, the expres sites), a transcription termination site, and/or a polyadeny sion vectors can include additional regulatory sequences lation signal. from the 3'-untranslated region of plant genes, e.g., a 3' 0134 Examples of constitutive plant promoters which terminator region to increase mRNA stability of the mRNA, can be useful for expressing the TF sequence include: the such as the PI-II terminator region of potato or the octopine cauliflower mosaic virus (CaMV) 35S promoter, which or nopaline synthase 3' terminator regions. confers constitutive, high-level expression in most plant tissues (see, e.g., Odell et al. (1985) Nature 313: 810-812); 0.137 The following represent specific examples of the nopaline synthase promoter (An et al. (1988) Plant expression constructs used to overexpress sequences of the Physiol. 88: 547-552); and the octopine synthase promoter invention. The choice of promoters may include, for (Fromm et al. (1989) Plant Cell 1: 977-984). example, the constitutive CaMV 35S promoter, the STM 0135) A variety of plant gene promoters that regulate shoot apical meristem-specific promoter, the CUT1 epider gene expression in response to environmental, hormonal, mal-specific promoter, the LTP1 epidermal-specific pro chemical, developmental signals, and in a tissue-active moter, the SUC2 vascular-specific promoter, the RBCS3 manner can be used for expression of a TF sequence in leaf-specific promoter, the ARSK1 root-specific promoter, plants. Choice of a promoter is based largely on the pheno the RD29A stress inducible promoter, the AP1 floral mer type of interest and is determined by Such factors as tissue istem-specific promoter (SEQ ID NO: 209-217, respec (e.g., seed, fruit, root, pollen, vascular tissue, flower, carpel, tively). Many of these examples have been used to produce etc.), inducibility (e.g., in response to wounding, heat, cold, transgenic plants. These or other inducible or tissue-specific drought, light, pathogens, etc.), timing, developmental stage, promoters may be incorporated into an expression vector and the like. Numerous known promoters have been char comprising a transcription factor polynucleotide of the acterized and can favorably be employed to promote expres invention, where the promoter is operably linked to the sion of a polynucleotide of the invention in a transgenic transcription factor polynucleotide, can be envisioned and plant or cell of interest. For example, tissue specific pro produced. moters include: seed-specific promoters (such as the napin, 0.138 P894 (SEQ ID NO: 218) contained a 35S:G47 phaseolin or DC3 promoter described in U.S. Pat. No. direct fusion and carries KanR. The construct contains a G47 5,773.697), fruit-specific promoters that are active during cDNA clone. fruit ripening (such as the dru 1 promoter (U.S. Pat. No. 5,783,393), or the 2A11 promoter (U.S. Pat. No. 4,943,674) 0.139. An alternative means of overexpressing G47 and the tomato polygalacturonase promoter (Bird et al. makes use of the two constructs P6506 (SEQ ID NO. 233; (1988) Plant Mol. Biol. 11: 651), root-specific promoters, 35S:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; such as those disclosed in U.S. Pat. Nos. 5,618,988, 5,837, opLeXA::G47), which together constituted a two-component 848 and 5,905,186, pollen-active promoters such as PTA29, system for expression of G47 from the 35S promoter. A PTA26 and PTA13 (U.S. Pat. No. 5,792,929), promoters kanamycin resistant transgenic line containing P6506 was active in vascular tissue (Ringli and Keller (1998) Plant established, and this was then supertransformed with the Mol. Biol. 37: 977-988), flower-specific (Kaiser et al. (1995) P3853 construct containing a cDNA clone of G47 and a Plant Mol. Biol. 28: 231-243), pollen (Baerson et al. (1994) Sulfonamide resistance marker. Plant Mol. Biol. 126: 1947-1959), carpels (Ohlet al. (1990) 0140) P1572 (SEQID NO: 219) comprised a 35S::G2133 Plant Cell 2: 837-848), pollen and ovules (Baerson et al. direct promoter fusion and carries KanR. The construct (1993) Plant Mol. Biol. 22:255-267), auxin-inducible pro contains a cDNA clone of G2133 moters (such as that described in van der Kop et al. (1999) Plant Mol. Biol. 39:979-990 or Baumann et al. (1999) Plant 0141 P23456 (SEQ ID NO: 220) contained a Cell 11: 323-334), cytokinin-inducible promoter (Guevara 35S::G3649 direct promoter fusion and carries KanR. The Garcia (1998) Plant Mol. Biol. 38: 743-753), promoters construct contains a cDNA clone of G3649. responsive to gibberellin (Shi et al. (1998) Plant Mol. Biol. 0142 P23455 (SEQ ID NO: 221) contained a 38: 1053-1060, Willmott et al. (1998) Plant Mol. Biol. 38: 35S::G3644 direct promoter fusion and carries KanR. The 817-825) and the like. Additional promoters are those that construct contains a cDNA clone of G3644. elicit expression in response to heat (Ainley et al. (1993) Plant Mol. Biol. 22: 13-23), light (e.g., the pea rbcS-3A 0143 P23465 (SEQ ID NO: 222) contained a promoter, Kuhlemeier et al. (1989) Plant Cell 1: 471, and the 35S::G3643 direct fusion and carries KanR. The construct maize rbcS promoter, Schaffner and Sheen (1991) Plant Cell harbors a cDNA clone of G3643. 3: 997); wounding (e.g., wunI, Siebertz et al. (1989) Plant 0144) P25402 (SEQ ID NO: 223) contained a Cell 1: 961); pathogens (such as the PR-1 promoter 35S::G3650 direct fusion and carries KanR. The construct described in Buchel et al. (1999) Plant Mol. Biol. 40: contains a cDNA clone. 387-396, and the PDF1.2 promoter described in Manners et al. (1998) Plant Mol. Biol. 38: 1071-80), and chemicals such 0145 The two constructs P5318 (SEQ ID NO: 225; as methyl jasmonate or salicylic acid (Gatz et al. (1997) STM::LexA-GAL4TA) and P3853 (SEQ ID NO: 224; Plant Mol. Biol. 48: 89-108). In addition, the timing of the opLeXA::G47) together constitute a two-component system expression can be controlled by using promoters such as for expression of G47 from the STM promoter. Kanamycin those acting at Senescence (An and Amazon (1995) Science resistant transgenic lines containing P5318 were established US 2006/0195944 A1 Aug. 31, 2006

(lines #5 and #10), and these were then supertransformed marker. SEQ ID NO: 242 is the predicted polypeptide that with the P3853 construct containing a cDNA clone of G47 results expression of the vector comprising SEQ ID NO: and a Sulfonamide resistance marker. 239. 0146) The two constructs P5288 (SEQ ID NO: 226: 0154 Similar to constructs made with G47, other vectors CUT1:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; may be produced that incorporate a promoter and other opLeXA::G47) together constitute a two-component system transcription factor polynucleotide combination. For for expression of G47 from the CUT1 promoter. A kanamy example, the two constructs P9002 (SEQ ID NO: 237; cin resistant transgenic line containing P5288 was estab RD29A::LexA-GAL4TA) and P4361 (SEQ ID NO: 227; lished, and this was then supertransformed with the P3853 opLeXA::G2133) together constitute a two-component sys construct containing a cDNA clone of G47 and a sulfona tem for expression of G2133 from the RD29A promoter. A mide resistance marker. kanamycin resistant transgenic line containing P9002 was established, and this was then supertransformed with the 0147 The two constructs P5284 (SEQ ID NO: 235; P4361 construct containing a cDNA clone of G2133 and a RBCS3:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; Sulfonamide resistance marker. opLeXA::G47) together constituted a two-component sys tem for expression of G47 from the RBCS3 promoter. A Additional Expression Elements kanamycin resistant transgenic line containing P5284 was 0.155 Specific initiation signals can aid in efficient trans established, and this was then supertransformed with the lation of coding sequences. These signals can include, e.g., P3853 construct containing a cDNA clone of G47 and a the ATG initiation codon and adjacent sequences. In cases Sulfonamide resistance marker. where a coding sequence, its initiation codon and upstream 0148. The two constructs P5290 (SEQ ID NO. 234: sequences are inserted into the appropriate expression vec SUC2:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; tor, no additional translational control signals may be opLeXA::G47) together constitute a two-component system needed. However, in cases where only coding sequence for expression of G47 from the SUC2 promoter. A kanamy (e.g., a mature protein coding sequence), or a portion cin resistant transgenic line containing P5290 was estab thereof, is inserted, exogenous transcriptional control sig lished, and this was then supertransformed with the P3853 nals including the ATG initiation codon can be separately construct containing a cDNA clone of G47 and a sulfona provided. The initiation codon is provided in the correct mide resistance marker. reading frame to facilitate transcription. Exogenous tran scriptional elements and initiation codons can be of various 0149) The two constructs P5311 (SEQ ID NO: 236; origins, both natural and synthetic. The efficiency of expres ARSK1:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; sion can be enhanced by the inclusion of enhancers appro opLeXA::G47) together constitute a two-component system priate to the cell system in use. for expression of G47 from the ARSK1 promoter. A kana mycin resistant transgenic line containing P5311 was estab Expression Hosts lished, and this was then supertransformed with the P3853 0156 The present invention also relates to host cells construct containing a cDNA clone of G47 and a sulfona which are transduced with vectors of the invention, and the mide resistance marker. production of polypeptides of the invention (including frag ments thereof) by recombinant techniques. Host cells are 0150. The two constructs P9002 (SEQ ID NO. 237; genetically engineered (i.e., nucleic acids are introduced, RD29A::LexA-GAL4TA) and P3853 (SEQ ID NO: 224; e.g., transduced, transformed or transfected) with the vectors opLeXA::G47) together constitute a two-component system of this invention, which may be, for example, a cloning for expression of G47 from the RD29A promoter. A kana vector or an expression vector comprising the relevant mycin resistant transgenic line (#5) containing P9002 was nucleic acids herein. The vector is optionally a plasmid, a established, and this was then supertransformed with the viral particle, a phage, a naked nucleic acid, etc. The P3853 construct containing a cDNA clone of G47 and a engineered host cells can be cultured in conventional nutri Sulfonamide resistance marker. ent media modified as appropriate for activating promoters, 0151. The two constructs P5326 (SEQ ID NO. 238; selecting transformants, or amplifying the relevant gene. AP1:LexA-GAL4TA) and P3853 (SEQ ID NO: 224; The culture conditions, such as temperature, pH and the like, opLeXA::G47) together constitute a two-component system are those previously used with the host cell selected for for expression of G47 from the AP1 promoter. A kanamycin expression, and will be apparent to those skilled in the art resistant transgenic line containing P5326 was established, and in the references cited herein, including, Sambrook and and this was then supertransformed with the P3853 construct Ausubel. containing a cDNA clone of G47 and a sulfonamide resis 0157 The host cell can be a eukaryotic cell, such as a tance marker. yeast cell, or a plant cell, or the host cell can be a prokaryotic 0152 P25186 (SEQID NO: 239) contains a 35S::GAL4 cell. Such as a bacterial cell. Plant protoplasts are also G47 fusion and carries KanR (addition to the G47 protein of suitable for some applications. For example, the DNA a strong transcription activation domain from the yeast fragments are introduced into plant tissues, cultured plant GAL4 gene). SEQ ID NO: 240 is the predicted polypeptide cells or plant protoplasts by Standard methods including that results expression of the vector comprising SEQID NO: electroporation (Fromm et al. (1985) Proc. Natl. Acad. Sci. 239. USA 82: 5824-5828, infection by viral vectors such as cauliflower mosaic virus (CaMV) (Hohn et al. (1982) 0153 P25279 (SEQ ID NO: 241) carries a 35S:G47 Molecular Biology of Plant Tumors, (Academic Press, New GFP fusion directly fused to the 35S promoter and a KanR York) pp. 549-560; U.S. Pat. No. 4,407,956), high velocity US 2006/0195944 A1 Aug. 31, 2006 20 ballistic penetration by small particles with the nucleic acid (Small or large molecules) and/or inorganic compounds that either within the matrix of small beads or particles, or on the affect expression of (i.e., regulate) a particular transcription surface (Klein et al., (1987) Nature 327: 70-73), use of factor. Alternatively, Such molecules include endogenous pollen as vector (WO 85/01856), or use of Agrobacterium molecules that are acted upon either at a transcriptional level tumefaciens or A. rhizogenes carrying a T-DNA plasmid in by a transcription factor of the invention to modify a which DNA fragments are cloned. The T-DNA plasmid is phenotype as desired. For example, the transcription factors transmitted to plant cells upon infection by Agrobacterium can be employed to identify one or more downstream gene tumefaciens, and a portion is stably integrated into the plant with which is subject to a regulatory effect of the transcrip genome (Horsch et al. (1984) Science 233: 496-498; Fraley tion factor. In one approach, a transcription factor or tran et al. (1983) Proc. Natl. Acad. Sci. USA 80: 4803-4807). Scription factor homologue of the invention is expressed in a host cell, e.g., a transgenic plant cell, tissue or explant, and 0158. The cell can include a nucleic acid of the invention expression products, either RNA or protein, of likely or which encodes a polypeptide, wherein the cells expresses a random targets are monitored, e.g., by hybridization to a polypeptide of the invention. The cell can also include microarray of nucleic acid probes corresponding to genes vector sequences, or the like. Furthermore, cells and trans expressed in a tissue or cell type of interest, by two genic plants that include any polypeptide or nucleic acid dimensional gel electrophoresis of protein products, or by above or throughout this specification, e.g., produced by any other method known in the art for assessing expression transduction of a vector of the invention, are an additional of gene products at the level of RNA or protein. Alterna feature of the invention. tively, a transcription factor of the invention can be used to 0159 For long-term, high-yield production of recombi identify promoter sequences (i.e., binding sites) involved in nant proteins, stable expression can be used. Host cells the regulation of a downstream target. After identifying a transformed with a nucleotide sequence encoding a polypep promoter sequence, interactions between the transcription tide of the invention are optionally cultured under conditions factor and the promoter sequence can be modified by suitable for the expression and recovery of the encoded changing specific nucleotides in the promoter sequence or protein from cell culture. The protein or fragment thereof specific amino acids in the transcription factor that interact produced by a recombinant cell may be secreted, membrane with the promoter sequence to alter a plant trait. Typically, bound, or contained intracellularly, depending on the transcription factor DNA-binding sites are identified by gel sequence and/or the vector used. As will be understood by shift assays. After identifying the promoter regions, the those of skill in the art, expression vectors containing promoter region sequences can be employed in double polynucleotides encoding mature proteins of the invention stranded DNA arrays to identify molecules that affect the can be designed with signal sequences which direct secre interactions of the transcription factors with their promoters tion of the mature polypeptides through a prokaryotic or (Bulyket al. (1999) Nature Biotechnol. 17: 573-577). eukaryotic cell membrane. 0164. The identified transcription factors are also useful Modified Amino Acid Residues to identify proteins that modify the activity of the transcrip tion factor. Such modification can occur by covalent modi 0160 Polypeptides of the invention may contain one or fication, such as by phosphorylation, or by protein-protein more modified amino acid residues. The presence of modi (homo or -heteropolymer) interactions. Any method suitable fied amino acids may be advantageous in, for example, for detecting protein-protein interactions can be employed. increasing polypeptide half-life, reducing polypeptide anti Among the methods that can be employed are co-immuno genicity or toxicity, increasing polypeptide storage stability, precipitation, cross-linking and co-purification through gra or the like. Amino acid residue(s) are modified, for example, dients or chromatographic columns, and the two-hybrid co-translationally or post-translationally during recombinant yeast system. production or modified by Synthetic or chemical means. 0.165. The two-hybrid system detects protein interactions 0161 Non-limiting examples of a modified amino acid in vivo and is described in Chien et al. (1991), Proc. Natl. residue include incorporation or other use of acetylated Acad. Sci. USA 88: 9578-9582) and is commercially avail amino acids, glycosylated amino acids, Sulfated amino able from Clontech (Palo Alto, Calif.). In such a system, acids, prenylated (e.g., farnesylated, geranylgeranylated) plasmids are constructed that encode two hybrid proteins: amino acids, PEG modified (e.g., “PEGylated) amino one consists of the DNA-binding domain of a transcription acids, biotinylated amino acids, carboxylated amino acids, activator protein fused to the TF polypeptide and the other phosphorylated amino acids, etc. References adequate to consists of the transcription activator protein’s activation guide one of skill in the modification of amino acid residues domain fused to an unknown protein that is encoded by a are replete throughout the literature. cDNA that has been recombined into the plasmid as part of a cDNA library. The DNA-binding domain fusion plasmid 0162 The modified amino acid residues may prevent or and the cDNA library are transformed into a strain of the increase affinity of the polypeptide for another molecule, yeast Saccharomyces cerevisiae that contains a reporter gene including, but not limited to, polynucleotide, proteins, car (e.g., lacZ) whose regulatory region contains the transcrip bohydrates, lipids and lipid derivatives, and other organic or tion activator's binding site. Either hybrid protein alone synthetic compounds. cannot activate transcription of the reporter gene. Interaction of the two hybrid proteins reconstitutes the functional acti Identification of Additional Factors vator protein and results in expression of the reporter gene, 0163 A transcription factor provided by the present which is detected by an assay for the reporter gene product. invention can also be used to identify additional endogenous Then, the library plasmids responsible for reporter gene or exogenous molecules that can affect a phenotype or trait expression are isolated and sequenced to identify the pro of interest. On the one hand, such molecules include organic teins encoded by the library plasmids. After identifying US 2006/0195944 A1 Aug. 31, 2006

proteins that interact with the transcription factors, assays e.g., U.S. Pat. No. 5,539,083), and small organic molecule for compounds that interfere with the TF protein-protein libraries (see, e.g., benzodiazepines, Baum Chem. Eng. interactions can be preformed. News January 18, page 33 (1993); isoprenoids, U.S. Pat. No. 5,569,588; thiazolidinones and metathiazanones, U.S. Pat. Identification of Modulators No. 5,549,974; pyrrolidines, U.S. Pat. Nos. 5,525,735 and 0166 In addition to the intracellular molecules described 5,519,134; morpholino compounds, U.S. Pat. No. 5,506, above, extracellular molecules that alter activity or expres 337) and the like. sion of a transcription factor, either directly or indirectly, can be identified. For example, the methods can entail first 0170 Preparation and screening of combinatorial or placing a candidate molecule in contact with a plant or plant other libraries is well known to those of skill in the art. Such cell. The molecule can be introduced by topical administra combinatorial chemical libraries include, but are not limited tion, such as spraying or soaking of a plant, and then the to, peptide libraries (see, e.g., U.S. Pat. No. 5,010, 175: Furka molecule's effect on the expression or activity of the TF (1991) Int. J. Pept. Prot. Res. 37: 487-493; and Houghton et polypeptide or the expression of the polynucleotide moni al. (1991) Nature 354: 84-88). Other chemistries for gener tored. Changes in the expression of the TF polypeptide can ating chemical diversity libraries can also be used. be monitored by use of polyclonal or monoclonal antibodies, 0171 In addition, as noted, compound screening equip gel electrophoresis or the like. Changes in the expression of ment for high-throughput Screening is generally available, the corresponding polynucleotide sequence can be detected e.g., using any of a number of well known robotic systems by use of microarrays, Northerns, quantitative PCR, or any that have also been developed for solution phase chemistries other technique for monitoring changes in mRNA expres useful in assay systems. These systems include automated Sion. These techniques are exemplified in Ausubel et al. workstations including an automated synthesis apparatus (eds) Current Protocols in Molecular Biology, John Wiley & and robotic systems utilizing robotic arms. Any of the above Sons (1998, and supplements through 2001). Such changes devices are suitable for use with the present invention, e.g., in the expression levels can be correlated with modified for high-throughput screening of potential modulators. The plant traits and thus identified molecules can be useful for nature and implementation of modifications to these devices soaking or spraying on fruit, vegetable and grain crops to (if any) so that they can operate as discussed herein will be modify traits in plants. apparent to persons skilled in the relevant art. 0167 Essentially any available composition can be tested 0172 Indeed, entire high throughput screening systems for modulatory activity of expression or activity of any are commercially available. These systems typically auto nucleic acid or polypeptide herein. Thus, available libraries mate entire procedures including all sample and reagent of compounds such as chemicals, polypeptides, nucleic pipetting, liquid dispensing, timed incubations, and final acids and the like can be tested for modulatory activity. readings of the microplate in detector(s) appropriate for the Often, potential modulator compounds can be dissolved in assay. These configurable systems provide high throughput aqueous or organic (e.g., DMSO-based) solutions for easy and rapid start up as well as a high degree of flexibility and delivery to the cell or plant of interest in which the activity customization. Similarly, microfluidic implementations of of the modulator is to be tested. Optionally, the assays are screening are also commercially available. designed to Screen large modulator composition libraries by 0173 The manufacturers of such systems provide automating the assay steps and providing compounds from detailed protocols the various high throughput. Thus, for any convenient Source to assays, which are typically run in example, Zymark Corp. provides technical bulletins describ parallel (e.g., in microtiter formats on microtiter plates in ing screening systems for detecting the modulation of gene robotic assays). transcription, ligand binding, and the like. The integrated 0168 In one embodiment, high throughput screening systems herein, in addition to providing for sequence align methods involve providing a combinatorial library contain ment and, optionally, synthesis of relevant nucleic acids, can ing a large number of potential compounds (potential modu include Such screening apparatus to identify modulators that lator compounds). Such “combinatorial chemical libraries' have an effect on one or more polynucleotides or polypep are then screened in one or more assays, as described herein, tides according to the present invention. to identify those library members (particular chemical spe 0.174. In some assays it is desirable to have positive cies or Subclasses) that display a desired characteristic controls to ensure that the components of the assays are activity. The compounds thus identified can serve as target working properly. At least two types of positive controls are compounds. appropriate. That is, known transcriptional activators or 0169. A combinatorial chemical library can be, e.g., a inhibitors can be incubated with cells/plants/etc. in one collection of diverse chemical compounds generated by sample of the assay, and the resulting increase/decrease in chemical synthesis or biological synthesis. For example, a transcription can be detected by measuring the resulting combinatorial chemical library Such as a polypeptide library increase in RNA/protein expression, etc., according to the is formed by combining a set of chemical building blocks methods herein. It will be appreciated that modulators can (e.g., in one example, amino acids) in every possible way for also be combined with transcriptional activators or inhibitors a given compound length (i.e., the number of amino acids in to find modulators that inhibit transcriptional activation or a polypeptide compound of a set length). Exemplary librar transcriptional repression. Either expression of the nucleic ies include peptide libraries, nucleic acid libraries, antibody acids and proteins herein or any additional nucleic acids or libraries (see, e.g., Vaughn et al. (1996) Nature Biotechnol., proteins activated by the nucleic acids or proteins herein, or 14:309-314 and PCT/US96/10287), carbohydrate libraries both, can be monitored. (see, e.g., Liang et al. Science (1996) 274: 1520-1522 and 0.175. In an embodiment, the invention provides a method U.S. Pat. No. 5,593,853), peptide nucleic acid libraries (see, for identifying compositions that modulate the activity or US 2006/0195944 A1 Aug. 31, 2006 22 expression of a polynucleotide or polypeptide of the inven a polypeptide fragment can comprise a recognizable struc tion. For example, a test compound, whether a small or large tural motif or functional domain such as a DNA binding molecule, is placed in contact with a cell, plant (or plant domain that binds to a specific DNA promoter region, an tissue or explant), or composition comprising the polynucle activation domain or a domain for protein-protein interac otide or polypeptide of interest and a resulting effect on the tions. cell, plant, (or tissue or explant) or composition is evaluated by monitoring, either directly or indirectly, one or more of Production of Transgenic Plants expression level of the polynucleotide or polypeptide, activ Modification of Traits ity (or modulation of the activity) of the polynucleotide or 0180. The polynucleotides of the invention are favorably polypeptide. In some cases, an alteration in a plant pheno employed to produce transgenic plants with various traits, or type can be detected following contact of a plant (or plant characteristics, that have been modified in a desirable man cell, or tissue or explant) with the putative modulator, e.g., ner, e.g., to improve the seed characteristics of a plant. For by modulation of expression or activity of a polynucleotide example, alteration of expression levels or patterns (e.g., or polypeptide of the invention. Modulation of expression or spatial or temporal expression patterns) of one or more of the activity of a polynucleotide or polypeptide of the invention transcription factors (or transcription factor homologues) of may also be caused by molecular elements in a signal the invention, as compared with the levels of the same transduction second messenger pathway and Such modula protein found in a wild type plant, can be used to modify a tion can affect similar elements in the same or another signal plant's traits. An illustrative example of trait modification, transduction second messenger pathway. improved characteristics, by altering expression levels of a Subsequences particular transcription factor is described further in the 0176). Also contemplated are uses of polynucleotides, Examples and the Sequence Listing. also referred to herein as oligonucleotides, typically having Arabidopsis as a Model System at least 12 bases, preferably at least 15, more preferably at 0181 Arabidopsis thaliana is the object of rapidly grow least 20, 30, or 50 bases, which hybridize under at least ing attention as a model for genetics and metabolism in highly stringent (or ultra-high Stringent or ultra-ultra-high plants. Arabidopsis has a small genome, and well docu stringent conditions) conditions to a polynucleotide mented Studies are available. It is easy to grow in large sequence described above. The polynucleotides may be used numbers and mutants defining important genetically con as probes, primers, sense and antisense agents, and the like, trolled mechanisms are either available, or can readily be according to methods as noted Supra. obtained. Various methods to introduce and express isolated 0177 Subsequences of the polynucleotides of the inven homologous genes are available (see Koncz, et al., eds. tion, including polynucleotide fragments and oligonucle Methods in Arabidopsis Research. et al. (1992), World otides are useful as nucleic acid probes and primers. An Scientific, New Jersey, New Jersey, in “Preface”). Because oligonucleotide Suitable for use as a probe or primer is at of its Small size, short life cycle, obligate autogamy and high least about 15 nucleotides in length, more often at least fertility, Arabidopsis is also a choice organism for the about 18 nucleotides, often at least about 21 nucleotides, isolation of mutants and studies in morphogenetic and frequently at least about 30 nucleotides, or about 40 nucle development pathways, and control of these pathways by otides, or more in length. A nucleic acid probe is useful in transcription factors (Koncz, Supra, p. 72). A number of hybridization protocols, e.g., to identify additional polypep studies introducing transcription factors into A. thaliana tide homologues of the invention, including protocols for have demonstrated the utility of this plant for understanding microarray experiments. Primers can be annealed to a the mechanisms of gene regulation and trait alteration in complementary target DNA strand by nucleic acid hybrid plants. See, for example, Koncz, supra, and U.S. Pat. No. ization to form a hybrid between the primer and the target 6,417,428). DNA strand, and then extended along the target DNA strand by a DNA polymerase enzyme. Primer pairs can be used for Arabidopsis Genes in Transgenic Plants. amplification of a nucleic acid sequence, e.g., by the poly 0182 Expression of genes which encode transcription merase chain reaction (PCR) or other nucleic-acid amplifi factors modify expression of endogenous genes, polynucle cation methods. See Sambrook and Ausubel, Supra. otides, and proteins are well known in the art. In addition, transgenic plants comprising isolated polynucleotides 0178. In addition, the invention includes an isolated or encoding transcription factors may also modify expression recombinant polypeptide including a Subsequence of at least of endogenous genes, polynucleotides, and proteins. about 15 contiguous amino acids encoded by the recombi Examples include Peng et al. (1997) Genes Develop. 11: nant or isolated polynucleotides of the invention. For 3194-3205) and Peng et al. (1999) Nature 400: 256-261). In example, Such polypeptides, or domains or fragments addition, many others have demonstrated that an Arabidop thereof, can be used as immunogens, e.g., to produce anti sis transcription factor expressed in an exogenous plant bodies specific for the polypeptide sequence, or as probes for species elicits the same or very similar phenotypic response. detecting a sequence of interest. A Subsequence can range in See, for example, Fu et al. (2001) Plant Cell 13: 1791-1802); size from about 15 amino acids in length up to and including Nandi et al. (2000) Curr. Biol. 10: 215-218); Coupland the full length of the polypeptide. (1995) Nature 377: 482-483); and Weigel and Nilsson 0179 To be encompassed by the present invention, an (1995, Nature 377: 482-500). expressed polypeptide which comprises such a polypeptide Subsequence performs at least one biological function of the Homologous Genes Introduced into Transgenic Plants. intact polypeptide in Substantially the same manner, or to a 0183 Homologous genes that may be derived from any similar extent, as does the intact polypeptide. For example, plant, or from any source whether natural, synthetic, semi US 2006/0195944 A1 Aug. 31, 2006 synthetic or recombinant, and that share significant sequence 0186 The first column of Table 4 shows the polynucle identity or similarity to those provided by the present otide SEQ ID NO; the second column shows the Mendel invention, may be introduced into plants, for example, crop Gene ID No., GID; the third column shows the transcription plants, to confer desirable or improved traits. Consequently, factor family to which the polynucleotide belongs; the fourth transgenic plants may be produced that comprise a recom column shows the category of the trait; the fifth column binant expression vector or cassette with a promoter oper shows the trait(s) resulting from the knock out or overex ably linked to one or more sequences homologous to pres pression of the polynucleotide in the transgenic plant; the ently disclosed sequences. The promoter may be, for sixth column (“Comment'), includes specific effects and example, a plant or viral promoter. utilities conferred by the polynucleotide of the first column; 0184 The invention thus provides for methods for pre the seventh column shows the SEQ ID NO of the polypep paring transgenic plants, and for modifying plant traits. tide encoded by the polynucleotide; and the eighth column These methods include introducing into a plant a recombi shows the amino acid residue positions of the conserved nant expression vector or cassette comprising a functional domain in amino acid (AA) co-ordinates. promoter operably linked to one or more sequences homolo 0187. The first column (Col. 1) of Table 4 lists the SEQ gous to presently disclosed sequences. Plants and kits for ID NO: of presently disclosed polynucleotide sequences. producing these plants that result from the application of The second column lists the corresponding GID number. these methods are also encompassed by the present inven The third column shows the transcription factor family in tion. which each of the respective sequences is found. The fourth column lists the conserved domains in amino acid coordi Traits of Interest nates of the respective encoded polypeptide sequences. The 0185. Examples of some of the traits that may be desir fifth and sixth columns list the trait category and specific able in plants, and that may be provided by transforming the traits observed for plants overexpressing the respective plants with the presently disclosed sequences, are listed in sequences (except where noted as “KO in Col. 2 for plants Tables 4 and 6. in which the respective sequence was knocked out).

TABLE 4 Sequences of the invention and the traits they confer in plants

Col. 1 Col. 4 SEQ ID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) 1 G1272 PAZ 800-837 Seed glucosinolates Decrease in seed glucosinolate M39497 3 G1506 GATAZn 7 33 Seed glucosinolates Increase in glucosinolates M395.02 and M39498 5 G1897 Z-DOf 34 -62 Seed glucosinolates Increase in seed glucosinolates M39491 and M394.93 7 G1946 HS 37 -128 Seed glucosinolates Increase in seed glucosinolate M39501 Increased tolerance to phosphate-free media 9 G2113 AP2 SS-122 Seed glucosinolates Decrease in seed glucosinolate M39497, increase of glucosinolates M39501, M394.94 and M39478 11 G2117 bZIP 46-106 Seed glucosinolates Decrease in M394.96 13 G2155 AT-hook 18-38 Seed glucosinolates increase in M39497 Plant size Large plant size 1S G2290 WRKY 147 205 Seed glucosinolates increase in M39496 14 120 Seed glucosinolates Altered glucosinolate profile 41–61, 84-104 Seed glucosinolates increase in M394.94 23 G484 CAAT 11-104 Seed glucosinolates Altered glucosinolate (KO) profile 20-120 Seed glucosinolates increase in M395.01

27 G1052 bZIP 201-261 Seed prenyl lipids Decrease in lutein and increase in Xanthophyll 1 14-119 Seed prenyl lipids Decreased seed lutein

31 G1930 AP2 59-124, Seed prenyl lipids increased chlorophyll a 179. 273 and b content C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay US 2006/0195944 A1 Aug. 31, 2006 24

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQ ID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) 33 G214 MYB 25 71 Seed prenyl lipids; increased seed lutein; related leaf fatty acids; increased leaf fatty acids; prenyl lipids increased chlorophyll, carotenoids Plant size Larger biomass (increased leaf number and size Prenyl lipids Darker green in vegetative and reproductive tissues due o a higher chlorophyll content in the later stages of development 35 G2509 AP2 89-156 Seed prenyl lipids increase in C-tocopherol 37 G252O HLHAMYC 139-197 Seed prenyl lipids; increase in seed 8 leaf glucosinolates ocopherol and decrease in seed Y-tocopherol.: increase in M39478 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 39 G259 HS 40-131 Seed prenyl lipids increase in C-tocopherol 41 G490 CAAT 48-143 Seed prenyl lipids increase in seed 8 ocopherol 43 G652 Z-CLDSH 28 49, 137-151, Seed prenyl lipids; increase in C-tocopherol; 182-196 leaf glucosinolates increase in M394.80 45 G748 112-140 Seed prenyl lipids increased lutein content 47 G883 245-302 Seed prenyl lipids Decreased seed lutein 49 G20 68-144 Seed sterols increase in campesterol 51 G974 80-147 Seed oil content Altered seed oil content S3 G2343 14-116 Seed oil content increased seed oil content

55 G1777 124 247 Seed oil and protein increased seed oil content content and decreased seed protein 57 G229 14 120 Biochemistry: other Up-regulation of genes involved in secondary metabolism; Genes coding for enzymes involved in alkaloid biosynthesis including indole-3-glycerol phosphatase and strictosidine synthase were induced; genes for enzymes involved in aromatic amino acid biosynthesis were also up-regulated including tryptophan synthase and tyrosine transaminase. Phenylalanine ammonia yase, chalcone synthase and trans-cinnamate mono-oxygenase, involved in phenylpropenoid biosynthesis, were also induced S9 G663 9-111 Biochemistry: other increased anthocyanins in eaf, root, seed 61 G362 62-82 Biochemistry: other increased trichome density and trichome products; increased anthocyanins in various issues 63 G21 OS TH 100-153 Biochemistry: other increased trichome density and trichome products US 2006/0195944 A1 Aug. 31, 2006 25

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQ ID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) 6S G47 AP2 11–80 Flowering Time Increased lignin content Biochemistry: other Increased cold tolerance Abiotic stress Increased drought tolerance tolerance Increased desiccation tolerance Increased salt tolerance Late flowering Dark green Increased leaf size, larger rosettes and/or increased amount of vegetative tissue 67 G2123 GF14 99-109 Biochemistry: other Putative 14-3-3 protein 69 G1266 AP2 79-147 Leaf fatty acids, Changes in leaf fatty insoluble Sugars; acids, insoluble Sugars, decreased sensitivity to ABA C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 71 G1337 Z-CO-like Leaf fatty acids increase in the amount of oleic acid Sugar sensing Decreased tolerance to SUCOS 73 G1399 AT-hook 86-93 Leaf fatty acids increase of the percentage of the 16:0 fatty acid 7S G146S NAC 242-306 Leaf fatty acids increases in the percentages of 16:0, 16:1, 8:0 and 18:2 and decreases in 16:3 and 8:3 fatty acids 77 G1512 RINGi C3HC4 39-93 Leaf fatty acids increase in 18:2 fatty acids 79 G1537 HB 14 74 Leaf fatty acids Altered leaf fatty acid composition 81 G2136 MADS 43-1OO Leaf fatty acids Decrease in 18:3 fatty acid 83 G2147 HLHAMYC 163-220 Leaf fatty acids increase in 16:0, increase in 18:2 fatty acids 85 G377 RINGi C3H2C3 85-128 Leaf fatty acids increased 18:2 and decreased 18:3 leaf fatty acids 87 G962 NAC 53-175 Leaf fatty acids increased 16:0 and decreased 18:3 leaf fatty acids 89 G975 AP2 4 71 Leaf fatty acids increased wax in leaves increased C29, C31, and C33 alkanes increased up o 10-fold compared to control plants; More drought tolerant han controls C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 91 G987 SCR 395–462, Leaf fatty acids; leaf Reduction in 16:3 fatty 525-613, prenyl lipids acids; altered chlorophyll, 1027-1102, ocopherol, carotenoid 1162-1255 93 G1069 AT-hook 67 74 Leaf and seed Altered leaf glucosinolate glucosinolates composition increased seed glucosinolate M39497 increased 16:0 fatty acid, decreased 18:2 fatty acids, decreased sensitivity to ABA US 2006/0195944 A1 Aug. 31, 2006 26

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQ ID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 95 G1198. 173-223 Leaf glucosinolates increase in M39481 97 G1322 26-130 Leaf glucosinolates increase in M394.80 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 99 G1421 AP2 74-151 Leaf glucosinolates increased leaf content of glucosinolate M39482 101 G1794 AP2 182-249 Leaf glucosinolates increased leaf content of glucosinolate M3948.0 103 G2144 HLHAMYC 2O7 265 Leaf glucosinolates increased leaf content of glucosinolate M3948.0 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay OS G2S12 AP2 79-147 Leaf glucosinolates increased leaf content of glucosinolate M39481 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay O7 G2552 HLHAMYC 124 181 Leaf glucosinolates increased leaf content of glucosinolate M3948.0 09 G264 HS 23-114 Leaf glucosinolates increased leaf content of glucosinolate M39481 11 G681 MYB 14 120 Leaf glucosinolates increased leaf content of (R1)R2R3 glucosinolate M3948.0 13 G1012 WRKY Leaf insoluble Sugars Decreased rhamnose 1S G1309 MYB Leaf insoluble Sugars Ce3S(O 8OS (R1)R2R3 17 G158 MADS Leaf inso e Sugars Ce3St. rhamnose 19 G1641 MYB Leaf insoluble Sugars Ce3St. rhamnose related 21 G1865 GRF-like Leaf insoluble Sugars increased galactose, decreased xylose 23 G2094 43-68 Leaf inso l b e Sugars increase in arabinose 2S G211 24-137 Leaf inso l b e Sugars increase in xylose

27 G242 6-105 Leaf insoluble Sugars increased arabinose

29 G2S89 1 57 Leaf inso l b e Sugars increase in arabinose 31 G274 94-600 Leaf inso l b e Sugars increased leafarabinose 33 G598 2OS-263 Leaf inso l b e Sugars Altered insoluble Sugars; (increased galactose evels) 3S G1543 HB 135-195 Leaf prenyl lipids increase in chlorophyll a and b increased biomass 37 G280 AT-hook 97-104, Leaf prenyl lipids increased 8- and Y 130–137-155-162, ocopherol 185-192 39 G2131 AP2 50–121, Leaf sterols increase in campesterol 146 217 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 41 G2424 107 219 Leaf sterols increase in stigmaStanol

43 G2583 4 71 Le3 W8X Glossy leaves, increased Flowering time epicuticular wax content or altered composition Late developing, late flowering time 147 G977 AP2 5 72 Leaf wax Altered epiculticular wax content or composition 151 G2133 AP2 11–82 Flowering Time increased cold tolerance Biochemistry: other increased drought Abiotic stress olerance olerance increased desiccation olerance US 2006/0195944 A1 Aug. 31, 2006 27

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQ ID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) increased salt tolerance Late flowering Dark green increased leaf size and/or arger rosette increased seed size 157 G3643 AP2 14 79 Flowering Time increased cold tolerance Biochemistry: other increased drought Abiotic stress olerance tolerance increased desiccation olerance increased heat tolerance Late flowering Dark green Larger plants 1SS G3644 AP2 SS-102 Flowering Time increased salt tolerance Biochemistry: other Late flowering Abiotic stress Dark green tolerance Large seedlings Large rosettes with long, broad leaves 153 G3649 AP2 18-61 Flowering Time increased cold tolerance Biochemistry: other increased drought Abiotic stress olerance tolerance increased desiccation olerance Decreased heat tolerance Late flowering Dark green Larger rosettes Large cauline leaves 145 G1387 AP2 4-68 Few lines of overexpressors have been produced or examined 149 G4294 AP2 5 72 Overexpressors not yet produced or examined

Abbreviations: KO—knockout

0188 Table 5 lists a summary of orthologous and NCBI GenBank database associated with NCBI taxonomic homologous sequences identified using BLAST (tblastX ID 33090 (Viridiplantae; all plants) and excluding entries program). The first column shows the polynucleotide associated with taxonomic ID 3701 (Arabidopsis thaliana). sequence identifier (SEQID NO), the second column shows These sequences are compared to those listed in the the corresponding cDNA identifier (Gene ID), the third Sequence Listing, using the Washington University column shows the orthologous or homologous polynucle TBLASTX algorithm (version 2.0a19 MP) at the default otide GenBank Accession Number (Test Sequence ID), the settings using gapped alignments with the filter "off. For fourth column shows the calculated probability value that each sequence listed in the Sequence Listing, individual the sequence identity is due to chance (Smallest Sum comparisons were ordered by probability score (P-value), Probability), the fifth column shows the plant species from where the score reflects the probability that a particular which the test sequence was isolated (Test Sequence Spe alignment occurred by chance. For example, a score of cies), and the sixth column shows the orthologous or 3.6e-40 is 3.6x10'. In addition to P-values, comparisons homologous test sequence GenBank annotation (Test were also scored by percentage identity. Percentage identity Sequence GenBank Annotation). reflects the degree to which two segments of DNA or protein 0189 Of the identified sequences homologous to the are identical over a particular length. The identified homolo Arabidopsis sequences provided in Table 5, the percent gous polynucleotide and polypeptide sequences and sequence identity among these sequences can be as low as homologs of the Arabidopsis polynucleotides and polypep 47%, or even lower sequence identity. The entire NCBI tides may be orthologs of the Arabidopsis polynucleotides GenBank database was filtered for sequences from all plants and polypeptides and/or closely, phylogenetically-related except Arabidopsis thaliana by selecting all entries in the Sequences. US 2006/0195944 A1 Aug. 31, 2006 28

TABLE 5 Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation 19 G671 G2340, 7 1.OE-103 Arabidopsis thaliana BG269414 G2340, 1.60 E-45 Mesembryanthemum L0-3478T3 Ice plant crystallinum Lambda Un BG448527 G2340, 5.30 E-41 Medicago truncatula NFO36FO4RT1F1032 Developing root Medica AIf30649 G2340, 1.1OE Gossypium hirsutium BNLGHTS95 Six day Cotton fiber Gossypiu AW70 6006 G2340, 1.2OE Glycine max skó4f05.y1 Gm c1016 Glycine max cDNA clone GENO PHMYBPH31 G2340, Petunia X hybrida P hybrida myb.Ph3 gene encoding protein A491024 G2340, 4.1OE Lycopersicon EST241733 tomato escientiin shoot, Cornell Lyc AMMIXTA G2340, Antirrhinum maius A. maints mixta mRNA. OSMYB1355 G2340, Oryza sativa O. Saiiva mRNA for myb factor, 1355 bp. BE4953OO G2340, Secale cereale WHE1268 FO2 KO4ZS Secale cereale anther cDNA BG300704 G2340, Hordeum vulgare HVSMEbOO18BO3f Hordeum vulgare Seedling sho gi2605617 G2340, 1...SO Oryza sativa OSMYB1. gi2O563 G2340, 7.30 Petunia X hybrida protein 1. gi485867 G2340, 4.OO Antirrhinum maius mixta. gi437327 G2340, 2.OO Gossypium hirsutium MYBA; putative. gi19051 G2340, 3.10 Hordeum vulgare MybHv1. gi227030 G2340, 3.10 Hordeum vulgare myb-related gene var. distichum Hv1. gi1101770 G2340, 7 3 8 Picea mariana MYB-like transcriptional factor MBF1. gi1430846 G2340, 6.30 Lycopersicon myb-related escientiin transcription factor. gi5139814 G2340, 2.50 Glycine max GMYB29B2. gió651292 G2340, 1.70 Pimpinella myb-related brachycarpa transcription factor. 145 G1387 G2583, 43 6.OO Arabidopsis thaliana 89 G975 G2583, 43 3.00 Arabidopsis thaliana 149 G4294 G2583, 43 2.OO Oryza sativa AW92846S G2583, 43 140 43 Lycopersicon EST337253 tomato escientiin flower buds 8 mm tSm80e10.y1 BEO23297 G2583, 43 42 Glycine max Gm c1015 Glycine max cDNA clone GENO APOO3615 G2583, 43 Oryza sativa chromosome 6 clone P0486H12, *** SEQUENCING IN AUO88998 G2583, 43 2.90E Lotus japonicus AUO88998 Lois japonicus flower bud cDNA Lo ATOO1828 G2583, 43 Brassica rapa Subsp. AT001828 Flower pekinensis bud cDNA Br BG415973 G2583, 43 Hordeum vulgare HVSMEkOOO9EO6f Hordeum vulgare testaperica BF647090 G2583, 43 Medicago truncatula NFOO7A06EC1F1038 Elicited cell culture BG560598 G2583, 43 2.90E Sorghum. RHIZ2 59 D07.b1 AO03 propinquilin Rhizome2 (RHIZ2). So US 2006/0195944 A1 Aug. 31, 2006 29

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation AWO11200 G2583, 43 6.60 -16 Pinus taeda ST17HO8 Pine TriplEx shoot tip library Pinus ta BF479478 G2583, 43 1.60 E 5 Mesembryanthemum L48-3155T3 Ice plant crystallinum Lambda U gi19507 G2583, 43 140 E 6 Lupinus polyphyllus put. pPLZ2 product (AA 1-164). gi10798644 G2583, 43 1.OO E 2 Nicotiana tabacum AP2 domain containing transcription fac gi8571476 G2583, 43 4.70 E 2 Atriplex hortensis apetala2 domain containing protein. gi2213783 G2583, 43 840 E 2 Lycopersicon Pt.S. escientiin gi8809573 G2583, 43 5.30 E 1 Nicotiana Sylvestris ethylene-responsive element binding gi4099914 G2583, 43 840 E 1 Stylosanthes hamata ethylene-responsive element binding p gió478845 G2583, 43 8.90 E 1 Matricaria ethylene-responsive chamomilia element binding gi15290041 G2583, 43 9.40 Oryza sativa hypothetical protein. gi12225.884 G2583, 43 1.70 Zea mays unnamed protein product. gi3264767 G2583, 43 340 E O Prints armeniaca AP2 domain containing protein. 242 G361 G362.6 7.OEe-17 Arabidopsis thaliana 244 G2826 G362.6 S.OE-14 Arabidopsis thaliana 246 G2838 G362.6 2.OE-12 Arabidopsis thaliana 248 G1995 G362.6 S.OE-10 Arabidopsis thaliana 250 G370 G362.6 S.OE-10 Arabidopsis thaliana BGS8113S G362.6 1.7OE-19 Medicago truncatula EST482865. GVN Medicago truncatula cDNA BI2O6903 G362.6 1 8 Lycopersicon ESTS24943 cTOS escientiin Lycopersicon esclien G362.6 7.3OE-17 Glycine max saa71c12.y1 Gm c1060 Glycine max cDNA clone GEN APOO3214 G362.6 3.OOE-12 Oryza sativa chromosome 1 clone OSJNBa0083M16, *** SEQUENCI BE366047 G362.6 6.4OE-12 Sorghum bicolor PI1 30 G05.b2 AO02 Pathogen induced 1 (PI1) BF616974 G362.6 1.90E-OS Hordeum vulgare HVSMEcOO14CO8f Hordeum vulgare Seedling sho BG444243 G362.6 Gossypium arboretin GA Ea?)023L22f Gossypium arboretin 7-10 d BESOO26S G362.6 O.OOO15 Titictim aestivitin WHEO981 F11 L2OZS Wheat pre-anthesis spik ABOO6604 G362.6 O.OOO23 Petunia X hybrida mRNA for ZPT2-9, complete cols. AI163O84 G362.6 OOOO4 Populus tremula X AO31pé5u Hybrid Populus tremuloides aspen gi15528588 G362.6 4.2OE-15 Oryza sativa hypothetical protein. gi2346984 G362.6 3.8OE-08 Petunia X hybrida ZPT2-9. gi7228329 G362.6 O.O12 Medicago Saiiva putative TFIIIA (or kruppel)-like Zinc fi gi1763063 G362.6 Glycine max SCOF-1 gi485814 G362.6 Titictim aestivitin WZF1. gi4666360 G362.6 Datisca glomerata zinc-finger protein 1. gi2O58504 G362.6 Brassica rapa zinc-finger protein-1. gi861091 G362.6 Pistin Saivitin putative Zinc finger protein. US 2006/0195944 A1 Aug. 31, 2006 30

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation gi2981169 G362.61 Nicotiana tabacum osmotic stress induced zinc-finger prot BM110736 OS, 63 3.7OE-45 Soianum tuberosum EST558272 potato roots Soianum tuberosum BF646615 OS, 63 Medicago truncatula NFO66CO8EC1F1065 Elicited cell culture ABOS2729 OS, 63 9.SOE-30 Pistin Saivitin mRNA for DNA binding protein DF1, complete cod OSNOOO22 OS, 63 1.1OE-26 Oryza sativa chromosome 4 clone OSJNBa0011L07, *** SEQUENC AIf 77.252 OS, 63 4.2OE-25 Lycopersicon EST 258217 tomato escientiin resistant, Cornell BMSOOO43 OS, 63 6.7OE-24 Zea mays 952036C09.y1 952 BMS tissue from Walbot Lab (red APOO4839 OS, 63 1.90E-23 Oryza sativa () chromosome 2 clo (japonica cultivar group) AW596787 OS, 63 Glycine max s16f10.y1 Gm-c1032 Glycine max cDNA clone GENO AV410715 OS, 63 Lotus japonicus AV410715 Lotus japonicits young plants (two BM357046 OS, 63 3.1OE-14 Triphysaria 16I-G5 Triphysaria versicolor versicolor root-t gi13646986 OS, 63 7...SOE-32 Pistin Saivitin DNA-binding protein DF1. gi20249 OS, 63 1.30 Oryza sativa gt-2. gi18182311 OS, 63 8.20 Glycine max GT-2 factor. gi8096269 OS, 63 O.24 Nicotiana tabacum KED. 167 G3645 G47.65 9.OE-64 Brassica rapa subsp. Pekinensis 151 G2133 G47.65 1.OE-47 Arabidopsis thaliana 16S G3646 G47.65 2.OE-46 Brassica oleracea 163 G3647 G47.65 2.OE-33 Zinnia elegans 157 G3643 G47.65 1.OE-29 Glycine max 155 G3644 G47.65 9.OEe-26 Oryza sativa (japonica cultivar group) 159 G47.65 1.OE-23 Zea mays 153 G47.65 1.OE-23 Oryza sativa (japonica cultivar group) 161 G3651 G47.65 9.OE-21 Oryza sativa (japonica cultivar group) BE32O193 G47.65 S.90E-23 Medicago truncatula NFO24BO4RT1F1029 Developing root Medica APOO3379 G47.65 Oryza sativa chromosome 1 clone P0408G07, *** SEQUENCING IN G47.65 7.90E-16 Lycopersicon EST3O2937 tomato escientiin root during after B434553 G47.65 8.90E-16 Soianum tuberosum EST537314 P. infestans-challenged leaf So BF6101.98 G47.65 1.3OE-15 Pinus taeda NXSI O55 HO4 F NXSI (Nsf Xylem Side wood Inclin US 2006/0195944 A1 Aug. 31, 2006 31

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Co ... 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation BE659994 G47.65 2.50 E -15 Glycine max 4-G2 GmaxSC Glycine max cDNA, mRNA sequence. BG446456 G47.65 E 5 Gossypium arboretin GA Eb0034.M18f Gossypium arboretin 7-10 d BG321374 G47.65 1.10 E 4 Deschirainia Sophia DSO1 06dO8 R DSO1. AAFC ECORC cold stress G47.65 240 E 4 Gossypium hirsutium BNLGH11133 Six day Cotton fiber Gossypi gi14140155 G47.65 2.90 E 6 Oryza sativa putative AP2 domain transcription factor. gi5616086 G47.65 7.90 E 4 Brassica naptis dehydration responsive element binding pro gi12225916 G47.65 8.70 Zea mays unnamed protein product. gi8571476 G47.65 1.30 Atriplex hortensis apetala2 domain containing protein. gi898.0313 G47.65 EEE 334 Catharanthus roseus AP2-domain DNA binding protein. gió478845 G47.65 E 2 Matricaria ethylene-responsive chamomilla element binding gi1208498 G47.65 6.40 Nicotiana tabacum EREBP-2. gi8809573 G47.65 2.2O Nicotiana Sylvestris ethylene-responsive element binding gi7528276 G47.65 340 E 1 Mesembryanthemum AP2-related crystallinum transcription f gi3342211 G47.65 4...SO E 1 Lycopersicon Pti4. escientiin 149 G4294 G975/89 2.OE-65 Oryza sativa 143 G2S83 G975/89 3.OE-56 Arabidopsis thaliana 145 G1387 G975/89 5.OE-54 Arabidopsis thaliana APOO3615 G975/89 1.10 E-51 Oryza sativa chromosome 6 clone P0486H12, *** SEQUENCING IN BG642554 G975/89 1.1OE Lycopersicon EST356O31 tomato escientiin flower buds, anthe AW 705973 G975/89 3.2OE Glycine max skó4c02.y1 Gm c1016 Glycine max cDNA clone GENO ATOO1828 G975/89 Brassica rapa Subsp. AT001828 Flower pekinensis bud cDNA Br BG415973 G975/89 Hordeum vulgare HVSMEkOOO9EO6f Hordeum vulgare testaperica AUO88998 G975/89 2.1OE Lotus japonicus AUO88998 Lois japonicus flower bud cDNA Lo AL377839 G975/89 Medicago truncatula MtBB34CO4F1 MtBB Medicago truncatula cD BF479478 G975/89 2.2OE Mesembryanthemum L48-3155T3 Ice plant crystallinum Lambda U BG560598 G975/89 Sorghum. RHIZ2 59 D07.b1 AO03 propinquilin Rhizome2 (RHIZ2). So L46408 G975/89 Brassica rapa BNAF1258 Mustard flower buds Brassica rapa cD gi19507 G975/89 2.1OE Lupinus polyphyllus put. pPLZ2 product (AA 1-164). gi2213783 G975/89 Lycopersicon Pt.S. escientiin gi8571476 G975/89 Atriplex hortensis apetala2 domain containing protein. US 2006/0195944 A1 Aug. 31, 2006 32

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation gi4099914 G975/89 7.90E-14 Stylosanthes hamata ethylene-responsive element binding p gió478845 G975/89 3.4OE-13 Matricaria ethylene-responsive chamomilia element binding gi12225.884 G975/89 Zea mays unnamed protein product. gi8809573 G975/89 7.OOE-13 Nicotiana Sylvestris ethylene-responsive element binding gi15290041 G975/89 1.2O Oryza sativa hypothetical protein. gi898.0313 G975/89 1.2O Catharanthus roseus AP2-domain DNA binding protein. gi7528276 G975/89 1.3OE-12 Mesembryanthemum AP2-related crystallinum transcription f 252 4,33 1.OE-116 Arabidopsis thaliana 4,33 Lycopersicon EST310415 tomato escientiin root deficiency, C BG156656 4,33 18OE-33 Glycine max sab31d11.y1 Gm c1026 Glycine max cDNA clone GEN BES97638 4,33 5.4OE-28 Sorghum bicolor PI1 72 CO5.b1 AO02 Pathogen induced 1 (PI1) BI272895 4,33 Medicago truncatula NFO91A11FL1F1084 Developing flower Medi BE129981 4,33 3.90E-23 Zea mays 945O34COSX1945 - Mixed adult tissues rom Walbot BF889434 4,33 7...SOE-14 Oryza sativa EST003 Magnaporthe grisea infected 16 day-old gi15528628 4,33 7.4OE-14 Oryza sativa hypothetical protein-similar to Oryza sativa gi7677132 4,33 O41 Secale cereale c-myb-like transcription factor. gi13676413 4,33 O43 Glycine max hypothetical protein. gi12406993 4,33 0.57 Hordeum vulgare MCB1 protein. gi940288 4,33 O.85 Pistin Saivitin protein localized in he nucleoli of peanu gi1279563 4,33 O.92 Medicago Saiiva nuM1. gi12005328 4,33 O.98 Hevea brasiiensis unknown. gi7688744 4,33 O.99 Lycopersicon aSc1. escientiin gi1070004 4,33 O.99 Brassica naptis Biotin carboxyl carrier protein. gi5326994 4,33 Daiiciis Caroia DNA topoisomerase I. 254 1.OE-76 Arabidopsis thaliana B421315 Lycopersicon EST53 1981 tomato escientiin callus, TAMU Lycop Glycine max sc38e09.y1 Gm c1014 Glycine max cDNA clone GENO AF274.033 1.7OE-43 Atriplex hortensis apetala2 domain containing protein mRNA, BG5929.17 Soianum tuberosum EST491595 CSTS Soianum tuberosum cDNA clo AI166481 Populus balsamifera xylem.est.309 Poplar Subsp. trichocarpa AW776927 2.10E-41 Medicago truncatula EST335992 DSIL Medicago truncatula cDNA US 2006/0195944 A1 Aug. 31, 2006 33

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation APOO4119 G974,S 2.7OE-41 Oryza sativa chromosome 2 clone OJ1288 G09, *** SEQUENCING BE918036 G974,S 6.6OE-38 Sorghum bicolor OV1 1 B03.b1 AO02 Ovary 1 (OV1) Sorghum bic gi8571476 G974,S 7.OOE-45 Atriplex hortensis apetala2 domain containing protein. gi14140155 G974,S 4.4OE-2O Oryza sativa putative AP2 domain transcription factor. gi3342211 G974,S 9.1OE-2O Lycopersicon Pti4. escientiin gi1208498 G974,S 15OE-19 Nicotiana tabacum EREBP-2. gi12225.884 G974,S 15OE-19 Zea mays unnamed protein product. gi7528276 G974,S 3.90E-19 Mesembryanthemum AP2-related crystallinum transcription f gi8809571 G974,S 3.90E-19 Nicotiana Sylvestris ethylene-responsive element binding gi1688233 G974,S 3.SOE-18 Soianum tuberosum DNA binding protein homolog. gi3264767 G974,S 9.4OE-18 Prints armeniaca AP2 domain containing protein. gi6478845 G974.5 2.OOE-17 Matricaria ethylene-responsive chamomilia element binding BI3111.37 G2343,53 4.OOE-45 Medicago truncatula ESTS312887 GESD Medicago truncatula cDN BG130765 G2343,53 S.1OE-45 Lycopersicon EST463657 tomato escientiin crown gall Lycoper G2343,53 2.3OE-44 Sorghum bicolor LG1 354 G05.b1 AO02 Light Grown 1 (LG 1) Sor AV421932 G2343,53 2.7OE-42 Lotus japonicus AV421932 Lois japonicits young plants (two BE611938 G2343,53 9.1OE-42 Glycine max Sr01 ho4.y1 Gm c1049 Glycine max cDNA clone GENO BF484214 G2343,53 1.90E-37 Titictim aestivitin WHE2309 F07 K13ZS Wheat pre-anthesis spik BG3.01022 G2343,53 4.3OE-3S Hordeum vulgare HVSMEbOO19E16f Hordeum vulgare Seedling sho APOO3O18 G2343,53 3.2OE-34 Oryza sativa genomic DNA, chromosome 1, BAC clone: OSJNBa000 BE4953OO G2343,53 3.3OE-34 Secale cereale WHE1268 FO2 KO4ZS Secale cereale anther cDNA AI657290 G2343,53 3.SOE-34 Zea mays 486093A08.y1 486 - leaf primordia cDNA library fro gi1167486 G2343,53 9.SOE-53 Lycopersicon transcription factor. escientiin gi13366181 G2343,53 1.3OE-48 Oryza sativa putative transcription factor. gi2130045 G2343,53 1. SOE-37 Hordeum vulgare MybHv33 protein - barley. gi82310 G2343,53 1.6OE-34 Antirrhinum maius myb protein 330 - garden Snapdragon. gi1732247 G2343,53 4.2OE-34 Nicotiana tabacum transcription factor Myb1. US 2006/0195944 A1 Aug. 31, 2006 34

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Co l. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation gi1841475 343,53 7.8O Pistin Saivitin Myb26. gi5139814 343,53 2.8O Glycine max GMYB29B2. gi13346178 343,53 4.90 Gossypium hirsutium BNLGH233. gió651292 343,53 2.70 Pimpinella myb-related brachycarpa transcription factor. gi8247759 343,53 1.10 Titictim aestivitin GAMyb protein. AF272573 23,67 1.30 Populus alba X Popitius clone INRA717-1-B4 tremia 4-3-3 pr BGS81482 23,67 Medicago truncatula EST483216 GVN Medicago truncatula cDNA 23,67 Soianum tuberosum O9A12 Mature tuber iambda ZAP Soianum

LETFTT 23,67 1.2OE Lycopersicon mRNA for 14-3-3 escientiin protein, TFT7. AF228SO1 23,67 4. SOE Glycine max 4-3-3-like protein mRNA, complete cols. BE643058 23,67 Ceratopteris Cri. 7 M14 SP6 richardi Ceratopteris Spore Li AF2228OS 23,67 7.OOE Euphorbia estia 4-3-3-like protein mRNA, complete cols. PSA238682 23.67 1.3OE Pistin Saiivum mRNA for 14-3-3- ike protein, sequence 2. 23,67 Gossypium arboretin GA Ea?)020A13f Gossypium arboretin 7-10 d AIT 27536 23,67 Gossypium hirsutium BNLGH8338 Six day Cotton fiber Gossypiu gi8515890 23,67 18O E 5 2 Populus albax Popitius 14-3-3 protein. tremula gi8099061 23,67 3.70 5 2 Populus x canescens 14-3-3 protein. gi7576887 23,67 1.OO Glycine max 14-3-3-like protein. gi3925703 23,67 8.90 Lycopersicon 14-3-3 protein. escientiin gió752903 23,67 8.90 Euphorbia estia 14-3-3-like protein. gi913214 23,67 2.10 i. g Nicotiana tabacum T14-3-3. gi11138322 23,67 340 Vicia faba wf14-3-3d protein. gi2879818 23,67 8.50 4 6 Soianum tuberosum 14-3-3 protein. gi10.15462 23,67 8.90 E-EE44 67 Chlamydomonas 14-3-3 protein. reinhardtii gi2921512 23,67 1.10 45 agrestis GF14 protein. ACO91246 777/55 3.SO 9 6 Oryza sativa chromosome 3 clone OSJNBa0002IO3, *** SEQUENCI BG136684 777/55 1.1OE Lycopersicon EST477126 wild pennelli tomato pollen Lycoper AW703793 777/55 2. SOE Glycine max sk12fO8.y1 Gm c1023 Glycine max cDNA clone GENO BEOS 1040 777/55 Zea mays Za71g01.b50 Maize Glume cDNAs Library Zea mayS cDN AW933922 777/55 2.90E Lycopersicon EST359765 tonato escientiin fruit mature green BG600834 777/55 Soianum tuberosum EST505729 cSTS Soianum tuberosum cDNA clo BF440069 777/55 3.2OE Theilungiella Sc0136 Theilungiella Salistiginea Saisuginea ZA US 2006/0195944 A1 Aug. 31, 2006

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQ ID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation BFS87440 G1777/55 4.20E-25 Sorghum FM1 36 D07.b1 AO03 propinquum Floral-Induced Merist BI267961 G1777/55 2.10E-23 Medicago truncatula NF118E09IN1F1071 Insect herbivory Medic BE415217 G1777/55 2.5 OE-22 Titictim aestivitin MWLO2S.FO2FOOO2O8 ITEC MWL Wheat Root Lib gi1666171 G1777/55 7.5 OE-24 Nicotiana KOW. plumbaginifolia gió43082 G1777/55 1 Fragaria X ananassa KOW. AW928317 G252O?37 4.6OE-27 Lycopersicon EST307OSO tomato escientiin flower buds 8 mm t BF271147 G252O?37 2.60E-26 Gossypium arboreum GA Eb001OK15f Gossypium arboretin 7-10 d BE3296.54 G252O?37 2.60E-26 Glycine max so67c05.y1 Gm c1040 Glycine max cDNA clone GENO BG103O16 G252O?37 4.40E-23 Sorghum RHIZ2 36 A10.b1 AO03 propinquilin Rhizome2 (RHIZ2). So BE60698O G2520,37 1.00E-22 Titicin aestivitin WHEO914 F04 KO8ZS Wheat S-15 DAP spike cD BGO48756 G252O?37 1.60E-22 Sorghum bicolor OV1 22 FO5.b1 AO02 Ovary 1 (OV1) Sorghum bi AI162779 G252O?37 2.10E-22 Populus tremula X Populus AO23P62U Hybrid tremioides aspen BI27.0049 G252O?37 2.90E-22 Medicago truncatula NFOO4DO4FL1F1042 Developing flower Medi BE921054 G252O?37 3.90E-22 Soianum tuberosum EST424823 potato leaves and petioles Sola BF200249 G252O?37 9.1OE-22 Titicip WHE2254 F11 L22ZE iii.626COCGilii Trictim iii.626COCGilii S gi118.62964 G252O?37 4.50E-16 Oryza sativa hypothetical protein. gi5923912 G252O?37 6.30E-16 Tulipa gesneriana bHLH transcription factor GBOF-1. gió166283 G252O?37 O.69 Pinus taeda helix-loop-helix protein 1A. gi1086538 G252O?37 1 Oryza rufipogon transcriptional activator Rb homolog.

0190. For many of the traits listed in Table 6 that may be ing a gene that causes a plant to be more sensitive to cold conferred to plants by ectopically expressing transcription may improve a plants tolerance of cold. factors of the invention, a single transcription factor gene may be used to increase or decrease, advance or delay, or 0191 The first and second columns of Table 6 provide the improve or prove deleterious to a given trait. For example, Trait category and specific trait were generally observed in overexpression of a transcription factor gene that naturally plants overexpressing the listed transcription factor occurs in a plant may cause early flowering relative to sequence of the invention, or, where noted, in plants in non-transformed or wild-type plants. By knocking out the which a specific transcription factor has been knocked out gene, or Suppressing the gene (with, for example, antisense (KO). The third column lists the sequences for which a Suppression) the plant may experience delayed flowering. specific trait was observed when the expression of the Similarly, overexpressing or Suppressing one or more genes sequence was altered, and the last column provides the can impart significant differences in production of plant utility and specific observations, relative to controls, for products, such as different fatty acid ratios. Thus, Suppress each of the sequences. US 2006/0195944 A1 Aug. 31, 2006 36

TABLE 6 Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations Environmental Increased osmotic stress G353, G1069, G1930 Enhanced germination rate, stress resistance tolerance survivability, yield and tolerance G47 (in a root growth assay on PEG-containing media, G47 overexpressing seedlings were larger and had more root growth compared to the wild-type) G353 (on PEG containing media, overexpressing seedlings were larger and greener than the wild-type) G1069 (overexpressing lines showed more tolerance to osmotic stress on high Sucrose media) G1930 (with more seedling vigor on high Sucrose than wild-type control plants) Altered C/N sensing and G975, G1069, G1266, Improved yield, less fertilizer tolerance to low G1322, G1930, G2131, required, improved stress nitrogen conditions G2144, G2512, G2520 tolerance and quality G975 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1069 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1266 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1322 (accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1930 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2131 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2144 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2512 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2520 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) Increased tolerance to G1946 Improved yield, less fertilizer phosphate-limitation required, improved stress tolerance and quality G1946 (more secondary root growth on phosphate US 2006/0195944 A1 Aug. 31, 2006 37

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations free media than wild-type controls) Increased salt tolerance G47, G1930, G3644 G1930 (with more seedling vigor on high Salt media than wild-type control plants) G47 and G3644 (homologs; more seedling vigor on high salt media than wild-type control plants) Increased cold stress G47, G1322, G1930, Enhanced germination, resistance and/or G2133, G3643, G3649 growth, earlier planting improved germination in G1322 (at 8° C., cold conditions overexpressor seedlings were slightly larger and had onger roots than wild type) G1930 (increased tolerance o 8°C. in a germination assay) G47 (with leaf RBCS3 or shoot apical meristem promoters) and closely related homologs G2133, G3643 and G3649 (35S promoter) conferred increased tolerance to 8° C. in a germination assay relative to controls) Increased drought or G47, G353, G975, G1069, mproved survivability, yield, desiccation tolerance G2133, G3643, G3644, extended range G3649 G353 (overexpressors had greater tolerance to drought han wild type in a soil based assay) G975 (overexpressors had greater tolerance to desiccation in plate-based assays, and greater tolerance to drought than wild type in a soil-based assay) G1069 (overexpressors had greater tolerance to drought than wild type in a soil based assay) G47 and homologs G2133, G3643 and G3649 conferred increased water deprivation when overexpressed compared to controls (another homolog, G3644, was not tested in drought assays) Altered light response G377, G1069, G1322, Enhanced germination, and shade tolerance G1794, G2144, G2520 growth, development, flowering time, greater planting density and improved yield G377 (overexpressors had altered leaf orientation) G1322 (overexpressors exhibited constitutive photomorphogenesis) G1069 (overexpressors exhibited altered leaf orientation) G1794 (overexpressors exhibited constitutive photomorphogenesis) G2144 overexpressors exhibited long hypocotyls US 2006/0195944 A1 Aug. 31, 2006 38

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G2520 (overexpressors had long hypocotyls) Sugar sensing Altered plant response G1337 Photosynthetic rate, to Sugars carbohydrate accumulation, biomass production, source sink relationships, senescence G1337 (G1337 overexpressors germinated poorly on high glucose compared to controls, thus G1337 may be involved in Sugar sensing, transport, or metabolism) Hormonal Altered hormone G47, G1069, G1266 Seed dormancy, drought sensitivity tolerance; plant form, fruit ripening G47 (overexpressors had decreased sensitivity to ABA) G1069 (overexpressors had decreased sensitivity to ABA) G1266 (overexpressors had decreased sensitivity to ABA) Development, Altered overall plant Altered vascular tissues, morphology architecture increased lignin content; altered cell wall content; and/or appearance G47 (increased lignin content, stems were wider with a much greater number of xylem vessels than wild type) G353 (overexpressors had short pedicels, downward pointing siliques, leaves had short petioles, were rather flat, rounded, and sometimes showed changes in coloration) G1543 (some G1543 overexpressors exhibited contorted, stunted carpels; 35S:G1543 plants also exhibited altered branching pattern, and apical dominance was reduced) G1794 (overexpressors exhibited decreased apical dominance) G2509 (overexpressors exhibited decreased apical dominance) Increased size, stature G47: G377, G1052, G1543, Improved yield and/or biomass G2133, G2155, G3643, G47 (stem sections were of G3644, G3649 wider diameter and vascular bundles were larger, sometimes multiple cauline leaves were present at each node; overexpression of G47 and its homologs G2133, G3643, G3644 and G3649, resulted in some lines that produced larger plants than controls with larger rosettes, seedlings and/or seeds) G377 (some lines had broader, fuller rosette leaves than wild type) US 2006/0195944 A1 Aug. 31, 2006 39

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G214 (larger biomass, increased leaf number and size compared to controls) G1052 (larger leaves and were generally more sturdy than wild type) G1543 (some overexpressors exhibited increased biomass, including tomato plants overexpressing this sequence) G2155 (late in development, 35S:G2155 plants became very large relative to controls) Size: reduced stature or G280; G353; G362; G652; Ornamental utility (creation dwarfism G674: G962; G977; of dwarf varieties); Small G1198; G1266; G1309; stature also provides wind G1322; G1421; G1537; resistance G1641; G1794; G2094; G2144: G2147 Flower structure, G47, G259, G353, G1543 Ornamental horticulture: inflorescence production of saffron or other edible flowers G47 (thick and fleshy inflorescences) G259 (rosette leaves were longer, narrower, darker green than controls, sepals were longer, narrower, and often fused at the tips) G353 (35S:G353 plants had a reduction in flower pedicellength and downward pointing siliques) G1543 (some lines showed contorted, stunted carpels) Number and G362, G1930, G2105 Improved resistance to pests development of and desiccation; essential oil trichomes production G362 (increased trichome density) G1930 (decreased trichome density) G2105 (adaxial leaf Surfaces had a somewhat lumpy appearance caused by trichomes being raised up on Small mounds of epidermal cells) Seed size, color, and G652; G2105 mproved yield number G652 (seeds produced by knockouts of G652 plants were somewhat wrinkled and misshapen) G2105 (pale, larger seeds han controls) Leafshape, color, G377; G674: G977; Appealing shape or shiny modifications G1198; G2094; G2105; eaves for ornamental G2113; G2117: G2144: agriculture, increased G2155, G2583 biomass or photosynthesis G377 (during later rosette stage, leaves were rounder, darker green, and shorter han wild type. After lowering, 35S::G377 eaves had a greater blade area than wild-type) G674 (rounded, dark green US 2006/0195944 A1 Aug. 31, 2006 40

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations leaves that sometimes pointed upward) G977 (dark green leaves that were generally wrinkled or curled) G1198 (smaller, narrower leaves) G2094 (leaves pf overexpressors were short, wide, and slightly yellowed compared to wild type. occasionally the leaves also showed mild Serrations on their margins) G2105 (uneven leaf Surface) G2113 (long petioles, vertical leaf orientation, eaves appeared narrow and were downward curling at he margins compared to controls) G2155 (slightly small, rounded, leaves that became dark green, very arge and senesced later han wild type late in development) G2144 (pale, narrow, flat eaves that had long petioles and sometimes positioned in a vertical orientation) G2583 (narrow, curled eaves) Altered stem G47, G748 Ornamental; digestibility morphology G47 (stems of wider diameter with large irregular vascular bundles containing greater number of xylem vessels than wild type; some xylem vessels within the bundles appeared narrow and more lignified) G748 (thicker and more vascular bundles in stems han controls) Pigment Production of G214; G259; G362, G490; Antioxidant activity, vitamin E anthocyanin and prenyl G652, G748; G883; G977, G214 (darker green in lipids G1052; G1328; G1930; vegetative and reproductive G2509, G2520 issues due to a higher chlorophyll content in the alter stages of development; increased seed lutein) G259 (increase in seed C ocopherol) G362 (increased pigment production compared to controls, seeds developed patches of dark purple pigmentation, increased anthocyanin in seedling eaves; late flowering lines also became darkly pigmented.) G490 (increased seed 8 ocopherol) G652 (increase in seed C ocopherol) G748 overexpressors consistently produced US 2006/0195944 A1 Aug. 31, 2006 41

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations greater root content than controls) G883 (decreased seed lutein) G1328 (decreased seed lutein) G977 (darker green leaves than controls) G1052 (overexpressors had decreased lutein and increased Xanthophyll 1 relative to controls) G1930 (increased chlorophyll content) G2509 (increase in C to copherol) G2520 (increase in seed 8 to copherol and a decrease in seed Y-tocopherol) Seed biochemistry Production of seed G20 Precursors for human steroid sterols hormones; cholesterol modulators G20 (increased campesterol) Production of seed G353; G484; G674; Defense against insects; glucosinolates G1069; G1272 (KO); putative anticancer activity; G1506; G 1897: G1946; undesirable in animal feeds G2113; G2117: G2155; G353 (increased M39494) G2290, G2340 G484 (altered glucosinolate profile) G674 (increased M395.01) G1069 (increased M39497) G1272 (decreased M39497) G1506 (increased M39502 and M39498) G1897 (increased M39491 and M39493) G1946 (increased M395O1) G2113 (decreased M39497, increased M395.01 and M39494) G2117 (increased M39497, decreased M39496) G2155 (increased M39497) G2290 (increased M39496) G2340 (extreme alteration in seed glucosinolate profile) Modified seed oil G229, G652, G663, G974: Vegetable oil production; content G1198; G1543; G1777; increased caloric value for G1946; G2117, G2123; animal feeds; lutein content G2343 G.229 (increased seed oil) G652 (decreased seed oil) G663 (decreased seed oil) G1198 (increased seed oil) G1543 (decreased seed oil observed in Arabidopsis overexpressors, increased seed oil observed in soy) G1777 (increased seed oil) G1946 (increased seed oil) G2117 (decreased seed oil) G2123 (increased seed oil) Modified seed protein G229, G663, G1641; Reduced caloric value for content G1777: G1946; G2117; humans G2SO9 G.229 (decreased seed protein) G663 (increased seed protein) G1641 (increased seed protein) US 2006/0195944 A1 Aug. 31, 2006 42

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G1777 (decreased seed protein) G1946 (decreased seed protein) G2117 (increased seed protein) G2509 (increased seed protein) Modified seed fatty acid G1069, G1421 Altered nutritional value: content increase in waxes for disease resistance G1069 (increased 16:0 fatty acids and decreased 18:2 fatty acids) G1421 (increased 18:1 and decreased 18:3 seed fatty acids) Leaf biochemistry Production of leaf G264: G353; G652; G681; Defense against insects; glucosinolates G1069; G1198; G1322; putative anticancer activity; G1421: G1794; G2113, undesirable in animal feeds G2144: G2512; G2520; G264 (increased M39481) G2552 G353 (increased M39494) G652 (increased M394.80) G681 (increased M394.80) G1069 ( ) G1198 (increased M3948) G1322 (increased M39480) G1421 (increased M39482) G1794 (increased M39480) G2113 (increased M39478) G2144 (increased M39480) G2512 (increased M39481) G2520 (increased M39478) G2552 (increased M39480) Production of leaf G2131; G2424 Precursors for human steroid phytosterols, inc. hormones; cholesterol Stigmastanol, modulators campesterol G2131 (Increase in leaf campesterol) G2424 (increase in StigmaStanol) Leaf fatty acid G214; G377: G962: G975; Altered nutritional value: composition G987 (KO); G1266; increase in waxes for disease G1337; G1399, G1465; resistance G1512; G2136; G2147, G214 (increased leaf fatty G2583 acids) G377 (increase in leaf 18:2 fatty acids and decrease in leaf 18:3 fatty acids) G962 (increase in 16:0 leaf fatty acids, decrease in 18:3 leaf fatty acids) G987 KO (reduction in 16:3 fatty acids relative to controls) G975 (increased leaf fatty acids, glossy leaves) G1337 (increased leaf oleic acids) G1399 (increased leaf 16:0 fatty acid) G1465 (increased in 16:0, 16:1, 18:0 and 18:2 and decreased 16:3 and 18:3 leaf fatty acids) G1512 (increased 18:2 leaf fatty acids) G2136 (decreased 18:3 leaf fatty acids) US 2006/0195944 A1 Aug. 31, 2006 43

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G2147 increased 16:0 and 18:23 leaf fatty acids) G2583 (glossy leaves) Production of prenyl G214; G259; G280; G362, Antioxidant activity, vitamin E lipids, including G652; G987 (KO), G1543; G214 (increased leaf tocopherol G1930, G2509; G2520 chlorophyll and carotenoids) G259 (increased seed C to copherol) G280 (increased leaf 8 and Y tocopherol) G362 (increased anthocyanin levels in various tissues at different stages of growth.: seedlings showed high levels of pigment in first true leaves, late flowering lines became darkly pigmented., seeds from developed patches of dark purple pigmentation) G652 (increased seed C to copherol) G987 (overexpressors had two Xanthophylls not present in wild-type leaves, Y-tocopherol (which normally accumulate in seed tissue), and reduced levels of chlorophyll a and chlorophyll b in leaves) G1543 (dark green color, increased levels of carotenoids and chlorophylls a and b in leaves) G1930 (increased levels of chlorophyll a and chlorophyll b in seeds compared to controls) G2509 (increased seed C to cophero G2520 (increase in seed 8 to copherol and a decrease in seed Y-tocopherol) Sugar, starch, G158; G211; G242; G274; Improved food digestibility, hemicellulose G1012; G1266; G1309; increased hemicellulose & composition, G1641; G1865; G2094; pectin content; increased G2589 fiber content; increased plant tensile strength, wood quality, pathogen resistance, pulp production and/or tuber starch content G158 (increased leaf rhamnose) G211 (increased leaf xylose) G242 (increased leaf arabinose) G274 (increased leaf arabinose) G1012 (decreased leaf rhamnose) G1266 (alterations in rhamnose, arabinose, xylose, and mannose, and galactose) G1309 (increased leaf mannose) G1641 (increased leaf rhamnose) US 2006/0195944 A1 Aug. 31, 2006 44

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G1865 (increased galactose, decreased xylose) G2094 (increased leaf arabinose) G2589 (increased leaf insoluble Sugars - increased arabinose) Growth, Plant growth rate and G1543 Faster growth, increased Reproduction development biomass or yield, improved appearance; delay in bolting G1543 (faster growth of seedlings) Senescence: cell death G652, G1897, G2155, Altered yield, appearance; G2340 response to pathogens (potential protective response without the potentially detrimental consequences of a constitutive systemic acquired resistance) G652 (premature senescence of rosette leaves) G1897 (later senescence than controls G2155 (senesced much later than controls) G2340 (overexpressors showed necrosis of blades of rosette and cauline leaves, necrotic lesions) Modified fertility G652: G962; G977; Prevents or minimizes escape G1266; G1421; G2094; of the pollen of genetically G2113; G2147 modified plants G652 (poor fertility) G962 (poor fertility) G977 (poor fertility) G1266 (poor fertility) G1421 (poor fertility) G2094 (poor fertility) G2113 (poor fertility) G2094 (poor fertility) G2147 (poor fertility) Early flowering Faster generation time; synchrony of flowering: potential for introducing new traits to single variety Delayed flowering G47; G214; G362; G748; Delayed time to pollen G1052; G1865; G1930, production of GMO plants; G2155, G2133, G3643, synchrony of flowering: G3644, G3649 increased yield Flower and leaf G259; G353; G377; G652; Ornamental applications; development G1865; G 1897: G2094 decreased fertility G259 (rosette leaves were longer and narrow, dark green and curled compared to control plants, sepals were long, narrow, and often fused at the tips) G353 (reduction in flower pedicellength and downward pointing siliques) G377 (inflorescence stems were shorter than wild-type, during late rosette stage, leaves were rounder, darker green, and slightly shorter than those of wild type) G652 (reduced number of stamens: 4–5 of these organs rather than 6) US 2006/0195944 A1 Aug. 31, 2006 45

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G1865 (short, thick, inflorescence stems, greatly increased number of leaves; visible flower buds up to a month after wild type, continuous light conditions, by which time rosette leaves had become rather large and contorted) G1897 (narrow, dark-green rosette and cauline leaves, inflorescences had short internodes with various abnormalities, perianth organs were typically rather long and narrow..., stamens were short, silique formation was poor) G2094 (inflorescence stems were often thin and carried short flowers, mild serrations on leaf margins) Flower abscission G1897 Ornamental: longer retention of flowers G1897 (delayed abscission of floral organs) * When co-expressed with G669 and G663

Significance of Modified Plant Traits Transcription factor genes that confer better survivability in 0192 The sequences of the Sequence Listing, those in cooler climates allow a grower to move up planting time in Tables 4-6, or those disclosed here can be used to prepare the spring and extend the growing season further into transgenic plants and plants with altered traits. The specific autumn for higher crop yields. transgenic plants listed below are produced from the 0.196 Tolerance to freezing. The presently disclosed tran sequences of the Sequence Listing, as noted Tables 4-6 Scription factor genes that impart tolerance to freezing provides exemplary polynucleotide and polypeptide conditions are useful for enhancing the Survivability and sequences of the invention. appearance of plants conditions or conditions that would 0193 Salt stress resistance. Soil salinity is one of the otherwise cause extensive cellular damage. Thus, germina more important variables that determines where a plant may tion of seeds and Survival may take place at temperatures thrive. Salinity is especially important for the successful significantly below that of the mean temperature required for cultivation of crop plants, particular in many parts of the germination of seeds and Survival of non-transformed world that have naturally high soil salt concentrations, or plants. As with salt tolerance, this has the added benefit of where the soil has been over-utilized. Thus, presently dis increasing the potential range of a crop plant into regions in closed transcription factor genes that provide increased salt tolerance during germination, the seedling stage, and which it would otherwise succumb. Cold tolerant trans throughout a plants life cycle would find particular value formed plants may also be planted earlier in the spring or for imparting Survivability and yield in areas where a later in autumn, with greater Success than with non-trans particular crop would not normally prosper. formed plants. 0194 Osmotic stress resistance. Presently disclosed tran 0.197 Heat stress tolerance. The germination of many Scription factor genes that confer resistance to osmotic stress crops is also sensitive to high temperatures. Presently dis may increase germination rate under adverse conditions, closed transcription factor genes that provide increased heat which could impact survivability and yield of seeds and tolerance are generally useful in producing plants that ger plants. minate and grow in hot conditions, may find particular use for crops that are planted late in the season, or extend the 0.195 Cold stress resistance. The potential utility of pres range of a plant by allowing growth in relatively hot ently disclosed transcription factor genes that increase tol climates. erance to cold is to confer better germination and growth in cold conditions. The germination of many crops is very 0198 Drought, low humidity tolerance. Strategies that sensitive to cold temperatures. Genes that would allow allow plants to Survive in low water conditions may include, germination and seedling vigor in the cold would have for example, reduced Surface area or Surface oil or wax highly significant utility in allowing seeds to be planted production. A number of presently disclosed transcription earlier in the season with a high rate of survivability. factor genes increase a plants tolerance to low water con US 2006/0195944 A1 Aug. 31, 2006 46 ditions and provide the benefits of improved survivability, duced into plants to confer an increase in heavy metal increased yield and an extended geographic and temporal uptake, which may benefit efforts to clean up contaminated planting range. soils. 0199 Radiation resistance. Presently disclosed transcrip 0204 Light response. Presently disclosed transcription tion factor genes have been shown to increase lutein pro factor genes that modify a plants response to light may be duction. Lutein, like other Xanthophylls such as Zeaxanthin useful for modifying a plant's growth or development, for and violaxanthin, are important in the protection of plants example, photomorphogenesis in poor light, or accelerating against the damaging effects of excessive light. Lutein flowering time in response to various light intensities, qual contributes, directly or indirectly, to the rapid rise of non ity or duration to which a non-transformed plant would not photochemical quenching in plants exposed to high light. similarly respond. Examples of Such responses that have Increased tolerance of field plants to visible and ultraviolet been demonstrated include leaf number and arrangement, light impacts Survivability and vigor, particularly for recent and early flower bud appearances. transplants. Also affected are the yield and appearance of harvested plants or plant parts. Crop plants engineered with 0205 Overall plant architecture. Several presently dis presently disclosed transcription factor genes that cause the closed transcription factor genes have been introduced into plant to produce higher levels of lutein therefore would have plants to alter numerous aspects of the plant's morphology. improved photoprotection, leading to less oxidative damage For example, it has been demonstrated that a number of and increase vigor, Survivability and higher yields under transcription factors may be used to manipulate branching, Such as the means to modify lateral branching, a possible high light and ultraviolet light conditions. application in the forestry industry. Transgenic plants have 0200 Decreased herbicide sensitivity. Presently dis also been produced that have altered cell wall content, lignin closed transcription factor genes that confer resistance or production, flower organ number, or overall shape of the tolerance to herbicides (e.g., glyphosate) may find use in plants. Presently disclosed transcription factor genes trans providing means to increase herbicide applications without formed into plants may be used to affect plant morphology detriment to desirable plants. This would allow for the by increasing or decreasing internode distance, both of increased use of a particular herbicide in a local environ which may be advantageous under different circumstances. ment, with the effect of increased detriment to undesirable For example, for fast growth of woody plants to provide species and less harm to transgenic, desirable cultivars. more biomass, or fewer knots, increased internode distances 0201 Increased herbicide sensitivity. Knockouts of a are generally desirable. For improved wind screening of number of the presently disclosed transcription factor genes shrubs or trees, or harvesting characteristics of for example, have been shown to be lethal to developing embryos. Thus, members of the Gramineae family, decreased internode these genes are potentially useful as herbicide targets. distance may be advantageous. These modifications would also prove useful in the ornamental horticulture industry for 0202). Oxidative stress. In plants, as in all living things, the creation of unique phenotypic characteristics of orna abiotic and biotic stresses induce the formation of oxygen mental plants. radicals, including Superoxide and peroxide radicals. This has the effect of accelerating senescence, particularly in 0206 Increased stature. For some ornamental plants, the leaves, with the resulting loss of yield and adverse effect on ability to provide larger varieties may be highly desirable. appearance. Generally, plants that have the highest level of For many plants, including t fruit-bearing trees or trees and defense mechanisms, such as, for example, polyunsaturated shrubs that serve as view or wind Screens, increased stature moieties of membrane lipids, are most likely to thrive under provides obvious benefits. Crop species may also produce conditions that introduce oxidative stress (e.g., high light, higher yields on larger cultivars oZone, water deficit, particularly in combination). Introduc 0207 Reduced stature or dwarfism. Presently disclosed tion of the presently disclosed transcription factor genes that transcription factor genes that decrease plant stature can be increase the level of oxidative stress defense mechanisms used to produce plants that are more resistant to damage by would provide beneficial effects on the yield and appearance wind and rain, or more resistant to heat or low humidity or of plants. One specific oxidizing agent, oZone, has been water deficit. Dwarf plants are also of significant interest to shown to cause significant foliar injury, which impacts yield the ornamental horticulture industry, and particularly for and appearance of crop and ornamental plants. In addition to home garden applications for which space availability may reduced foliar injury that would be found in ozone resistant be limited. plant created by transforming plants with some of the presently disclosed transcription factor genes, the latter have 0208 Fruit size and number. Introduction of presently also been shown to have increased chlorophyll fluorescence disclosed transcription factor genes that affect fruit size will (Yu-Sen Chang et al. Bot. Bull. Acad. Sin. (2001) 42: have desirable impacts on fruit size and number, which may 265-272). comprise increases in yield for fruit crops, or reduced fruit yield, Such as when vegetative growth is preferred (e.g., with 0203 Heavy metal tolerance. Heavy metals such as lead, mercury, arsenic, chromium and others may have a signifi bushy omamentals, or where fruit is undesirable, as with cant adverse impact on plant respiration. Plants that have ornamental olive trees). been transformed with presently disclosed transcription fac 0209 Flower structure, inflorescence, and development. tor genes that confer improved resistance to heavy metals, Presently disclosed transgenic transcription factors have through, for example, sequestering or reduced uptake of the been used to create plants with larger flowers or arrange metals will show improved vigor and yield in soils with ments of flowers that are distinct from wild-type or non relatively high concentrations of these elements. Con transformed cultivars. This would likely have the most value versely, transgenic transcription factors may also be intro for the ornamental horticulture industry, where larger flow US 2006/0195944 A1 Aug. 31, 2006 47 ers or interesting presentations generally are preferred and 0214) Modifications to root hairs. Presently disclosed command the highest prices. Flower structure may have transcription factor genes that increase root hair length or advantageous effects on fertility, and could be used, for number potentially could be used to increase root growth or example, to decrease fertility by the absence, reduction or vigor, which might in turn allow better plant growth under screening of reproductive components. One interesting adverse conditions such as limited nutrient or water avail application for manipulation of flower structure, for ability. example, by introduced transcription factors could be in the 0215 Apical dominance. The modified expression of increased production of edible flowers or flower parts, presently disclosed transcription factors that control apical including saffron, which is derived from the Stigmas of dominance could be used in ornamental horticulture, for Crocus sativus. example, to modify plant architecture. 0210 Number and development of trichomes. Several 0216 Branching patterns. Several presently disclosed presently disclosed transcription factor genes have been transcription factor genes have been used to manipulate used to modify trichome number and amount of trichome branching, which could provide benefits in the forestry products in plants. Trichome glands on the Surface of many industry. For example, reduction in the formation of lateral higher plants produce and secrete exudates that give pro branches could reduce knot formation. Conversely, increas tection from the elements and pests Such as insects, microbes ing the number of lateral branches could provide utility and herbivores. These exudates may physically immobilize when a plant is used as a windscreen, or may also provide insects and spores, may be insecticidal or ant-microbial or ornamental advantages. they may act as allergens or irritants to protect against herbivores. Trichomes have also been suggested to decrease 0217 Leaf shape, color and modifications. It has been transpiration by decreasing leaf Surface air flow, and by demonstrated in laboratory experiments that overexpression exuding chemicals that protect the leaf from the Sun. of Some of the presently disclosed transcription factors produced marked effects on leaf development. At early 0211) Another potential utilities for sequences that stages of growth, these transgenic seedlings developed nar increase trichome number is to increase the density of cotton row, upward pointing leaves with long petioles, possibly fibers in cotton bolls. Cotton fibers are modified unicellular indicating a disruption in circadian-clock controlled pro trichomes that are produced from the ovule epidermis. cesses or nyctinastic movements. Other transcription factor However, typically only 30% of the epidermal cells take on genes can be used to increase plant biomass; large size a trichome fate (Basra and Malik, 1984). Thus, cotton yields would be useful in crops where the vegetative portion of the might be increased by inducing a greater proportion of the plant is the marketable portion. ovule epidermal cells to become fibers. 0218 Siliques. Genes that later silique conformation in 0212 Seed size, color and number. The introduction of brassicates may be used to modify fruit ripening processes presently disclosed transcription factor genes into plants that in brassicates and other plants, which may positively affect alter the size or number of seeds may have a significant seed or fruit quality. impact on yield, both when the product is the seed itself, or when biomass of the vegetative portion of the plant is 0219 Stem morphology and shoot modifications. Labo increased by reducing seed production. In the case of fruit ratory studies have demonstrated that introducing several of products, it is often advantageous to modify a plant to have the presently disclosed transcription factor genes into plants reduced size or number of seeds relative to non-transformed can cause stem bifurcations in shoots, in which the shoot plants to provide seedless or varieties with reduced numbers meristems split to form two or three separate shoots. This or smaller seeds. Presently disclosed transcription factor unique appearance would be desirable in ornamental appli genes have also been shown to affect seed size, including the cations. development of larger seeds. Seed size, in addition to seed 0220 Diseases, pathogens and pests. A number of the coat integrity, thickness and permeability, seed water content presently disclosed transcription factor genes have been and by a number of other components including antioxidants shown to or are likely to confer resistance to various plant and oligosaccharides, may affect seed longevity in storage. diseases, pathogens and pests. The offending organisms This would be an important utility when the seed of a plant include fungal pathogens Fusarium oxysporum, Botrytis is the harvested crops, as with, for example, peas, beans, cinerea, Sclerotinia Sclerotiorum, and Erysiphe Orontii. Bac nuts, etc. Presently disclosed transcription factor genes have terial pathogens to which resistance may be conferred also been used to modify seed color, which could provide include Pseudomonas Syringae. Other problem organisms added appeal to a seed product. may potentially include nematodes, mollicutes, parasites, or herbivorous arthropods. In each case, one or more trans 0213 Root development, modifications. By modifying formed transcription factor genes may provide Some benefit the structure or development of roots by transforming into a to the plant to help prevent or overcome infestation. The plant one or more of the presently disclosed transcription mechanisms by which the transcription factors work could factor genes, plants may be produced that have the capacity include increasing Surface waxes or oils, Surface thickness, to thrive in otherwise unproductive soils. For example, grape local senescence, or the activation of signal transduction roots that extend further into rocky soils, or that remain pathways that regulate plant defense in response to attacks viable in waterlogged soils, would increase the effective planting range of the crop. It may be advantageous to by herbivorous pests (including, for example, protease manipulate a plant to produce short roots, as when a soil in inhibitors). which the plant will be growing is occasionally flooded, or 0221) Increased tolerance of plants to nutrient-limited when pathogenic fungi or disease-causing nematodes are soils. Presently disclosed transcription factor genes intro prevalent. duced into plants may provide the means to improve uptake US 2006/0195944 A1 Aug. 31, 2006 48 of essential nutrients, including nitrogenous compounds, be of interest from a nutraceutical standpoint. (3) Glucosi phosphates, potassium, and trace minerals. The effect of nolates form part of a plants natural defense against insects. these modifications is to increase the seedling germination Modification of glucosinolate composition or quantity could and range of ornamental and crop plants. The utilities of therefore afford increased protection from predators. Fur presently disclosed transcription factor genes conferring thermore, in edible crops, tissue specific promoters might be tolerance to conditions of low nutrients also include cost used to ensure that these compounds accumulate specifically savings to the grower by reducing the amounts of fertilizer in tissues, such as the epidermis, which are not taken for needed, environmental benefits of reduced fertilizer runoff consumption. and improved yield and stress tolerance. In addition, this 0227 Modified seed oil content. The composition of gene could be used to alter seed protein amounts and/or seeds, particularly with respect to seed oil amounts and/or composition that could impact yield as well as the nutritional composition, is very important for the nutritional value and value and production of various food products. production of various food and feed products. Several of the 0222 Hormone sensitivity. One or more of the presently presently disclosed transcription factor genes in seed lipid disclosed transcription factor genes have been shown to saturation that alter seed oil content could be used to affect plant abscisic acid (ABA) sensitivity. This plant improve the heat stability of oils or to improve the nutri hormone is likely the most important hormone in mediating tional quality of seed oil, by, for example, reducing the the adaptation of a plant to stress. For example, ABA number of calories in seed, increasing the number of calories mediates conversion of apical meristems into dormant buds. in animal feeds, or altering the ratio of Saturated to unsat In response to increasingly cold conditions, the newly devel urated lipids comprising the oils. oping leaves growing above the meristem become converted 0228 Seed and leaf fatty acid composition. A number of into stiff bud scales that closely wrap the meristem and the presently disclosed transcription factor genes have been protect it from mechanical damage during winter. ABA in shown to alter the fatty acid composition in plants, and seeds the bud also enforces dormancy; during premature warm in particular. This modification may find particular value for spells, the buds are inhibited from sprouting. Bud dormancy improving the nutritional value of for example, seeds or is eliminated after either a prolonged cold period of cold or whole plants. Dietary fatty acids ratios have been shown to a significant number of lengthening days. Thus, by affecting have an effect on, for example, bone integrity and remod ABA sensitivity, introduced transcription factor genes may eling (see, for example, Weiler Pediatr. Res. (2000) 47: 5 affect cold sensitivity and survivability. ABA is also impor 692-697). The ratio of dietary fatty acids may alter the tant in protecting plants from drought tolerance. precursor pools of long-chain polyunsaturated fatty acids 0223 Several other of the present transcription factor that serve as precursors for prostaglandin synthesis. In genes have been used to manipulate ethylene signal trans mammalian connective tissue, prostaglandins serve as duction and response pathways. These genes can thus be important signals regulating the balance between resorption used to manipulate the processes influenced by ethylene, and formation in bone and cartilage. Thus dietary fatty acid Such as seed germination or fruit ripening, and to improve ratios altered in seeds may affect the etiology and outcome seed or fruit quality. of bone loss. 0229) Modified seed protein content. As with seed oils, 0224 Production of seed and leaf prenyl lipids, including the composition of seeds, particularly with respect to protein tocopherol. Prenyl lipids play a role in anchoring proteins in amounts and/or composition, is very important for the membranes or membranous organelles. Thus, modifying the nutritional value and production of various food and feed prenyl lipid content of seeds and leaves could affect mem products. A number of the presently disclosed transcription brane integrity and function. A number of presently dis factor genes modify the protein concentrations in seeds closed transcription factor genes have been shown to modify would provide nutritional benefits, and may be used to the tocopherol composition of plants. Tocopherols have both prolong storage, increase seed pest or disease resistance, or anti-oxidant and vitamin E activity. modify germination rates. 0225 Production of seed and leaf phytosterols: Presently 0230 Production of flavonoids in leaves and other plant disclosed transcription factor genes that modify levels of parts. Expression of presently disclosed transcription factor phytosterols in plants may have at least two utilities. First, genes that increase flavonoid production in plants, including phytosterols are an important source of precursors for the anthocyanins and condensed tannins, may be used to alter in manufacture of human steroid hormones. Thus, regulation of pigment production for horticultural purposes, and possibly transcription factor expression or activity could lead to increasing stress resistance. Flavonoids have antimicrobial elevated levels of important human steroid precursors for activity and could be used to engineer pathogen resistance. steroid semi-synthesis. For example, transcription factors Several flavonoid compounds have health promoting effects that cause elevated levels of campesterol in leaves, or Such as the inhibition of tumor growth and cancer, preven sitosterols and Stigmasterols in seed crops, would be useful tion of bone loss and the prevention of the oxidation of for this purpose. Phytosterols and their hydrogenated deriva lipids. Increasing levels of condensed tannins, whose bio tives phytostanols also have proven cholesterol-lowering synthetic pathway is shared with anthocyanin biosynthesis, properties, and transcription factor genes that modify the in forage legumes is an important agronomic trait because expression of these compounds in plants would thus provide they prevent pasture bloat by collapsing protein foams health benefits. within the rumen. For a review on the utilities of flavonoids 0226 Production of seed and leaf glucosinolates. Some and their derivatives, refer to Dixon et al. (1999) Trends glucosinolates have anti-cancer activity; thus, increasing the Plant Sci. 4: 394-400. levels or composition of these compounds by introducing 0231 Production of diterpenes in leaves and other plant several of the presently disclosed transcription factors might parts. Depending on the plant species, varying amounts of US 2006/0195944 A1 Aug. 31, 2006 49 diverse secondary biochemicals (often lipophilic terpenes) structural component of the plant cell, Sugars are central are produced and exuded or volatilized by trichomes. These regulatory molecules that control several aspects of plant exotic secondary biochemicals, which are relatively easy to physiology, metabolism and development. It is thought that extract because they are on the surface of the leaf, have been this control is achieved by regulating gene expression and, widely used in Such products as flavors and aromas, drugs, in higher plants, Sugars have been shown to repress or pesticides and cosmetics. Thus, the overexpression of genes activate plant genes involved in many essential processes that are used to produce diterpenes in plants may be accom Such as photosynthesis, glyoxylate metabolism, respiration, plished by introducing transcription factor genes that induce starch and Sucrose synthesis and degradation, pathogen said overexpression. One class of secondary metabolites, the response, wounding response, cell cycle regulation, pigmen diterpenes, can effect several biological systems such as tation, flowering and senescence. The mechanisms by which tumor progression, prostaglandin synthesis and tissue Sugars control gene expression are not understood. inflammation. In addition, diterpenes can act as insect phero mones, termite allomones, and can exhibit neurotoxic, cyto 0237 Because Sugars are important signaling molecules, toxic and antimitotic activities. As a result of this functional the ability to control either the concentration of a signaling diversity, diterpenes have been the target of research several Sugar or how the plant perceives or responds to a signaling pharmaceutical ventures. In most cases where the metabolic Sugar could be used to control plant development, physiol pathways are impossible to engineer, increasing trichome ogy or metabolism. For example, the flux of Sucrose (a density or size on leaves may be the only way to increase disaccharide Sugar used for systemically transporting carbon plant productivity. and energy in most plants) has been shown to affect gene expression and alter storage compound accumulation in 0232 Production of anthocyanin in leaves and other plant seeds. Manipulation of the Sucrose signaling pathway in parts. Several presently disclosed transcription factor genes seeds may therefore cause seeds to have more protein, oil or can be used to alter anthocyanin production in numerous carbohydrate, depending on the type of manipulation. Simi plant species. The potential utilities of these genes include larly, in tubers. Sucrose is converted to starch which is used alterations in pigment production for horticultural purposes, as an energy store. It is thought that Sugar signaling path and possibly increasing stress resistance in combination with ways may partially determine the levels of starch synthe another transcription factor. sized in the tubers. The manipulation of Sugar signaling in 0233 Production of miscellaneous secondary metabo tubers could lead to tubers with a higher starch content. lites. Microarray data Suggests that flux through the aromatic amino acid biosynthetic pathways and primary and second 0238. Thus, the presently disclosed transcription factor ary metabolite biosynthetic pathways are up-regulated. Pres genes that manipulate the Sugar signal transduction pathway ently disclosed transcription factors have been shown to be may lead to altered gene expression to produce plants with involved in regulating alkaloid biosynthesis, in part by desirable traits. In particular, manipulation of Sugar signal up-regulating the enzymes indole-3-glycerol phosphatase transduction pathways could be used to alter Source-sink and strictosidine synthase. Phenylalanine ammonia lyase, relationships in seeds, tubers, roots and other storage organs chalcone synthase and trans-cinnamate mono-oxygenase are leading to increase in yield. also induced, and are involved in phenylpropenoid biosyn 0239 Plant growth rate and development. A number of thesis. the presently disclosed transcription factor genes have been 0234 Sugar, starch, hemicellulose composition. Overex shown to have significant effects on plant growth rate and pression of the presently disclosed transcription factors that development. These observations have included, for affect sugar content resulted in plants with altered leaf example, more rapid or delayed growth and development of insoluble Sugar content. Transcription factors that alter plant reproductive organs. This would provide utility for regions cell wall composition have several potential applications with short or long growing seasons, respectively. Acceler including altering food digestibility, plant tensile strength, ating plant growth would also improve early yield or wood quality, pathogen resistance and in pulp production. increase biomass at an earlier stage, when Such is desirable The potential utilities of a gene involved in glucose-specific (for example, in producing forestry products). Sugar sensing are to alter energy balance, photosynthetic 0240 Embryo development. Presently disclosed tran rate, carbohydrate accumulation, biomass production, Scription factor genes that alter embryo development has Source-sink relationships, and senescence. been used to alter seed protein and oil amounts and/or 0235 Hemicellulose is not desirable in paper pulps composition which is very important for the nutritional because of its lack of strength compared with cellulose. Thus value and production of various food products. Seed shape modulating the amounts of cellulose vs. hemicellulose in the and seed coat may also be altered by these genes, which may plant cell wall is desirable for the paper/lumber industry. provide for improved storage stability. Increasing the insoluble carbohydrate content in various fruits, vegetables, and other edible consumer products will 0241 Seed germination rate. A number of the presently result in enhanced fiber content. Increased fiber content disclosed transcription factor genes have been shown to would not only provide health benefits in food products, but modify seed germination rate, including when the seeds are might also increase digestibility of forage crops. In addition, in conditions normally unfavorable for germination (e.g., the hemicellulose and pectin content of fruits and berries cold, heat or salt stress, or in the presence of ABA), and may affects the quality of jam and catsup made from them. thus be used to modify and improve germination rates under Changes in hemicellulose and pectin content could result in adverse conditions. a Superior consumer product. 0242 Plant, seedling vigor. Seedlings transformed with 0236 Plant response to sugars and sugar composition. In presently disclosed transcription factors have been shown to addition to their important role as an energy source and possess larger cotyledons and appeared somewhat more US 2006/0195944 A1 Aug. 31, 2006 50 advanced than control plants. This indicates that the seed tion factor genes could thus bring about large increases in lings developed more rapidly that the control plants. Rapid yields. Prevention of flowering might help maximize veg seedling development is likely to reduce loss due to diseases etative yields and prevent escape of genetically modified particularly prevalent at the seedling stage (e.g., damping organism (GMO) pollen. off) and is thus important for survivability of plants germi 0248 Extended flowering phase. Presently disclosed nating in the field or in controlled environments. transcription factors that extend flowering time have utility 0243 Senescence, cell death. Presently disclosed tran in engineering plants with longer-lasting flowers for the Scription factor genes may be used to alter senescence horticulture industry, and for extending the time in which the responses in plants. Although leaf senescence is thought to plant is fertile. be an evolutionary adaptation to recycle nutrients, the ability to control senescence in an agricultural setting has signifi 0249 Flower and leaf development. Presently disclosed cant value. For example, a delay in leaf Senescence in some transcription factor genes have been used to modify the maize hybrids is associated with a significant increase in development of flowers and leaves. This could be advanta yields and a delay of a few days in the senescence of soybean geous in the development of new ornamental cultivars that plants can have a large impact on yield. Delayed flower present unique configurations. In addition, some of these senescence may also generate plants that retain their blos genes have been shown to reduce a plants fertility, which is soms longer and this may be of potential interest to the also useful for helping to prevent development of pollen of ornamental horticulture industry. GMOS 0250 Flower abscission. Presently disclosed transcrip 0244) Modified fertility. Plants that overexpress a number tion factor genes introduced into plants have been used to of the presently disclosed transcription factor genes have retain flowers for longer periods. This would provide a been shown to possess reduced fertility. This could be a significant benefit to the ornamental industry, for both cut desirable trait, as it could be exploited to prevent or mini flowers and woody plant varieties (of for example, maize), mize the escape of the pollen of genetically modified organ as well as have the potential to lengthen the fertile period of isms (GMOs) into the environment. a plant, which could positively impact yield and breeding 0245 Early and delayed flowering. Presently disclosed programs. transcription factor genes that accelerate flowering could have valuable applications in Such programs since they 0251 A listing of specific effects and utilities that the allow much faster generation times. In a number of species, presently disclosed transcription factor genes have on plants, for example, broccoli, cauliflower, where the reproductive as determined by direct observation and assay analysis, is parts of the plants constitute the crop and the vegetative provided in Tables 4 and 6. tissues are discarded, it would be advantageous to accelerate Antisense and Co-Suppression time to flowering. Accelerating flowering could shorten crop 0252) In addition to expression of the nucleic acids of the and tree breeding programs. Additionally, in Some instances, invention as gene replacement or plant phenotype modifi a faster generation time might allow additional harvests of a cation nucleic acids, the nucleic acids are also useful for crop to be made within a given growing season. A number sense and anti-sense Suppression of expression, e.g., to of Arabidopsis genes have already been shown to accelerate down-regulate expression of a nucleic acid of the invention, flowering when constitutively expressed. These include e.g., as a further mechanism for modulating plant phenotype. LEAFY, APETALA1 and CONSTANS (Mandel et al. That is, the nucleic acids of the invention, or Subsequences (1995) Nature 377: 522-524; Weigel and Nilsson (1995) or anti-sense sequences thereof, can be used to block expres Nature 377: 495-500; and Simon et al. (1996) Nature 384: sion of naturally occurring homologous nucleic acids. A 59-62). variety of sense and anti-sense technologies are known in 0246 By regulating the expression of potential flowering the art, e.g., as set forth in Lichtenstein and Nellen (1997) using inducible promoters, flowering could be triggered by Antisense Technology: A Practical Approach IRL Press at application of an inducer chemical. This would allow flow Oxford University Press, Oxford, U.K. In general, sense or ering to be synchronized across a crop and facilitate more anti-sense sequences are introduced into a cell, where they efficient harvesting. Such inducible systems could also be are optionally amplified, e.g., by transcription. Such used to tune the flowering of crop varieties to different sequences include both simple oligonucleotide sequences latitudes. At present, species such as Soybean and cotton are and catalytic sequences Such as ribozymes. available as a series of maturity groups that are suitable for 0253 For example, a reduction or elimination of expres different latitudes on the basis of their flowering time (which sion (i.e., a “knock-out”) of a transcription factor or tran is governed by day-length). A system in which flowering Scription factor homologue polypeptide in a transgenic could be chemically controlled would allow a single high plant, e.g., to modify a plant trait, can be obtained by yielding northern maturity group to be grown at any latitude. introducing an antisense construct corresponding to the In Southern regions such plants could be grown for longer, polypeptide of interest as a cDNA. For antisense Suppres thereby increasing yields, before flowering was induced. In Sion, the transcription factor or homologue cDNA is more northern areas, the induction would be used to ensure arranged in reverse orientation (with respect to the coding that the crop flowers prior to the first winter frosts. sequence) relative to the promoter sequence in the expres 0247. In a sizeable number of species, for example, root sion vector. The introduced sequence need not be the full crops, where the vegetative parts of the plants constitute the length cDNA or gene, and need not be identical to the cDNA crop and the reproductive tissues are discarded, it would be or gene found in the plant type to be transformed. Typically, advantageous to delay or prevent flowering. Extending the antisense sequence need only be capable of hybridizing vegetative development with presently disclosed transcrip to the target gene or RNA of interest. Thus, where the US 2006/0195944 A1 Aug. 31, 2006 introduced sequence is of shorter length, a higher degree of sequence similarity between the introduced sequence and homology to the endogenous transcription factor sequence the endogenous transcription factor gene is increased. will be needed for effective antisense suppression. While antisense sequences of various lengths can be utilized, 0257 Vectors expressing an untranslatable form of the preferably, the introduced antisense sequence in the vector transcription factor mRNA, e.g., sequences comprising one will be at least 30 nucleotides in length, and improved or more stop codon, or nonsense mutation) can also be used antisense Suppression will typically be observed as the to Suppress expression of an endogenous transcription fac length of the antisense sequence increases. Preferably, the tor, thereby reducing or eliminating its activity and modi length of the antisense sequence in the vector will be greater fying one or more traits. Methods for producing Such than 100 nucleotides. Transcription of an antisense construct constructs are described in U.S. Pat. No. 5,583,021. Prefer as described results in the production of RNA molecules that ably, such constructs are made by introducing a premature are the reverse complement of mRNA molecules transcribed stop codon into the transcription factor gene. Alternatively, from the endogenous transcription factor gene in the plant a plant trait can be modified by gene silencing using double strand RNA (Sharp (1999) Genes Devel. 13: 139-141). cell. Another method for abolishing the expression of a gene is by 0254 Suppression of endogenous transcription factor insertion mutagenesis using the T-DNA of Agrobacterium gene expression can also be achieved using a ribozyme. tumefaciens. After generating the insertion mutants, the Ribozymes are RNA molecules that possess highly specific mutants can be screened to identify those containing the endoribonuclease activity. The production and use of insertion in a transcription factor or transcription factor ribozymes are disclosed in U.S. Pat. No. 4,987,071 and U.S. homologue gene. Plants containing a single transgene inser Pat. No. 5,543.508. Synthetic ribozyme sequences including tion event at the desired gene can be crossed to generate antisense RNAs can be used to confer RNA cleaving activity homozygous plants for the mutation. Such methods are well on the antisense RNA, such that endogenous mRNA mol known to those of skill in the art. (See for example Koncz ecules that hybridize to the antisense RNA are cleaved, et al. (1992) Methods in Arabidopsis Research, World Sci which in turn leads to an enhanced antisense inhibition of entific.) endogenous gene expression. 0258 Alternatively, a plant phenotype can be altered by 0255 Suppression of endogenous transcription factor eliminating an endogenous gene. Such as a transcription gene expression can also be achieved using RNA interfer factor or transcription factor homologue, e.g., by homolo ence, or RNAi. RNAi is a post-transcriptional, targeted gous recombination (Kempin et al. (1997) Nature 389: gene-silencing technique that uses double-stranded RNA 802-803). (dsRNA) to incite degradation of messenger RNA (mRNA) containing the same sequence as the dsRNA (Constans, 0259. A plant trait can also be modified by using the (2002) The Scientist 16:36). Small interfering RNAs, or Cre-lox system (for example, as described in U.S. Pat. No. siRNAS are produced in at least two steps: an endogenous 5,658,772). A plant genome can be modified to include first ribonuclease cleaves longer dsRNA into shorter, 21-23 and second lox sites that are then contacted with a Cre nucleotide-long RNAs. The siRNA segments then mediate recombinase. If the loX sites are in the same orientation, the the degradation of the target mRNA (Zamore, (2001) Nature intervening DNA sequence between the two sites is excised. Struct. Biol. 8: 746–50). RNAi has been used for gene If the loX sites are in the opposite orientation, the intervening function determination in a manner similar to antisense sequence is inverted. oligonucleotides (Constans, (2002) The Scientist 16:36). 0260 The polynucleotides and polypeptides of this Expression vectors that continually express siRNAs in tran invention can also be expressed in a plant in the absence of siently and stably transfected have been engineered to an expression cassette by manipulating the activity or express small hairpin RNAs (shRNAs), which get processed expression level of the endogenous gene by other means. For in vivo into siRNAs-like molecules capable of carrying out example, by ectopically expressing a gene by T-DNA acti gene-specific silencing (Brummelkamp et al. (2002) Science vation tagging (Ichikawa et al. (1997) Nature 390 698-701: 296: 550-553, and Paddison et al. (2002) Genes & Dev. 16: Kakimoto et al. (1996) Science 274: 982-985). This method 948-958). Post-transcriptional gene silencing by double entails transforming a plant with a gene tag containing stranded RNA is discussed in further detail by Hammond et multiple transcriptional enhancers and once the tag has al. (2001) Nature Rev. Gen 2: 110-119, Fire et al. (1998) inserted into the genome, expression of a flanking gene Nature 391: 806-811 and Timmons and Fire (1998) Nature coding sequence becomes deregulated. In another example, 395: 854. the transcriptional machinery in a plant can be modified so 0256 Vectors in which RNA encoded by a transcription as to increase transcription levels of a polynucleotide of the factor or transcription factor homologue cDNA is over invention (See, e.g., PCT Publications WO 96/06166 and expressed can also be used to obtain co-suppression of a WO 98.753057 which describe the modification of the DNA corresponding endogenous gene, e.g., in the manner binding specificity of Zinc finger proteins by changing described in U.S. Pat. No. 5,231,020 to Jorgensen. Such particular amino acids in the DNA-binding motif). co-Suppression (also termed sense Suppression) does not 0261) The transgenic plant can also include the machin require that the entire transcription factor cDNA be intro ery necessary for expressing or altering the activity of a duced into the plant cells, nor does it require that the polypeptide encoded by an endogenous gene, for example introduced sequence be exactly identical to the endogenous by altering the phosphorylation state of the polypeptide to transcription factor gene of interest. However, as with anti sense Suppression, the Suppressive efficiency will be maintain it in an activated State. enhanced as specificity of hybridization is increased, e.g., as 0262 Transgenic plants (or plant cells, or plant explants, the introduced sequence is lengthened, and/or as the or plant tissues) incorporating the polynucleotides of the US 2006/0195944 A1 Aug. 31, 2006 52 invention and/or expressing the polypeptides of the inven Integrated Systems—Sequence Identity tion can be produced by a variety of well established 0268 Additionally, the present invention may be an inte techniques as described above. Following construction of a grated system, computer or computer readable medium that vector, most typically an expression cassette, including a comprises an instruction set for determining the identity of polynucleotide, e.g., encoding a transcription factor or tran one or more sequences in a database. In addition, the Scription factor homologue, of the invention, standard tech instruction set can be used to generate or identify sequences niques can be used to introduce the polynucleotide into a that meet any specified criteria. Furthermore, the instruction plant, a plant cell, a plant explant or a plant tissue of interest. set may be used to associate or link certain functional Optionally, the plant cell, explant or tissue can be regener benefits, such improved characteristics, with one or more ated to produce a transgenic plant. identified sequence. 0263. The plant can be any higher plant, including gym nosperms, monocotyledonous and dicotyledonous plants. 0269. For example, the instruction set can include, e.g., a Suitable protocols are available for Leguminosae (alfalfa, sequence comparison or other alignment program, e.g., an Soybean, clover, etc.), Umbelliferae (carrot, celery, parsnip), available program Such as, for example, the Wisconsin Cruciferae (cabbage, radish, rapeseed, broccoli, etc.), Cur Package Version 10.0, such as BLAST, FASTA, PILEUP. curbitaceae (melons and cucumber), Gramineae (wheat, FINDPATTERNS or the like (GCG, Madison, Wis.). Public corn, rice, barley, millet, etc.), Solanaceae (potato, tomato, sequence databases such as GenBank, EMBL, Swiss-Prot tobacco, peppers, etc.), and various other crops. See proto and PIR or private sequence databases such as PHYTOSEQ cols described in Ammirato et al. (1984) Handbook of Plant sequence database (Incyte Genomics, Palo Alto, Calif.) can Cell Culture-Crop Species, Macmillan Publ. Co. Shimamoto be searched. et al. (1989) Nature 338: 274–276; Fromm et al. (1990) 0270 Alignment of sequences for comparison can be Bio/Technology 8: 833-839; and Vasil et al. (1990) Bio/ conducted by the local homology algorithm of Smith and Technology 8: 429-434. Waterman (1981) Adv. Appl. Math. 2: 482-489, by the 0264 Transformation and regeneration of both mono homology alignment algorithm of Needleman and Wunsch cotyledonous and dicotyledonous plant cells is now routine, (1970).J. Mol. Biol. 48: 443-453, by the search for similarity and the selection of the most appropriate transformation method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. technique will be determined by the practitioner. The choice U.S.A. 85: 2444-2448, by computerized implementations of of method will vary with the type of plant to be transformed: these algorithms. After alignment, sequence comparisons those skilled in the art will recognize the suitability of between two (or more) polynucleotides or polypeptides are particular methods for given plant types. Suitable methods typically performed by comparing sequences of the two can include, but are not limited to: electroporation of plant sequences over a comparison window to identify and com protoplasts; liposome-mediated transformation; polyethyl pare local regions of sequence similarity. The comparison ene glycol (PEG) mediated transformation; transformation window can be a segment of at least about 20 contiguous using viruses; micro-injection of plant cells; micro-projec positions, usually about 50 to about 200, more usually about tile bombardment of plant cells; vacuum infiltration; and 100 to about 150 contiguous positions. A description of the Agrobacterium tumefaciens mediated transformation. Trans method is provided in Ausubel et al., Supra. formation means introducing a nucleotide sequence into a 0271 A variety of methods for determining sequence plant in a manner to cause stable or transient expression of relationships can be used, including manual alignment and the sequence. computer assisted sequence alignment and analysis. This 0265 Successful examples of the modification of plant later approach is a preferred approach in the present inven characteristics by transformation with cloned sequences tion, due to the increased throughput afforded by computer which serve to illustrate the current knowledge in this field assisted methods. As noted above, a variety of computer of technology, and which are herein incorporated by refer programs for performing sequence alignment are available, ence, include: U.S. Pat. Nos. 5,571,706; 5,677,175: 5,510, or can be produced by one of skill. 471; 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268, 526; 5,780,708: 5,538,880; 5,773,269; 5,736,369 and 5,610, 0272. One example algorithm that is suitable for deter O42. mining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul et al. 0266 Following transformation, plants are preferably J. Mol. Biol. 215: 403-410 (1990). Software for performing selected using a dominant selectable marker incorporated BLAST analyses is publicly available, e.g., through the into the transformation vector. Typically, such a marker will National Center for Biotechnology Information (see internet confer antibiotic or herbicide resistance on the transformed website at ncbi.nlm.nih.gov). This algorithm involves first plants, and selection of transformants can be accomplished identifying high scoring sequence pairs (HSPs) by identify by exposing the plants to appropriate concentrations of the ing short words of length W in the query sequence, which antibiotic or herbicide. either match or satisfy some positive-valued threshold score 0267. After transformed plants are selected and grown to Twhen aligned with a word of the same length in a database maturity, those plants showing a modified trait are identified. sequence. T is referred to as the neighborhood word score The modified trait can be any of those traits described above. threshold (Altschulet al., supra). These initial neighborhood Additionally, to confirm that the modified trait is due to word hits act as seeds for initiating searches to find longer changes in expression levels or activity of the polypeptide or HSPs containing them. The word hits are then extended in polynucleotide of the invention can be determined by ana both directions along each sequence for as far as the cumu lyzing mRNA expression using Northern blots, RT-PCR or lative alignment score can be increased. Cumulative scores microarrays, or protein expression using immunoblots or are calculated using, for nucleotide sequences, the param Western blots or gel shift assays. eters M (reward score for a pair of matching residues; US 2006/0195944 A1 Aug. 31, 2006 always >0) and N (penalty score for mismatching residues; 0276 Thus, the invention provides methods for identify always <0). For amino acid sequences, a scoring matrix is ing a sequence similar or homologous to one or more used to calculate the cumulative score. Extension of the polynucleotides as noted herein, or one or more target word hits in each direction are halted when: the cumulative polypeptides encoded by the polynucleotides, or otherwise alignment score falls off by the quantity X from its maxi noted herein and may include linking or associating a given mum achieved value; the cumulative score goes to Zero or plant phenotype or gene function with a sequence. In the below, due to the accumulation of one or more negative methods, a sequence database is provided (locally or across scoring residue alignments; or the end of either sequence is an inter or intra net) and a query is made against the reached. The BLAST algorithm parameters W. T. and X sequence database using the relevant sequences herein and determine the sensitivity and speed of the alignment. The associated plant phenotypes or gene functions. BLASTN program (for nucleotide sequences) uses as 0277 Any sequence herein can be entered into the data defaults a wordlength (W) of 11, an expectation (E) of 10, base, before or after querying the database. This provides for a cutoff of 100, M=5, N=4, and a comparison of both strands. both expansion of the database and, if done before the For amino acid sequences, the BLASTP program uses as querying step, for insertion of control sequences into the defaults a wordlength (W) of 3, an expectation (E) of 10, and database. The control sequences can be detected by the the BLOSUM62 scoring matrix (see Henikoff & Henikoff query to ensure the general integrity of both the database and (1989) Proc. Natl. Acad. Sci. USA 89: 10915). Unless the query. As noted, the query can be performed using a web otherwise indicated, “sequence identity” here refers to the '% browser based interface. For example, the database can be a sequence identity generated from a thiastx using the NCBI centralized public database Such as those noted herein, and version of the algorithm at the default settings using gapped the querying can be done from a remote terminal or com alignments with the filter “off” (see, for example, internet puter across an internet or intranet. website at ncbi.nlm.nih.gov). 0273. In addition to calculating percent sequence identity, EXAMPLES the BLAST algorithm also performs a statistical analysis of 0278. The following examples are intended to illustrate the similarity between two sequences (see, e.g., Karlin & but not limit the present invention. The complete descrip Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5787). tions of the traits associated with each polynucleotide of the One measure of similarity provided by the BLAST algo invention is fully disclosed in Table 4 and Table 6. rithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between Example I two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to Full Length Gene Identification and Cloning a reference sequence (and, therefore, in this context, homologous) if the Smallest Sum probability in a comparison 0279 Putative transcription factor sequences (genomic or of the test nucleic acid to the reference nucleic acid is less ESTs) related to known transcription factors were identified than about 0.1, or less than about 0.01, and or even less than in the Arabidopsis thaliana GenBank database using the about 0.001. An additional example of a useful sequence thlastin sequence analysis program using default parameters alignment algorithm is PILEUP, PILEUP creates a multiple and a P-value cutoff threshold of -4 or -5 or lower, sequence alignment from a group of related sequences using depending on the length of the query sequence. Putative progressive, pairwise alignments. The program can align, transcription factor sequence hits were then screened to e.g., up to 300 sequences of a maximum length of 5,000 identify those containing particular sequence strings. If the letters. sequence hits contained such sequence Strings, the 0274 The integrated system, or computer typically sequences were confirmed as transcription factors. includes a user input interface allowing a user to selectively 0280 Alternatively, Arabidopsis thaliana cDNA libraries view one or more sequence records corresponding to the one derived from different tissues or treatments, or genomic or more character strings, as well as an instruction set which libraries were screened to identify novel members of a aligns the one or more character Strings with each other or transcription family using a low stringency hybridization with an additional character string to identify one or more approach. Probes were synthesized using gene specific prim region of sequence similarity. The system may include a link ers in a standard PCR reaction (annealing temperature 60° of one or more character strings with a particular phenotype C.) and labeled with PdCTP using the High Prime DNA or gene function. Typically, the system includes a user Labeling Kit (Boehringer Mannheim). Purified radiolabelled readable output element that displays an alignment produced probes were added to filters immersed in Church hybridiza by the alignment instruction set. tion medium (0.5 MNaPO pH 7.0, 7% SDS, 1% w/v bovine serum albumin) and hybridized overnight at 60° C. with 0275. The methods of this invention can be implemented shaking. Filters were washed two times for 45 to 60 minutes in a localized or distributed computing environment. In a distributed environment, the methods may be implemented with 1xSCC, 1% SDS at 60° C. on a single computer comprising multiple processors or on 0281 To identify additional sequence 5' or 3' of a partial a multiplicity of computers. The computers can be linked, cDNA sequence in a cDNA library, 5' and 3' rapid amplifi e.g. through a common bus, but more preferably the com cation of cDNA ends (RACE) was performed using the puter(s) are nodes on a network. The network can be a MarathonTM cDNA amplification kit (Clontech, Palo Alto, generalized or a dedicated local or wide-area network and, Calif.). Generally, the method entailed first isolating poly(A) in certain preferred embodiments, the computers may be mRNA, performing first and second strand cDNA synthesis components of an intra-net or an internet. to generate double stranded cDNA, blunting cDNA ends, US 2006/0195944 A1 Aug. 31, 2006 54 followed by ligation of the MarathonTM Adaptor to the 0285) Agrobacterium cells were transformed with plas cDNA to form a library of adaptor-ligated ds cDNA. mids prepared as described above following the protocol described by Nagel et al. For each DNA construct to be 0282 Gene-specific primers were designed to be used transformed, 50-100 ng DNA (generally resuspended in 10 along with adaptor specific primers for both 5' and 3' RACE mM Tris-HCl, 1 mM EDTA, pH 8.0) was mixed with 40 ul reactions. Nested primers, rather than single primers, were of Agrobacterium cells. The DNA/cell mixture was then used to increase PCR specificity. Using 5' and 3' RACE transferred to a chilled cuvette with a 2 mm electrode gap reactions, 5' and 3' RACE fragments were obtained, and subject to a 2.5 kV charge dissipated at 25 uF and 200 sequenced and cloned. The process can be repeated until 5' uF using a Gene Pulser II apparatus (Bio-Rad, Hercules, and 3' ends of the full-length gene were identified. Then the Calif.). After electroporation, cells were immediately resus full-length cDNA was generated by PCR using primers pended in 1.0 ml LB and allowed to recover without specific to 5' and 3' ends of the gene by end-to-end PCR. antibiotic selection for 2-4 hours at 28°C. in a shaking incubator. After recovery, cells were plated onto selective Example II medium of LB broth containing 100 g/ml spectinomycin Construction of Expression Vectors (Sigma) and incubated for 24-48 hours at 28° C. Single colonies were then picked and inoculated in fresh medium. 0283 The sequence was amplified from a genomic or The presence of the plasmid construct was verified by PCR cDNA library using primers specific to sequences upstream amplification and sequence analysis. and downstream of the coding region. The expression vector was pMEN20 or pMEN65, which are both derived from Example IV pMON316 (Sanders et al. (1987) Nucleic Acids Res. 15: 1543-1558) and contain the CaMV 35S promoter to express Transformation of Arabidopsis Plants with transgenes. To clone the sequence into the vector, both Agrobacterium tumefaciens with Expression Vector pMEN20 and the amplified DNA fragment were digested 0286. After transformation of Agrobacterium tumefa separately with SalI and NotI restriction enzymes at 37°C. ciens with plasmid vectors containing the gene, single for 2 hours. The digestion products were subject to electro Agrobacterium colonies were identified, propagated, and phoresis in a 0.8% agarose gel and visualized by ethidium used to transform Arabidopsis plants. Briefly, 500 ml cul bromide staining. The DNA fragments containing the tures of LB medium containing 50 mg/l kanamycin were sequence and the linearized plasmid were excised and inoculated with the colonies and grown at 28° C. with purified by using a Qiaquick gel extraction kit (Qiagen, shaking for 2 days until an optical absorbance at 600 nm Valencia Calif.). The fragments of interest were ligated at a wavelength over 1 cm (A) of >2.0 is reached. Cells were ratio of 3:1 (vector to insert). Ligation reactions using T4 then harvested by centrifugation at 4,000xg for 10 min, and DNA ligase (New England Biolabs, Beverly Mass.) were resuspended in infiltration medium (%x Murashige and carried out at 16° C. for 16 hours. The ligated DNAs were Skoog salts (Sigma), 1X Gamborg's B-5 vitamins (Sigma), transformed into competent cells of the E. coli strain DH5C. 5.0% (w/v) sucrose (Sigma), 0.044 uMbenzylamino purine by using the heat shock method. The transformations were (Sigma), 200 ul/l Silwet L-77 (Lehle Seeds) until an Asoo of plated on LB plates containing 50 mg/l kanamycin (Sigma, 0.8 was reached. St. Louis, Mo.). Individual colonies were grown overnight in five milliliters of LB broth containing 50 mg/l kanamycin at 0287 Prior to transformation, Arabidopsis thaliana seeds 37° C. Plasmid DNA was purified by using Qiaquick Mini (ecotype Columbia) were sown at a density of ~10 plants per Prep kits (Qiagen). 4" pot onto Pro-Mix BX potting medium (Hummert Inter national) covered with fiberglass mesh (18 mmx 16 mm). Example III Plants were grown under continuous illumination (50-75 uE/m/sec) at 22-23° C. with 65-70% relative humidity. Transformation of Agrobacterium with the After about 4 weeks, primary inflorescence stems (bolts) are Expression Vector cut off to encourage growth of multiple secondary bolts. After flowering of the mature secondary bolts, plants were 0284. After the plasmid vector containing the gene was prepared for transformation by removal of all siliques and constructed, the vector was used to transform Agrobacte opened flowers. rium tumefaciens cells expressing the gene products. The stock of Agrobacterium tumefaciens cells for transformation 0288 The pots were then immersed upside down in the were made as described by Nagel et al. (1990) FEMS mixture of Agrobacterium infiltration medium as described Microbiol Letts, 67: 325-328. Agrobacterium strain ABI was above for 30 sec, and placed on their sides to allow draining grown in 250 ml LB medium (Sigma) overnight at 28°C. into a 1'x2' flat surface covered with plastic wrap. After 24 with shaking until an absorbance (Agoo) of 0.5-1.0 was h, the plastic wrap was removed and pots are turned upright. reached. Cells were harvested by centrifugation at 4,000xg The immersion procedure was repeated one week later, for for 15 min at 4° C. Cells were then resuspended in 250 ul a total of two immersions per pot. Seeds were then collected chilled buffer (1 mM HEPES, pH adjusted to 7.0 with from each transformation pot and analyzed following the KOH). Cells were centrifuged again as described above and protocol described below. resuspended in 125 ul chilled buffer. Cells were then cen trifuged and resuspended two more times in the same Example V HEPES buffer as described above at a volume of 100 uland 750 ul, respectively. Resuspended cells were then distributed Identification of Arabidopsis Primary Transformants into 40 ul aliquots, quickly frozen in liquid nitrogen, and 0289. Seeds collected from the transformation pots were Stored at -80° C. sterilized essentially as follows. Seeds were dispersed into in US 2006/0195944 A1 Aug. 31, 2006

a solution containing 0.1% (v/v) Triton X-100 (Sigma) and lines were examined. However, in cases where a phenotype sterile H2O and washed by shaking the suspension for 20 required confirmation or detailed characterization, plants min. The wash solution was then drained and replaced with from Subsequent generations were also analyzed. fresh wash solution to wash the seeds for 20 min with shaking. After removal of the second wash Solution, a 0293 Primary transformants were selected on MS solution containing 0.1% (v/v) Triton X-100 and 70% etha medium with 0.3% sucrose and 50 mg/l kanamycin. T2 and nol (Equistar) was added to the seeds and the Suspension was later generation plants were selected in the same manner, shaken for 5 min. After removal of the ethanol/detergent except that kanamycin was used at 35 mg/l. In cases where solution, a solution containing 0.1% (v/v) Triton X-100 and lines carry a Sulfonamide marker (as in all lines generated by 30% (v/v) bleach (Clorox) was added to the seeds, and the super-transformation), seeds were selected on MS medium suspension was shaken for 10 min. After removal of the with 0.3% sucrose and 1.5 mg/l sulfonamide. KO lines were bleach/detergent solution, seeds were then washed five times usually germinated on plates without a selection. Seeds were in sterile distilled HO. The seeds were stored in the last cold-treated (stratified) on plates for 3 days in the dark (in wash water at 4°C. for 2 days in the dark before being plated order to increase germination efficiency) prior to transfer to onto antibiotic selection medium (1x Murashige and Skoog growth cabinets. Initially, plates were incubated at 22° C. salts (pH adjusted to 5.7 with 1M KOH), 1x Gamborg's B-5 vitamins, 0.9% phytagar (Life Technologies), and 50 mg/1 under a light intensity of approximately 100 microEinsteins kanamycin). Seeds were germinated under continuous illu for 7 days. At this stage, transformants were green, pos mination (50-75 uE/m/sec) at 22-23°C. After 7-10 days of sessed the first two true leaves, and were easily distinguished growth under these conditions, kanamycin resistant primary from bleached kanamycin or Sulfonamide-susceptible seed transformants (T generation) were visible and obtained. lings. Resistant seedlings were then transferred onto Soil These seedlings were transferred first to fresh selection (Sunshine potting mix). Following transfer to soil, trays of plates where the seedlings continued to grow for 3-5 more seedlings were covered with plastic lids for 2-3 days to days, and then to soil (Pro-Mix BX potting medium). maintain humidity while they became established. Plants were grown on soil under fluorescent light at an intensity of 0290 Primary transformants were crossed and progeny 70-95 microEinsteins and a temperature of 18-23° C. Light seeds (T) collected; kanamycin resistant seedlings were selected and analyzed. The expression levels of the recom conditions consisted of a 24-hour photoperiod unless other binant polynucleotides in the transformants varies from wise stated. In instances where alterations in flowering time about a 5% expression level increase to a least a 100% was apparent, flowering was re-examined under both expression level increase. Similar observations are made 12-hour and 24-hour light to assess whether the phenotype with respect to polypeptide level expression. was photoperiod dependent. Under 24-hour light growth conditions, the typical generation time (seed to seed) was Example VI approximately 14 weeks. 0294 Because many aspects of Arabidopsis development Identification of Arabidopsis Plants with are dependent on localized environmental conditions, in all Transcription Factor Gene Knockouts cases plants were evaluated in comparison to controls in the 0291. The screening of insertion mutagenized Arabidop same flat. Controls for transgenic lines were generally sis collections for null mutants in a known target gene was wild-type plants or, where specifically indicated, transgenic essentially as described in Krysan et al (1999) Plant Cell 11: plants harboring an empty transformation vector selected on 2283-2290. Briefly, gene-specific primers, nested by 5-250 kanamycin or Sulfonamide. Careful examination was made base pairs to each other, were designed from the 5' and 3' at the following stages: seedling (I week), rosette (2-3 regions of a known target gene. Similarly, nested sets of weeks), flowering (4-7 weeks), and late seed set (8-12 primers were also created specific to each of the T-DNA or weeks). Seed was also inspected. Seedling morphology was transposon ends (the “right' and “left” borders). All possible assessed on selection plates. At all other stages, plants were combinations of gene specific and T-DNA/transposon prim macroscopically evaluated while growing on soil. All sig ers were used to detect by PCR an insertion event within or nificant differences (including alterations in growth rate, close to the target gene. The amplified DNA fragments were size, leaf and flower morphology, coloration and flowering then sequenced which allows the precise determination of time) were recorded, but routine measurements were not be the T-DNA/transposon insertion point relative to the target taken if no differences were apparent. In certain cases, stem gene. Insertion events within the coding or intervening sections were stained to reveal lignin distribution. In these sequence of the genes were deconvoluted from a pool instances, hand-sectioned stems were mounted in phloro comprising a plurality of insertion events to a single unique glucinol saturated 2M HCl (which stains lignin pink) and mutant plant for functional characterization. The method is viewed immediately under a dissection microscope. described in more detail in Yu and Adam, U.S. application 0295 Flowering time was measured by the number of Ser. No. 09/177,733 filed Oct. 23, 1998. rosette leaves present when a visible inflorescence of Example VII approximately 3 cm is apparent Rosette and total leaf number on the progeny stem are tightly correlated with the Morphological Analysis timing of flowering (Koornneefetal (1991) Mol. Gen. Genet 229: 57-66. The vernalization response was measured. For 0292 Morphological analysis was performed to deter Vernalization treatments, seeds were sown to MSagar plates, mine whether changes in transcription factor levels affect sealed with micropore tape, and placed in a 4°C. cold room plant growth and development. This was primarily carried with low light levels for 6-8 weeks. The plates were then out on the T1 generation, when at least 10-20 independent transferred to the growth rooms alongside plates containing US 2006/0195944 A1 Aug. 31, 2006 56 freshly sown non-vernalized controls. Rosette leaves were extracted 2% methylene chloride in hexane and, after cen counted when a visible inflorescence of approximately 3 cm trifugation, the two upper phases were combined and evapo was apparent. rated. 2% methylene chloride in hexane was added to the tubes and the samples were then extracted with one ml of Example VIII water. The upper phase was removed, dried, and resus pended in 400 ul of 2% methylene chloride in hexane and Biochemical Analysis analyzed by gas chromatography using a 50 m DB-5 ms 0296 Experiments were also performed to identify those (0.25 mm ID, 0.25 um phase, J&W Scientific). transformants or knockouts that exhibited modified bio 0301 Insoluble sugar levels were measured by the chemical characteristics. Among the biochemicals that were method essentially described by Reiter et al., (1999) Plant J. assayed were insoluble Sugars, such as arabinose, fucose, 12: 335-345. This method analyzes the neutral sugar com galactose, mannose, rhamnose or Xylose or the like; prenyl position of cell wall polymers found in Arabidopsis leaves. lipids, Such as lutein, B-carotene, Xanthophyl-1, Xantho Soluble Sugars were separated from Sugar polymers by phyll-2, chlorophylls A or B, or C-, 6- or Y-tocopherol or the extracting leaves with hot 70% ethanol. The remaining like; fatty acids, such as 16:0 (palmitic acid), 16:1 (palmi residue containing the insoluble polysaccharides was then toleic acid), 18:0 (stearic acid), 18:1 (oleic acid), 18:2 acid hydrolyzed with allose added as an internal standard. (linoleic acid), 20:0, 18:3 (linolenic acid), 20:1 (eicosenoic Sugar monomers generated by the hydrolysis were then acid), 20:2, 22:1 (erucic acid) or the like; waxes, such as by reduced to the corresponding alditols by treatment with altering the levels of C29, C31, or C33 alkanes; sterols, such NaBH4, then were acetylated to generate the volatile alditol as brassicasterol, campesterol, Stigmasterol, Sitosterol or acetates which were then analyzed by GC-FID. Identity of Stigmastanol or the like, glucosinolates, protein or oil levels. the peaks was determined by comparing the retention times of known Sugars converted to the corresponding alditol 0297 Fatty acids were measured using two methods acetates with the retention times of peaks from wild-type depending on whether the tissue was from leaves or seeds. plant extracts. Alditol acetates were analyzed on a Supelco For leaves, lipids were extracted and esterified with hot SP-2330 capillary column (30 mx250 umx0.2 um) using a methanolic HSO and partitioned into hexane from metha temperature program beginning at 180° C. for 2 minutes nolic brine. For seed fatty acids, seeds were pulverized and extracted in methanol: heptane:toluene:2.2- followed by an increase to 220° C. in 4 minutes. After dimethoxypropane:HSO, (39:34:20:5:2) for 90 minutes at holding at 220° C. for 10 minutes, the oven temperature is 80° C. After cooling to room temperature the upper phase, increased to 240° C. in 2 minutes and held at this tempera containing the seed fatty acid esters, was Subjected to GC ture for 10 minutes and brought back to room temperature. analysis. Fatty acid esters from both seed and leaf tissues 0302) To identify plants with alterations in total seed oil were analyzed with a Supelco SP-2330 column. or protein content, 150 mg of seeds from T2 progeny plants were subjected to analysis by Near Infrared Reflectance 0298 Glucosinolates were purified from seeds or leaves Spectroscopy (NIRS) using a Foss NirSystems Model 6500 by first heating the tissue at 95°C. for 10 minutes. Preheated with a spinning cup transport system. NIRS is a non ethanol: water (50:50) is and after heating at 95° C. for a destructive analytical method used to determine seed oil and further 10 minutes, the extraction solvent is applied to a protein composition. Infrared is the region of the electro DEAE Sephadex column which had been previously equili magnetic spectrum located after the visible region in the brated with 0.5 M pyridine acetate. Desulfoglucosinolates direction of longer wavelengths. Near infrared owns its were eluted with 300 ul water and analyzed by reverse phase name for being the infrared region near to the visible region HPLC monitoring at 226 mm. of the electromagnetic spectrum. For practical purposes, 0299 For wax alkanes, samples were extracted using an near infrared comprises wavelengths between 800 and 2500 identical method as fatty acids and extracts were analyzed nm. NIRS is applied to organic compounds rich in O H on a HP 5890 GC coupled with a 5973 MSD. Samples were bonds (such as moisture, carbohydrates, and fats), C-H chromatographically isolated on a J&W DB35 mass spec bonds (such as organic compounds and petroleum deriva trometer (J&W Scientific). tives), and N—H bonds (such as proteins and amino acids). 0300. To measure prenyl lipids levels, seeds or leaves The NIRS analytical instruments operate by statistically were pulverized with 1 to 2% pyrogallol as an antioxidant. correlating NIRS signals at several wavelengths with the For seeds, extracted samples were filtered and a portion characteristic or property intended to be measured. All removed for tocopherol and carotenoid/chlorophyll analysis biological Substances contain thousands of C-H. O—H. by HPLC. The remaining material was saponified for sterol and N—H bonds. Therefore, the exposure to near infrared determination. For leaves, an aliquot was removed and radiation of a biological sample, Such as a seed, results in a diluted with methanol and chlorophyll A, chlorophyll B, and complex spectrum which contains qualitative and quantita total carotenoids measured by spectrophotometry by deter tive information about the physical and chemical composi mining optical absorbance at 665.2 nm, 652.5 nm, and 470 tion of that sample. mm. An aliquot was removed for tocopherol and carotenoid/ 0303. The numerical value of a specific analyte in the chlorophyll composition by HPLC using a Waters uBonda sample, such as protein content or oil content, is mediated by pak C18 column (4.6 mmx 150 mm). The remaining metha a calibration approach known as chemometrics. Chemomet nolic solution was saponified with 10% KOH at 80° C. for rics applies statistical methods such as multiple linear one hour. The samples were cooled and diluted with a regression (MLR), partial least squares (PLS), and principle mixture of methanol and water. A solution of 2% methylene component analysis (PCA) to the spectral data and correlates chloride in hexane was mixed in and the samples were them with a physical property or other factor, that property centrifuged. The aqueous methanol phase was again re or factor is directly determined rather than the analyte US 2006/0195944 A1 Aug. 31, 2006 57 concentration itself. The method first provides “wet chem duction cascades that regulate the transcription of N-assimi istry” data of the samples required to develop the calibration. latory genes. To determine whether these mechanisms are altered, we exploit the observation that wild-type plants 0304 Calibration for Arabidopsis seed oil composition grown on media containing high levels of Sucrose (3%) was performed using accelerated Solvent extraction using 1 without a nitrogen Source accumulate high levels of antho g seed sample size and was validated against certified canola cyanins. This Sucrose induced anthocyanin accumulation seed. A similar wet chemistry approach was performed for can be relieved by the addition of either inorganic or organic seed protein composition calibration. nitrogen. We use glutamine as a nitrogen source since it also 0305 Data obtained from NIRS analysis was analyzed serves as a compound used to transport N in plants. statistically using a nearest-neighbor (N-N) analysis. The N N analysis allows removal of within-block spatial vari 0308 Germination assays. NaCl (150 mM), mannitol ability in a fairly flexible fashion which does not require (300 mM), sucrose (9.4%), ABA (0.3 uM), Heat (32° C.), prior knowledge of the pattern of variability in the chamber. Cold (8° C.), -N is basal media minus nitrogen plus 3% Ideally, all hybrids are grown under identical experimental Sucrose and -N/+Gln is basal media minus nitrogen plus 3% conditions within a block (rep). In reality, even in many Sucrose and 1 mM glutamine. block designs, significant within-block variability exists. 0309 Growth assays. Severe dehydration (drought), heat Nearest-neighbor procedures are based on assumption that (32° C. for 5 days followed by recovery at 22°C.), chilling environmental effect of a plot is closely related to that of its (8° C.), root development (visual assessment of lateral and neighbors. Nearest-neighbor methods use information from primary roots, root hairs and overall growth). For the adjacent plots to adjust for within-block heterogeneity and nitrogen limitation assay, all components of MS medium so provide more precise estimates of treatment means and remain constant except N is reduced to 20 mg/L of NHNO. differences. If there is within-plot heterogeneity on a spatial Note that 80% MS has 1.32 g/L NHNO, and 1.52 g/L scale that is larger than a single plot and Smaller than the KNO. entire block, then yields from adjacent plots will be posi 0310. Unless otherwise stated, all experiments are per tively correlated. Information from neighboring plots can be formed with the Arabidopsis thaliana ecotype Columbia used to reduce or remove the unwanted effect of the spatial (col-0). Assays are usually performed on non-selected seg heterogeneity, and hence improve the estimate of the treat regating T2 populations (in order to avoid the extra stress of ment effect. Data from neighboring plots can also be used to selection). Control plants for assays on lines containing reduce the influence of competition between adjacent plots. direct promoter-fusion constructs are Col-O plants trans The Papadakis N N analysis can be used with designs to formed an empty transformation vector (pMEN65). Controls remove within-block variability that would not be removed for 2-component lines (generated by Supertransformation) with the standard split plot analysis (Papadakis, 1973, Inst. are the background promoter-driver lines (i.e. promoter d'Amelior. Plantes Thessaloniki (Greece) Bull. Scientif., ::LexA-GAL4TA lines), into which the supertransforma No. 23; Papadakis, 1984, Proc. Acad. Athens, 59, 326-342). tions were initially performed. Example IX 0311 All assays are performed in tissue culture. Growing the plants under controlled temperature and humidity on Plate-Based Physiology Experimental Methods sterile medium produces uniform plant material that has not been exposed to additional stresses (such as water stress) 0306 Plate Assays. Twelve different plate-based physi which could cause variability in the results obtained. All ological assays (shown below), representing a variety of assays were designed to detect plants that are more tolerant drought-stress related conditions, are used as a pre-screen to or less tolerant to the particular stress condition and were identify top performing lines from each project (i.e. lines developed with reference to the following publications: Jang from transformation with a particular construct), that will be et al. (1997) Plant Cell 9: 5-19; Smeekens (1998) Curr: tested in Subsequent soil based assays. Typically, ten lines Opin. Plant Biol. 1: 230-234; Liu and Zhu (1997) Proc. Natl. are subjected to plate assays, from which the best three lines Acad. Sci. U.S. A. 94: 14960-14964; Saleki et al. (1993) are selected for Subsequent soil based assays. However, in Plant Physiol. 101: 839-845; Wu et al. (1996) Plant Cell 8: projects where significant stress tolerance is not obtained in 617-627; Zhu et al. (1998) Plant Cell 10: 1181-1191: Alia et plate based assays, lines are not Submitted for soil assays. al. (1998) Plant J. 16: 155-161; Xin and Browse, (1998) 0307 In addition, some projects are subjected to nutrient Proc. Natl. Acad. Sci. U.S.A. 95: 7799-7804; Leon-Kloost limitation studies. A nutrient limitation assay is intended to erziel et al. (1996) Plant Physiol. 110: 233-240. Where find genes that allow more plant growth upon deprivation of possible, assay conditions were originally tested in a blind nitrogen. Nitrogen is a major nutrient affecting plant growth experiment with controls that had phenotypes related to the and development that ultimately impacts yield and stress condition tested. tolerance. These assays monitor primarily root but also Procedures rosette growth on nitrogen deficient media. In all higher plants, inorganic nitrogen is first assimilated into glutamate, 0312 Prior to plating, seed for all experiments are surface glutamine, aspartate and asparagine, the four amino acids sterilized in the following manner: (1) 5 minute incubation used to transport assimilated nitrogen from sources (e.g. with mixing in 70% ethanol, (2) 20 minute incubation with leaves) to sinks (e.g. developing seeds). This process is mixing in 30% bleach, 0.01% triton-X 100, (3) 5x rinses regulated by light, as well as by C/N metabolic status of the with sterile water, (4) Seeds are re-suspended in 0.1% sterile plant. We use a C/N sensing assay to look for alterations in agarose and stratified at 4°C. for 3-4 days. the mechanisms plants use to sense internal levels of carbon 0313 All germination assays follow modifications of the and nitrogen metabolites which could activate signal trans same basic protocol. Sterile seeds are sown on the condi US 2006/0195944 A1 Aug. 31, 2006

tional media that has a basal composition of 80% MS+Vi and the leaves having a “crispy” texture. At the end of the tamins. Plates are incubated at 22°C. under 24-hour light drought period, pots are re-watered and scored after 5-6 (120-130 E mi s') in a growth chamber. Evaluation of days; the number of Surviving plants in each pot is counted, germination and seedling vigor is done 5 days after planting. and the proportion of the total plants in the pot that survived For assessment of root development, seedlings germinated is calculated. on 80% MS+Vitamins+1% sucrose are transferred to square 0318 Slit-pot method. A variation of the above method plates at 7 days. Evaluation is done 5 days after transfer was sometimes used, whereby plants for a given transgenic following growth in a vertical position. Qualitative differ line were compared to wild-type controls in the same pot. ences are recorded including lateral and primary root length, For those studies, 7 wild-type seedlings were transplanted root hair number and length, and overall growth. into one half of a 3.5 inch pot and 7 seedlings of the line 0314 For chilling (8° C.) and heat sensitivity (32° C.) being tested were transplanted into the other half of the pot. growth assays, seeds are germinated and grown for 7 days 0319 Analysis of results. In a given experiment, we on MS+Vitamins+1% sucrose at 22° C. and then are trans typically compare 6 or more pots of a transgenic line with 6 ferred to chilling or heat stress conditions. Heat stress is or more pots of the appropriate control. (In the split pot applied for 5 days, after which the plants are transferred method, 12 or more pots are used.) The mean drought score back to 22°C. for recovery and evaluated after a further 5 and mean proportion of plants Surviving (Survival rate) are days. Plants are subjected to chilling conditions (8° C.) and calculated for both the transgenic line and the wild-type evaluated at 10 days and 17 days. pots. In each case a p-value is calculated, which indicates 0315 For severe dehydration (drought) assays, seedlings the significance of the difference between the two mean are grown for 14 days on MS+Vitamins+1% Sucrose at 22 values. The results for each transgenic line across each C. Plates are opened in the sterile hood for 3 hr for hardening planting for a particular project are then presented in a and then seedlings are removed from the media and let dry results table. Results where the lines show a significantly for 2 h in the hood. After this time they are transferred back better or worse performance versus the control are high to plates and incubated at 22° C. for recovery. Plants are lighted. evaluated after 5 days. 0320 Calculation of p-values. For the assays where con 0316 Experiments were also performed to identify those trol and experimental plants are in separate pots, Survival is transformants or knockouts that exhibited modified Sugar analyzed with a logistic regression to account for the fact sensing. For such studies, seeds from transformants were that the random variable is a proportion between 0 and 1. germinated on media containing 5% glucose or 9.4% The reported p-value is the significance of the experimental Sucrose which normally partially restrict hypocotyl elonga proportion contrasted to the control, based upon regressing tion. Plants with altered Sugar sensing may have either the logit-transformed data. longer or shorter hypocotyls than normal plants when grown 0321) Drought score, being an ordered factor with no real on this media. Additionally, other plant traits may be varied numeric meaning, is analyzed with a non-parametric test Such as root mass. between the experimental and control groups. The p-value is calculated with a Mann-Whitney rank-sum test. Example X 0322 For the split-pot assays, matched control and Soil Drought Experimental Methods experimental measurements are available for both variables. In lieu of a direct transformed regression technique for this 0317. The soil drought assay (performed in clay pots) is data, the logit-transformed proportions are analyzed by based on that described by Haake et al. (2002). In the current parametric methods. The p-value is derived from a paired procedure, seedlings were first germinated on selection t-test on the transformed data. For the paired score data, the plates containing either kanamycin or Sulfonamide. Seeds p-value from a Wilcoxon test is reported. were sterilized by a 2 minute ethanol treatment followed by 20 minutes in 30% bleach/0.01% Tween and five washes in 0323 Measurement of Photosynthesis. Photosynthesis distilled water. Seeds are sown to MS agar in 0.1% agarose was measured using a LICOR LI-6400. The LI-6400 uses and stratified for 3 days at 4°C., before transfer to growth infrared gas analyzers to measure carbon dioxide to generate cabinets with a temperature of 22°C. After 7 days of growth a photosynthesis measurement. It is based upon the differ on selection plates, seedlings are transplanted to 3.5 inch ence of the CO reference (the amount put into the chamber) diameter clay pots containing 80 g of a 50:50 mix of and the CO, sample (the amount that leaves the chamber). vermiculite:perlite topped with 80 g of ProMix. Typically, Since photosynthesis is the process of converting CO to each pot contains 14 seedlings, and plants of the transgenic carbohydrates, we expect to see a decrease in the amount of line being tested are in separate pots to the wild-type CO. Sample. From this difference, a photosynthesis rate can controls. Pots containing the transgenic line versus control be generated. In some cases, respiration may occur and an pots were interspersed in the growth room, maintained under increase in CO detected. To perform measurements, the 24-hour light conditions (18-23°C., and 90-100 uEm’s') LI-6400 is set-up and calibrated as per LI-6400 standard and watered for a period of 14 days. Water was then directions. Photosynthesis is measured in the youngest most withheld and pots were placed on absorbent diaper paper for fully expanded leaf at 300 and 1000 ppm CO using a metal a period of 8-10 days to apply a drought treatment. After this halide light source. This light source provides about 700 LE period, a visual qualitative “drought score' from 0-6 is m2 s. assigned to record the extent of visible drought stress 0324 Fluorescence was measured in dark and light symptoms. A score of “6” corresponds to no visible symp adapted leaves using either a LI-6400 (LICOR) with a leaf toms whereas a score of “0” corresponds to extreme wilting chamber fluorometer attachment or an OS-1 (Opti-Sciences) US 2006/0195944 A1 Aug. 31, 2006 59 as described in the manufacturer's literature. When the plants in the T1 generation. The necrotic lesions observed on LI-6400 was used, all manipulations were performed under the T1 plants grown in soil were not observed on the plants a dark shade cloth. Plants were dark adapted by placing in grown in culture leaving uncertainty as to whether the a box under this shade cloth until used. The OS-30 utilized necrotic lesion phenotype is a classic lesion mimic pheno Small clips to create dark adapted leaves. type that would suggest that G2340 is involved in cell death responses or if the G2340 overexpressor plants are simply 0325 Chlorophyll/carotenoid determination. For some hyper-sensitive to stresses. One class of lesion mimic forms experiments, chlorophyll was estimated in methanolic progressive lesions following an inductive stress. Lesion extracts using the method of Porra et al. (1989) Biochim. et formation may be induced in G2340 overexpressing plants Biophys. Acta 975: 384-394. Carotenoids were estimated in grown in culture. In addition to the morphological changes, the same extract at 450 nm using an A(1%) of 2500. We overexpression of G2340 resulted in an extreme alteration in currently are measuring chlorophyll using a SPAD-502 seed glucosinolate profile. This phenotype was observed in (Minolata). When the SPAD-502 is being used to measure one line, line 1, in seed from two independent plantings. chlorophyll, both carotenoid and chlorophyll content and According to RT-PCR analysis, G2340 was expressed pri amount can also be determined via HPLC. Pigments are marily in roots and was slightly induced in leaf tissue in extracted from leave tissue by homogenizing leaves in response to auxin and heat treatments. G2340 can be used to acetone:ethyl acetate (3:2). Water was added, the mixture engineer plants with an inducible cell death response. A gene centrifuged, and the upper phase removed for HPLC analy that regulates cell death in plants can be used to induce a sis. Samples are analyzed using a Zorbax C18 (non-end pathogen protective hyper-response (HR) in plants without capped) column (250x4.6) with a gradient of acetonitrile the potentially detrimental consequences of a constitutive :water (85:15) to acetonitrile:methanol (85:15) in 12.5 systemic acquired resistance (SAR). Other potential utilities minutes. After holding at these conditions for two minutes, include the creation of novel abscission Zones or inducing Solvent conditions were changed to methanol:ethyl acetate death in reproductive organs to prevent the spread of pollen, (68:32) in two minutes. Carotenoids and chlorophylls are transgenic or otherwise. In the case of necrotrophic patho quantified using peak areas and response factors calculated gens that rely on dead plant tissue as a source of nutrients, using lutein and B-carotene as standards. prevention of cell death could confer tolerance to these diseases. Overexpression of G2340 in Arabidopsis also Example XI resulted in an extreme alteration in seed glucosinolate pro file. Therefore, the gene can be used to alter glucosinolate Experimental Results composition in plants. Increases or decreases in specific G2340; (SEQ ID NOs. 17 and 18) glucosinolates or total glucosinolate content are desirable depending upon the particular application. For example: (1) 0326 G2340 was analyzed using transgenic plants in Glucosinolates are undesirable components of the oilseeds which the gene was expressed under the control of the 35S used in animal feed, since they produce toxic effects. Low promoter. Overexpression of G2340 produced a spectrum of glucosinolate varieties of canola have been developed to deleterious effects on Arabidopsis growth and development. combat this problem. (2) Some glucosinolates have anti 35S::G2340 primary transformants were generally smaller cancer activity; thus, increasing the levels or composition of than controls, and at early stages some displayed leaves that these compounds might be of interest from a nutraceutical were held in a vertical orientation. The most severely standpoint. (3) Glucosinolates form part of a plants natural affected lines died at early stages. Others survived, but defense against insects. Modification of glucosinolate com displayed necrosis of the blades in later rosette leaves and position or quantity can therefore afford increased protection cauline leaves. Inflorescence development was also highly from predators. Furthermore, in edible crops, tissue specific abnormal; stems were typically shorter than wild type, often promoters can be used to ensure that these compounds kinked at nodes, and the tissue had a rather fleshy succulent appearance. Flower buds were frequently poorly formed, accumulate specifically in tissues, such as the epidermis, failed to open and withered away without siliques develop which are not taken for consumption. ing. Additionally, secondary shoot growth frequently failed G2583: (SEQ ID NOs. 143 and 144) the tips of Such structures sometimes Senesced. Due to these 0327 G2583 was studied using transgenic plants in abnormalities, many of the primary transformants were which the gene was expressed under the control of the 35S completely infertile. Three T1 lines (#1.5.20) with a rela promoter. Most notably, 35S:G2583 plants exhibited tively weak phenotype, which did set Some seed, were extremely glossy leaves. At early stages, 35S::G2583 seed selected for further study. Plants from the T2-20 population lings appeared normal, but by about two weeks after Sowing, displayed a strong phenotype, and died early in develop the plants exhibited very striking shiny leaves, which were ment. The other two T2 populations were slightly small, but apparent until very late in development. In addition to this the effects were much weaker than those seen in the parental phenotype, it should be noted that many lines displayed a plants, suggesting that activity of the transgene might have variety of other effects such as a reduction in overall size, become reduced between the generations. It should be noted narrow curled leaves, or various non-specific floral abnor that G2340 and G671 (SEQID NO: 19) are part of the same malities, which reduced fertility. These effects on leaf clade and that they had very similar morphological pheno appearance were observed in 18/20 primary transformants, types and a similar expression pattern. These two genes may and in all the plants from 4/6 of the T2 lines (#2.49 and 15) have overlapping or redundant phenotypes in the plant. examined. The glossy nature of the leaves from 35S::G2583 Small, pale seedlings with strap-like leaves that held a plants can be a consequence of changes in epiculticular wax vertical orientation were found in the mixed line populations content or composition. G2583 belongs to a small clade of 35S::G2340 transgenic seedlings when grown under within the large AP2/EREBP Arabidopsis family that also sterile conditions, similar to those observed in soil grown contains G975 (SEQID NO: 89), G1387 (SEQID NO: 145), US 2006/0195944 A1 Aug. 31, 2006 60 and G977 (SEQ ID NO: 147). Overexpression of G975 utilities in molecular farming practices (for example, the use (SEQ ID NO: 89) caused a substantial increase in leaf wax of trichomes as a manufacturing system for complex sec components, as well as morphological phenotypes resem ondary metabolites), and in producing insect resistant and bling those observed in 35S:G2583 plants. G2583 was herbivore resistant plants. In addition, G362 can be used to ubiquitously expressed (at higher levels in root, flower, alter a plant's time to flowering. embryo, and silique tissues). G2583 can be used to modify plant appearance (shiny leaves). In addition, it can be used G2105: (SEQ ID NOS. 63 and 64) to manipulate wax composition, amount, or distribution, 0329. The ORF boundary of G2105 was determined and which in turn can modify plant tolerance to drought and/or G2105 was analyzed using transgenic plants in which low humidity or resistance to insects. G2105 was expressed under the control of the 35S promoter. G362: (SEQ ID NOs. 61 and 62) Two of four T2 lines examined appeared dark green and 0328 G362 was analyzed using transgenic plants in were Smaller than wild type at all stages of development. which G362 was expressed under the control of the 35S Additionally, the adaxial leaf surfaces from these plants had promoter. 35S::G362 had a number of developmental effects a somewhat lumpy appearance caused by trichomes being with the most prominent result being an increase in trichome raised-up on Small mounds of epidermal cells. Two lines of number as well as the ectopic formation of trichomes. G2105 overexpressing plants had larger seed. G2105 Overexpression of G362 also increased anthocyanin levels expression was root specific and induced in leaves by auxin, in various tissues at different stages of growth. Seedlings abscisic acid, high temperature, salt and osmotic stress Sometimes showed high levels of pigment in the first true treatments. On the basis of the analyses, G2105 can be used leaves. Late flowering lines also became darkly pigmented. to manipulate some aspect of plant growth or development, Seeds from a number of lines were observed to develop particularly in trichome development. In addition, G2105 patches of dark purple pigmentation. Inflorescences from can be used to modify seed size and/or morphology, which 35S::G362 plants were thin, and flowers sometimes dis can have an impact on yield. The promoter of G2105 can played poorly developed organs. The seed yield from many have some utility as a root specific promoter. lines was somewhat poor. As determined by RT-PCR, G362 is expressed in roots, and is expressed at significantly lower G47 (SEQ ID NOS. 65 and 66) levels in siliques, seedlings and shoots. No expression of 0330 G47 was studied using transgenic plants in which G362 was detected in the other tissues tested. G362 expres the gene was expressed under the control of the 35S pro sion was induced in rosette leaves by heat stress. G362 can moter. Overexpression of G47 resulted in a variety of be used to alter trichome number and distribution in plants. morphological and physiological phenotypic alterations. Trichome glands on the Surface of many higher plants 35S::G47 plants showed enhanced tolerance to osmotic produce and secrete exudates which give protection from the stress. In a root growth assay on PEG-containing media, elements and pests such as insects, microbes and herbivores. G47 overexpressing transgenic seedlings were larger and These exudates may physically immobilize insects and had more root growth compared with wild-type controls. spores, may be insecticidal or ant-microbial or they may G47 expression levels may be altered by environmental allergens or irritants to protect against herbivores. Tri conditions, in particular reduced by salt and osmotic chomes have also been suggested to decrease transpiration stresses. In addition to the phenotype observed in the by decreasing leaf Surface air flow, and by exuding chemi osmotic stress assay, germination efficiency for the seeds cals that protect the leaf from the sun. Another use for G362 from G47 overexpressor plants was low. Overexpression of is to increase the density of cotton fibers in cotton bolls. G47 also produced a substantial delay in flowering time and Cotton fibers are modified unicellular trichomes that are caused a marked change in shoot architecture. 35S::G47 produced from the ovule epidermis. However, typically only transformants were Small at early stages and Switched to 30% of the epidermal cells take on a trichome fate (Basra flowering more than a week later than wild-type controls and Malik (1984) Int. Rev. Cytol. 89: 65-113). Thus, cotton (continuous light conditions). The inflorescences from these yields can be increased by inducing a greater proportion of plants appeared thick and fleshy, had reduced apical domi the ovule epidermal cells to become fibers. Depending on nance, and exhibited reduced internode elongation leading the plant species, varying amounts of diverse secondary to a short compact stature. The branching pattern of the biochemicals (often lipophilic terpenes) are produced and stems also appeared abnormal, with the primary shoot exuded or volatilized by trichomes. These exotic secondary becoming kinked at each coflorescence node. Additionally, biochemicals, which are relatively easy to extract because the plants showed slightly reduced fertility and formed they are on the surface of the leaf, have been widely used in rather Small siliques that were borne on short pedicels and Such products as flavors and aromas, drugs, pesticides and held vertically, close against the stem. Additional alterations cosmetics. One class of secondary metabolites, the diterpe were detected in the inflorescence stems of 35S::G47 plants. nes, can effect several biological systems such as tumor Stem sections from T2-21 and T2-24 plants were of wider progression, prostaglandin synthesis and tissue inflamma diameter, and had large irregular vascular bundles contain tion. In addition, diterpenes can act as insect pheromones, ing a much greater number of xylem vessels than wild type. termite allomones, and can exhibit neurotoxic, cytotoxic and Furthermore, some of the xylem vessels within the bundles antimitotic activities. As a result of this functional diversity, appeared narrow and were possibly more lignified than were diterpenes have been the target of research several pharma those of controls. G47 was expressed at higher levels in ceutical ventures. In most cases where the metabolic path rosette leaves, and transcripts were detected in other tissues ways are impossible to engineer, increasing trichome density (flower, embryo, silique, and germinating seedling). G47 or size on leaves may be the only way to increase plant can be used to manipulate flowering time, to modify plant productivity. Thus, the use of G362 and its homologs to architecture and stem structure (including development of increase trichome density, size or type can have profound vascular tissues and lignin content) and to improve plant US 2006/0195944 A1 Aug. 31, 2006

performance under osmotic stress. The use of G47 or of G47 cDNA clone F1258. The similarity between G975 and the orthologs from tree species can be used to modulate lignin Brassica rapa gene represented by EST LA6408 extends content of a plant. This allows the quality of wood used for beyond the conserved AP2 domain that characterizes the furniture or construction to be improved. Lignin is energy AP2/EREBP family. This Brassica rapa gene appeared to be rich; increasing lignin composition could therefore be valu more closely related to G975 than Arabidopsis G1387, able in raising the energy content of wood used for fuel. indicating that EST LA6408 may represent a true G975 Conversely, the pulp and paper industries seek wood with a ortholog. The similarity between G975 and Arabidopsis reduced lignin content. Currently, lignin must be removed in G1387 (SEQ ID NO: 145) also extends beyond the con a costly process that involves the use of many polluting served AP2 domain. chemicals. Consequently, lignin is a serious barrier to effi 0334 G2583 (SEQ ID NO: 143 and 144), a closely cient pulp and paper production. In addition to forest bio related homolog of G975 determined by BLAST, alignment technology applications, changing lignin content might and phylogeneitc analysis, has been shown to confer a increase the palatability of various fruits and vegetables. A transcriptional regulatory activity of G975 in that when the wide variety of applications exist for systems that either polypeptide sequences were overexpressed in plants and lengthen or shorten the time to flowering. produced some lines that were later in their development and 0331 Closely-related homologs of G47, determined by flowering, and produced shiny leaves, indicating more wax BLAST, alignment and phylogeneitc analysis, include production, similar to plants overexpressing G975 (Table 4), G2133 (SEQ ID NO: 152), G3643 (SEQ ID NO: 158), as compared to controls. Other closely related sequences G3644 (SEQID NO: 156), and G3649 (SEQ ID NO: 154). include G1387 (SEQ ID NO: 145 and 146), and G4294 Each of these sequences has conferred a transcriptional (SEQID NO: 149 and 150). Although detailed analyses with regulatory activity of G47 in that when any of these plants overexpressing these sequence have not yet been sequences were overexpressed in plants, they have each performed, plants overexpressing these related sequences produced some lines that were larger, later in their devel are likely to confer Some similar transcriptional regulatory opment and flowering, and more tolerant to water-depriva activity and traits as G975. tion, cold or salt, similar to plants overexpressing G47 (Table 4), as compared to controls. G214: (SEQ ID NOs. 33 and 34) 0335 G214 overexpressing lines were late bolting, G975: (SEQ ID NOS. 89 and 90) showed larger biomass (increased leaf number and size), and 0332 G975 was identified as a new member of the were darker green in vegetative and reproductive tissues due AP2/EREBP family (EREBP subfamily) of transcription to a higher chlorophyll content in the later stages of devel factors. G975 was expressed in flowers and, at lower levels, opment. In these later stages, the overexpressor plants also in shoots, leaves, and siliques. GC-FID and GC-MS analy had higher insoluble Sugar, leaf fatty acid, and carotenoid ses of leaves from G975 overexpressing plants showed that content per unit area. Line 11 also showed a significant, the levels of C29, C31, and C33 alkanes were substantially repeatable increase in lutein levels in seeds. Micro-array increased (up to 10-fold) compared with control plants. A data was consistent with the morphological and biochemical number of additional compounds of similar molecular data in that the genes that were highly induced included weight, presumably also wax components, also accumulated chloroplast localized enzymes, and light regulated genes to significantly higher levels in G975 overexpressing plants. Such as Rubisco, carbonic anhydrase, and the photosystem 1 C29 alkanes constituted close to 50% of the wax content in reaction center subunit precursor. A chlorophyll biosynthetic wild-type plants (Millar et al. (1998) Plant Cell 11: 1889 enzyme was also highly induced, consistent with the dark 1902), Suggesting that a major increase in total wax content green color of the adult leaves and perhaps a higher photo occurred in the G975 transgenic plants. However, the trans synthetic rate. A measurement of leaf fatty acid in the older genic plants had an almost normal phenotype (although overexpressors Suggested that the overall levels were higher Small morphological differences are detected in leaf appear than wild-type levels (except for the percent composition of ance), indicating that overexpression of G975 was not 16:3 in line 11). Percent composition of 16:1 and 16:3 (fatty deleterious to the plant. Overexpression of G975 did not acids found primarily in plastids) is similar to wild-type cause the dramatic alterations in plant morphology that had arguing against an increase in chloroplast number as an been reported for Arabidopsis plants in which the FATTY explanation for increase chlorophyll content in the leaves. ACID ELONGATION1 gene was overexpressed (Millar et G214 overexpressing lines 3, 11, and 15 were sensitive to al. (1998) supra). G975 may regulate the expression of some germination on high glucose showing less cotyledon expan of the genes involved in wax metabolism. One Arabidopsis sion and hypocotyl elongation Suggesting the late bolting AP2 sequence (G1387: SEQ ID NO: 145) that is signifi and dark green phenotype could be tied into carbon sensing cantly more closely related to G975 than the rest of the which has been shown to regulate phytochrome A signaling. members of the AP2/EREBP family is predicted to have a Sugars are key regulatory molecules that affect diverse function and a use related to that of G975. G975 can be used processes in higher plants including germination, growth, to manipulate wax composition, amount, or distribution, flowering, senescence, Sugar metabolism and photosynthe which in turn can modify plant tolerance to drought and/or sis. Glucose-specific hexose-sensing has also been described low humidity or resistance to insects, as well as plant in plants and implicated in cell division and the repression appearance (shiny leaves). G975 can also be used to spe of famine genes (photosynthetic or glyoxylate cycles). cifically alter wax composition, amount, or distribution in Potential utilities of G214 include increasing chlorophyll content allowing more growth and productivity in condi those plants and crops from which wax is a valuable product. tions of low light. With a potentially higher photosynthetic 0333) A non-Arabidopsis gene that is related to G975 is rate, fruits can have higher Sugar content. Increased caro LA6408 BNAF1258 Mustard flower buds Brassica rapa tenoid content can be used as a nutraceutical to produce US 2006/0195944 A1 Aug. 31, 2006 62 foods with greater antioxidant capability. Also G214 can be was induced by auxin and ABA treatment, and by heat stress. used to manipulate seed composition which is very impor G1777 has utility in manipulating seed oil and protein tant for the nutritional value and production of various food COntent. products. G2520; (SEQ ID NOs. 37 and 38) 0336 G214 is homologous to a tomato (Cornell Lyco 0343 G2520 was analyzed using transgenic plants in persicon esculentum) EST (cLER12A11) generated from a which G2520 was expressed under the control of the 35S Pseudomonas resistant line. promoter. At early stages, 35S::G2520 transformants dis G974: (SEQ ID NOs. 51 and 52) played abnormal curled cotyledons, long hypocotyls, and rather short roots. During the vegetative phase, these plants 0337 The complete sequence of G974 was obtained and formed somewhat small flat leaves. Following the switch to G974 was studied using transgenic plants in which G974 reproductive growth, 35S::G2520 inflorescences were typi was expressed under the control of the 35S promoter. cally very spindly, slightly pale colored, and stems often Constitutive expression of G974 produced deleterious split open at late stages. Flowers were frequently small with effects: the majority of 35S::G974 primary transformants narrow organs and showed poor pollen production. As a showed a reduction in overall size and developed rather result, the seed yield from 35S::G2520 plants was low slowly compared to wild type controls. These phenotypic compared to wild-type controls. These effects were observed alterations were not observed in the T2 generation, perhaps in the majority of primary transformants, and to varying indicating silencing of the transgene. The T2 plants were extents, in all three of the T2 populations. Overexpression of wild-type in the physiological and biochemical analyses G2520 also resulted in an increase in the leaf glucosinolate performed. G974 was ubiquitously expressed. 35S::G974 M39478 in lines 11 and 14. In addition, these lines showed had altered seed oil content an increase in seed C-tocopherol and a decrease in seed 0338. Several AP2 proteins from a variety of species Y-tocopherol. No altered phenotypes were detected in any of (Atriplex hortensis, Lycopersicon esculentum, Glycine max, the physiological assays. G2520 was expressed throughout Populus balsamifera, Medicago truncatula) exhibited some the plant and was induced by ABA, heat, salt, drought and sequence similarity with G974 outside of the signature AP2 osmotic stress. G2520 is useful for manipulating plant domain sequence, and bear nearly identical AP2 domains. development and altering leaf glucosinolate composition. These proteins may be related. Increases or decreases in specific glucosinolates or total glucosinolate content are be desirable depending upon the G2343: (SEQ ID NOS. 53 and 54) particular application. For example: (1) Glucosinolates are undesirable components of the oilseeds used in animal feed, 0339. The complete sequence of G2343 was determined since they produce toxic effects. Low-glucosinolate varieties and G2343 was analyzed using transgenic plants in which of canola have been developed to combat this problem. (2) G2343 was expressed under the control of the 35S promoter. Some glucosinolates have anti-cancer activity; thus, increas The phenotype of these transgenic plants was wild-type in ing the levels or composition of these compounds might be all assays performed. As determined by RT-PCR, G2343 is of interest from a nutraceutical standpoint. (3) Glucosino expressed in shoots, embryos and siliques. G2343 expres lates form part of a plant's natural defense against insects. sion is induced in rosette leaves by auxin, heat stress, and Modification of glucosinolate composition or quantity can infection by Fusarium oxysporum. 35S::G2343 had an therefore afford increased protection from predators. Fur altered seed oil content thermore, in edible crops, tissue specific promoters can be 0340 G2343 is a related tomato gene LETHM1 used to ensure that these compounds accumulate specifically (CAA64615). Similarity between G2343 and LETHM1 in tissues, such as the epidermis, which are not taken for extends beyond the signature motif of the family to a level consumption. G2520 can also be used to modify seed that would suggest the genes are orthologs. tocopherol composition. Tocopherols have anti-oxidant and vitamin E activity. G2123: (SEQ ID NOS. 67 and 68) 0341 G2123 was analyzed using transgenic plants in Example XII which G2123 was expressed under the control of the 35S promoter. The phenotype of these transgenic plants was Identification of Homologous Sequences wild-type in all assays performed. G2123 was expressed primarily in developing seeds and silique tissue in wild-type 0344) Homologous sequences from Arabidopsis and plants. G2123 corresponds to a predicted putative 14-3-3 plant species other than Arabidopsis were identified using protein in annotated BAC clone T11 I11 (AC012680), from database sequence search tools, such as the Basic Local chromosome 1 of Arabidopsis. Alignment Search Tool (BLAST) (Altschul et al. (1990).J. Mol. Biol. 215: 403-410; and Altschul et al. (1997) Nucl. G1777: (SEQ ID NOs. 55 and 56) Acid Res. 25: 3389-3402). The tblastx sequence analysis programs were employed using the BLOSUM-62 scoring 0342 G1777 (SEQ ID NO: 55) was analyzed using transgenic plants in which G1777 was expressed under the matrix (Henikoff and Henikoff (1992) Proc. Natl. Acad. Sci. control of the 35S promoter. Overexpression of G1777 in USA 89: 10915-10919). Arabidopsis resulted in an increase in seed oil content and a 0345 The polynucleotide and polypeptide sequences decrease in seed protein content in T2 lines 1 and 13. The derived from monocots (e.g., the rice or maize sequences) change in seed oil in line 1 was just below the significance may be used to transform both monocot and dicot plants, and cutoff, but the seed protein change was significant. G1777 those derived from dicots (e.g., the Arabidopsis and Soy was expressed in all examined tissue of Arabidopsis. G1777 sequences) may be used to transform either group, although US 2006/0195944 A1 Aug. 31, 2006 it is expected that Some of these sequences will function best into soybeans or tomatoes. One Such method is microprojec if the gene is transformed into a plant from the same class tile-mediated transformation, in which DNA on the surface as that from which the sequence is derived. of microprojectile particles is driven into plant tissues with a biolistic device (see, for example, Sanford et al. (1987) Example XIII Part. Sci. Technol. 5: 27-37; Christou et al. (1992) Plant. J. 2: 275-281; Sanford (1993) Methods Enzymol. 217: 483 Transformation of Dicots to Produce Improved 509; Klein et al. (1987) Nature 327: 70-73: U.S. Pat. No. Biochemical and Other Traits 5,015,580 (Christou et al), issued May 14, 1991; and U.S. 0346) Homologous sequences from Arabidopsis and Pat. No. 5,322,783 (Tomes et al.), issued Jun. 21, 1994). plant species other than Arabidopsis were identified using database sequence search tools, such as the Basic Local 0351 Alternatively, sonication methods (see, for Alignment Search Tool (BLAST) (Altschul et al. (1990) example, Zhang et al. (1991) Bio/Technology 9: 996-997); supra; and Altschuletal. (1997) supra). The tblastx sequence direct uptake of DNA into protoplasts using CaCl precipi analysis programs were employed using the BLOSUM-62 tation, polyvinyl alcohol or poly-L-ornithine (see, for example, Hain et al. (1985) Mol. Gen. Genet. 199: 161-168: scoring matrix (Henikoff and Henikoff (1992) supra). Draper et al. (1982) Plant Cell Physiol. 23: 451–458); 0347 Crop species including tomato and soybean plants liposome or spheroplast fusion (see, for example, Deshayes that overexpress any of a considerable number of the tran et al. (1985) EMBO.J. 4: 2731-2737; Christou et al. (1987) scription factor polypeptides of the invention have been Proc. Natl. Acad. Sci. USA 84: 3962-3966); and electropo shown experimentally to produce plants with increased ration of protoplasts and whole cells and tissues (see, for drought tolerance and/or biomass in field trials. For example, Donn et al. (1990) in Abstracts of VIIth Interna example, tomato plants overexpressing the G2153 polypep tional Congress on Plant Cell and Tissue Culture IAPTC, tide have been found to be larger than wild-type control A2-38:53: DHalluin et al. (1992); and Spencer et al. (1994) tomato plants. For example, soy plants overexpressing a Plant Mol. Biol. 24: 51-61) have been used to introduce number of G481, G682, G867 and G 1073 orthologs have foreign DNA and expression vectors into plants. been shown to be more drought tolerant than control plants. These observations indicate that these genes, when overex 0352. After a plant or plant cell is transformed (and the pressed, will result in larger yields than non-transformed latter regenerated into a plant), the transformed plant may be plants in both stressed and non-stressed conditions. crossed with itself or a plant from the same line, a non transformed or wild-type plant, or another transformed plant 0348 Thus, transcription factor polynucleotide from a different transgenic line of plants. Crossing provides sequences listed in the Sequence Listing recombined into, the advantages of producing new and often stable transgenic for example, one of the expression vectors of the invention, varieties. Genes and the traits they confer that have been or another suitable expression vector, may be transformed introduced into a tomato or soybean line may be moved into into a plant for the purpose of modifying plant traits for the distinct line of plants using traditional backcrossing tech purpose of improving yield and/or quality. The expression niques well known in the art. Transformation of tomato vector may contain a constitutive, tissue-specific or induc plants may be conducted using the protocols of Koornneef ible promoter operably linked to the transcription factor et al (1986) In Tomato Biotechnology: Alan R. Liss, Inc., polynucleotide. The cloning vector may be introduced into 169-178, and in U.S. Pat. No. 6,613,962, the latter method a variety of plants by means well known in the art such as, described in brief here. Eight day old cotyledon explants are for example, direct DNA transfer or Agrobacterium tume precultured for 24 hours in Petri dishes containing a feeder faciens-mediated transformation. It is now routine to pro layer of Petunia hybrida suspension cells plated on MS duce transgenic plants using most dicot plants (see Weiss medium with 2% (w/v) sucrose and 0.8% agar supplemented bach and Weissbach, (1989) supra; Gelvin et al. (1990) with 10 uM C.-naphthalene acetic acid and 4.4 uM 6-ben supra; Herrera-Estrella et al. (1983) supra; Bevan (1984) Zylaminopurine. The explants are then infected with a supra; and Klee (1985) supra). Methods for analysis of traits diluted overnight culture of Agrobacterium tumefaciens are routine in the art and examples are disclosed above. containing an expression vector comprising a polynucle 0349 Numerous protocols for the transformation of otide of the invention for 5-10 minutes, blotted dry on sterile tomato and soy plants have been previously described, and filter paper and cocultured for 48 hours on the original feeder are well known in the art. Gruber et al. (1993) in Methods layer plates. Culture conditions are as described above. in Plant Molecular Biology and Biotechnology, p. 89-119, Overnight cultures of Agrobacterium tumefaciens are and Glick and Thompson (1993) Methods in Plant Molecu diluted in liquid MS medium with 2% (w/v/) sucrose, pH lar Biology and Biotechnology, eds., CRC Press, Inc., Boca 5.7) to an ODoo of 0.8. Raton, describe several expression vectors and culture meth 0353. Following cocultivation, the cotyledon explants are ods that may be used for cell or tissue transformation and transferred to Petri dishes with selective medium comprising Subsequent regeneration. For soybean transformation, meth MS medium with 4.56 uM Zeatin, 67.3 uM Vancomycin, ods are described by Miki et al. (1993) in Methods in Plant 418.9 uM cefotaxime and 171.6 uM kanamycin sulfate, and Molecular Biology and Biotechnology, p. 67-88, Glick and cultured under the culture conditions described above. The Thompson, eds. CRC Press, Inc., Boca Raton; and U.S. Pat. explants are subcultured every three weeks onto fresh No. 5,563.055, (Townsend and Thomas), issued Oct. 8, medium. Emerging shoots are dissected from the underlying 1996. callus and transferred to glass jars with selective medium 0350. There are a substantial number of alternatives to without Zeatin to form roots. The formation of roots in a Agrobacterium-mediated transformation protocols, other kanamycin Sulfate-containing medium is a positive indica methods for the purpose of transferring exogenous genes tion of a successful transformation. US 2006/0195944 A1 Aug. 31, 2006 64

0354 Transformation of soybean plants may be con of U.S. Pat. No. 5,591,616, in which callus ducted using the methods found in, for example, U.S. Pat. is transformed by contacting dedifferentiating tissue with the No. 5,563,055 (Townsend et al., issued Oct. 8, 1996), Agrobacterium containing the cloning vector. described in brief here. In this method soybean seed is Surface sterilized by exposure to chlorine gas evolved in a 0359 The sample tissues are immersed in a suspension of glass bell jar. Seeds are germinated by plating on 1/10 3x10 cells of Agrobacterium containing the cloning vector strength agar Solidified medium without plant growth regu for 3-10 minutes. The callus material is cultured on solid lators and culturing at 28°C. with a 16 hour day length. medium at 25° C. in the dark for several days. The calli After three or four days, seed may be prepared for coculti grown on this medium are transferred to Regeneration Vation. The Seedcoat is removed and the elongating radicle medium. Transfers are continued every 2-3 weeks (2 or 3 times) until shoots develop. Shoots are then transferred to removed 34 mm below the cotyledons. Shoot-Elongation medium every 2-3 weeks. Healthy look 0355 Overnight cultures of Agrobacterium tumefaciens ing shoots are transferred to rooting medium and after roots harboring the expression vector comprising a polynucleotide have developed, the plants are placed into moist potting soil. of the invention are grown to log phase, pooled, and con centrated by centrifugation. Inoculations are conducted in 0360 The transformed plants are then analyzed for the batches such that each plate of seed was treated with a newly presence of the NPTII gene/kanamycin resistance by resuspended pellet of Agrobacterium. The pellets are resus ELISA, using the ELISA NPTII kit from 5Prime-3Prime pended in 20 ml inoculation medium. The inoculum is Inc. (Boulder, Colo.). poured into a Petri dish containing prepared seed and the 0361. It is also routine to use other methods to produce cotyledonary nodes are macerated with a Surgical blade. transgenic plants of most cereal crops (Vasil (1994) Plant After 30 minutes the explants are transferred to plates of the Mol. Biol. 25: 925-937) such as corn, wheat, rice, sorghum same medium that has been solidified. Explants are embed (Cassas et al. (1993) Proc. Natl. Acad. Sci. 90: 11212-1121), ded with the adaxial side up and level with the surface of the and barley (Wan and Lemeaux (1994) Plant Physiol. 104: medium and cultured at 22°C. for three days under white 37-48). DNA transfer methods such as the microprojectile fluorescent light. These plants may then be regenerated method can be used for corn (Fromm et al. (1990) supra; according to methods well established in the art, Such as by Gordon-Kamm et al. (1990) supra; Ishida (1990) Nature moving the explants after three days to a liquid counter Biotechnol. 14: 745-750), wheat (Vasil et al. (1992) Bio/ selection medium (see U.S. Pat. No. 5,563,055). Technol. 10: 667-674; Vasil et al. (1993a) Bio/Technology 0356. The explants may then be picked, embedded and 10: 667-674; Vasil et al. (1993b) Bio/Technol. 11: 1553 cultured in solidified selection medium. After one month on 1558: Weeks et al. (1993) supra), and rice (Christou (1991) selective media transformed tissue becomes visible as green Bio/Technology 9: 957-962: Hiei et al. (1994) Plant J. 6: sectors of regenerating tissue against a background of 271-282; Aldemita and Hodges (1996) Planta 199: 612-617; bleached, less healthy tissue. Explants with green sectors are and Hiei et al. (1997) Plant Mol. Biol. 35: 205-218). For transferred to an elongation medium. Culture is continued most cereal plants, embryogenic cells derived from imma on this medium with transfers to fresh plates every two ture scutellum tissues are the preferred cellular targets for weeks. When shoots are 0.5 cm in length they may be transformation (Hiei et al. (1997) supra; Vasil (1994) supra). excised at the base and placed in a rooting medium. For transforming corn embryogenic cells derived from immature Scutellar tissue using microprojectile bombard Example XIV ment, the A188XB73 genotype is the preferred genotype (Fromm et al. (1990) supra; Gordon-Kamm et al. (1990) Transformation of Cereal Plants with an Expression Supra). After microprojectile bombardment the tissues are Vector selected on phosphinothricin to identify the transgenic embryogenic cells (Gordon-Kamm (1990) Plant Cell 2: 0357 Cereal plants such as, but not limited to, corn, 603-618). Transgenic plants are regenerated by standard wheat, rice, Sorghum, or barley, may be transformed with the corn regeneration techniques (Fromm et al. (1990) Supra; present polynucleotide sequences, including monocot or Gordon-Kamm et al. (1990) supra). dicot-derived sequences Such as those presented in Tables 4-6, cloned into a vector Such as pGA643 and containing a Example XV kanamycin-resistance marker, and expressed constitutively under, for example, the CaMV 35S or COR15 promoters, or Transcription Factor Expression and Analysis of with tissue-specific or inducible promoters. The expression Improved Traits vectors may be one found in the Sequence Listing, or any other suitable expression vector may be similarly used. For 0362 Biochemical assays such as those disclosed above example, pMENO20 may be modified to replace the NptlI may be used to identify improved characteristics in any of coding region with the BAR gene of Streptomyces hygro the transgenic or knock plants produced with sequences of scopicus that confers resistance to phosphinothricin. The the invention, such as polynucleotides SEQ ID NO: 2n-1, KpnI and BglII sites of the Bar gene are removed by wherein n=1-84, or SEQ ID NO: 2n, wherein n=121-127. site-directed mutagenesis with silent codon changes. 0363) Northern blot analysis, RT-PCR or microarray 0358. The cloning vector may be introduced into a vari analysis of the regenerated, transformed plants may also be ety of cereal plants by means well known in the art including used to show expression of a transcription factor polypeptide direct DNA transfer or Agrobacterium tumefaciens-medi or the invention and related genes that are capable of ated transformation. The latter approach may be accom inducing improved biochemical characteristics, abiotic plished by a variety of means, including, for example, that stress tolerance, and/or larger size. US 2006/0195944 A1 Aug. 31, 2006

0364 To verify the ability to confer stress resistance, 0366 These experiments would demonstrate that tran mature plants overexpressing a transcription factor of the Scription factor polypeptides of the invention can be iden invention, or alternatively, seedling progeny of these plants, tified and shown to confer improved biochemical character may be challenged by a stress such as drought, heat, cold, istics, larger size, greater yield, and/or abiotic stress high salt, or desiccation. Alternatively, these plants may tolerance in dicots or monocots, including multiple challenged in a hyperosmotic stress condition that may also improved biochemical characteristics and/or tolerance to measure altered Sugar sensing. Such as a high Sugar condi multiple stresses. tion. By comparing control plants (for example, wild type) 0367. It is expected that the same methods may be and transgenic plants similarly treated, the transgenic plants applied to identify other useful and valuable sequences of may be shown to have greater tolerance to the particular the present transcription factor clades, and the sequences StreSS. may be derived from a diverse range of species. 0365. After a dicot plant, monocot plant or plant cell has 0368 All references, publications, patent documents, been transformed (and the latter regenerated into a plant) web pages, and other documents cited or mentioned herein and shown to have improved biochemical characteristics, are hereby incorporated by reference in their entirety for all greater size or tolerance to abiotic stress, or produce greater purposes. Although the invention has been described with yield relative to a control plant under the stress conditions, reference to specific embodiments and examples, it should the transformed monocot plant may be crossed with itself or be understood that one of ordinary skill can make various a plant from the same line, a non-transformed or wild-type modifications without departing from the spirit of the inven monocot plant, or another transformed monocot plant from tion. The scope of the invention is not limited to the specific a different transgenic line of plants. embodiments and examples provided.

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS: 256 <210> SEQ ID NO 1 <211& LENGTH 2793 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &22O > FEATURE &223> OTHER INFORMATION G1272

<400 SEQUENCE: 1 atggattcaa caaatggtaa cqgagctgat cittgaatcag caaatggggc aaacgg gagt 60 ggggttact g aggcattacc accitcct coa coagttatac citccaaatgt ggalaccagtt 120 cgtgttaaaa citgaacttgc tagaagaag gggc.cagttc gagttccitat ggctic gaaaa 18O ggatttggaa caaggggcca aaagatc.ccc ttgttaacaa at catttcaa agtc gatgtg 240 gctaatcttic agggit cattt ctitccactac agtgtggctic tattotatoga to atggtogt 3OO cctgttgaac aaaagggtgt toggaagaaaa atcc ttgaca aggtgcatca gacittaccat 360 totgatctgg atggtaaaga gtttgctitat gacggtoaga agacgttgtt tacatatgga 420 gctittgccta gtaacaagat ggatttittct gtggtgcttg aggaagtatic toctacaagt 480 aaggattittg tdag cagggc taatggaaac ggaa.gc.ccca atgg gaatga aagttccaagt 540 gatggtgata ggaaaag act gcigtaggcct aaccggtoca aaaactittag agtggagatc 600 agctato.cgg ccaaaattcc tottcaagct cittgctaatg caatgcgggg acaagaatca 660 gagaatticcic aggaggcaat acgggttctt gatatoatat tdaggcaa.ca toctoctaga 720

caaggttgct togcttgttcg acagtc.ttitt titccacaatig atccalaccala citgttgaacca 78O

gttggtggta acatcttagg atgtagggga tittcacticca gtttcagaac aacgcagggit 840

ggcatgtcac ttaatatgga tottaca acc accatgatca toaa.gc.ctogg to cagtggitt 9 OO

gattitccitaa ttgctaacca aaatgctagg gacccittatt cqattgact g g totaaggct 96.O

aaacgaacco ttaagaacct aagggtaaag gtcago.ccct caggccaaga attcaagata 1020

accggattga gtgacaagcc ttgcagg gaa caaacgtttgaattgaagaa aaggaaccoa 1080 US 2006/0195944 A1 Aug. 31, 2006 66

-continued aatgaaaatg gagagttcga aac tactgaa gttacagttg citgacitactt cog.cgataca 14 O agg catattg atttgcaata ttctg.cggat ttgccittgca totaatgttgg gaagccaaag 200 cg acco actt acattcct ct c gagctotgc gcgttggttc. cactitcagag gtacacaaaa 260 gcacttacca C gttccaaag atctg.cccitt gttgagaaat coagacagaa accocaagag 320 aggatgacitg ttctgtccaa agctotgaaa gttagcaact atgatgcgga accactcctg 38O cgatcc totg gcatttcg at cagotccaac titt acto agg toggagggtog togttctacca 4 40 gctoccaa.gc tigaaaatggg atgtggat.ct gaalaccitttc ccagaaatgg togctggaac 5 OO ttcaacaa.ca aggaatttgttgagcccacc aaaattcaac gatgg gttgttgttcaattitc 560 totgctic got gtaatgtacg toaagttgtt gatgatctga taaaaattgg aggatcaaaa 62O ggaattgaaa ttgcttct co citttcaagtg tittgaggagg gtaatcaatt cogcc.gtgct 680 ccitccitatga titcgtgttga galacatgttt aagga catcc aatcgaaact coctogtgtc. 740 ccacaattica tactatotgt gct coctoac aaaaagaaca gtgat citcta togg to catgg 800 aagaaaaaaa acttaact ga atttggcatt gttacitcaat gcatcgctoc alacgcggcaa. 860 cctaatgatc agitatcttac taact tactt citgaagatta atgcaaagct toggaggcctg 920 aact caatgt taagtgtaga gcgtacacct gcgttcactg to atttctaa gottcca acc 98O attatccttg g gatggatgt titcacatgga totcc togac agtctgatgt cocgtc.catc 20 40 gctgctgtgg tagttctag ggagtggcca citgat atcca aatatagagc atctgttcgg 2100 acacagoctt citaaggctga gatgattgag toccittgtca agaaaaatgg aactgaagac 216 O gatggcatta toaaggagtt gctggtag at ttctacacca gctic gaataa gagaaaacca 2220 gag catatoa taattittcag g gatggtgtg agtgaatcto aatticaatca ggttctgaat 228O attgaactitg atcagatcat cq aggcttgc aagct cittag acgcaaattig galacc caaag 234. O titccttttgttggtggctca aaagaatcat cataccaagt tottccagoc aacgtotcct 24 OO gaaaatgttc citccagggiac aatcattgac aacaaaatat gtcacccaaa gaacaatgat 2460 ttctacct cit gtgctoacgc tiggaatgatt galactaccc gcc caactca citaccacgtc 252O citgitat gatg agattggttt ttcagotgac gaactitcagg aacttgtc.ca citc.gctotcc 258O tatgtotacic aaagaa.gcac cagtgccatt totgttgttg cqc.cgatctg. citatgct cac 264 O ttgg cagotg citcagottgg gacgttcatg aagtttgaag atcagtctga gacatcatca 27 OO agcc atggtg gtatcacago to caggacca atctotgttg cacagotcc c aagacitcaaa 276 O. gacaacgtcg ccaacticcat gttcttctgt taa 2793

<210> SEQ ID NO 2 &2 11s LENGTH 930 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1272 polypeptide <400 SEQUENCE: 2 Met Asp Ser Thr Asn Gly Asn Gly Ala Asp Leu Glu Ser Ala Asn Gly 1 5 10 15 Ala Asn Gly Ser Gly Val Thr Glu Ala Leu Pro Pro Pro Pro Pro Val 2O 25 30 Ile Pro Pro Asn Val Glu Pro Val Arg Val Lys Thr Glu Leu Ala Glu US 2006/0195944 A1 Aug. 31, 2006 67

-continued

35 40 45 Lys Lys Gly Pro Val Arg Val Pro Met Ala Arg Lys Gly Phe Gly. Thr 50 55 60 Arg Gly Glin Lys Ile Pro Leu Lieu. Thir Asn His Phe Lys Val Asp Val 65 70 75 8O Ala Asn Leu Gln Gly His Phe Phe His Tyr Ser Val Ala Leu Phe Tyr 85 90 95 Asp Asp Gly Arg Pro Val Glu Gln Lys Gly Val Gly Arg Lys Ile Leu 100 105 110 Asp Llys Wal His Glin Thr Tyr His Ser Asp Lieu. Asp Gly Lys Glu Phe 115 120 125 Ala Tyr Asp Gly Glu Lys Thr Leu Phe Thr Tyr Gly Ala Leu Pro Ser 130 135 1 4 0 Asn Lys Met Asp Phe Ser Val Val Leu Glu Glu Val Ser Ala Thr Ser 145 15 O 155 160 Lys Asp Phe Val Ser Arg Ala Asn Gly Asn Gly Ser Pro Asn Gly Asn 1.65 170 175 Glu Ser Pro Ser Asp Gly Asp Arg Lys Arg Lieu Arg Arg Pro Asn Arg 18O 185 19 O Ser Lys Asn. Phe Arg Val Glu Ile Ser Tyr Ala Ala Lys Ile Pro Leu 195 200 2O5 Gln Ala Leu Ala Asn Ala Met Arg Gly Glin Glu Ser Glu Asn Ser Glin 210 215 220 Glu Ala Ile Arg Val Lieu. Asp Ile Ile Leu Arg Glin His Ala Ala Arg 225 230 235 240 Gln Gly Cys Leu Leu Val Arg Glin Ser Phe Phe His Asn Asp Pro Thr 245 250 255 Asn. Cys Glu Pro Val Gly Gly Asn. Ile Leu Gly Cys Arg Gly Phe His 260 265 27 O Ser Ser Phe Arg Thr Thr Glin Gly Gly Met Ser Leu Asn Met Asp Val 275 280 285 Thir Thr Thr Met Ile Ile Lys Pro Gly Pro Val Val Asp Phe Leu Ile 29 O 295 3OO Ala Asn Glin Asn Ala Arg Asp Pro Tyr Ser Ile Asp Trp Ser Lys Ala 305 310 315 320 Lys Arg Thr Lieu Lys Asn Lieu Arg Val Lys Val Ser Pro Ser Gly Glin 325 330 335 Glu Phe Lys Ile Thr Gly Lieu Ser Asp Llys Pro Cys Arg Glu Glin Thr 340 345 35 O Phe Glu Lieu Lys Lys Arg Asn Pro Asn. Glu Asn Gly Glu Phe Glu Thr 355 360 365 Thr Glu Val Thr Val Ala Asp Tyr Phe Arg Asp Thr Arg His Ile Asp 370 375 38O Leu Glin Tyr Ser Ala Asp Leu Pro Cys Ile Asn Val Gly Lys Pro Lys 385 390 395 400 Arg Pro Thr Tyr Ile Pro Leu Glu Leu Cys Ala Leu Val Pro Leu Gln 405 410 415 Arg Tyr Thr Lys Ala Leu Thir Thr Phe Glin Arg Ser Ala Leu Val Glu 420 425 43 O Lys Ser Arg Glin Lys Pro Glin Glu Arg Met Thr Val Lieu Ser Lys Ala 435 4 40 4 45 US 2006/0195944 A1 Aug. 31, 2006 68

-continued

Leu Lys Val Ser Asn Tyr Asp Ala Glu Pro Leu Lleu Arg Ser Cys Gly 450 455 460 Ile Ser Ile Ser Ser Asn Phe Thr Glin Val Glu Gly Arg Val Leu Pro 465 470 475 480 Ala Pro Lys Lieu Lys Met Gly Cys Gly Ser Glu Thr Phe Pro Arg Asn 485 490 495 Gly Arg Trp Asin Phe Asn. Asn Lys Glu Phe Val Glu Pro Thr Lys Ile 5 OO 505 51O. Glin Arg Trp Val Val Val Asn. Phe Ser Ala Arg Cys Asn Val Arg Glin 515 52O 525 Val Val Asp Asp Lieu. Ile Lys Ile Gly Gly Ser Lys Gly Ile Glu Ile 530 535 540 Ala Ser Pro Phe Glin Val Phe Glu Glu Gly Asn Glin Phe Arg Arg Ala 545 550 555 560 Pro Pro Met Ile Arg Val Glu Asn Met Phe Lys Asp Ile Glin Ser Lys 565 570 575 Leu Pro Gly Val Pro Glin Phe Ile Lieu. Cys Val Lieu Pro Asp Llys Lys 58O 585 59 O Asn Ser Asp Leu Tyr Gly Pro Trp Llys Lys Lys Asn Lieu. Thr Glu Phe 595 600 605 Gly Ile Val Thr Gln Cys Met Ala Pro Thr Arg Gln Pro Asn Asp Gln 610 615 62O Tyr Lieu. Thir Asn Lieu Lleu Lleu Lys Ile Asn Ala Lys Lieu Gly Gly Lieu 625 630 635 640 Asn Ser Met Leu Ser Val Glu Arg Thr Pro Ala Phe Thr Val Ile Ser 645 650 655 Lys Val Pro Thr Ile Ile Leu Gly Met Asp Val Ser His Gly Ser Pro 660 665 67 O Gly Glin Ser Asp Val Pro Ser Ile Ala Ala Val Val Ser Ser Arg Glu 675 680 685 Trp Pro Leu Ile Ser Lys Tyr Arg Ala Ser Val Arg Thr Glin Pro Ser 69 O. 695 7 OO Lys Ala Glu Met Ile Glu Ser Lieu Val Lys Lys Asn Gly Thr Glu Asp 705 710 715 720 Asp Gly Ile Ile Lys Glu Lieu Lieu Val Asp Phe Tyr Thr Ser Ser Asn 725 730 735 Lys Arg Llys Pro Glu His Ile Ile Ile Phe Arg Asp Gly Val Ser Glu 740 745 750 Ser Glin Phe Asin Glin Val Lieu. Asn. Ile Glu Lieu. Asp Glin Ile Ile Glu 755 760 765 Ala Cys Lys Lieu Lleu. Asp Ala Asn Trp Asin Pro Llys Phe Lieu Lleu Lieu 770 775 78O Val Ala Gln Lys Asn His His Thr Lys Phe Phe Gln Pro Thr Ser Pro 785 790 795 8OO Glu Asin Val Pro Pro Gly. Thir Ile Ile Asp Asn Lys Ile Cys His Pro 805 810 815 Lys Asn. Asn Asp Phe Tyr Lieu. Cys Ala His Ala Gly Met Ile Gly Thr 820 825 83O Thr Arg Pro Thr His Tyr His Val Leu Tyr Asp Glu Ile Gly Phe Ser 835 840 845 US 2006/0195944 A1 Aug. 31, 2006 69

-continued Ala Asp Glu Lieu Glin Glu Lieu Val His Ser Lieu Ser Tyr Val Tyr Glin 85 O 855 860 Arg Ser Thr Ser Ala Ile Ser Val Val Ala Pro Ile Cys Tyr Ala His 865 870 875 88O Leu Ala Ala Ala Glin Leu Gly Thr Phe Met Lys Phe Glu Asp Glin Ser 885 890 895 Glu Thir Ser Ser Ser His Gly Gly Ile Thr Ala Pro Gly Pro Ile Ser 9 OO 905 910 Val Ala Glin Leu Pro Arg Lieu Lys Asp Asn. Wall Ala Asn. Ser Met Phe 915 920 925 Phe Cys 930

<210> SEQ ID NO 3 <211& LENGTH 1413 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1506

<400 SEQUENCE: 3 atggggaagc aaggtoctitg citatcactgt ggagttacaa gtacaccitct atggagaaac 60 gggccaccag agaa.gc.cggt gttgttgcaat gcgtgtggitt C gaggtggag alactaaagga 120 to attagtaa act acacacc tottcatgct C gtgctgaag gtgatgagac tagagattgag 18O gatcatagaa citcaaacggit gatgattalag g gaatgtc.tt togaacaaaaa gattoccaag 240 aggaalaccat atcaagaaaa citt cacagtg aaaagagcta acttggaatt coataccggit 3OO ttcaagagga aggctotgga tigaagaagct agcaatagat cqagttcagg atcggttgta 360 toaaactc.cg agagctgtgc acaatctaat gcgtggg act c gacitttitcc ttgtaagaga 420 agga catgtg toggacgtcc aaagg cagct tcttctgttgaaaagct cac aaaggat citt 480 tatact attc tacaagaaca gcaatcttct totcitcticto gtactitcaga ggaagatttg 540 citttittgaga atgaaacacc aatgctgtta gga catggta gtgttctitat gagagatcct 600 cacticaggtg citc.gagaaga ggaatctgaa gottagct cac tottagttga gag cago aag 660 tott catcag titcattctgt taaatttggt ggaaaag caa togaag cagga gcaagtgaag 720 aggagcaaat citcaagtc.tt aggaagacat agttcactac totgtag cat agatttgaag 78O gatgttitt.ca actittgatga gttcatagaa aattt cacag aggaagaac a gcaaaaactg 840 atgaaattac titccitcaagt to actctgtt gatcgtocto atagoctorag aag catgttt 9 OO gag agttcto aattcaaaga gaacttatcc ttgtttcago aacttgttggc agatggtgtt 96.O tittgagacaa attcgt.citta togcaaaactt galaga catta agacacttgc aaagcttgct 1020 titat cagatc ctaacaaatc ccatttgttg gaaagctatt acatgcticaa gagaaga gag 1080 attgaag act gtgttactac alacatcaagg gtc.tcaagct to agticcatc gaataataat 1140 agtc.ttgtaa ccattgaaag accittgttgaa agcttaalacc aaaacttct c agaga caaga 1200 ggtgttgatga gaa.gc.ccgaa agaagtgatg aagattagat caaag cacac cqaagagaat 1260 ttagagaata gtgitat ctitc ctittaaacct gtgagctgtg gtgg accitct ggtgtttagc 1320 tatgaagata atgatatttctgatcaggat cittcttcttg atgtgcc.gtc. galacggcto a 1380 titcc citcaag cagagcttct aaa.catgata toga 1413 US 2006/0195944 A1 Aug. 31, 2006 70

-continued

<210> SEQ ID NO 4 &2 11s LENGTH 470 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1506 polypeptide <400 SEQUENCE: 4 Met Gly Lys Glin Gly Pro Cys Tyr His Cys Gly Val Thr Ser Thr Pro 1 5 10 15 Leu Trp Arg Asn Gly Pro Pro Glu Lys Pro Val Lieu. Cys Asn Ala Cys 2O 25 30 Gly Ser Arg Trp Arg Thr Lys Gly Ser Leu Val Asn Tyr Thr Pro Leu 35 40 45 His Ala Arg Ala Glu Gly Asp Glu Thr Glu Ile Glu Asp His Arg Thr 50 55 60 Glin Thr Val Met Ile Lys Gly Met Ser Lieu. Asn Lys Lys Ile Pro Lys 65 70 75 8O Arg Llys Pro Tyr Glin Glu Asn. Phe Thr Val Lys Arg Ala Asn Lieu Glu 85 90 95 Phe His Thr Gly Phe Lys Arg Lys Ala Lieu. Asp Glu Glu Ala Ser Asn 100 105 110 Arg Ser Ser Ser Gly Ser Val Val Ser Asn Ser Glu Ser Cys Ala Glin 115 120 125 Ser Asn Ala Trp Asp Ser Thr Phe Pro Cys Lys Arg Arg Thr Cys Val 130 135 1 4 0 Gly Arg Pro Lys Ala Ala Ser Ser Val Glu Lys Lieu. Thir Lys Asp Lieu 145 15 O 155 160 Tyr Thr Ile Leu Gln Glu Gln Glin Ser Ser Cys Leu Ser Gly Thr Ser 1.65 170 175 Glu Glu Asp Leu Lleu Phe Glu Asn. Glu Thr Pro Met Lieu Lieu Gly. His 18O 185 19 O Gly Ser Val Lieu Met Arg Asp Pro His Ser Gly Ala Arg Glu Glu Glu 195 200 2O5 Ser Glu Ala Ser Ser Leu Leu Val Glu Ser Ser Lys Ser Ser Ser Val 210 215 220 His Ser Wall Lys Phe Gly Gly Lys Ala Met Lys Glin Glu Glin Val Lys 225 230 235 240 Arg Ser Lys Ser Glin Val Lieu Gly Arg His Ser Ser Lieu Lieu. Cys Ser 245 250 255 Ile Asp Leu Lys Asp Val Phe Asin Phe Asp Glu Phe Ile Glu Asn. Phe 260 265 27 O Thr Glu Glu Glu Glin Gln Lys Lieu Met Lys Lieu Lleu Pro Glin Val Asp 275 280 285 Ser Val Asp Arg Pro Asp Ser Leu Arg Ser Met Phe Glu Ser Ser Glin 29 O 295 3OO Phe Lys Glu Asn Lieu Ser Leu Phe Glin Gln Leu Val Ala Asp Gly Val 305 310 315 320 Phe Glu Thir Asn. Ser Ser Tyr Ala Lys Lieu Glu Asp Ile Lys Thr Lieu 325 330 335 Ala Lys Lieu Ala Leu Ser Asp Pro Asn Lys Ser His Leu Lieu Glu Ser 340 345 35 O US 2006/0195944 A1 Aug. 31, 2006 71

-continued Tyr Tyr Met Leu Lys Arg Arg Glu Ile Glu Asp Cys Val Thr Thr Thr 355 360 365 Ser Arg Val Ser Ser Leu Ser Pro Ser Asn Asn Asn Ser Leu Val Thr 370 375 38O Ile Glu Arg Pro Cys Glu Ser Lieu. Asn Glin Asn. Phe Ser Glu Thir Arg 385 390 395 400 Gly Wal Met Arg Ser Pro Lys Glu Val Met Lys Ile Arg Ser Lys His 405 410 415 Thr Glu Glu Asn Leu Glu Asn Ser Val Ser Ser Phe Lys Pro Val Ser 420 425 43 O Cys Gly Gly Pro Leu Val Phe Ser Tyr Glu Asp Asn Asp Ile Ser Asp 435 4 40 4 45 Glin Asp Lieu Lleu Lleu. Asp Val Pro Ser Asn Gly Ser Phe Pro Glin Ala 450 455 460

Glu Lieu Lieu. Asn Met Ile 465 470

<210 SEQ ID NO 5 &2 11s LENGTH 678 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1897

<400> SEQUENCE: 5 atgc cittctgaattcagtga atctogtogg gttcc talaga titc.cccacgg cca aggagga 60 totgttgcga titc.cgacgga toaacaagag cagotttctt gtc.citcgct g togaatcaacc 120 aacaccaagt totgttacta caacaactac aacttctoac aacctcgtoa tittctgcaag 18O tottgtc.gcc gttactggac to atggaggit act citcc.gtg acattcc.cgt cqgtggtgtt 240 toccgtaaaa gotcaaaacg titccc.gg act tattoctotg cc.gctaccac citcc.gttgtc 3OO ggaag.ccgga acttitcccitt acaagctacg cct gttcttt toccitcagtc gitottccaac 360 ggcggitatica cqacgg.cgaa goggaagtgct tcgtogttct atggcggttt cagotctittg 420 atcaactaca acgcc.gc.cgt gag cagaaat gggcc togto gogggtttaa toggccagat 480 gcttittgg to ttgggcttgg to acgggtcg tattato agg acgtoagata toggcaagga 540 ataacggtot goccgttittcaagtggcgct act gatgctd caactactac aagccacatt 600 gctoaaatac cc.gc.cacgtg goagtttgaa gqtcaa.gaga gcaaagttcgg gttcgtgtct 660 ggagacitacg tag.cgtga 678

<210> SEQ ID NO 6 &2 11s LENGTH 225 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1897 polypeptide <400 SEQUENCE: 6 Met Pro Ser Glu Phe Ser Glu Ser Arg Arg Val Pro Lys Ile Pro His 1 5 10 15 Gly Glin Gly Gly Ser Val Ala Ile Pro Thr Asp Glin Glin Glu Gln Leu 2O 25 30 Ser Cys Pro Arg Cys Glu Ser Thr Asn Thr Lys Phe Cys Tyr Tyr Asn 35 40 45 US 2006/0195944 A1 Aug. 31, 2006 72

-continued

Asn Tyr Asn. Phe Ser Glin Pro Arg His Phe Cys Lys Ser Cys Arg Arg 50 55 60 Tyr Trp Thr His Gly Gly. Thir Leu Arg Asp Ile Pro Val Gly Gly Val 65 70 75 8O Ser Arg Lys Ser Ser Lys Arg Ser Arg Thr Tyr Ser Ser Ala Ala Thr 85 90 95 Thr Ser Val Val Gly Ser Arg Asin Phe Pro Leu Glin Ala Thr Pro Val 100 105 110 Leu Phe Pro Glin Ser Ser Ser Asn Gly Gly Ile Thr Thr Ala Lys Gly 115 120 125 Ser Ala Ser Ser Phe Tyr Gly Gly Phe Ser Ser Leu Ile Asn Tyr Asn 130 135 1 4 0 Ala Ala Wal Ser Arg Asn Gly Pro Gly Gly Gly Phe Asin Gly Pro Asp 145 15 O 155 160 Ala Phe Gly Lieu Gly Lieu Gly His Gly Ser Tyr Tyr Glu Asp Val Arg 1.65 170 175 Tyr Gly Glin Gly Ile Thr Val Trp Pro Phe Ser Ser Gly Ala Thr Asp 18O 185 19 O Ala Ala Thr Thr Thr Ser His Ile Ala Glin Ile Pro Ala Thr Trp Gln 195 200 2O5 Phe Glu Gly Glin Glu Ser Lys Val Gly Phe Val Ser Gly Asp Tyr Val 210 215 220

Ala 225

<210 SEQ ID NO 7 &2 11s LENGTH 1605 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1946

<400 SEQUENCE: 7 totcaccitat tigtaaaaatc accagtttcg tatataaaac cotaattittc. tcaaaatticc 60 caaatattga cittggaatca aaaatcc.gaa to gatgtgag caaagta acc acaag.cgacg 120 gc ggaggaga ttcaatggag actaagcc at citcct calacc toagcct gc g g c gattotaa 18O gttcaaacgc gccitcc to cq tttctgagca agaccitatga tatggttgat gat cacaata 240 cagatticg at tdtctottgg agtgctaata acaac agttt tatcgtttgg aalaccaccgg 3OO agttc.gctcg cg atcttctt cataagaact ttaag cataa taatttctoc agctitcgitta 360 gacagottaa tacctatogt titcaggaagg ttgac coaga tagatgggaa tittgc gaatg 420 aaggttttitt aagaggtoag aag cacttgc tacaatcaat alactagg.cga aaacctg.ccc 480 atggacaggg acagggacat cagogatcto agc acto gaa togacagaac tocatctgtta 540 gc gcatgtgt togaagttggc aaatttgg to tcgaagaaga agttgaaagg cittaaaagag 600 ataagaacgt cottatgcaa gaacticgtca gattalagaca gcagoaa.ca.g. tcc actdata 660 acca acttica aac gatggitt cagogtc.tcc agggcatgga gaatcggcaa caacaattaa 720 tgtcattcct tdcaaaggca gtacaaag.cc citcattttct atctoaattic ttacago agc 78O agaatcagoa aaacga gagt aatagg.cgca toagtgatac cagtaagaag cqgagatto a 840 agc.gagacgg cattgtc.cgt aataatgatt citgctactcc tdatggacag atagtgaagt 9 OO US 2006/0195944 A1 Aug. 31, 2006 73

-continued atcaac citcc aatgcacgag caa.gc.caaag caatgtttaa acagottatg aagatggaac 96.O cittacaaaac cqgcgatgat ggtttcct to taggtaatgg tacgtctact accgagggaa O20 cagagatgga gacittcatca aaccalagtat cqggtataac totta aggaa atgcc tacag O8O cittctgagat acagtcatca totaccalattgaaacaactcc tdaaaatgtt toggcagoat 14 O cagaa.gcaac cq agaactgt attccttcac citgatgatct aactcittccc g acttcactic 200 atatgctacc ggaaaataat to agagaagc citccagagag titt catggaa ccaaacct gg 260 gaggttctag to cattacta gatccagatc tdttgatcga tigattctttgtcc titcgaca 320 ttgacgacitt to caatggat totgatatag accotgttga ttacggttta citcgaacgct 38O tact catgtc. aagcc.cggitt coagataata toggattcaac accagtggac aatgaaa.ca.g 4 40 agcaggaaca aaatggatgg gacaaaacta agcatatgga taatctgact caacagatgg 5 OO gtotccitcto tcctgaaacc ttagatctot caaggcaaaa toctitgattt tdggagttitt 560 taaagtc.ttt taggtaa.ca cagtc.cct ga gag cagdata ttcat 605

<210 SEQ ID NO 8 &2 11s LENGTH 485 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1946 polypeptide <400 SEQUENCE: 8 Met Asp Val Ser Lys Val Thir Thr Ser Asp Gly Gly Gly Asp Ser Met 1 5 10 15 Glu Thr Lys Pro Ser Pro Gln Pro Gln Pro Ala Ala Ile Leu Ser Ser 2O 25 30 Asn Ala Pro Pro Pro Phe Leu Ser Lys Thr Tyr Asp Met Val Asp Asp 35 40 45 His Asn Thr Asp Ser Ile Val Ser Trp Ser Ala Asn Asn Asn Ser Phe 50 55 60 Ile Val Trp Llys Pro Pro Glu Phe Ala Arg Asp Leu Lleu Pro Lys Asn 65 70 75 8O Phe Lys His Asn Asn Phe Ser Ser Phe Val Arg Gln Leu Asn Thr Tyr 85 90 95 Gly Phe Arg Lys Val Asp Pro Asp Arg Trp Glu Phe Ala Asn. Glu Gly 100 105 110 Phe Leu Arg Gly Glin Lys His Lieu Lieu Glin Ser Ile Thr Arg Arg Lys 115 120 125 Pro Ala His Gly Glin Gly Glin Gly His Glin Arg Ser Gln His Ser Asn 130 135 1 4 0 Gly Glin Asn Ser Ser Val Ser Ala Cys Val Glu Val Gly Lys Phe Gly 145 15 O 155 160 Leu Glu Glu Glu Val Glu Arg Lieu Lys Arg Asp Lys Asn. Wall Leu Met 1.65 170 175 Glin Glu Lieu Val Arg Lieu Arg Glin Glin Glin Glin Ser Thr Asp Asn Glin 18O 185 19 O Leu Glin Thr Met Val Glin Arg Lieu Glin Gly Met Glu Asn Arg Glin Glin 195 200 2O5 Gln Leu Met Ser Phe Leu Ala Lys Ala Val Glin Ser Pro His Phe Leu 210 215 220 US 2006/0195944 A1 Aug. 31, 2006 74

-continued

Ser Glin Phe Leu Glin Glin Glin Asn. Glin Glin Asn. Glu Ser Asn Arg Arg 225 230 235 240 Ile Ser Asp Thir Ser Lys Lys Arg Arg Phe Lys Arg Asp Gly Ile Val 245 250 255 Arg Asn. Asn Asp Ser Ala Thr Pro Asp Gly Glin Ile Wall Lys Tyr Glin 260 265 27 O Pro Pro Met His Glu Glin Ala Lys Ala Met Phe Lys Gln Leu Met Lys 275 280 285 Met Glu Pro Tyr Lys Thr Gly Asp Asp Gly Phe Lieu Lleu Gly Asn Gly 29 O 295 3OO Thr Ser Thr Thr Glu Gly. Thr Glu Met Glu Thir Ser Ser Asn Glin Val 305 310 315 320 Ser Gly Ile Thr Leu Lys Glu Met Pro Thr Ala Ser Glu Ile Glin Ser 325 330 335

Ser Ser Pro Ile Glu Thir Thr Pro Glu Asn. Wal Ser Ala Ala Ser Glu 340 345 35 O Ala Thr Glu Asn. Cys Ile Pro Ser Pro Asp Asp Lieu. Thir Lieu Pro Asp 355 360 365 Phe Thr His Met Leu Pro Glu Asn Asn Ser Glu Lys Pro Pro Glu Ser 370 375 38O Phe Met Glu Pro Asn Lieu Gly Gly Ser Ser Pro Leu Lieu. Asp Pro Asp 385 390 395 400 Leu Lieu. Ile Asp Asp Ser Leu Ser Phe Asp Ile Asp Asp Phe Pro Met 405 410 415 Asp Ser Asp Ile Asp Pro Val Asp Tyr Gly Lieu Lieu Glu Arg Lieu Lieu 420 425 43 O Met Ser Ser Pro Val Pro Asp Asn Met Asp Ser Thr Pro Val Asp Asn 435 4 40 4 45 Glu Thr Glu Glin Glu Glin Asn Gly Trp Asp Llys Thr Lys His Met Asp 450 455 460 Asn Lieu. Thr Glin Gln Met Gly Lieu Lleu Ser Pro Glu Thir Lieu. Asp Lieu 465 470 475 480 Ser Arg Glin Asn Pro 485

<210 SEQ ID NO 9 &2 11s LENGTH 640 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2113

<400 SEQUENCE: 9 ataacaaact catcaaactt cotcagogtt totttittctt acataaacaa tttittcttac 60 ataaacaaat cittgttgttt gttgttgtca togcaccgac agittaaaacg goggcc.gtoa 120 aalaccalacga aggtaacgga gtc.cgttaca gaggagtgag galaga gacca tagggiac gtt 18O acgcago.cga gatcagagat cotttcaaga agt cacgtgt citggctoggt actitt.cgaca 240 citcc to aaga agcc.gctcgt gccitacgaca aacgtgctat tdagtttcgt ggagctaaag 3OO ccaaaaccala citt.cccittgt tacaa.catca acgcc cactg. cittgagtttg acacagagcc 360 tgagccagag cago accgtg gaatcatcgt titcctaatct caaccitcgga totgacitctg 420 US 2006/0195944 A1 Aug. 31, 2006 75

-continued ttagttcgag atticcottitt cotaagatto aggttaaggc tigg gatgat g g togttcgatg 480 aaaggagtga atcggattct tcgtoggtgg tatggatgt cqttagatat galaggacgac 540 gtgtggttitt go acttggat cittaattitcc citcct coacc tdagaactga ttaagattta 600 attatgatta ttagatataa ttaaatgttt citgaattgag 640

<210> SEQ ID NO 10 &2 11s LENGTH 166 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2113 polypeptide <400 SEQUENCE: 10 Met Ala Pro Thr Val Lys Thr Ala Ala Val Lys Thr Asn Glu Gly Asn 1 5 10 15 Gly Val Arg Tyr Arg Gly Val Arg Lys Arg Pro Trp Gly Arg Tyr Ala 2O 25 30 Ala Glu Ile Arg Asp Pro Phe Lys Lys Ser Arg Val Trp Leu Gly Thr 35 40 45 Phe Asp Thr Pro Glu Glu Ala Ala Arg Ala Tyr Asp Lys Arg Ala Ile 50 55 60 Glu Phe Arg Gly Ala Lys Ala Lys Thr Asn. Phe Pro Cys Tyr Asn. Ile 65 70 75 8O Asn Ala His Cys Leu Ser Leu Thr Glin Ser Leu Ser Glin Ser Ser Thr 85 90 95 Val Glu Ser Ser Phe Pro Asn Leu Asn Leu Gly Ser Asp Ser Val Ser 100 105 110 Ser Arg Phe Pro Phe Pro Lys Ile Glin Val Lys Ala Gly Met Met Val 115 120 125 Phe Asp Glu Arg Ser Glu Ser Asp Ser Ser Ser Val Val Met Asp Val 130 135 1 4 0 Val Arg Tyr Glu Gly Arg Arg Val Val Lieu. Asp Lieu. Asp Lieu. Asn. Phe 145 15 O 155 160

Pro Pro Pro Pro Glu Asn 1.65

<210> SEQ ID NO 11 &2 11s LENGTH 506 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2117

<400 SEQUENCE: 11 atacttgtca acaaaaattt tottaaagaa cigcataactg tttittittcat ggctggttct 60 gtotataacc titccaagttca aaaccotaat coacagtctt tattocaaat citttgttgat 120 cgagtaccac tittcaaactt gcc toccacg tdagacg act citagc.cggac toc agaagat 18O aatgagagga agcggagaag galaggtatcg aaccg.cgagt cagct cqgag atcgcgtatg 240 cggaaa.ca.gc gtcacatgga agaactgtgg to catgcttg ttcaact cat caataagaac 3OO aaatctotag to gatgagct aagccaagcc agg gaatgtt acgagaaggit tatagaagag 360 aacatgaaac titc gagagga aaacticcaag to gaggaaga tigattggtga gatcgggctt 420 aataggitt to ttagcgtaga gg.ccgatcag atctggacct tctaatcgtc. tcgtaag citt 480 US 2006/0195944 A1 Aug. 31, 2006 76

-continued gttggitttitt tattgtttat ttaaag 506

<210> SEQ ID NO 12 &2 11s LENGTH 138 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2117 polypeptide <400 SEQUENCE: 12 Met Ala Gly Ser Val Tyr Asn Leu Pro Ser Glin Asn Pro Asn Pro Glin 1 5 10 15 Ser Leu Phe Glin Ile Phe Val Asp Arg Val Pro Leu Ser Asn Leu Pro 2O 25 30 Ala Thr Ser Asp Asp Ser Ser Arg Thr Ala Glu Asp Asn. Glu Arg Lys 35 40 45 Arg Arg Arg Lys Val Ser Asn Arg Glu Ser Ala Arg Arg Ser Arg Met 50 55 60 Arg Lys Glin Arg His Met Glu Glu Lieu Trp Ser Met Leu Val Glin Lieu 65 70 75 8O Ile Asn Lys Asn Lys Ser Lieu Val Asp Glu Lieu Ser Glin Ala Arg Glu 85 90 95 Cys Tyr Glu Lys Val Ile Glu Glu Asn Met Lys Lieu Arg Glu Glu Asn 100 105 110 Ser Lys Ser Arg Lys Met Ile Gly Glu Ile Gly Lieu. Asn Arg Phe Lieu 115 120 125 Ser Val Glu Ala Asp Glin Ile Trp Thr Phe 130 135

<210> SEQ ID NO 13 &2 11s LENGTH 1050 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2155

<400 SEQUENCE: 13 citcatatata coaaccaaac citctotctg.c atctittatta acacaaaatt coaaaagatt 60 aaatgttgtc gaagctccct acacagogac acttgcacct citctoccitcc totcc citcca 120 tgga aaccgt cqggcgtoca C gtgg cagac citc gaggttc caaaaacaaa cctaaagctic 18O caatctttgt caccattgac cotccitatga gtccttacat cotcgaagtig ccatccggaa 240 acgatgtc.gt togaag.cccita aaccqtttct gcc.gcggtaa agc catcggc titttgcgtoc 3OO toagtggcto aggcto cqtt gct gatgtca citttgcgtoa gcc.ittcticc g g cagotcctg 360 gctcaaccat tacttitccac ggaaagttcg atcttctdtc tdtctocqcc acttitccitcc 420 citccitctacc toctacctcc ttgtc.cccitc cc.gtotccaa tittcttcacc gitotictotcg 480 cc.gg acctica ggggaaagttc atcggtggat togtogctgg to citcto gtt gcc.gc.cggaa 540 citgtttactt cqtc.gc.cact agtttcaaga accottccta to accggitta cct gotacgg 600 aggalaga.gca aagaalacticg gcggalagggg aagaggaggg acaatcgc.cg cc.ggtotctg 660 gaggtggtgg agagtc gatg tacgtgggtg gctctgatgt catttgggat cocaacgc.ca 720 aagctocatc gcc.gtactga ccacaaatcc atctogttca aactagg gtt tottcttctt 78O US 2006/0195944 A1 Aug. 31, 2006 77

-continued tagatcatca agaatcaa.ca aaaagattgc atttittagat totttgtaat atcataattg 840 actcactcitt taatctotct atcacttctt ctittagcttt ttctgcagtig toaaactitca 9 OO catatttgta gtttgatttg actatoccca agttttgtat tittatcatac aaatttittgc 96.O citgtctotaa taggttgttitt titcgtttgta taatctitatg cattgttitat tiggagctcca 1020 gagattgaat gtataatata atggtttaat 105 O

<210> SEQ ID NO 14 &2 11s LENGTH 225 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2155 polypeptide <400 SEQUENCE: 14 Met Leu Ser Lys Leu Pro Thr Glin Arg His Leu. His Leu Ser Pro Ser 1 5 10 15 Ser Pro Ser Met Glu Thr Val Gly Arg Pro Arg Gly Arg Pro Arg Gly 2O 25 30 Ser Lys Asn Lys Pro Lys Ala Pro Ile Phe Val Thr Ile Asp Pro Pro 35 40 45 Met Ser Pro Tyr Ile Leu Glu Val Pro Ser Gly Asn Asp Val Val Glu 50 55 60 Ala Leu Asn Arg Phe Cys Arg Gly Lys Ala Ile Gly Phe Cys Val Leu 65 70 75 8O Ser Gly Ser Gly Ser Val Ala Asp Val Thr Leu Arg Gln Pro Ser Pro 85 90 95 Ala Ala Pro Gly Ser Thr Ile Thr Phe His Gly Lys Phe Asp Leu Leu 100 105 110

Ser Wal Ser Ala Thr Phe Leu Pro Pro Leu Pro Pro Thir Ser Leu Ser 115 120 125 Pro Pro Val Ser Asn Phe Phe Thr Val Ser Leu Ala Gly Pro Glin Gly 130 135 1 4 0 Lys Val Ile Gly Gly Phe Val Ala Gly Pro Leu Val Ala Ala Gly. Thr 145 15 O 155 160 Val Tyr Phe Val Ala Thr Ser Phe Lys Asn Pro Ser Tyr His Arg Leu 1.65 170 175 Pro Ala Thr Glu Glu Glu Glin Arg Asn. Ser Ala Glu Gly Glu Glu Glu 18O 185 19 O Gly Glin Ser Pro Pro Val Ser Gly Gly Gly Gly Glu Ser Met Tyr Val 195 200 2O5 Gly Gly Ser Asp Val Ile Trp Asp Pro Asn Ala Lys Ala Pro Ser Pro 210 215 220 Tyr 225

<210 SEQ ID NO 15 <211& LENGTH 1312 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2290

<400 SEQUENCE: 15 ttctittctitt citttctittct cittccaatca agaacaaacc ctagotcctc. tctttittcto 60 US 2006/0195944 A1 Aug. 31, 2006 78

-continued totctacctd totttctota tottctotta toactactitc. tctogcc gat caatcatcat 120 gaac gatcct gataatcc.cg atctgagcaa cqacgacitct gcttggagag aactcacact 18O cacagotcaa gattctgact tctitcg accg agacactitcc aatatoctot citg actitcgg 240 ttggaaccitc. caccactcct cogatcatcc toacagtctd agattcg act cog atttaac 3OO acaaaccacc ggagtcaaac citaccaccgt cacttcttct tattoctoat cogcc.gc.cgt 360 titcc gttgcc gttacctota citaataataa toccitcagot accitcaagtt caagttgaaga 420 to cqgcc.gag aactcaa.ccg cct cogcc.ga gaaaacacca ccaccggaga caccagtgaa 480 ggagaagaag aaggctcaaa agc gaatticg gcaac caaga titcgcattca toaccala gag 540 tgatgtggat aatcttgaag atggatat cq atggc gtaaa tatggacaaa aagcc.gtcaa 600 gaatagocca titcc.caagga gctactatag atgcacaaac agcagatgca C ggtgaagaa 660 gagagtagaa cqttcatcag atgatccatc gatagtgatc acaa.catacg aaggacaa.ca 720 ttgc catcaa accattggat toccitcgtgg toggaatccitc actgcacacg accolacatag 78O cittcacttct catcatcatc. tcc citcctcc attaccaaat cottattatt accaagaact 840 ccttcatcaa cittcacagag acaataatgc ticcittcaccg cggittacccc gacctactac 9 OO tgaagataca cct gcc.gtgt citacticcatc agaggaaggc titacttggtg atattgtacc 96.O toaaactato C goaac cott gaggtaagct togtacgtag caatagota a ggaggtgcta 1020 actcattata tatagaagat attgcagacc agaatatgcg cagggagggit ataacaatat 1080 ggcgttgtaa caatggat.ct atatattacc to attgttga totaatagdac accaccggta 1140 cgtttgcaat ttcttcatgt atatttcttg ttatatatgt agittatatat coaggtataa 1200 ttittgatgta acacaacatt aatcttaatc gtggatccat cocacatttg atgcatgitat 1260 gtgcacttaa gaaaaagaac atggaggaaa taacgttatt ttittattatt ct 1312

<210> SEQ ID NO 16 &2 11s LENGTH 2.87 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2290 polypeptide <400 SEQUENCE: 16 Met Asn Asp Pro Asp Asn Pro Asp Leu Ser Asn Asp Asp Ser Ala Trp 1 5 10 15 Arg Glu Lieu. Thir Lieu. Thir Ala Glin Asp Ser Asp Phe Phe Asp Arg Asp 2O 25 30 Thr Ser Asn Ile Leu Ser Asp Phe Gly Trp Asn Lieu. His His Ser Ser 35 40 45 Asp His Pro His Ser Leu Arg Phe Asp Ser Asp Leu Thr Glin Thr Thr 50 55 60 Gly Val Lys Pro Thr Thr Val Thr Ser Ser Cys Ser Ser Ser Ala Ala 65 70 75 8O

Wal Ser Wall Ala Wall Thir Ser Thr Asn Asn. Asn Pro Ser Ala Thr Ser 85 90 95 Ser Ser Ser Glu Asp Pro Ala Glu Asn. Ser Thr Ala Ser Ala Glu Lys 100 105 110 Thr Pro Pro Pro Glu Thr Pro Val Lys Glu Lys Lys Lys Ala Glin Lys 115 120 125 US 2006/0195944 A1 Aug. 31, 2006 79

-continued

Arg Ile Arg Glin Pro Arg Phe Ala Phe Met Thr Lys Ser Asp Wall Asp 130 135 1 4 0 Asn Lieu Glu Asp Gly Tyr Arg Trp Arg Lys Tyr Gly Glin Lys Ala Val 145 15 O 155 160 Lys Asn Ser Pro Phe Pro Arg Ser Tyr Tyr Arg Cys Thr Asn Ser Arg 1.65 170 175 Cys Thr Val Lys Lys Arg Val Glu Arg Ser Ser Asp Asp Pro Ser Ile 18O 185 19 O Val Ile Thr Thr Tyr Glu Gly Gln His Cys His Gln Thr Ile Gly Phe 195 200 2O5 Pro Arg Gly Gly Ile Leu Thir Ala His Asp Pro His Ser Phe Thr Ser 210 215 220 His His His Leu Pro Pro Pro Leu Pro Asn Pro Tyr Tyr Tyr Glin Glu 225 230 235 240 Leu Lieu. His Glin Lieu. His Arg Asp Asn. Asn Ala Pro Ser Pro Arg Lieu 245 250 255 Pro Arg Pro Thr Thr Glu Asp Thr Pro Ala Val Ser Thr Pro Ser Glu 260 265 27 O Glu Gly Lieu Lieu Gly Asp Ile Val Pro Glin Thr Met Arg Asn Pro 275 280 285

<210> SEQ ID NO 17 &2 11s LENGTH 1406 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2340

<400 SEQUENCE: 17 atacaaaact coctottctd tatcttctitc atcttaaaga aaaaataaga gatattogta 60 aagagagaac acaaaattitc agtttacgaa aagctagoaa agtc.gagtat cqaggaataa 120 cagaataaga cigtatctato cittgccttaa tagttcttacc aaaag atcta gtoctittctt 18O tgitatgatcg atccatcaca ag.cccacaac aacaacaact acatctottt citctatotct 240 agcttctatt tittaatacat tdaagaatca agaatgg tac ggacgcc.gtg ttgtagagca 3OO gaagggttga agaaaggagc atggacticaa gaagaag acc aaaagctitat cqccitatgtt 360 caacga catg gtgaaggcgg ttggc galacc citt.ccggaca aagctgg act caaaagatgt 420 ggcaaaagct gcagattgag atggg.cga at tacttalagac citgacattaa acgtggagag 480 tittagccaag acgaggaaga titc catcatc aacct coacg ccatt catgg caacaaatgg 540 toggccatag citcgtaaaat accaagaaga acagacaatg agatcaagaa ccattggaac 600 actcacatca agaaatgtct g g to aagaaa gqtattgatc cqttgaccca caaatcc citt 660 citcgatggag ccggtaaatc atctgaccat toc gogcatc cc.gagaaaag cagogttcat 720 gacgacaaag atgatcagaa ttcaaataac aaaaagttgt caggatcatc atcagctic gg 78O tttittgaaca gagtag caaa cagattcggit catagaatca accacaatgttctgtctgat 840 attattggaa gtaatggcct acttacitagt cacactactc. caactacaag tdtttcagaa 9 OO ggtgagaggit caacgagttc titcct coaca catacct citt cqaatctocc catcaa.ccgt. 96.O agcataaccq ttgatgcaac atctotatoc to atccacgt totctgactic ccc.cg accog 1020 tgtttatacg aggaaatagt cqgtgacatt galagatatga cigagattittc atcaagatgt 1080 US 2006/0195944 A1 Aug. 31, 2006 80

-continued ttgagt catg ttittatctoa togalagattta ttgatgtc.cg ttgagtc.ttg tittggagaat 1140 actt cattca tagggaaat tacaatgatc tittcaag agg ataaaatcga gacgacgtog 1200 tittaatgata gctacgtgac goc gatcaat gaagttgatg acticcitgttga agggattgac 1260 aattattittg gatgagttat attgatgatg atgaaaattit gcatttggca totaaatcaa 1320 ttagagtttg atttgctato gtgtttittag tttgttgttgtg tagtgttgttt cq accgtcaa 1380 aaaaaaaaaa aaaaaaaaaa aaaaaa. 1406

<210> SEQ ID NO 18 &2 11s LENGTH 333 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2340 polypeptide <400 SEQUENCE: 18 Met Val Arg Thr Pro Cys Cys Arg Ala Glu Gly Lieu Lys Lys Gly Ala 1 5 10 15 Trp Thr Glin Glu Glu Asp Gln Lys Lieu. Ile Ala Tyr Val Glin Arg His 2O 25 30 Gly Glu Gly Gly Trp Arg Thr Lieu Pro Asp Lys Ala Gly Lieu Lys Arg 35 40 45 Cys Gly Lys Ser Cys Arg Leu Arg Trp Ala Asn Tyr Lieu. Arg Pro Asp 50 55 60 Ile Lys Arg Gly Glu Phe Ser Glin Asp Glu Glu Asp Ser Ile Ile Asn 65 70 75 8O Lieu. His Ala Ile His Gly Asn Lys Trp Ser Ala Ile Ala Arg Lys Ile 85 90 95 Pro Arg Arg Thr Asp Asn. Glu Ile Lys Asn His Trp Asn. Thir His Ile 100 105 110 Lys Lys Cys Lieu Val Lys Lys Gly Ile Asp Pro Leu Thir His Lys Ser 115 120 125 Leu Lieu. Asp Gly Ala Gly Lys Ser Ser Asp His Ser Ala His Pro Glu 130 135 1 4 0 Lys Ser Ser Val His Asp Asp Lys Asp Asp Glin Asn. Ser Asn. Asn Lys 145 15 O 155 160 Lys Lieu Ser Gly Ser Ser Ser Ala Arg Phe Lieu. Asn Arg Val Ala Asn 1.65 170 175 Arg Phe Gly His Arg Ile Asn His Asn Val Lieu Ser Asp Ile Ile Gly 18O 185 19 O Ser Asn Gly Leu Leu Thir Ser His Thr Thr Pro Thr Thr Ser Val Ser 195 200 2O5 Glu Gly Glu Arg Ser Thr Ser Ser Ser Ser Thr His Thr Ser Ser Asn 210 215 220 Leu Pro Ile Asin Arg Ser Ile Thr Val Asp Ala Thr Ser Leu Ser Ser 225 230 235 240 Ser Thr Phe Ser Asp Ser Pro Asp Pro Cys Leu Tyr Glu Glu Ile Val 245 250 255 Gly Asp Ile Glu Asp Met Thr Arg Phe Ser Ser Arg Cys Lieu Ser His 260 265 27 O Val Lieu Ser His Glu Asp Leu Lleu Met Ser Val Glu Ser Cys Lieu Glu 275 280 285 US 2006/0195944 A1 Aug. 31, 2006 81

-continued

Asn Thr Ser Phe Met Arg Glu Ile Thr Met Ile Phe Glin Glu Asp Lys 29 O 295 3OO Ile Glu Thir Thr Ser Phe Asn Asp Ser Tyr Val Thr Pro Ile Asin Glu 305 310 315 320 Val Asp Asp Ser Cys Glu Gly Ile Asp Asn Tyr Phe Gly 325 330

<210 SEQ ID NO 19 &2 11s LENGTH 1384 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G671

<400 SEQUENCE: 19 ttcacttgag aacaa.ccc.cc tittgaacticg atcaagaaag citaagtttga agaatcaaga 60 atggtgcgga caccgtgttg caaag.ccgaa citagggittaa agaaaggagc titggacticcic 120 gaggaagatc agaagcttct citcttacctt aaccqccacg gtgaaggtgg atggc galact 18O citcc cc galaa aagctgg act caa.gagatgc ggcaaaagct gcagact gag atgggccaat 240 tatcittagac citgacatcaa aagaggagag titcactgaag acgaagaacg ttcaatcatc totcitt cacg cccittcacgg caacaaatgg totgctatag citcgtgg act accaggaaga 360 accqataacg agatcaagaa citactggaac acticatatoa aaaaacgttt gatcaagaaa 420 gg tattgatc cagttacaca caagggcata accitcc.ggta cc.gacaaatc agaaaacctic 480 cc.ggagaaac aaaatgttaa totgacaact agtgaccato atcttgataa toaca aggcg 540 aagaagaa.ca acaagaattt toggattatca toggctagtt tottgaacaa agtagctaat 600 aggttcggaa agagaatcaa totagagtgtt citgtctgaga ttatcggaag toggaggcc.ca 660 cittgcttcta citagtcacac tactaatact acaactacaa gtgtttcc.gt tdactctgaa 720 to agittaagt caacgagttc titcctitcgca cca acct cqa atcttctict g c catggg acc gttgcaacaa caccagtttc atc galactitt gacgttgatg gtaac gttaa totgacgtgt 840 tottcgtoca cqttctotga titcctcc.gtt aacaatcctd taatgtact g c gataattitc 9 OO gttggtaata acaacgttga tigatgaggat act atcgggit totccacatt totgaatgat 96.O gaagatttica tatgttgga ggagtc.ttgt gttgaaaa.ca citgc gttcat gaaagaactt 1020 acgaggtttc titcacgagga tigaaaacgac gtcgttgatg tdacgc.cggit citatgaacgt. 1080 caag acttgt ttgacgaaat tdata act at tittggatgag togaaact cat aatcgatgaa 1140 toccacgtga ccatgtcaat atgatgtcta toggatatgtt accittgatga tigttgatggit 1200 aataataata aataatagat ggtgatgatg accatgcatc. aatcatgaat gtagttcgtg 1260 ttgtcacata tacttgttgtt tttgttgttitt tttitttittgg totgaagtgt gttgtttcgt 1320 tgtaaatgga ttataaatgg togatgtaata attataatgt taaaaaaaaa aaaaaaaaaa 1380 aaaa. 1384

<210> SEQ ID NO 20 &2 11s LENGTH 352 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G671 polypeptide US 2006/0195944 A1 Aug. 31, 2006 82

-continued <400 SEQUENCE: 20 Met Val Arg Thr Pro Cys Cys Lys Ala Glu Lieu Gly Lieu Lys Lys Gly 1 5 10 15 Ala Trp Thr Pro Glu Glu Asp Gln Lys Lieu Lleu Ser Tyr Lieu. Asn Arg 2O 25 30 His Gly Glu Gly Gly Trp Arg Thr Lieu Pro Glu Lys Ala Gly Lieu Lys 35 40 45 Arg Cys Gly Lys Ser Cys Arg Lieu Arg Trp Ala Asn Tyr Lieu Arg Pro 50 55 60 Asp Ile Lys Arg Gly Glu Phe Thr Glu Asp Glu Glu Arg Ser Ile Ile 65 70 75 8O Ser Lieu. His Ala Lieu. His Gly Asn Lys Trp Ser Ala Ile Ala Arg Gly 85 90 95 Leu Pro Gly Arg Thr Asp Asn. Glu Ile Lys Asn Tyr Trp Asn. Thir His 100 105 110 Ile Lys Lys Arg Lieu. Ile Lys Lys Gly Ile Asp Pro Val Thr His Lys 115 120 125 Gly Ile Thir Ser Gly Thr Asp Llys Ser Glu Asn Lieu Pro Glu Lys Glin 130 135 1 4 0 Asn Val Asn Lieu. Thir Thr Ser Asp His Asp Lieu. Asp Asn Asp Lys Ala 145 15 O 155 160 Lys Lys Asn Asn Lys Asn Phe Gly Leu Ser Ser Ala Ser Phe Leu Asn 1.65 170 175 Lys Wall Ala Asn Arg Phe Gly Lys Arg Ile Asn. Glin Ser Val Lieu Ser 18O 185 19 O Glu Ile Ile Gly Ser Gly Gly Pro Leu Ala Ser Thr Ser His Thr Thr 195 200 2O5 Asn Thr Thr Thr Thr Ser Val Ser Val Asp Ser Glu Ser Val Lys Ser 210 215 220 Thr Ser Ser Ser Phe Ala Pro Thr Ser Asn Leu Leu Cys His Gly. Thr 225 230 235 240 Val Ala Thr Thr Pro Val Ser Ser Asn Phe Asp Val Asp Gly Asn Val 245 250 255 Asn Leu Thr Cys Ser Ser Ser Thr Phe Ser Asp Ser Ser Val Asn Asn 260 265 27 O Pro Leu Met Tyr Cys Asp Asn. Phe Val Gly Asn. Asn. Asn. Wall Asp Asp 275 280 285 Glu Asp Thr Ile Gly Phe Ser Thr Phe Leu Asn Asp Glu Asp Phe Met 29 O 295 3OO Met Leu Glu Glu Ser Cys Val Glu Asn Thr Ala Phe Met Lys Glu Leu 305 310 315 320 Thr Arg Phe Leu. His Glu Asp Glu Asn Asp Val Val Asp Val Thr Pro 325 330 335 Val Tyr Glu Arg Glin Asp Leu Phe Asp Glu Ile Asp Asn Tyr Phe Gly 340 345 35 O

<210> SEQ ID NO 21 &2 11s LENGTH 727 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G353 US 2006/0195944 A1 Aug. 31, 2006 83

-continued <400 SEQUENCE: 21 accaaactica aaaaacacaa accacaagag gatcatttca ttttittattg tittcgttitta 60 atcatcatca totagaagaaa aatggttgcg atato ggaga totalagtogac ggtggatgtc. 120 acgg.cggcga attgtttgat gcttittatct agagttggac aagaaaacgt tdacggtggc 18O gatcaaaaac gogttittcac atgtaaaacg tdtttgaagc agttt cattc gttccaa.gc.c 240 ttaggagg to accgtgcgag to acaagaag cctaacaacg acgctttgtc. gtctggattg 3OO atgaagaagg togaaaacg to gtc.gcatcct totcc catat gtggagtgga gtttc.cgatg 360 ggaCaagctt togggaggaca Catgaggaga Cacaggalacg agagtggggc tigctggtggc 420 gc gttggitta cacgc.gctitt gttgc.cggag cccacggtga citacgttgaa gaaatctago 480 agtgggaaga gagtggcttg tittggatctg agtictaggga tiggtggacaa tittgaatcto 540 aagttggagc titggaagaac agtttattgattittattitat tittccittaaa titttctgaat 600 atatttgttt citctdattct ttgaatttitt cittaatatto tag attatac atacatcc.gc 660 agatttagga aactitt cata gagtgtaatc titttctttct gtaaaaatat attittacttg 720 tagcaaa 727

<210> SEQ ID NO 22 <211& LENGTH 162 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G353 polypeptide <400 SEQUENCE: 22 Met Val Ala Ile Ser Glu Ile Lys Ser Thr Val Asp Val Thr Ala Ala 1 5 10 15 Asn. Cys Lieu Met Leu Lleu Ser Arg Val Gly Glin Glu Asn Val Asp Gly 2O 25 30 Gly Asp Gln Lys Arg Val Phe Thr Cys Lys Thr Cys Lieu Lys Glin Phe 35 40 45 His Ser Phe Glin Ala Lieu Gly Gly His Arg Ala Ser His Lys Llys Pro 50 55 60 Asn Asn Asp Ala Leu Ser Ser Gly Lieu Met Lys Lys Wall Lys Thir Ser 65 70 75 8O Ser His Pro Cys Pro Ile Cys Gly Val Glu Phe Pro Met Gly Glin Ala 85 90 95 Leu Gly Gly His Met Arg Arg His Arg Asn. Glu Ser Gly Ala Ala Gly 100 105 110 Gly Ala Leu Val Thr Arg Ala Leu Leu Pro Glu Pro Thr Val Thr Thr 115 120 125 Leu Lys Lys Ser Ser Ser Gly Lys Arg Val Ala Cys Lieu. Asp Leu Ser 130 135 1 4 0 Leu Gly Met Val Asp Asn Lieu. Asn Lieu Lys Lieu Glu Lieu Gly Arg Thr 145 15 O 155 160 Val Tyr

<210> SEQ ID NO 23 &2 11s LENGTH 922 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE US 2006/0195944 A1 Aug. 31, 2006 84

-continued

&223> OTHER INFORMATION: G484

<400 SEQUENCE: 23 attatattoc gtacaatc.cg atcgattitcc cqgc.gc.caga totcaccg.cg acticgtotac 60 tittc.cgattt gttcgtgtt gacticagtta cqattaaact atggatccaa toggatatagt 120 cggcaaatcc aaggaagacg cittctictitcc aaaagctacg atgactaaaa ttataaagga 18O gatgttacca ccagatgttc gtgttgcaag agatgcticaa gatcttctoa ttgaatgttg 240 tgtagagttt ataaatcttg tat cittcaga atctaatgat gtttgtaiaca aagaggataa 3OO acggacgatt gcticcitgagc atgttctoraa gqcattacag gttcttggitt ttggagaata 360 cattgaagaa gtctatocto C gtatgagca acatalagtat gaaacaatgc agg acacaca 420 gaggagcgtg aaatggalacc Ctggagctica aatgact gag gaggaag cag Cagctgagca 480 acaacg tatgtttgcagaag cacgtgcaag aatgaatgga ggtgtttcgg titcct calacc 540 tgaacatcca gaaact gacc agagaagtcc gcaaagctaa citgaaac cqt aagggtaagt 600 gttaggcaag aaaaaacaac atccttittaa cattcc cittg taagttgcaa atgcgitatgt 660 totctgttta tatgctcitta gitatgatata tottagttag tatttcacga totaaaaaca 720 cittgttgatto agatgtaatt agtaag catt cottgttittg tatttactitt gtgtcttgac 78O taag catggt gggtoagg to tacacaaag.c atctgattog atgacittaca ggaatcttaa 840 tgtttgtaga ttggataaat ttggtgattg gtgtaattgt tttitcCataa acacaatgca 9 OO atcattgttt agtgttgtta ac 922

<210> SEQ ID NO 24 &2 11s LENGTH 159 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G484 polypeptide <400 SEQUENCE: 24 Met Asp Pro Met Asp Ile Val Gly Lys Ser Lys Glu Asp Ala Ser Lieu 1 5 10 15 Pro Lys Ala Thr Met Thr Lys Ile Ile Lys Glu Met Leu Pro Pro Asp 2O 25 30 Val Arg Val Ala Arg Asp Ala Glin Asp Leu Lieu. Ile Glu Cys Cys Val 35 40 45 Glu Phe Ile Asn Lieu Val Ser Ser Glu Ser Asn Asp Val Cys Asn Lys 50 55 60 Glu Asp Lys Arg Thr Ile Ala Pro Glu His Val Lieu Lys Ala Leu Glin 65 70 75 8O Val Leu Gly Phe Gly Glu Tyr Ile Glu Glu Val Tyr Ala Ala Tyr Glu 85 90 95 Gln His Lys Tyr Glu Thr Met Glin Asp Thr Glin Arg Ser Val Lys Trp 100 105 110 Asn Pro Gly Ala Gln Met Thr Glu Glu Glu Ala Ala Ala Glu Glin Glin 115 120 125 Arg Met Phe Ala Glu Ala Arg Ala Arg Met Asn Gly Gly Val Ser Val 130 135 1 4 0 Pro Glin Pro Glu His Pro Glu Thr Asp Glin Arg Ser Pro Glin Ser 145 15 O 155 US 2006/0195944 A1 Aug. 31, 2006 85

-continued

<210> SEQ ID NO 25 &2 11s LENGTH 786 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G674

<400 SEQUENCE: 25 atggtgttta aatcagaaaa atcaaaccgg gaaatgaaat caaaggaga a gcaaaggaag 60 ggattatggit caccc.gagga agatgagaag cittaggag to atgtc.citcaa atatggc cat 120 ggatgctgga gtactattoc tottcaagct ggattgcaga ggaatgggaa gagttgtaga 18O ttalaggtggg ttaattattt aag acctgga cittaagaagt ctittattoac taaacaa gag 240 gaaactatac ttctitt cact tcattccatg ttgggtaa.ca aatggtotca gat atcgaaa 3OO ttcttaccag gaagaaccga caacgagatc aaaaactatt goc attctaa totaaagaag 360 ggtgta actt togaaacaa.ca toga aaccaca aaaaaa.catc aaa.caccittt aatcacaaac 420 to acttgagg ccttgcagag ttcaactgaa agatcttctt catctatocaa totcggagaa 480 acgtctaatg citcaaaccitc aagcttitt.cg ccaaatctog togttctogga atggittagat 540 catagitttgc titatggatca gtcaccitcaa aagttctagot atgttcaaaa tottgttitta 600 cc.ggaagaga gaggattcat to gaccatgt gg.cccitcgtt atttgggaaa cqactictittg 660 Cctgattitcg togccaaattc agaattitttgttggatgatg agatato atc tdagatcgag 720 ttctgtactt catttitcaga caacttitttgttctgatgg to tcatcaacga gctacgacca 78O atgtaa 786

<210> SEQ ID NO 26 <211& LENGTH 261 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G674 polypeptide <400 SEQUENCE: 26 Met Val Phe Lys Ser Glu Lys Ser Asn Arg Glu Met Lys Ser Lys Glu 1 5 10 15 Lys Glin Arg Lys Gly Lieu Trp Ser Pro Glu Glu Asp Glu Lys Lieu Arg 2O 25 30 Ser His Val Leu Lys Tyr Gly His Gly Cys Trp Ser Thr Ile Pro Leu 35 40 45 Glin Ala Gly Lieu Glin Arg Asn Gly Lys Ser Cys Arg Lieu Arg Trp Val 50 55 60 Asn Tyr Lieu Arg Pro Gly Lieu Lys Lys Ser Lieu Phe Thr Lys Glin Glu 65 70 75 8O Glu Thir Ile Leu Lleu Ser Lieu. His Ser Met Leu Gly Asn Lys Trp Ser 85 90 95 Glin Ile Ser Lys Phe Leu Pro Gly Arg Thr Asp Asn. Glu Ile Lys Asn 100 105 110 Tyr Trp His Ser Asn Lieu Lys Lys Gly Val Thr Lieu Lys Glin His Glu 115 120 125 Thir Thr Lys Lys His Glin Thr Pro Leu Ile Thr Asn Ser Leu Glu Ala 130 135 1 4 0 Leu Glin Ser Ser Thr Glu Arg Ser Ser Ser Ser Ile Asn Val Gly Glu US 2006/0195944 A1 Aug. 31, 2006 86

-continued

145 15 O 155 160

Thir Ser Asn Ala Glin Thir Ser Ser Phe Ser Pro Asn Lieu Wall Phe Ser 1.65 170 175 Glu Trp Lieu. Asp His Ser Lieu Lleu Met Asp Glin Ser Pro Glin Lys Ser 18O 185 19 O Ser Tyr Val Glin Asn Leu Val Leu Pro Glu Glu Arg Gly Phe Ile Gly 195 200 2O5 Pro Cys Gly Pro Arg Tyr Lieu Gly Asn Asp Ser Leu Pro Asp Phe Val 210 215 220 Pro Asn. Ser Glu Phe Lieu Lleu. Asp Asp Glu Ile Ser Ser Glu Ile Glu 225 230 235 240 Phe Cys Thr Ser Phe Ser Asp Asin Phe Leu Phe Asp Gly Leu Ile Asn 245 250 255 Glu Leu Arg Pro Met 260

<210 SEQ ID NO 27 &2 11s LENGTH 1304 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1052

<400 SEQUENCE: 27 tgatcatcta aaactittcaa tittctdtctt gatccitcact tdaattittitt gttgtttcto 60 tdaaatctitt gatcctttcc tttgtttittc atttgacctd ttacaaaaaa atctggtgttg 120 cc attaaatc titt attaatg goaca actitc citc.cgaaaat cocaiaccatg acgacgc.cala 18O attggcct ga cittcto citcc cagaaacticc ctitccatago C goaacggc g g cago.cgcag 240 caa.ccgctgg accitcaacaa caaaaccott catggatgga tigagtttct c gacittct cag 3OO cgacitc.gc.cg toggacitcac cqtcgttcta taag.cgacto cattgctitt.c cittgaac cac 360 cittcct cogg cqtcggaaac caccactitcg ataggtttga cqac gag caa ttcatgtc.ca 420 tgttcaacga C gacgtacac aacaataa.cc acaatcatca toatcatcac agcatcaacg. 480 gcaatgtggg toccacgc.gt to atccitcca acaccitccac gocgtcc gat cataatagoc 540 ttagcgacga C gacaacaac aaagaag cac caccgtocga totatoatcat cacatggaca 600 ataatgtagc caatcaaaac aacgcc.gc.cg gtaacaatta caacgaatca gac gaggtoc 660 aaagcc agtg caagacggag coacalagatg gtc.cgtoggc gaatcaaaac tocc ggtggaa 720 gcto cqgtaa togtattoac gaccotaaaa gqgtaaaaag aattittagca aataggcaat 78O cago acagag atcaagggtg aggaaattgc aatacatato agagcttgaa aggagcgtta 840 cittcattgca gactgaagtg toagtgttat cqccaag agt togc gtttittg gatcatcago 9 OO gattgcttct caacg.tc.gac aatagtgcta toaa.gcaacg aatc.gcagot ttagcacaag 96.O ataagattitt caaagacgct catcaagaag cattgaagag agaaatagag agactitcgac 1020 aagtatatoa toaacaaagc citcaagaaga tiggagaataa totcitcc gat caatcto cqg 1080 cc.gatatoaa accgtocgtt gagaaggaac agotcct caa totctaaagc tigttc gttca 1140 ctaagatctt tottttcatg gcgaaaag at tcttgacitat aaaacctcitt tatgtcaaga 1200 aattaattta toaaagaaga taggcctttitt tatttgatct aatcacattt ttittaagttg 1260 tgatgaattit gcttittgatg tatctgttitt tttittttittt ttitt 1304 US 2006/0195944 A1 Aug. 31, 2006 87

-continued

<210> SEQ ID NO 28 &2 11s LENGTH 329 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1052 polypeptide <400 SEQUENCE: 28 Met Ala Gln Leu Pro Pro Lys Ile Pro Thr Met Thr Thr Pro Asn Trp 1 5 10 15 Pro Asp Phe Ser Ser Glin Lys Lieu Pro Ser Ile Ala Ala Thr Ala Ala 2O 25 30 Ala Ala Ala Thr Ala Gly Pro Glin Glin Glin Asn Pro Ser Trp Met Asp 35 40 45 Glu Phe Lieu. Asp Phe Ser Ala Thr Arg Arg Gly Thr His Arg Arg Ser 50 55 60 Ile Ser Asp Ser Ile Ala Phe Leu Glu Pro Pro Ser Ser Gly Val Gly 65 70 75 8O Asn His His Phe Asp Arg Phe Asp Asp Glu Gln Phe Met Ser Met Phe 85 90 95 Asn Asp Asp Wal His Asn. Asn. Asn His Asn His His His His His Ser 100 105 110 Ile Asin Gly Asn Val Gly Pro Thr Arg Ser Ser Ser Asn Thr Ser Thr 115 120 125 Pro Ser Asp His Asn. Ser Leu Ser Asp Asp Asp Asn. Asn Lys Glu Ala 130 135 1 4 0 Pro Pro Ser Asp His Asp His His Met Asp Asn. Asn. Wall Ala Asn Glin 145 15 O 155 160 Asn Asn Ala Ala Gly Asn. Asn Tyr Asn. Glu Ser Asp Glu Val Glin Ser 1.65 170 175 Glin Cys Lys Thr Glu Pro Glin Asp Gly Pro Ser Ala Asn Glin Asn. Ser 18O 185 19 O Gly Gly Ser Ser Gly Asn Arg Ile His Asp Pro Lys Arg Val Lys Arg 195 200 2O5 Ile Leu Ala Asn Arg Glin Ser Ala Glin Arg Ser Arg Val Arg Lys Lieu 210 215 220 Gln Tyr Ile Ser Glu Leu Glu Arg Ser Val Thir Ser Leu Glin Thr Glu 225 230 235 240 Val Ser Val Lieu Ser Pro Arg Val Ala Phe Lieu. Asp His Glin Arg Lieu 245 250 255 Leu Lieu. Asn Val Asp Asn. Ser Ala Ile Lys Glin Arg Ile Ala Ala Lieu 260 265 27 O Ala Glin Asp Lys Ile Phe Lys Asp Ala His Glin Glu Ala Lieu Lys Arg 275 280 285 Glu Ile Glu Arg Lieu Arg Glin Val Tyr His Glin Glin Ser Lieu Lys Lys 29 O 295 3OO Met Glu Asn. Asn. Wal Ser Asp Glin Ser Pro Ala Asp Ile Llys Pro Ser 305 310 315 320 Val Glu Lys Glu Gln Lieu Lleu. Asn. Wal 325

<210 SEQ ID NO 29 US 2006/0195944 A1 Aug. 31, 2006 88

-continued

<211& LENGTH 1161. &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1328

<400 SEQUENCE: 29 aattcaatca citatatttitt ttaaaaac at ttgacittcat cqatcggitta acaattaatc 60 aaaaagatgg gacgatcacc atgttgtgag aagaagaatg gtc.to aagaa agg accatogg 120 actcct gagg aggatcaaaa got cattgat tatat caata tacatggitta toggaaattgg 18O agaact ctitc ccaagaatgc tigg gttacaa agatgttggta agagttgtc.g. tct cogg togg 240 accalactato tcc gaccaga tattaag.cgt ggaagattct cittittgaaga agaagaalacc 3OO attattoaac titcacago at catgggaaac aagtggtotg cqattgcggc ticgtttgcct 360 ggaagaacag acaacgagat caaaaact at tdgaac acto a catcagaaa aag acttcta 420 aagatgggaa to gaccc.ggit tacacacact coacgtottg atcttcticga tat citcc to c 480 attctoagct catctatota caactictitcg catcatcatc atcatcatca toaacaacat 540 atgaac atgt cqaggctcat gatgagtgat ggtaatcatc aaccattggit talacc cc gag 600 atactcaaac togcaaccitc. tctottttca aaccaaaacc accocaacaa cacacac gag 660 aacaac acgg ttalaccaaac cqaagtaaac caataccalaa ccggittacaa catgcctggit 720 aatgaagaat tacaatcttg gttcCctato atggatcaat tcacgaattt CCaag acctc 78O atgccaatga agacgacggit coaaaattica ttgtcatacg atgatgattig titcgaagttcc 840 aattittgtat tagaacctta ttactcc.gac tittgcttcag tottgaccac accittcttca 9 OO agc.ccgacitc cqttaaactc aagttcctica acttacatca atagtag cac ttgcago acc 96.O gaggatgaaa aagagagitta ttacagtgat aatat cacta attatto gtt to atgttaat 1020 ggittittcticc aatticcaata aacaaaacgc cattggaata gagittatgta alacatgcaat 1080 cattgtattt gttatataga titttgttaca tatccaaaat coaaaatact atagittittaa 1140 aataaaaaaa aaaaaaaaaa a 1161

<210 SEQ ID NO 30 <211& LENGTH 324 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1328 polypeptide <400 SEQUENCE: 30 Met Gly Arg Ser Pro Cys Cys Glu Lys Lys Asn Gly Lieu Lys Lys Gly 1 5 10 15 Pro Trp Thr Pro Glu Glu Asp Glin Lys Lieu. Ile Asp Tyr Ile Asn. Ile 2O 25 30 His Gly Tyr Gly Asn Trp Arg Thr Lieu Pro Lys Asn Ala Gly Lieu Glin 35 40 45 Arg Cys Gly Lys Ser Cys Arg Lieu Arg Trp Thr Asn Tyr Lieu Arg Pro 50 55 60 Asp Ile Lys Arg Gly Arg Phe Ser Phe Glu Glu Glu Glu Thir Ile Ile 65 70 75 8O Glin Lieu. His Ser Ile Met Gly Asn Lys Trp Ser Ala Ile Ala Ala Arg 85 90 95 US 2006/0195944 A1 Aug. 31, 2006 89

-continued Leu Pro Gly Arg Thr Asp Asn. Glu Ile Lys Asn Tyr Trp Asn. Thir His 100 105 110 Ile Arg Lys Arg Lieu Lleu Lys Met Gly Ile Asp Pro Val Thr His Thr 115 120 125 Pro Arg Lieu. Asp Leu Lieu. Asp Ile Ser Ser Ile Leu Ser Ser Ser Ile 130 135 1 4 0 Tyr Asn Ser Ser His His His His His His His Glin Gln His Met Asn 145 15 O 155 160 Met Ser Arg Leu Met Met Ser Asp Gly Asn His Gln Pro Leu Val Asn 1.65 170 175 Pro Glu Ile Leu Lys Lieu Ala Thr Ser Lieu Phe Ser Asn Glin Asn His 18O 185 19 O

Pro Asn Asn. Thir His Glu Asn Asn. Thir Wall Asn Glin Thr Glu Wall Asn 195 200 2O5 Gln Tyr Glin Thr Gly Tyr Asn Met Pro Gly Asn Glu Glu Leu Glin Ser 210 215 220 Trp Phe Pro Ile Met Asp Glin Phe Thr Asn Phe Glin Asp Leu Met Pro 225 230 235 240 Met Lys Thir Thr Val Glin Asn. Ser Lieu Ser Tyr Asp Asp Asp Cys Ser 245 250 255 Lys Ser Asn Phe Val Leu Glu Pro Tyr Tyr Ser Asp Phe Ala Ser Val 260 265 27 O

Leu. Thir Thr Pro Ser Ser Ser Pro Thr Pro Leu Asn. Ser Ser Ser Ser 275 280 285 Thr Tyr Ile Asin Ser Ser Thr Cys Ser Thr Glu Asp Glu Lys Glu Ser 29 O 295 3OO Tyr Tyr Ser Asp Asn Ile Thr Asn Tyr Ser Phe Asp Val Asin Gly Phe 305 310 315 320

Leu Glin Phe Glin

<210> SEQ ID NO 31 &2 11s LENGTH 1155 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1930

<400 SEQUENCE: 31 attcacatta citaatctotc aagattitcac aattittcttg tdattittctd toagtttctt 60 attitcgtttc ataa.catgga tigc catgagt agcgtag acg agagctotac alactacagat 120 to cattcc.gg cq agaaagttc atcgtc.to cq gogagtttac tatatagaat gggaag.cgga 18O acaag.cgtgg tacttgatto agagaacggt gtcgaagttcg aagttcgaagc cqaatcaaga 240 aagctt.ccitt cittcaagatt caaaggtgtt gttcc to aac caaatggaag atggggagct 3OO cagatttacg agaaac atca acgc.gtgtgg cittgg tactt toaac gagga agacgaagca 360 gctogt gott acgacgtc.gc ggcto accgt titc.cgtggcc gcgatgcc.gt tactaattitc 420 aaag acacga C gttcgaaga agaggttgag ttcttaaacg cqcattc gala atcagagatc 480 gtagatatgttgagaaaa.ca cacttacaaa gaagagittag accalaaggaa acgta accqt 540 gacggtaacg gaaaagagac gacgg.cgttt gotttggctt cqatggtggit tatgacgggg 600 tittaaaacgg cqgagttact gtttgagaaa acggtaacgc caagtgacgt cqggaaacta 660 US 2006/0195944 A1 Aug. 31, 2006 90

-continued aaccgtttag titataccalaa acaccaag.cg gagaalacatt titcc.gttacc gttaggtaat 720 aataacgt.ct cogittaaagg tatgctgttgaattitcgaag acgittaacgg gaaagtgtgg 78O aggttc.cgtt acticittattg gaatagtagt caaagttato tdttgaccala aggttggagt 840 agatto gtta aagaga agag actttgttgct ggtgatttga toagttittaa aagatccaac 9 OO gatcaagatc aaaaattctt tatcgggtgg aaatcgaaat cogggttgga totagaga.cg 96.O ggtogggitta toagattgtt toggggttgat atttctittaa acgc.cgt.cgt totagtgaag 1020 gaaacaacgg aggtgttaat gtcgtcgitta aggtgtaaga agcaa.cgagt tttgtaataa 1080 caatttaaca acttgggaaa gaaaaaaaag citttittgatt ttaatttctd ttcaacgitta 1140 atcttgct ga gatta 1155

<210> SEQ ID NO 32 &2 11s LENGTH 333 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1930 polypeptide <400 SEQUENCE: 32 Met Asp Ala Met Ser Ser Val Asp Glu Ser Ser Thr Thr Thr Asp Ser 1 5 10 15 Ile Pro Ala Arg Lys Ser Ser Ser Pro Ala Ser Lieu Lleu Tyr Arg Met 2O 25 30 Gly Ser Gly Thr Ser Val Val Leu Asp Ser Glu Asn Gly Val Glu Val 35 40 45 Glu Val Glu Ala Glu Ser Arg Lys Lieu Pro Ser Ser Arg Phe Lys Gly 50 55 60 Val Val Pro Gln Pro Asn Gly Arg Trp Gly Ala Glin Ile Tyr Glu Lys 65 70 75 8O His Glin Arg Val Trp Lieu Gly Thr Phe Asn. Glu Glu Asp Glu Ala Ala 85 90 95 Arg Ala Tyr Asp Wall Ala Ala His Arg Phe Arg Gly Arg Asp Ala Val 100 105 110 Thr Asin Phe Lys Asp Thr Thr Phe Glu Glu Glu Val Glu Phe Leu Asn 115 120 125 Ala His Ser Lys Ser Glu Ile Val Asp Met Leu Arg Lys His Thr Tyr 130 135 1 4 0 Lys Glu Glu Lieu. Asp Glin Arg Lys Arg Asn Arg Asp Gly Asn Gly Lys 145 15 O 155 160 Glu Thir Thr Ala Phe Ala Leu Ala Ser Met Val Val Met Thr Gly Phe 1.65 170 175 Lys Thr Ala Glu Leu Leu Phe Glu Lys Thr Val Thr Pro Ser Asp Val 18O 185 19 O Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Llys His Glin Ala Glu Lys His 195 200 2O5 Phe Pro Leu Pro Leu Gly Asn. Asn. Asn. Wal Ser Wall Lys Gly Met Lieu 210 215 220 Lieu. Asn. Phe Glu Asp Wall Asn Gly Lys Val Trp Arg Phe Arg Tyr Ser 225 230 235 240 Tyr Trp Asin Ser Ser Glin Ser Tyr Val Leu Thir Lys Gly Trp Ser Arg 245 250 255