US 2011 0078806A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2011/0078806 A1 JANG et al. (43) Pub. Date: Mar. 31, 2011

(54) POLYNUCLEOTIDES AND POLYPEPTIDES (60) Provisional application No. 60/434,166, filed on Dec. IN 17, 2002, provisional application No. 60/411,837. filed on Sep. 18, 2002, provisional application No. (75) Inventors: CA-ZHONG JIANG, 60/465,809, filed on Apr. 24, 2003. FREMONT, CA (US); JACQUELINE E. HEARD, STONINGTON, CT (US); Publication Classification OLIVER RATCLIFFE, (51) Int. Cl. OAKLAND, CA (US); ROBERT AOIH I/00 (2006.01) A. CREELMAN, CASTRO CI2N 15/63 (2006.01) VALLEY, CA (US); LUC ADAM, CI2N 15/87 (2006.01) HAYWARD, CA (US); T. LYNNE AOIH 5/00 (2006.01) REUBER, SAN MATEO, CA AOIH 5/02 (2006.01) (US); JOSE LUIS RIECHMANN, AOIH 5/04 (2006.01) PASADENA, CA (US); VOLKER AOIH 5/08 (2006.01) HAAKE, BERLIN (DE); AOIH 5/2 (2006.01) ARNOLD N. DUBELL, SAN AOIH 5/06 (2006.01) LORENZO, CA (US); JAMESS. AOIH 5/10 (2006.01) KEDDIE, SAN MATEO, CA (US); BRADLEY K. SHERMAN, (52) U.S. Cl...... 800/260; 435/320.1; 800/289; BERKELEY, CA (US) 800/290; 800/298; 800/301; 800/305; 800/306; 800/310: 800/312:800/313; 800/314; 800/316; (73) Assignee: MENDEL BIOTECHNOLOGY, 800/317; 800/317.1: 800/317.2: 800/317.3; HAYWARD, CA (US) 800/317.4: 800/320:800/320.1; 800/320.2: 800/320.3: 800/322 (21) Appl. No.: 12/917,303 (22) Filed: Nov. 1, 2010 (57) ABSTRACT The invention relates to transcription factor polypep Related U.S. Application Data tides, polynucleotides that encode them, homologs from a (60) Continuation-in-part of application No. 1 1/642,814, variety of plant species, and methods of using the polynucle filed on Dec. 20, 2006, now Pat. No. 7,825,296, which otides and polypeptides to produce transgenic plants having is a division of application No. 10/666,642, filed on advantageous properties compared to a reference plant. Sep. 18, 2003, now Pat. No. 7,196.245, Continuation Sequence information related to these polynucleotides and in-part of application No. 12/077,535, filed on Mar. 17, polypeptides can also be used in bioinformatic search meth 2008. ods and is also disclosed. Patent Application Publication Mar. 31, 2011 Sheet 1 of 10 US 2011/007880.6 A1

: Fagales Cucurbitales Rosaies Oxalidales Malpighiales Sapindaies s Mavaies Brassicales Mytales - Geraniafes Dipsacales : Apiales Aquifoliates i Solaales Lamia eS Gentianates Garyales Ericales - Cornates aia Saxifragales ------mar SantaialesCaryophylales Proteates Ranunculates Zingiberates ; Conrnelinates Poales AreCaies Pandarades iiales Dioscoreales Asparagales Alismataies ACOrales Piperales Magnoliales Lau?ales Ceratophylaies

FIGURE 1 Patent Application Publication US 2011/007880.6 A1

ZETRIQOIH Patent Application Publication Mar. 31, 2011 Sheet 3 of 10 US 2011/007880.6 A1

| | |||

Figure 3B Patent Application Publication Mar. 31, 2011 Sheet 4 of 10 US 2011/007880.6 A1

Patent Application Publication Mar. 31, 2011 Sheet 5 of 10 US 2011/0078806 A1

S S.

SS SS&S

r 8&

Patent Application Publication Mar. 31, 2011 Sheet 6 of 10 US 2011/007880.6 A1

Y \C \C D D - - sp sep

< O \d \d 2 sh sh t t Patent Application Publication Mar. 31, 2011 Sheet 7 of 10 US 2011/007880.6 A1

85. Patent Application Publication Mar. 31, 2011 Sheet 8 of 10 US 2011/007880.6 A1

Patent Application Publication Mar. 31, 2011 Sheet 9 of 10 US 2011/007880.6 A1

Patent Application Publication Mar. 31, 2011 Sheet 10 of 10 US 2011/007880.6 A1

Figure 10A

US 2011/007880.6 A1 Mar. 31, 2011

POLYNUCLEOTIDES AND POLYPEPTIDES embodiments of the invention are described below and can be IN PLANTS derived from the teachings of this disclosure as a whole.

RELATIONSHIP TO COPENDING SUMMARY OF THE INVENTION APPLICATIONS 0001. This application is a continuation in part application 0007. The present invention is directed to novel recombi of U.S. application Ser. No. 1 1/642,814 (pending), filed Dec. nant polynucleotides, transgenic plants comprising the poly 20, 2006, which is a divisional application of U.S. application nucleotides, and methods for producing the transgenic plants. Ser. No. 10/666,642 (issued as U.S. Pat. No. 7,196,245 on 27 0008. The recombinant polynucleotides may include any Mar. 2007), which claims the benefit of copending U.S. Pro of the following sequences: visional Application No. 60/411,837, filed Sep. 18, 2002, 0009 (a) the nucleotide sequences found in the U.S. Provisional Application No. 60/434,166, filed Dec. 17, sequence listing: 2002, and U.S. Provisional Application No. 60/465,809, filed 0.010 (b) nucleotide sequences encoding polypeptides Apr. 24, 2003. This application is a continuation in part of found in the sequence listing: application Ser. No. 12/077,535 (pending), filed Mar. 17, 0.011 (c) sequence variants that are at least 70% 2008. The contents of all applications herein are incorporated sequence identical to any of the nucleotide sequences of by referenced in their entirety. (a) or (b): 0012 (d) orthologous and paralogous nucleotide TECHNICAL FIELD sequences that are at least 70% identical to any of the 0002 This invention relates to the field of plant biology, nucleotide sequences of (a) or (b): and to compositions and methods for modifying the pheno 0013 (e) nucleotide sequence that hybridize to any of type of a plant. the nucleotide sequences of (a) or (b) under Stringent conditions, which may include, for example, hybridiza BACKGROUND OF THE INVENTION tion with wash steps of 6xSSC and 65C for ten to thirty minutes per step; and 0003) A plant's traits, such as its biochemical, develop mental, or phenotypic characteristics, may be controlled 0014 (f) nucleotide sequences encoding a polypeptide through a number of cellular processes. One important way to having a conserved domain required for the function of manipulate that control is through transcription factors— regulating transcription and altering a trait in a trans proteins that influence the expression of a particular gene or genic plant, the conserved domain being at least 70% sets of genes. Transformed and transgenic plants comprise identical with a conserved domain of a polypeptide of cells having altered levels of at least one selected transcription the invention (i.e., a polypeptide listed in the sequence factor, and may possess advantageous or desirable traits. listing, or encoded by any of the above nucleotide Strategies for manipulating traits by altering a plant cell's sequences). transcription factor content can therefore result in plants and 0015 The invention also pertains to transgenic plants that crops with new and/or improved commercially valuable may be produced by transforming plants with any recombi properties. nant polynucleotide of the invention. Due to the function of 0004 Transcription factors can modulate gene expression, these polynucleotides, the transgenic plant will become either increasing or decreasing (inducing or repressing) the altered phenotypically when compared with a wild-type rate of transcription. This modulation results in differential plant. The traits that may be altered by transforming a plant levels of gene expression at various developmental stages, in with one of the present polynucleotides are numerous and different tissues and cell types, and in response to different varied, and may include, for example: exogenous (e.g., environmental) and endogenous stimuli 0016 increased tolerance to various abiotic stresses, throughout the life cycle of the organism. including cold, heat, freezing, low nitrogen and phosphorus 0005 Because transcription factors are key controlling conditions, osmotic stresses such as drought, and high salt elements of biological pathways, altering the expression lev concentrations; els of one or more transcription factors can change entire 0017 increased tolerance to disease, including fungal dis biological pathways in an organism. For example, manipula ease, and particularly Erysiphe, Fusarium, and Botrytis; the tion of the levels of selected transcription factors may result in present polynucleotides may be used to confer increased tol increased expression of economically useful proteins or bio erance to multiple pathogens in transformed plants; molecules in plants or improvement in other agriculturally 0018 altered sensitivity or resistance to treatments that relevant characteristics. Conversely, blocked or reduced include glyphosate, ABA, and ACC, expression of a transcription factor may reduce biosynthesis 0019 altered carbon/nitrogen (C/N) sensing: of unwanted compounds or remove an undesirable trait. 0020 advanced or delayed flowering time; Therefore, manipulating transcription factor levels in a plant offers tremendous potential in agricultural biotechnology for 0021 altered floral characteristics such as flower struc modifying a plant's traits ture, loss of flower determinacy, or reduced fertility; 0006 We have identified polynucleotides encoding tran 0022 altered shoot meristem development, altered stem Scription factors, developed numerous transgenic plants morphology and vascular tissue structure, and altered branch using these polynucleotides, and have analyzed the plants for ing patterns; a variety of important traits. In so doing, we have identified 0023 reduced apical dominance: important polynucleotide and polypeptide sequences for pro 0024 altered trichome density, development, or structure; ducing commercially valuable plants and crops as well as the 0025 altered root development, including root mass, methods for making them and using them. Other aspects and branching and root hairs; US 2011/007880.6 A1 Mar. 31, 2011

0026 altered shade avoidance; growth than the four wild-type seedlings on the right. As 0027 altered seed characteristics such as size, oil content, would be predicted by the osmotic stress assay, G47 plants protein content, development, ripening, germination, or pre showed enhanced Survival and drought tolerance in a soil nyl lipid content; based drought assay, as did G2133, a paralog of G47 (see 0028 altered leaf characteristics, including size, mass, FIGS. 10A and 10B). FIG. 3B also demonstrates an interest shape, color, glossiness, prenyl lipid content and other chemi ing effect of G47 overexpression; the 35S::G47 plants on the cal modifications; left and in the center of this photograph had short, thick, 0029 slower or faster growth than wild-type: fleshy inflorescences with reduced apical dominance. 0030 altered cell differentiation, proliferation, and expan 0041 FIG. 4 demonstrates an example of the effects of an sion; altered response to light. In a germination assay conducted on 0031 altered phase change; MS medium in darkness, overexpression of G354 resulted in 0032 altered senescence, programmed cell death and more open and greenish cotyledons and thick hypocotyls necrosis, compared to wild type (G354 overexpressors are labeled 0033 increased plant size and/or biomass, including “G354-29” and wild-type “WT in this figure). G354 over larger seedlings than controls; dwarfed plants; and expressors also had a drought-tolerance phenotype, as indi 0034 altered pigment, including anthocyanin, levels, in cated in Example VIII, below. Closely related paralogs of this various plant tissues. gene, G353 and G2839, showed a osmotic stress tolerance 0035 Methods for producing transgenic plants having phenotype in a germination assay on media containing high altered traits are also encompassed by the invention. These sucrose. One line of 35S::G353 seedlings and several lines of method steps include first providing an expression vector 35S::G2839 were greener and had higher germination rates having a recombinant polynucleotide of the invention, and at than controls. This suggests that G354 and its paralogs G353 least one regulatory element flanking the polynucleotide and G2839 influence osmotic stress responses. sequence Generally, the regulatory element(s) control expres 0042 FIG.5A is a photograph of Arabidopsis 35S::G1274 sion of the recombinant polynucleotide in a target plant. The seedlings grown on low nitrogen media Supplemented with expression vectoris then introduced into plant cells. The plant Sucrose plus glutamine. Seedlings of two overexpressing cells are grown into plants, which are allowed to overexpress lines are present on this plate (not distinguished), and both a polypeptide encoded by the recombinant polynucleotide. lines contained less anthocyanin than the wild-type seedlings This overexpression results in the trait alteration, in the plant. seen in FIG. 5B. The lack of anthocyanin production indi Those plants that have altered traits are identified and selected cated that these lines were less stressed than control seedlings on the basis of the desirability and degree of the altered trait. under the same conditions, a fact later confirmed in soil-based drought assays showing enhanced drought tolerance by BRIEF DESCRIPTION OF THE SEQUENCE G1274 overexpressing lines. G1274 overexpression (FIG. LISTING AND FIGURES 5C) and wild-type (FIG. 5D) germination was also compared 0036. The file of this patent contains at least one drawing in a cold germination assay, in which the overexpressors were executed in color. Copies of this patent with color drawing(s) found to be larger and greener than the controls. will be provided by the Patent and Trademark Office upon 0043 FIGS. 6A-6D compare soil-based drought assays request and payment of the necessary fee. for G1274 overexpressors and wild-type control plants, 0037. The Sequence Listing provides exemplary poly which confirms the results predicted after the performance of nucleotide and polypeptide sequences of the invention. The the plate-based osmotic stress assays. 35S::G1274 lines fared traits associated with the use of the sequences are included in much better after a period of water deprivation (FIG. 6A) than the Examples. The sequence listing was created on Nov. 1, control plants (FIG. 6B). This distinction was particularly 2010 and is 4.293,651 bytes (4.09 MB) as measured in win evident in the overexpressor plants after being ministered dows MS-DOS. The entire content of the sequence listing is with water, said plants recovering to a healthy and vigorous hereby incorporated by reference. state, as shown in FIG. 6C. Conversely, none of the wild-type 0038 FIG. 1 shows a conservative estimate of phyloge plants seen in FIG. 6D recovered after rewatering. netic relationships among the orders of flowering plants 0044 FIGS. 7A and 7B compare G1792 overexpressing (modified from Angiosperm Phylogeny Group (1998) Ann. Arabidopsis seedling growth on a single plate (two sectors of Missouri Bot. Gard. 84: 1-49). Those plants with a single the same plate) with medium containing 3% Sucrose medium cotyledon (monocots) are a monophyletic clade nested within lacking nitrogen, five days after planting The 35S::G1792 at least two major lineages of dicots; the are further lines seen in FIG. 7A generally showed greater cotyledon divided into and . Arabidopsis is a rosid eudicot expansion and root growth than the wild-type seedlings in classified within the order Brassicales; rice is a member of the FIG. 7B. FIG.7C is a photograph of a single plate showing a monocot order Poales. FIG. 1 was adapted from Daly et al. G1792 overexpressing line (labeled G1792-12; on left) and ((2001) Plant Physiol. 127: 1328-1333). wild-type plants (on right) five days after inoculation with 0039 FIG. 2 shows a phylogenic dendogram depicting Botrytis cinerea, showing the chlorosis and hyphal growth in phylogenetic relationships of higher plant taxa, including the latter control plants but not in the former overexpressors. clades containing tomato and Arabidopsis; adapted from Ku Similar results were obtained five days after inoculation with et al. (2000) Proc. Natl. Acad. Sci. 97.9121-9126; and Chase Erysiphe Orontii (not shown) and with Fusarium oxysporum, et al. (1993) Ann. Missouri Bot. Gard. 80: 528-580. as seen in FIG. 7D, with control plants on the right showing 0040 FIG. 3A illustrates an example of an osmotic stress chlorosis, and G1792 overexpressors on the left appearing to assay. The medium used in this root growth assay contained be free of the adverse effects of infection. polyethylene glycol (PEG). After germination, the seedlings 0045 FIG. 8A illustrates the results of root growth assays of a 35S::G47 overexpressing line (the eight seedlings on left with G2999 overexpressing seedlings and controls in a high labeled “OE.G47. 22) appeared larger and had more root sodium chloride medium. The eight 35S::G2999 Arabidopsis US 2011/007880.6 A1 Mar. 31, 2011 seedlings on the left were larger, greener, and had more root that by expressing the present sequences in a plant, one may growth than the four control seedlings on the right. Another change the expression of autologous genes or induce the member of the G2999 clade, G2998, also showed a salt tol expression of introduced genes. By affecting the expression erance phenotype and performed similarly in the plate-based of similar autologous sequences in a plant that have the bio salt stress assay seen FIG. 8B. In the latter assay 35S::G2998 logical activity of the present sequences, or by introducing the seedlings appeared large and green, whereas wild-type seed present sequences into a plant, one may alter a plant's phe lings in the control assay plate shown in FIG. 8C were small notype to one with improved traits. The sequences of the and had not yet expanded their cotyledons. AS is noted below, invention may also be used to transform a plant and introduce high sodium chloride growth assays often are used to indicate desirable traits not found in the wild-type cultivar or strain. osmotic stress tolerance Such as drought tolerance, which was Plants may then be selected for those that produce the most Subsequently confirmed with soil-based assays conducted desirable degree of over- or under-expression of target genes with G2999-overexpressing plants. of interest and coincident trait improvement. 0046 FIG.9A shows the effects of a heat assay on Arabi 0051. The sequences of the present invention may be from dopsis wild-type and G3086-overexpressing plants. Gener any species, particularly plant species, in a naturally occur ally, the overexpressors on the left were larger, paler, and ring form or from any source whether natural, synthetic, bolted earlier than the wildtype plants seen on the right in this semi-synthetic or recombinant. The sequences of the inven plate. The same G3086 overexpressing lines, as exemplified tion may also include fragments of the present amino acid by the eight seedlings on the left of FIG.9B, were also found sequences. In this context, a “fragment” refers to a fragment to be larger, greener, and had more root growth in a high salt of a polypeptide sequence which is at least 5 to about 15 root growth assay than control plants, including the four on amino acids in length, most preferably at least 14 amino acids, the right in FIG.9B. and which retain some biological activity of a transcription 0047 FIGS. 10A and 10B compare the recovery from a factor. Where “amino acid sequence' is recited to refer to an drought treatment in two lines of G2133 overexpressing Ara amino acid sequence of a naturally occurring protein mol bidopsis plants and wild-type controls. FIG. 10A shows ecule, "amino acid sequence' and like terms are not meant to plants of 35S::G2133 line 5 (left) and control plants (right). limit the amino acid sequence to the complete native amino FIG. 10B shows plants of 35S:G2133 line 3 (left) and control plants (right). Each pot contained several plants grown under acid sequence associated with the recited protein molecule. 24 hours light. All were deprived of water for eight days, and 0052. As one of ordinary skill in the art recognizes, tran are shown after re-watering. All of the plants of the G2133 Scription factors can be identified by the presence of a region overexpressor lines recovered, and all of the control plants or domain of structural similarity or identity to a specific were either dead or severely and adversely affected by the consensus sequence or the presence of a specific consensus drought treatment. DNA-binding site or DNA-binding site motif (see, for example, Riechmann et al. (2000) Science 290: 2105-2110). DETAILED DESCRIPTION OF EXEMPLARY The plant transcription factors may belong to one of the EMBODIMENTS following transcription factor families: the AP2 (APETALA2) domain transcription factor family (Riech 0048. In an important aspect, the present invention relates mann and Meyerowitz (1998) Biol. Chem. 379: 633-646); the to polynucleotides and polypeptides, for example, for modi MYB transcription factor family (ENBib; Martin and Paz fying phenotypes of plants. Throughout this disclosure, Vari Ares (1997) Trends Genet. 13: 67-73); the MADS domain ous information Sources are referred to and/or are specifically transcription factor family (Riechmann and Meyerowitz incorporated. The information Sources include Scientific jour (1997) Biol. Chem. 378: 1079-1101); the WRKY protein nal articles, patent documents, textbooks, and World Wide family (Ishiguro and Nakamura (1994) Mol. Gen. Genet. 244: Web browser-inactive page addresses, for example. While the 563-571); the ankyrin-repeat protein family (Zhang et al. reference to these information sources clearly indicates that (1992) Plant Cell 4: 1575-1588); the zinc finger protein (Z) they can be used by one of skill in the art, each and every one family (Klug and Schwabe (1995) FASEB J. 9: 597-604): of the information sources cited herein are specifically incor Takatsuji (1998) Cell. Mol. Life. Sci. 54:582-596); the porated in their entirety, whether or not a specific mention of homeobox (HB) protein family (Buerglin (1994) in Guide “incorporation by reference' is noted. The contents and book to the Homeobox Genes, Duboule (ed.) Oxford Univer teachings of each and every one of the information sources sity Press); the CAAT-element binding proteins (Forsburg can be relied on and used to make and use embodiments of the and Guarente (1989) Genes Dev. 3: 1166-1178); the squa invention. mosa promoter binding proteins (SPB) (Klein et al. (1996) 0049. It must be noted that as used herein and in the Mol. Gen. Genet. 1996 250: 7-16); the NAM protein family appended claims, the singular forms “a,” “an and “the (Soueretal. (1996) Cell 85: 159-170); the IAA/AUX proteins include plural reference unless the context clearly dictates (Abel et al. (1995) J. Mol. Biol. 251: 533-549): the HLH/ otherwise. Thus, for example, a reference to “a plant” MYC protein family (Littlewood et al. (1994) Prot. Profile 1: includes a plurality of Such plants, and a reference to “a 639-709); the DNA-binding protein (DBP) family (Tuckeret stress is a reference to one or more stresses and equivalents al. (1994) EMBO J. 13: 2994-3002); the bZIP family of thereof known to those skilled in the art, and so forth. transcription factors (Foster et al. (1994) FASEB.J. 8: 192 0050. The polynucleotide sequences of the invention 200); the Box P-binding protein (the BPF-1) family (da Costa encode polypeptides that are members of well-known tran e Silva et al. (1993) Plant J. 4: 125-135); the high mobility Scription factor families, including plant transcription factor group (HMG) family (Bustin and Reeves (1996) Prog. Nucl. families, as disclosed in Tables 4-9. Generally, the transcrip Acids Res. Mol. Biol. 54:35-100); the scarecrow (SCR) fam tion factors encoded by the present sequences are involved in ily (DiLaurenzio et al. (1996) Cell 86: 423-433); the GF14 cell differentiation and proliferation and the regulation of family (Wu et al. (1997) Plant Physiol. 114: 1421-1431); the growth. Accordingly, one skilled in the art would recognize polycomb (PCOMB) family (Goodrich et al. (1997) Nature US 2011/007880.6 A1 Mar. 31, 2011

386: 44-51); the teosinte branched (TEO) family (Luo et al. of plant gene expression, as diagnostic probes for the pres (1996) Nature 383: 794-799); the ABI3 family (Giraudatetal. ence of complementary or partially complementary nucleic (1992) Plant Cell 4: 1251-1261); the triple helix (TH) family acids (including for detection of natural coding nucleic (Dehesh et al. (1990) Science 250: 1397-1399); the EIL fam acids); as Substrates for further reactions, e.g., mutation reac ily (Chao et al. (1997) Cell 89: 1133-44); the AT-HOOK tions, PCR reactions, or the like; as Substrates for cloning e.g., family (Reeves and Nissen (1990).J. Biol. Chem. 265: 8573 including digestion or ligation reactions; and for identifying 8582); the S1 FA family (Zhou et al. (1995) Nucleic Acids Res. exogenous or endogenous modulators of the transcription 23: 1165-1169); the bZIPT2 family (Lu and Ferl (1995) Plant factors. Physiol. 109:723); theYABBY family (Bowman etal. (1999) Development 126:2387-96); the PAZ family (Bohmert et al. DEFINITIONS (1998) EMBO J. 17: 170-80); a family of miscellaneous 0054 “Nucleic acid molecule' refers to a oligonucleotide, (MISC) transcription factors including the DPBF family polynucleotide or any fragment thereof. It may be DNA or (Kim et al. (1997) Plant J. 11: 1237-1251) and the SPF1 RNA of genomic or synthetic origin, double-stranded or family (Ishiguro and Nakamura (1994) Mol. Gen. Genet. 244: single-stranded, and combined with carbohydrate, lipids, 563-571): the GARP family (Hall et al. (1998) Plant Cell 10: protein, or other materials to perform a particular activity 925-936), the TUBBY family (Boggin et al (1999) Science Such as transformation or form a useful composition Such as 286: 21 19-2125), the heat shock family (Wu (1995) Annu. a peptide nucleic acid (PNA). Rev. Cell Dev. Biol. 11: 441-469), the ENBP family (Chris 0055 “Polynucleotide' is a nucleic acid molecule com tiansen et al. (1996) Plant Mol. Biol. 32: 809-821), the RING prising a plurality of polymerized nucleotides, e.g., at least zinc family (Jensen et al. (1998) FEBS Letters 436: 283-287), about 15 consecutive polymerized nucleotides, optionally at the PDBP family (Janiketal. (1989) Virology 168: 320-329), least about 30 consecutive nucleotides, at least about 50 con the PCF family (Cubas et al. Plant J. (1999) 18:215-22), the secutive nucleotides. A polynucleotide may be a nucleic acid, SRS(SHI-related) family (Fridborg et al. (1999) Plant Cell oligonucleotide, nucleotide, or any fragment thereof. In many 11: 1019-1032), the CPP (cysteine-rich polycomb-like) fam instances, a polynucleotide comprises a nucleotide sequence ily (Cvitanich et al. (2000) Proc. Natl. Acad. Sci. 97: 8163 encoding a polypeptide (or protein) or a domain or fragment 8168), the ARF (auxin response factor) family (Ulmasov et al. thereof. Additionally, the polynucleotide may comprise a pro (1999) Proc. Natl. Acad. Sci. 96: 5844-5849), the SWI/SNF moter, an intron, an enhancer region, a polyadenylation site, a family (Collingwood et al. (1999) J. Mol. Endocrinol. 23: translation initiation site, 5' or 3' untranslated regions, a 255-275), the ACBF family (Seguinet al. (1997) Plant Mol. reporter gene, a selectable marker, or the like. The polynucle Biol. 35: 281-291), PCGL (CG-1 like) family (da Costa e otide can be single stranded or double stranded DNA or RNA. Silva et al. (1994) Plant Mol. Biol. 25: 921-924) the ARID The polynucleotide optionally comprises modified bases or a family (Vazquez et al. (1999) Development 126: 733-742), modified backbone. The polynucleotide can be, e.g., genomic the Jumonji family (Balciunas et al. (2000), Trends Biochem. DNA or RNA, a transcript (such as an mRNA), a cDNA, a Sci. 25: 274-276), the bZIP-NIN family (Schauser et al. PCR product, a cloned DNA, a synthetic DNA or RNA, or the (1999) Nature 402: 191-195), the E2F family (Kaelin et al. like. The polynucleotide can be combined with carbohydrate, (1992) Cell 70:351-364) and the GRF-like family (Knaap et lipids, protein, or other materials to perform aparticular activ al. (2000) Plant Physiol. 122: 695-704). As indicated by any ity Such as transformation or form a useful composition Such part of the list above and as known in the art, transcription as a peptide nucleic acid (PNA). The polynucleotide can factors have been sometimes categorized by class, family, and comprise a sequence in either sense or antisense orientations. Sub-family according to their structural content and consen “Oligonucleotide' is substantially equivalent to the terms sus DNA-binding site motif, for example. Many of the classes amplimer, primer, oligomer, element, target, and probe and is and many of the families and sub-families are listed here. preferably single stranded. However, the inclusion of one sub-family and not another, or 0056 “Gene' or “gene sequence” refers to the partial or the inclusion of one family and not another, does not mean complete coding sequence of a gene, its complement, and its that the invention does not encompass polynucleotides or 5' or 3' untranslated regions. A gene is also a functional unit of polypeptides of a certain family or sub-family. The list pro inheritance, and in physical terms is a particular segment or vided here is merely an example of the types of transcription sequence of nucleotides along a molecule of DNA (or RNA, factors and the knowledge available concerning the consen in the case of RNA viruses) involved in producing a polypep Sus sequences and consensus DNA-binding site motifs that tide chain. The latter may be subjected to Subsequent process help define them as known to those of skill in the art (each of ing such as splicing and folding to obtain a functional protein the references noted above are specifically incorporated or polypeptide. A gene may be isolated, partially isolated, or herein by reference). A transcription factor may include, but be found with an organism's genome. By way of example, a is not limited to, any polypeptide that can activate or repress transcription factor gene encodes a transcription factor transcription of a single gene or a number of genes. This polypeptide, which may be functional or require processing polypeptide group includes, but is not limited to, DNA-bind to function as an initiator of transcription. ing proteins, DNA-binding protein binding proteins, protein 0057 Operationally, genes may be defined by the cis-trans kinases, protein phosphatases, protein methyltransferases, test, a genetic test that determines whether two mutations GTP-binding proteins, and receptors, and the like. occur in the same gene and which may be used to determine 0053. In addition to methods for modifying a plant phe the limits of the genetically active unit (Rieger et al. (1976) notype by employing one or more polynucleotides and Glossary of Genetics and Cytogenetics. Classical and polypeptides of the invention described herein, the poly Molecular, 4th ed., Springer Verlag. Berlin). A gene generally nucleotides and polypeptides of the invention have a variety includes regions preceding (“leaders'; upstream) and follow of additional uses. These uses include their use in the recom ing (“trailers’: downstream) of the coding region. A gene may binant production (i.e., expression) of proteins; as regulators also include intervening, non-coding sequences, referred to US 2011/007880.6 A1 Mar. 31, 2011

as “introns', located between individual coding segments, 0064) “Homology” refers to sequence similarity between referred to as “exons'. Most genes have an associated pro a reference sequence and at least a fragment of a newly moter region, a regulatory sequence 5' of the transcription sequenced clone insert or its encoded amino acid sequence. initiation codon (there are some genes that do not have an 0065. “Hybridization complex’ refers to a complex identifiable promoter). The function of a gene may also be between two nucleic acid molecules by virtue of the forma regulated by enhancers, operators, and other regulatory ele tion of hydrogen bonds between purines and pyrimidines. mentS. 0.066 “Identity” or “similarity” refers to sequence simi larity between two polynucleotide sequences or between two 0058. A “recombinant polynucleotide' is a polynucle polypeptide sequences, with identity being a more strict com otide that is not in its native state, e.g., the polynucleotide parison. The phrases “percent identity” and “96 identity” refer comprises a nucleotide sequence not found in nature, or the to the percentage of sequence similarity found in a compari polynucleotide is in a context other than that in which it is Son of two or more polynucleotide sequences or two or more naturally found, e.g., separated from nucleotide sequences polypeptide sequences. “Sequence similarity refers to the with which it typically is in proximity in nature, or adjacent percent similarity in base pair sequence (as determined by any (or contiguous with) nucleotide sequences with which it typi suitable method) between two or more polynucleotide cally is not in proximity. For example, the sequence at issue sequences. Two or more sequences can be anywhere from can be cloned into a vector, or otherwise recombined with one 0-100% similar, or any integer value therebetween. Identity or more additional nucleic acid. or similarity can be determined by comparing a position in 0059 An "isolated polynucleotide' is a polynucleotide each sequence that may be aligned for purposes of compari whether naturally occurring or recombinant, that is present son. When a position in the compared sequence is occupied outside the cell in which it is typically found in nature, by the same nucleotide base oramino acid, then the molecules whether purified or not. Optionally, an isolated polynucle are identical at that position. A degree of similarity or identity otide is Subject to one or more enrichment or purification between polynucleotide sequences is a function of the num procedures, e.g., cell lysis, extraction, centrifugation, precipi ber of identical or matching nucleotides at positions shared by tation, or the like. the polynucleotide sequences. A degree of identity of 0060 A “polypeptide' is an amino acid sequence com polypeptide sequences is a function of the number of identical prising a plurality of consecutive polymerized amino acid amino acids at positions shared by the polypeptide sequences. residues e.g., at least about 15 consecutive polymerized A degree of homology or similarity of polypeptide sequences amino acid residues, optionally at least about 30 consecutive is a function of the number of amino acids at positions shared polymerized amino acid residues, at least about 50 consecu by the polypeptide sequences. tive polymerized amino acid residues. In many instances, a 0067. The term “amino acid consensus motif refers to the polypeptide comprises a polymerized amino acid residue portion or Subsequence of a polypeptide sequence that is sequence that is a transcription factor or a domain or portion Substantially conserved among the polypeptide transcription or fragment thereof. Additionally, the polypeptide may com factors listed in the Sequence Listing. prise 1) a localization domain, 2) an activation domain, 3) a 0068 Alignment” refers to a number of DNA or amino repression domain, 4) an oligomerization domain, or 5) a acid sequences aligned by lengthwise comparison so that DNA-binding domain, or the like. The polypeptide optionally components in common (i.e., nucleotide bases or amino acid comprises modified amino acid residues, naturally occurring residues) may be readily and graphically identified. The num amino acid residues not encoded by a codon, non-naturally ber of components in common is related to the homology or occurring amino acid residues. identity between the sequences. Alignments may be used to 0061 “Protein’ refers to an amino acid sequence, oli identify “conserved domains” and relatedness within these gopeptide, peptide, polypeptide or portions thereof whether domains. An alignment may suitably be determined by means naturally occurring or synthetic. of computer programs known in the art, Such as MacVector 0062 “Portion', as used herein, refers to any part of a (1999) (Accelrys, Inc., San Diego, Calif.). protein used for any purpose, but especially for the screening 0069. A “conserved domain or “conserved region' as of a library of molecules which specifically bind to that por used herein refers to a region in heterologous polynucleotide tion or for the production of antibodies. or polypeptide sequences where there is a relatively high 0063 A “recombinant polypeptide' is a polypeptide pro degree of sequence identity between the distinct sequences. duced by translation of a recombinant polynucleotide. A 0070. With respect to polynucleotides encoding presently “synthetic polypeptide' is a polypeptide created by consecu disclosed transcription factors, a conserved region is prefer tive polymerization of isolated amino acid residues using ably at least 10 base pairs (bp) in length. methods well known in the art. An "isolated polypeptide.” 0071. A “conserved domain', with respect to presently whether a naturally occurring or a recombinant polypeptide, disclosed polypeptides refers to a domain within a transcrip is more enriched in (or out of) a cell than the polypeptide in its tion factor family that exhibits a higher degree of sequence natural state in a wild-type cell, e.g., more than about 5% homology, Such as at least 26% sequence similarity, at least enriched, more than about 10% enriched, or more than about 16% sequence identity, preferably at least 40% sequence 20%, or more than about 50%, or more, enriched, i.e., alter identity, preferably at least 65% sequence identity including natively denoted: 105%, 110%, 120%, 150% or more, conservative substitutions, and more preferably at least 80% enriched relative to wild type standardized at 100%. Such an sequence identity, and even more preferably at least 85%, or enrichment is not the result of a natural response of a wild at least about 86%, or at least about 87%, or at least about type plant. Alternatively, or additionally, the isolated 88%, or at least about 90%, or at least about 95%, or at least polypeptide is separated from other cellular components with about 98% amino acid residue sequence identity of a which it is typically associated, e.g., by any of the various polypeptide of consecutive amino acid residues. A fragment protein purification methods herein. or domain can be referred to as outside a conserved domain, US 2011/007880.6 A1 Mar. 31, 2011

outside a consensus sequence, or outside a consensus DNA al. (1999) supra), the ACBF family (Seguin et al. (1997) binding site that is known to exist or that exists for a particular supra), PCGL (CG-1 like) family (da Costa e Silva et al. transcription factor class, family, or Sub-family. In this case, (1994) supra) the ARID family (Vazquez et al. (1999) supra), the fragment or domain will not include the exact amino acids the Jumonji family, (Balciunas et al. (2000) supra), the bZIP of a consensus sequence or consensus DNA-binding site of a NIN family (Schauser et al. (1999) supra), the E2F family transcription factor class, family or Sub-family, or the exact Kaelin et al. (1992) supra) and the GRF-like family (Knaap et amino acids of a particular transcription factor consensus all (2000) supra). sequence or consensus DNA-binding site. Furthermore, a 0073. The conserved domains for each of polypeptides of particular fragment, region, or domain of a polypeptide, or a SEQID NO: 2N, wherein N=1-335 (that is, odd SEQID NO: polynucleotide encoding a polypeptide, can be “outside a 1, 3 5, 7 . . . 759) are listed in Table 5. Also, many of the conserved domain if all the amino acids of the fragment, polypeptides of Table 5 have conserved domains specifically region, or domain fall outside of a defined conserved domain indicated by start and stop sites. A comparison of the regions (s) for a polypeptide or protein. Sequences having lesser of the polypeptides in SEQ ID NO: 2N, wherein N=1-335 degrees of identity but comparable biological activity are (that is, even SEQID NOs: 2, 4, 6, 8... 760), or of those in considered to be equivalents. Table 5, allows one of skill in the art to identify conserved 0072. As one of ordinary skill in the art recognizes, con domain(s) for any of the polypeptides listed or referred to in served domains may be identified as regions or domains of this disclosure, including those in Tables 4-9. identity to a specific consensus sequence (see, for example, 0074 “Complementary” refers to the natural hydrogen Riechmann et al. (2000) Supra). Thus, by using alignment bonding by base pairing between purines and pyrimidines. methods well known in the art, the conserved domains of the For example, the sequence A-C-G-T (5'->3') forms hydrogen plant transcription factors for each of the following may be bonds with its complements A-C-G-T (5'->3') or A-C-G-U determined: the AP2 (APETALA2) domain transcription fac (5'->3'). Two single-stranded molecules may be considered tor family (Riechmann and Meyerowitz (1998) supra; the partially complementary, if only some of the nucleotides MYB transcription factor family (ENBib; Martin and Paz bond, or “completely complementary’ if all of the nucle Ares (1997) supra); the MADS domain transcription factor otides bond. The degree of complementarity between nucleic family (Riechmann and Meyerowitz (1997) supra: Imminket acid strands affects the efficiency and strength of the hybrid al. (2003) supra); the WRKY protein family (Ishiguro and ization and amplification reactions. "Fully complementary Nakamura (1994) supra); the ankyrin-repeat protein family refers to the case where bonding occurs between every base (Zhang et al. (1992) supra); the zinc finger protein (Z) family pair and its complement in a pair of sequences, and the two (Klug and Schwabe (1995) supra; Takatsuji (1998) supra); the sequences have the same number of nucleotides. homeobox (HB) protein family (Buerglin (1994) supra); the 0075. The terms “highly stringent' or “highly stringent CAAT-element binding proteins (Forsburg and Guarente condition” refer to conditions that permit hybridization of (1989) Supra); the squamosa promoter binding proteins DNA strands whose sequences are highly complementary, (SPB) (Klein et al. (1996) supra); the NAM protein family wherein these same conditions exclude hybridization of Sig (Souer et al. (1996) supra); the IAA/AUX proteins (Abeletal. nificantly mismatched DNAs. Polynucleotide sequences (1995) supra); the HLH/MYC protein family (Littlewood et capable of hybridizing under stringent conditions with the al. (1994) supra); the DNA-binding protein (DBP) family polynucleotides of the present invention may be, for example, (Tucker et al. (1994) supra); the bZIP family of transcription variants of the disclosed polynucleotide sequences, including factors (Foster et al. (1994) supra); the Box P-binding protein allelic or splice variants, or sequences that encode orthologs (the BPF-1) family (da Costa e Silva et al. (1993) supra); the or paralogs of presently disclosed polypeptides. Nucleic acid high mobility group (HMG) family (Bustin and Reeves hybridization methods are disclosed in detail by Kashima et (1996) supra); the scarecrow (SCR) family (DiLaurenzio et al. (1985) Nature 313:402-404, and Sambrook et al. (1989) al. (1996) supra); the GF14 family (Wu et al. (1997) supra); Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold the polycomb (PCOMB) family (Goodrich et al. (1997) Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (“Sam supra); the teosinte branched (TEO) family (Luo et al. (1996) brook”); and by Haymes et al., “Nucleic Acid Hybridization: supra); the ABI3 family (Giraudat et al. (1992) supra); the A Practical Approach, IRL Press, Washington, D.C. (1985), triple helix (TH) family (Dehesh et al. (1990) supra); the EIL which references are incorporated herein by reference. family (Chao et al. (1997) Cell supra); the AT-HOOK family 0076. In general, stringency is determined by the tempera (Reeves and Nissen (1990 supra); the S1 FA family (Zhou et ture, ionic strength, and concentration of denaturing agents al. (1995) supra); the bZIPT2 family (Lu and Ferl (1995) (e.g., formamide) used in a hybridization and washing pro supra); the YABBY family (Bowman et al. (1999) supra); the cedure (for a more detailed description of establishing and PAZ family (Bohmert et al. (1998) supra); a family of mis determining stringency, see below). The degree to which two cellaneous (MISC) transcription factors including the DPBF nucleic acids hybridize under various conditions of strin family (Kim et al. (1997) supra) and the SPF1 family (Ishig gency is correlated with the extent of their similarity. Thus, uro and Nakamura (1994) supra); the GARP family (Hall et similar nucleic acid sequences from a variety of sources. Such al. (1998) supra), the TUBBY family (Boggin et al. (1999) as within a plant's genome (as in the case of paralogs) or from supra), the heat shock family (Wu (1995 supra), the ENBP another plant (as in the case of orthologs) that may perform family (Christiansen et al. (1996) supra), the RING-zinc fam similar functions can be isolated on the basis of their ability to ily (Jensen et al. (1998) supra), the PDBP family (Janik et al. hybridize with known transcription factor sequences. Numer (1989) supra), the PCF family (Cubas et al. (1999) supra), the ous variations are possible in the conditions and means by SRS(SHI-related) family (Fridborg et al. (1999) supra), the which nucleic acid hybridization can be performed to isolate CPP (cysteine-rich polycomb-like) family (Cvitanich et al. transcription factor sequences having similarity to transcrip (2000) supra), the ARF (auxin response factor) family (Ulma tion factor sequences known in the art and are not limited to sov et al. (1999) supra), the SWI/SNF family (Collingwoodet those explicitly disclosed herein. Such an approach may be US 2011/007880.6 A1 Mar. 31, 2011 used to isolate polynucleotide sequences having various from a gene. Splice variation naturally occurs as a result of degrees of similarity with disclosed transcription factor alternative sites being spliced within a single transcribed sequences, such as, for example, transcription factors having RNA molecule or between separately transcribed RNA mol 60% identity, or more preferably greater than about 70% ecules, and may result in several different forms of mRNA identity, most preferably 72% or greater identity with dis transcribed from the same gene. This, splice variants may closed transcription factors. encode polypeptides having different amino acid sequences, 0077. The term “equivalog describes members of a set of which may or may not have similar functions in the organism. homologous proteins that are conserved with respect to func “Splice variant' or “polypeptide splice variant may also tion since their last common ancestor (Haft et al., 2003). refer to a polypeptide encoded by a splice variant of a tran Related proteins are grouped into equivalog families, and scribed mRNA. otherwise into protein families with other hierarchically defined homology types. I0084 As used herein, “polynucleotide variants' may also 0078. The term “variant', as used herein, may refer to refer to polynucleotide sequences that encode paralogs and polynucleotides or polypeptides, that differ from the pres orthologs of the presently disclosed polypeptide sequences. ently disclosed polynucleotides or polypeptides, respectively, “Polypeptide variants' may refer to polypeptide sequences in sequence from each other, and as set forth below. that are paralogs and orthologs of the presently disclosed 0079. With regard to polynucleotide variants, differences polypeptide sequences. between presently disclosed polynucleotides and polynucle I0085 Differences between presently disclosed polypep otide variants are limited so that the nucleotide sequences of tides and polypeptide variants are limited so that the the former and the latter are closely similar overall and, in sequences of the former and the latter are closely similar many regions, identical. Due to the degeneracy of the genetic overall and, in many regions, identical. Presently disclosed code, differences between the former and latter nucleotide polypeptide sequences and similar polypeptide variants may sequences o may be silent (i.e., the amino acids encoded by differ in amino acid sequence by one or more Substitutions, the polynucleotide are the same, and the variant polynucle additions, deletions, fusions and truncations, which may be otide sequence encodes the same amino acid present in any combination. These differences may produce 0080 sequence as the presently disclosed polynucleotide. silent changes and result in a functionally equivalent tran Variant nucleotide sequences may encode different amino scription factor. Thus, it will be readily appreciated by those acid sequences, in which case Such nucleotide differences of skill in the art, that any of a variety of polynucleotide will result in amino acid substitutions, additions, deletions, sequences is capable of encoding the transcription factors and insertions, truncations or fusions with respect to the similar disclosed polynucleotide sequences. These variations result transcription factor homolog polypeptides of the invention. A in polynucleotide variants encoding polypeptides that share polypeptide sequence variant may have “conservative' at least one functional characteristic. The degeneracy of the changes, wherein a Substituted amino acid has similar struc genetic code also dictates that many different variant poly tural or chemical properties. Deliberate amino acid substitu nucleotides can encode identical and/or Substantially similar tions may thus be made on the basis of similarity in polarity, polypeptides in addition to those sequences illustrated in the charge, solubility, hydrophobicity, hydrophilicity, and/or the Sequence Listing. amphipathic nature of the residues, as long as the functional 0081. Also within the scope of the invention is a variant of or biological activity of the transcription factor is retained. a transcription factor nucleic acid listed in the Sequence List For example, negatively charged amino acids may include ing, that is, one having a sequence that differs from the one of aspartic acid and glutamic acid, positively charged amino the polynucleotide sequences in the Sequence Listing, or a acids may include lysine and arginine, and amino acids with complementary sequence, that encodes a functionally equiva uncharged polar head groups having similar hydrophilicity lent polypeptide (i.e., a polypeptide having some degree of values may include leucine, isoleucine, and valine; glycine equivalent or similar biological activity) but differs in and alanine; asparagine and glutamine; serine and threonine; sequence from the sequence in the Sequence Listing, due to and phenylalanine and tyrosine (for more detail on conserva degeneracy in the genetic code. Included within this defini tive substitutions, see Table 2). More rarely, a variant may tion are polymorphisms that may or may not be readily detect have “non-conservative changes, e.g., replacement of a gly able using a particular oligonucleotide probe of the poly cine with a tryptophan. Similar minor variations may also nucleotide encoding polypeptide, and improper or include amino acid deletions or insertions, or both. Related unexpected hybridization to allelic variants, with a locus polypeptides may comprise, for example, additions and/or other than the normal chromosomal locus for the polynucle deletions of one or more N-linked or O-linked glycosylation otide sequence encoding polypeptide. sites, or an addition and/or a deletion of one or more cysteine I0082 “Allelic variant” or “polynucleotide allelic variant” residues. Guidance in determining which and how many refers to any of two or more alternative forms of a gene amino acid residues may be substituted, inserted or deleted occupying the same chromosomal locus. Allelic variation without abolishing functional or biological activity may be arises naturally through mutation, and may result in pheno found using computer programs well known in the art, for typic polymorphism within populations. Gene mutations may example, DNASTAR software (see U.S. Pat. No. 5,840,544). be 'silent” or may encode polypeptides having altered amino I0086 “Ligand” refers to any molecule, agent, or com acid sequence. “Allelic variant' and “polypeptide allelic vari pound that will bind specifically to a complementary site on a ant’ may also be used with respect to polypeptides, and in this nucleic acid molecule or protein. Such ligands stabilize or case the term refer to a polypeptide encoded by an allelic modulate the activity of nucleic acid molecules or proteins of variant of a gene. the invention and may be composed of at least one of the I0083 “Splice variant” or “polynucleotide splice variant” following: inorganic and organic Substances including as used herein refers to alternative forms of RNA transcribed nucleic acids, proteins, carbohydrates, fats, and lipids. US 2011/007880.6 A1 Mar. 31, 2011

0087. “Modulates’ refers to a change in activity (biologi tion of replication, transcription or translation. A polynucle cal, chemical, or immunological) or lifespan resulting from otide fragment” refers to any Subsequence of a polynucle specific binding between a molecule and either a nucleic acid otide, typically, of at least about 9 consecutive nucleotides, molecule or a protein. preferably at least about 30 nucleotides, more preferably at 0088. The term “plant' includes whole plants, shoot veg least about 50 nucleotides, of any of the sequences provided etative organs/structures (e.g., leaves, stems and tubers), herein. Exemplary polynucleotide fragments are the first roots, flowers and floral organs/structures (e.g., bracts, sepals, sixty consecutive nucleotides of the transcription factor poly petals, stamens, carpels, anthers and ovules), seed (including nucleotides listed in the Sequence Listing. Exemplary frag embryo, endosperm, and seed coat) and fruit (the mature ments also include fragments that comprise a region that ovary), plant tissue (e.g., vascular tissue, ground tissue, and encodes a conserved domain of a transcription factor. the like) and cells (e.g., guard cells, egg cells, and the like), and progeny of same. The class of plants that can be used in 0094) Fragments may also include subsequences of the method of the invention is generally as broad as the class polypeptides and protein molecules, or a Subsequence of the of higher and lower plants amenable to transformation tech polypeptide. Fragments may have uses in that they may have niques, including angiosperms (monocotyledonous and antigenic potential. In some cases, the fragment or domain is dicotyledonous plants), gymnosperms, ferns, horsetails, a Subsequence of the polypeptide which performs at least one psilophytes, lycophytes, bryophytes, and multicellular algae. biological function of the intact polypeptide in Substantially (See for example, FIG. 1, adapted from Daly et al. (2001) the same manner, or to a similar extent, as does the intact Plant Physiol. 127: 1328-1333; FIG.2, adapted from Ku et al. polypeptide. For example, a polypeptide fragment can com (2000) Proc. Natl. Acad. Sci. 97: 9121-9126; and see also prise a recognizable structural motif or functional domain Tudge in The Variety of Life, Oxford University Press, New such as a DNA-binding site or domain that binds to a DNA York, N.Y. (2000) pp. 547-606). promoter region, an activation domain, or a domain for pro 0089. A “transgenic plant” refers to a plant that contains tein-protein interactions, and may initiate transcription. Frag genetic material not found in a wild-type plant of the same ments can vary in size from as few as 3 amino acids to the full species, variety or cultivar. The genetic material may include length of the intact polypeptide, but are preferably at least a transgene, an insertional mutagenesis event (Such as by about 30 amino acids in length and more preferably at least transposon or T-DNA insertional mutagenesis), an activation about 60 amino acids in length. Exemplary polypeptide frag tagging sequence, a mutated sequence, a homologous recom ments are the first twenty consecutive amino acids of a mam bination event or a sequence modified by chimeraplasty. malian protein encoded by are the first twenty consecutive Typically, the foreign genetic material has been introduced amino acids of the transcription factor polypeptides listed in into the plant by human manipulation, but any method can be the Sequence Listing. Exemplary fragments also include used as one of skill in the art recognizes. fragments that comprise a conserved domain of a transcrip 0090. A transgenic plant may contain an expression vector tion factor, for example, amino acid residues 11-80 of G47 or cassette. The expression cassette typically comprises a (SEQ ID NO: 12), as noted in Table 5. polypeptide-encoding sequence operably linked (i.e., under 0.095 The invention also encompasses production of DNA regulatory control of) to appropriate inducible or constitutive sequences that encode transcription factors and transcription regulatory sequences that allow for the expression of factor derivatives, or fragments thereof, entirely by synthetic polypeptide. The expression cassette can be introduced into a chemistry. After production, the synthetic sequence may be plant by transformation or by breeding after transformation of inserted into any of the many available expression vectors and a parent plant. A plant refers to a whole plant as well as to a cell systems using reagents well known in the art. Moreover, plant part, Such as seed, fruit, leaf, or root, plant tissue, plant synthetic chemistry may be used to introduce mutations into cells or any other plant material, e.g., a plant explant, as well a sequence encoding transcription factors or any fragment as to progeny thereof, and to in vitro systems that mimic thereof. biochemical or cellular components or processes in a cell. 0096 “Derivative' refers to the chemical modification of a 0091 “Control plant” refers to a plant that serves as a nucleic acid molecule or amino acid sequence. Chemical standard of comparison for testing the results of a treatment or modifications can include replacement of hydrogen by an genetic alteration, or the degree of altered expression of a alkyl, acyl, or amino group or glycosylation, pegylation, or gene or gene product. Examples of control plants include any similar process that retains or enhances biological activ plants that are untreated, or genetically unaltered (i.e., wild ity or lifespan of the molecule or sequence. type). 0097. A “trait” refers to a physiological, morphological, 0092 “Wildtype', as used herein, refers to a cell, tissue or biochemical, or physical characteristic of a plant or particular plant that has not been genetically modified to knock out or plant material or cell. In some instances, this characteristic is overexpress one or more of the presently disclosed transcrip visible to the human eye. Such as seed or plant size, or can be tion factors. Wild-type cells, tissue or plants may be used as measured by biochemical techniques, such as detecting the controls to compare levels of expression and the extent and protein, starch, or oil content of seed or leaves, or by obser nature of trait modification with cells, tissue or plants in Vation of a metabolic or physiological process, e.g. by mea which transcription factor expression is altered or ectopically suring uptake of carbon dioxide, or by the observation of the expressed, e.g., in that it has been knocked out or overex expression level of a gene or genes, e.g., by employing North pressed. ern analysis, RT-PCR, microarray gene expression assays, or 0093. "Fragment', with respect to a polynucleotide, refers reporter gene expression systems, or by agricultural observa to a clone or any part of a polynucleotide molecule that retains tions such as stress tolerance, yield, or pathogen tolerance. a usable, functional characteristic. Useful fragments include Any technique can be used to measure the amount of com oligonucleotides and polynucleotides that may be used in parative level of, or difference in any selected chemical com hybridization or amplification technologies or in the regula pound or macromolecule in the transgenic plants, however. US 2011/007880.6 A1 Mar. 31, 2011

0098 “Trait modification refers to a detectable difference occur throughout a plant or in specific tissues of the plant, in a characteristic in a plant ectopically expressing a poly depending on the promoter used, as described below. nucleotide or polypeptide of the present invention relative to 0102. Overexpression may take place in plant cells nor a plant not doing so. Such as a wild-type plant. In some cases, mally lacking expression of polypeptides functionally the trait modification can be evaluated quantitatively. For equivalent or identical to the present transcription factors. example, the trait modification can entail at least about a 2% Overexpression may also occur in plant cells where endog increase or decrease in an observed trait (difference), at least enous expression of the present transcription factors or func a 5% difference, at least about a 10% difference, at least about tionally equivalent molecules normally occurs, but Such nor mal expression is at a lower level. Overexpression thus results a 20% difference, at least about a 30%, at least about a 50%, in a greater than normal production, or "overproduction of at least about a 70%, or at least about a 100%, or an even the transcription factor in the plant, cell or tissue. greater difference compared with a wild-type plant. It is 0103) The term “phase change” refers to a plant's progres known that there can be a natural variation in the modified sion from embryo to adult, and, by some definitions, the trait. Therefore, the trait modification observed entails a transition wherein flowering plants gain reproductive compe change of the normal distribution of the trait in the plants tency. It is believed that phase change occurs either after a compared with the distribution observed in wild-type plants. certain number of cell divisions in the shoot apex of a devel 0099. The term “transcript profile” refers to the expression oping plant, or when the shoot apex achieves a particular levels of a set of genes in a cell in a particular state, particu distance from the roots. Thus, altering the timing of phase larly by comparison with the expression levels of that same changes may affect a plant's size, which, in turn, may affect set of genes in a cell of the same type in a reference state. For yield and biomass. example, the transcript profile of a particular transcription 0104 “Tolerance' results from specific, heritable charac factor in a suspension cell is the expression levels of a set of teristics of a host plant that allow a pathogen to develop and genes in a cell overexpressing that transcription factor com multiply in the host while the host, either by lacking receptor pared with the expression levels of that same set of genes in a sites for, or by inactivating or compensating for the irritant Suspension cell that has normal levels of that transcription secretions of the pathogen, still manages to thrive or, in the factor. The transcript profile can be presented as a list of those case of crop plants, produce a good crop. Tolerant plants are genes whose expression level is significantly different Susceptible to the pathogen but are not killed by it and gen between the two treatments, and the difference ratios. Differ erally show little damage from the pathogen (Agrios (1988) ences and similarities between expression levels may also be Plant Pathology, 3rd ed. Academic Press, N.Y., p. 129). evaluated and calculated using statistical and clustering meth 0105. “Resistance', also referred to as “true resistance'. ods. results when a plant contains one or more genes that make the 0100 “Ectopic expression or altered expression' in refer plant and a potential pathogen more or less incompatible with ence to a polynucleotide indicates that the pattern of expres each other, either because of a lack of chemical recognition Sionin, e.g., a transgenic plant or plant tissue, is different from between the host and the pathogen, or because the host plant the expression pattern in a wild-type plant or a reference plant can defend itself against the pathogen by defense mecha of the same species. The pattern of expression may also be nisms already present or activated in response to infection compared with a reference expression pattern in a wild-type (Agrios (1988)) Plant Pathology, 3rd ed. Academic Press, plant of the same species. For example, the polynucleotide or N.Y., p. 125). polypeptide is expressed in a cell or tissue type other than a 0106. A “sample with respect to a material containing cell or tissue type in which the sequence is expressed in the nucleic acid molecules may comprise a bodily fluid; an wild-type plant, or by expression at a time other than at the extract from a cell, chromosome, organelle, or membrane time the sequence is expressed in the wild-type plant, or by a isolated from a cell; genomic DNA, RNA, or cDNA in solu response to different inducible agents, such as hormones or tion or bound to a Substrate; a cell; a tissue; a tissue print; a environmental signals, or at different expression levels (either forensic sample; and the like. In this context “substrate' higher or lower) compared with those found in a wild-type refers to any rigid or semi-rigid support to which nucleic acid plant. The term also refers to altered expression patterns that molecules or proteins are bound and includes membranes, are produced by lowering the levels of expression to below the filters, chips, slides, wafers, fibers, magnetic or nonmagnetic detection level or completely abolishing expression. The beads, gels, capillaries or other tubing, plates, polymers, and resulting expression pattern can be transient or stable, consti microparticles with a variety of Surface forms including tutive or inducible. In reference to a polypeptide, the term wells, trenches, pins, channels and pores. A substrate may “ectopic expression or altered expression further may relate also refer to a reactant in a chemical or biological reaction, or to altered activity levels resulting from the interactions of the a Substance acted upon (e.g., by an enzyme). polypeptides with exogenous or endogenous modulators or 0107 “Substantially purified’ refers to nucleic acid mol from interactions with factors or as a result of the chemical ecules or proteins that are removed from their natural envi modification of the polypeptides. ronment and are isolated or separated, and are at least about 0101 The term “overexpression” as used herein refers to a 60% free, preferably about 75% free, and most preferably greater expression level of a gene in a plant, plant cell or plant about 90% free, from other components with which they are tissue, compared to expression in a wild-type plant, cell or naturally associated. tissue, at any developmental or temporal stage for the gene. Traits that Maybe Modified in Overexpressing or Knock-Out Overexpression can occur when, for example, the genes Plants encoding one or more transcription factors are under the 0.108 Trait modifications of particular interest include control of a strong expression signal. Such as one of the those to seed (such as embryo or endosperm), fruit, root, promoters described herein (e.g., the cauliflowermosaic virus flower, leaf stem, shoot, seedling or the like, including: 35S transcription initiation region). Overexpression may enhanced tolerance to environmental conditions including US 2011/007880.6 A1 Mar. 31, 2011

freezing, chilling, heat, drought, water saturation, radiation 0112. In yet another example, Gilmour et al. (1998, Plant and oZone; improved tolerance to microbial, fungal or viral J. 16: 433-442) teach an Arabidopsis AP2 transcription fac diseases; improved tolerance to pest infestations, including tor, CBF1 (SEQID NO:2239), which, when overexpressed in insects, nematodes, mollicutes, parasitic higher plants or the transgenic plants, increases plant freezing tolerance. Jaglo et like; decreased herbicide sensitivity; improved tolerance of al. (2001, Plant Physiol. 127: 910-917) further identified heavy metals or enhanced ability to take up heavy metals; sequences in Brassica napus which encode CBF-like genes improved growth under poor photoconditions (e.g., low light and that transcripts for these genes accumulated rapidly in and/or short day length), or changes in expression levels of response to low temperature. Transcripts encoding CBF-like genes of interest. Other phenotype that can be modified relate proteins were also found to accumulate rapidly in response to to the production of plant metabolites, such as variations in low temperature in wheat, as well as in tomato. An alignment the production of taxol, tocopherol, tocotrienol, Sterols, phy of the CBF proteins from Arabidopsis, B. napus, wheat, rye, tosterols, vitamins, wax monomers, anti-oxidants, amino and tomato revealed the presence of conserved consecutive acids, lignins, cellulose, tannins, prenyllipids (such as chlo amino acid residues, PKK/RPAGRXKFXETRHP and rophylls and carotenoids), glucosinolates, and terpenoids, DSAWR, that bracket the AP2/EREBP DNA binding enhanced or compositionally altered protein or oil production domains of the proteins and distinguish them from other (especially in seeds), or modified Sugar (insoluble or soluble) members of the AP2/EREBP protein family. (See Jaglo et al. and/or starch composition. Physical plant characteristics that Supra.) can be modified include cell development (such as the num 0113 Transcription factors mediate cellular responses and ber of trichomes), fruit and seed size and number, yields of control traits through altered expression of genes containing plant parts such as stems, leaves, inflorescences, and roots, cis-acting nucleotide sequences that are targets of the intro the stability of the seeds during storage, characteristics of the duced transcription factor. It is well appreciated in the Art that seed pod (e.g., Susceptibility to shattering), root hair length the effect of a transcription factor on cellular responses or a and quantity, internode distances, or the quality of seed coat. cellular trait is determined by the particular genes whose Plant growth characteristics that can be modified include expression is either directly or indirectly (e.g., by a cascade of growth rate, germination rate of seeds, vigor of plants and transcription factor binding events and transcriptional seedlings, leaf and flower senescence, male sterility, apo changes) altered by transcription factor binding. In a global mixis, flowering time, flower abscission, rate of nitrogen analysis of transcription comparing a standard condition with uptake, osmotic sensitivity to soluble Sugar concentrations, one in which a transcription factor is overexpressed, the biomass or transpiration characteristics, as well as plant resulting transcript profile associated with transcription fac architecture characteristics Such as apical dominance, tor overexpression is related to the trait or cellular process branching patterns, number of organs, organ identity, organ controlled by that transcription factor. For example, the PAP2 gene (and other genes in the MYB family) have been shown shape or size. to control anthocyanin biosynthesis through regulation of the Transcription Factors Modify Expression of Endogenous expression of genes known to be involved in the anthocyanin Genes biosynthetic pathway (Bruce et al. (2000) Plant Cell 12: 65-79; and Borevitz et al. (2000) Plant Cell 12:2383-2393). 0109 Expression of genes that encode transcription fac Further, global transcript profiles have been used successfully tors that modify expression of endogenous genes, polynucle as diagnostic tools for specific cellular states (e.g., cancerous otides, and proteins are well known in the art. In addition, vs. non-cancerous; Bhattacharjee et al. (2001) Proc. Natl. transgenic plants comprising isolated polynucleotides encod Acad. Sci. USA 98: 13790-13795; and Xu et al. (2001) Proc ing transcription factors may also modify expression of Natl AcadSci, USA 98: 15089-15094). Consequently, it is endogenous genes, polynucleotides, and proteins. Examples evident to one skilled in the art that similarity of transcript include Peng et al. (1997, Genes Development 11: 3194 profile upon overexpression of different transcription factors 3205) and Peng et al. (1999, Nature, 400: 256-261). In addi would indicate similarity of transcription factor function. tion, many others have demonstrated that an Arabidopsis transcription factor expressed in an exogenous plant species Polypeptides and Polynucleotides of the Invention elicits the same or very similar phenotypic response. See, for 0114. The present invention provides, among other things, example, Fu et al. (2001, Plant Cell 13:1791-1802): Nandiet transcription factors (TFs), and transcription factor homolog al. (2000, Curr. Biol. 10: 215-218); Coupland (1995, Nature polypeptides, and isolated or recombinant polynucleotides 377: 482-483); and Weigel and Nilsson (1995, Nature 377: encoding the polypeptides, or novel sequence variant 482-500). polypeptides or polynucleotides encoding novel variants of 0110. In another example, Mandel et al. (1992, Cell transcription factors derived from the specific sequences pro 71-133-143) and Suzuki et al. (2001, Plant J. 28: 409–418) vided here. These polypeptides and polynucleotides may be teach that a transcription factor expressed in another plant employed to modify a plant's characteristics. species elicits the same or very similar phenotypic response 0115 Exemplary polynucleotides encoding the polypep of the endogenous sequence, as often predicted in earlier tides of the invention were identified in the Arabidopsis studies of Arabidopsis transcription factors in Arabidopsis thaliana GenBank database using publicly available (see Mandel et al. 1992, supra; Suzuki et al. 2001, supra). sequence analysis programs and parameters. Sequences ini 0111. Other examples include Müller et al. (2001, Plant J. tially identified were then further characterized to identify 28: 169-179); Kim et al. (2001, Plant J. 25: 247-259); Kyo sequences comprising specified sequence strings correspond Zuka and Shimamoto (2002, Plant Cell Physiol. 43: 130 ing to sequence motifs present in families of known transcrip 135): Boss and Thomas (2002, Nature, 416: 847-850); He et tion factors. In addition, further exemplary polynucleotides al. (2000, Transgenic Res. 9; 223-227); and Robson et al. encoding the polypeptides of the invention were identified in (2001, Plant J. 28: 619-631). the plant GenBank database using publicly available US 2011/0078806 A1 Mar. 31, 2011 sequence analysis programs and parameters. Sequences ini Molecular Biology, Ausubel et al. eds. Current Protocols, a tially identified were then further characterized to identify joint venture between Greene Publishing Associates, Inc. and sequences comprising specified sequence strings correspond John Wiley & Sons, Inc., (supplemented through 2000) ing to sequence motifs present in families of known transcrip ("Ausubel'). tion factors. Polynucleotide sequences meeting such criteria 0121 Alternatively, polynucleotides of the invention, can were confirmed as transcription factors. be produced by a variety of in vitro amplification methods 0116. Additional polynucleotides of the invention were adapted to the present invention by appropriate selection of identified by screening Arabidopsis thaliana and/or other specific or degenerate primers. Examples of protocols suffi plant cDNA libraries with probes corresponding to known cient to direct persons of skill through in vitro amplification transcription factors under low stringency hybridization con methods, including the polymerase chain reaction (PCR) the ditions. Additional sequences, including full length coding ligase chain reaction (LCR), Qbeta-replicase amplification sequences were subsequently recovered by the rapid ampli and other RNA polymerase mediated techniques (e.g., fication of cDNA ends (RACE) procedure, using a commer NASBA), e.g., for the production of the homologous nucleic cially available kit according to the manufacturer's instruc acids of the invention are found in Berger (supra), Sambrook tions. Where necessary, multiple rounds of RACE are (supra), and Ausubel (supra), as well as Mullis et al. (1987) performed to isolate 5' and 3' ends. The full-length cDNA was PCR Protocols A Guide to Methods and Applications (Innis et then recovered by a routine end-to-end polymerase chain al. eds) Academic Press Inc. San Diego, Calif. (1990) (Innis). reaction (PCR) using primers specific to the isolated 5' and 3' Improved methods for cloning in vitro amplified nucleic acids ends. Exemplary sequences are provided in the Sequence are described in Wallace et al. U.S. Pat. No. 5,426,039. Listing. Improved methods for amplifying large nucleic acids by PCR 0117. The polynucleotides of the invention can be or were are summarized in Cheng et al. (1994) Nature 369: 684-685 ectopically expressed in overexpressor or knockout plants and the references cited therein, in which PCR amplicons of and the changes in the characteristic(s) or trait(s) of the plants up to 40 kb are generated. One of skill will appreciate that observed. Therefore, the polynucleotides and polypeptides essentially any RNA can be converted into a double stranded can be employed to improve the characteristics of plants. DNA suitable for restriction digestion, PCR expansion and 0118. The polynucleotides of the invention can be or were sequencing using reverse transcriptase and a polymerase. ectopically expressed in overexpressor plant cells and the See, e.g., Ausubel, Sambrook and Berger, all supra. changes in the expression levels of a number of genes, poly 0.122 Alternatively, polynucleotides and oligonucleotides nucleotides, and/or proteins of the plant cells observed. of the invention can be assembled from fragments produced Therefore, the polynucleotides and polypeptides can be by solid-phase synthesis methods. Typically, fragments of up employed to change expression levels of a genes, polynucle to approximately 100 bases are individually synthesized and otides, and/or proteins of plants. then enzymatically or chemically ligated to produce a desired sequence, e.g., a polynucleotide encoding all or part of a Producing Polypeptides transcription factor. For example, chemical synthesis using 0119) The polynucleotides of the invention include the phosphoramidite method is described, e.g., by Beaucage sequences that encode transcription factors and transcription et al. (1981) Tetrahedron Letters 22: 1859-1869; and Matthes factor homolog polypeptides and sequences complementary et al. (1984) EMBO.J. 3: 801-805. According to such meth thereto, as well as unique fragments of coding sequence, or ods, oligonucleotides are synthesized, purified, annealed to sequence complementary thereto. Such polynucleotides can their complementary strand, ligated and then optionally be, e.g., DNA or RNA, e.g., mRNA, cRNA, synthetic RNA, cloned into suitable vectors. And if so desired, the polynucle genomic DNA, cDNA synthetic DNA, oligonucleotides, etc. otides and polypeptides of the invention can be custom The polynucleotides are either double-stranded or single ordered from any of a number of commercial suppliers. stranded, and include either, or both sense (i.e., coding) sequences and antisense (i.e., non-coding, complementary) Homologous Sequences sequences. The polynucleotides include the coding sequence (0123 Sequences homologous, i.e., that share significant of a transcription factor, or transcription factor homolog sequence identity or similarity, to those provided in the polypeptide, in isolation, in combination with additional cod Sequence Listing, derived from Arabidopsis thaliana or from ing sequences (e.g., a purification tag, a localization signal, as other plants of choice, are also an aspect of the invention. a fusion-protein, as a pre-protein, or the like), in combination Homologous sequences can be derived from any plant includ with non-coding sequences (e.g., introns or inteins, regula ing monocots and dicots and in particular agriculturally tory elements such as promoters, enhancers, terminators, and important plant species, including but not limited to, crops the like), and/or in a vector or host environment in which the such as soybean, wheat, corn (maize), potato, cotton, rice, polynucleotide encoding a transcription factor or transcrip rape, oilseed rape (including canola), Sunflower, alfalfa, clo tion factor homolog polypeptide is an endogenous or exog ver, sugarcane, and turf; or fruits and vegetables, such as enous gene. banana, blackberry, blueberry, strawberry, and raspberry, 0120) A variety of methods exist for producing the poly cantaloupe, carrot, cauliflower, coffee, cucumber, eggplant. nucleotides of the invention. Procedures for identifying and grapes, honeydew, lettuce, mango, melon, onion, papaya, isolating DNA clones are well known to those of skill in the peas, peppers, pineapple, pumpkin, spinach, squash, Sweet art, and are described in, e.g., Berger and Kimmel, Guide to corn, tobacco, tomato, tomatillo, watermelon, rosaceous Molecular Cloning Techniques, Methods in Enzymology, Vol. fruits (such as apple, peach, pear, cherry and plum) and Veg 152 Academic Press, Inc., San Diego, Calif. (“Berger'); Sam etable brassicas (such as broccoli, cabbage, cauliflower, Brus brook et al. Molecular Cloning A Laboratory Manual (2nd sels sprouts, and kohlrabi). Other crops, including fruits and Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring vegetables, whose phenotype can be changed and which com Harbor, N.Y., 1989 (“Sambrook”) and Current Protocols in prise homologous sequences include barley, rye: millet, Sor US 2011/007880.6 A1 Mar. 31, 2011

ghum, currant; avocado; citrus fruits such as oranges, lemons, using a program such as CLUSTAL (Thompson et al. (1994) grapefruit and tangerines, artichoke, cherries; nuts such as the Nucleic Acids Res. 22: 4673-4680: Higgins et al. (1996) walnut and peanut, endive, leek; roots such as arrowroot, Supra) potential orthologous sequences can be placed into the beet, cassaya, turnip, radish, yam, and Sweet potato; and phylogenetic tree and their relationship to genes from the . The homologous sequences may also be derived from species of interest can be determined. Orthologous sequences woody species, such pine, poplar and eucalyptus, or mint or can also be identified by a reciprocal BLAST strategy. Once other labiates. In addition, homologous sequences may be an orthologous sequence has been identified, the function of derived from plants that are evolutionarily-related to crop the ortholog can be deduced from the identified function of plants, but which may not have yet been used as crop plants. the reference sequence. Examples include deadly nightshade (Atropa belladona), 0128 Transcription factor gene sequences are conserved related to tomato; jimson weed (Datura stromnium), related across diverse eukaryotic species lines (Goodrich et al. to peyote, and teosinte (Zea species), related to corn (maize). (1993) Cell 75: 519-530; Lin et al. (1991) Nature 353: 569 571; Sadowski et al. (1988) Nature 335: 563-564). Plants are Orthologs and Paralogs no exception to this observation; diverse plant species possess 0124 Homologous sequences as described above can transcription factors that have similar sequences and func comprise orthologous or paralogous sequences. Several dif tions. ferent methods are known by those of skill in the art for I0129. Orthologous genes from different organisms have identifying and defining these functionally homologous highly conserved functions, and very often essentially iden sequences. Three general methods for defining orthologs and tical functions (Lee et al. (2002) Genome Res. 12: 493-502: paralogs are described; an ortholog, paralog or homolog may Remmetal. (2001).J. Mol. Biol. 314: 1041-1052). Paralogous be identified by one or more of the methods described below. genes, which have diverged through gene duplication, may 0.125 Orthologs and paralogs are evolutionarily related retain similar functions of the encoded proteins. In Such genes that have similar sequence and similar functions. cases, paralogs can be used interchangeably with respect to Orthologs are structurally related genes in different species certain embodiments of the instant invention (for example, that are derived by a speciation event. Paralogs are structur transgenic expression of a coding sequence). An example of ally related genes within a single species that are derived by a such highly related paralogs is the CBF family, with three duplication event. well-defined members in Arabidopsis and at least one 0126 Within a single plant species, gene duplication may ortholog in Brassica napus (SEQID NOS: 2238,2240, 2242, cause two copies of a particular gene, giving rise to two or and 2244, respectively), all of which control pathways more genes with similar sequence and often similar function involved in both freezing and drought stress (Gilmour et al. known as paralogs. A paralog is therefore a similar gene (1998) Plant J. 16:433-442; Jaglo et al. (1998) Plant Physiol. formed by duplication within the same species. Paralogs typi 127:910-917). cally cluster together or in the same clade (a group of similar 0.130. The following references represent a small sam genes) when a gene family phylogeny is analyzed using pro pling of the many studies that demonstrate that conserved grams such as CLUSTAL (Thompson et al. (1994) Nucleic transcription factor genes from diverse species are likely to Acids Res. 22: 4673-4680; Higgins et al. (1996) Methods function similarly (i.e., regulate similar target sequences and Enzymol. 266: 383-402). Groups of similar genes can also be control the same traits), and that transcription factors may be identified with pair-wise BLAST analysis (Feng and transformed into diverse species to confer or improve traits. Doolittle (1987).J. Mol. Evol. 25: 351-360). For example, a I0131 (1) The Arabidopsis NPR1 gene regulates systemic clade of very similar MADS domain transcription factors acquired resistance (SAR); over-expression of NPR1 leads to from Arabidopsis all share a common function in flowering enhanced resistance in Arabidopsis. When either Arabidopsis time (Ratcliffeet al. (2001) Plant Physiol. 126: 122-132), and NPR1 or the rice NPR1 ortholog was overexpressed in rice a group of very similar AP2 domain transcription factors from (which, as a monocot, is diverse from Arabidopsis), challenge Arabidopsis are involved in tolerance of plants to freezing with the rice bacterial blight pathogen Xanthomonas Oryzae (Gilmour et al. (1998) Plant J. 16: 433-442). Analysis of pV. Oryzae, the transgenic plants displayed enhanced resis groups of similar genes with similar function that fall within tance (Chem et al. (2001) Plant J. 27: 101-113). NPR1 acts one clade can yield Sub-sequences that are particular to the through activation of expression of transcription factor genes, clade. These sub-sequences, known as consensus sequences, such as TGA2 (Fan and Dong (2002) Plant Cell 14: 1377 can not only be used to define the sequences within each 1389). clade, but define the functions of these genes; genes within a I0132 (2) E2F genes are involved in transcription of plant clade may contain paralogous sequences, or orthologous genes for proliferating cell nuclear antigen (PCNA). Plant sequences that share the same function (see also, for example, E2Fs share a high degree of similarity in amino acid sequence Mount (2001), in Bioinformatics: Sequence and Genome between monocots and dicots, and are even similar to the Analysis Cold Spring Harbor Laboratory Press, Cold Spring conserved domains of the animal E2Fs. Such conservation Harbor, N.Y., page 543.) indicates a functional similarity between plant and animal 0127 Speciation, the production of new species from a E2Fs. E2F transcription factors that regulate meristem devel parental species, can also give rise to two or more genes with opment act through common cis-elements, and regulate similar sequence and similar function. These genes, termed related (PCNA) genes (Kosugi and Ohashi, (2002) Plant J. orthologs, often have an identical function within their host 29:45-59). plants and are often interchangeable between species without (0.133 (3) The ABI5 gene (ABA insensitive 5) encodes a losing function. Because plants have common ancestors, basic leucine Zipper factor required for ABA response in the many genes in any plant species will have a corresponding seed and vegetative tissues. Co-transformation experiments orthologous gene in another plant species. Once a phylogenic with ABI5 cl NA constructs in rice protoplasts resulted in tree for a gene family of one species has been constructed specific transactivation of the ABA-inducible wheat, Arabi US 2011/007880.6 A1 Mar. 31, 2011 dopsis, , and barley promoters. These results demon sequences but excluding or outside a known consensus strate that sequentially similar ABI5 transcription factors are sequence or consensus DNA-binding site, or with the listed key targets of a conserved ABA signaling pathway in diverse sequences excluding one or all conserved domain. Factors plants. (Gampala et al. (2001) J. Biol. Chem. 277: 1689 that are most closely related to the listed sequences share, e.g., 1694). at least about 85%, about 90% or about 95% or more 96 0134 (4) Sequences of three Arabidopsis GAMYB-like sequence identity to the listed sequences, or to the listed genes were obtained on the basis of sequence similarity to sequences but excluding or outside a known consensus GAMYB genes from barley, rice, and L. temulentum. These sequence or consensus DNA-binding site or outside one or all three Arabadopsis genes were determined to encode tran conserved domain. At the nucleotide level, the sequences will scription factors (AtMYB33, AtMYB65, and AtMYB101) typically share at least about 40% nucleotide sequence iden and could substitute for a barley GAMYB and control alpha tity, preferably at least about 50%, about 60%, about 70% or amylase expression (Gocal et al. (2001) Plant Physiol. 127: about 80% sequence identity, and more preferably about 1682-1693). 85%, about 90%, about 95% or about 97% or more sequence 0135 (5) The floral control gene LEAFY from Arabidop identity to one or more of the listed sequences, or to a listed sis can dramatically accelerate flowering in numerous dic sequence but excluding or outside a known consensus toyledonous plants. Constitutive expression of Arabidopsis sequence or consensus DNA-binding site, or outside one or LEAFY also caused early flowering in transgenic rice (a all conserved domain. The degeneracy of the genetic code monocot), with a heading date that was 26-34 days earlier enables major variations in the nucleotide sequence of a poly than that of wild-type plants. These observations indicate that nucleotide while maintaining the amino acid sequence of the floral regulatory genes from Arabidopsis are useful tools for encoded protein. Conserved domains within a transcription heading date improvement in cereal crops (He et al. (2000) factor family may exhibit a higher degree of sequence homol Transgenic Res. 9: 223-227). ogy. Such as at least 65% amino acid sequence identity includ 0.136 (6) Bioactive gibberellins (GAs) are essential ing conservative substitutions, and preferably at least 80% endogenous regulators of plant growth. GA signaling tends to sequence identity, and more preferably at least 85%, or at be conserved across the plant kingdom. GA signaling is medi least 86%, or at least 87%, or at least 88%, or at least 90%, or ated via GAI, a nuclear member of the GRAS family of plant at least 95%, or at least 98% sequence identity. Transcription transcription factors. Arabidopsis GAI has been shown to factors that are homologous to the listed sequences should function in rice to inhibit gibberellin response pathways (Fu share at least 30%, or at least 60%, or at least 75%, or at least et al. (2001) Plant Cell 13: 1791-1802). 76%, or at least 77%, or at least 78%, or at least 79%, or at 0137 (7) The Arabidopsis gene SUPERMAN(SUP), least 80%, or at least 85%, or at least 90%, or at least 95% encodes a putative transcription factor that maintains the amino acid sequence identity over the entire length of the boundary between stamens and carpels. By over-expressing polypeptide or the homolog. Arabidopsis SUP in rice, the effect of the gene's presence on 0141 Percent identity can be determined electronically, whorl boundaries was shown to be conserved. This demon e.g., by using the MEGALIGN program (DNASTAR, Inc. strated that SUP is a conserved regulator of floral whorl Madison, Wis.). The MEGALIGN program can create align boundaries and affects cell proliferation (Nandi et al. (2000) ments between two or more sequences according to different Curr. Biol. 10: 215-218). methods, for example, the clustal method. (See, for example, 0138 (8) Maize, petunia and Arabidopsis myb transcrip Higgins and Sharp (1988) Gene 73: 237-244.) The clustal tion factors that regulate flavonoid biosynthesis are very algorithm groups sequences into clusters by examining the genetically similar and affect the same trait in their native distances between all pairs. The clusters are aligned pairwise species, therefore sequence and function of these myb tran and then in groups. Other alignment algorithms or programs scription factors correlate with each other in these diverse may be used, including FASTA, BLAST, or ENTREZ, species (Borevitz et al. (2000) Plant Cell 12:2383-2394). FASTA and BLAST, and which may be used to calculate 0139 (9) Wheat reduced height-1 (Rht-B1/Rht-D1) and percent similarity. These are available as a part of the GCG maize dwarf-8 (d8) genes are orthologs of the Arabidopsis sequence analysis package (University of Wisconsin, Madi gibberellin insensitive (GAI) gene. Both of these genes have son, Wis.), and can be used with or without default settings. been used to produce dwarf grain varieties that have improved ENTREZ is available through the National Center for Bio grain yield. These genes encode proteins that resemble technology Information. In one embodiment, the percent nuclear transcription factors and contain an SH2-like domain, identity of two sequences can be determined by the GCG indicating that phosphotyrosine may participate in gibberel program with a gap weight of 1, e.g., each amino acid gap is lin signaling. Transgenic rice plants containing a mutant GAI weighted as if it were a single amino acid or nucleotide allele from Arabidopsis have been shown to produce reduced mismatch between the two sequences (see U.S. Pat. No. responses to gibberellin and are dwarfed, indicating that 6,262,333). mutant GAI orthologs could be used to increase yield in a 0142. Other techniques for alignment are described in wide range of crop species (Peng et al. (1999) Nature 400: Methods in Enzymology, Vol. 266, Computer Methods for 256-261). Macromolecular Sequence Analysis (1996), ed. Doolittle, 0140 Transcription factors that are homologous to the Academic Press, Inc., San Diego, Calif., USA. Preferably, an listed sequences will typically share, in at least one conserved alignment program that permits gaps in the sequence is uti domain, at least about 70% amino acid sequence identity, and lized to align the sequences. The Smith-Waterman is one type with regard to Zinc finger transcription factors, at least about of algorithm that permits gaps in sequence alignments (see 50% amino acid sequence identity. More closely related tran Shpaer (1997) Methods Mol. Biol. 70: 173-187). Also, the scription factors can share at least about 70%, or about 75% or GAP program using the Needleman and Wunsch alignment about 80% or about 90% or about 95% or about 98% or more method can be utilized to align sequences. An alternative sequence identity with the listed sequences, or with the listed search strategy uses MPSRCH software, which runs on a US 2011/007880.6 A1 Mar. 31, 2011

MASPAR computer. MPSRCH uses a Smith-Waterman algo N.Y., unit 7.7) and in Meyers (1995; Molecular Biology and rithm to score sequences on a massively parallel computer. Biotechnology, Wiley VCH, New York, N.Y., p. 856-853). This approach improves ability to pick up distantly related 0.148. A further method for identifying or confirming that matches, and is especially tolerant of Small gaps and nucle specific homologous sequences control the same function is otide sequence errors. Nucleic acid-encoded amino acid by comparison of the transcript profile(s) obtained upon over sequences can be used to search both protein and DNA data expression or knockout of two or more related transcription bases. factors. Since transcript profiles are diagnostic for specific cellular states, one skilled in the art will appreciate that genes 0143. The percentage similarity between two polypeptide that have a highly similar transcript profile (e.g., with greater sequences, e.g., sequence A and sequence B, is calculated by than 50% regulated transcripts in common, more preferably dividing the length of sequence A, minus the number of gap with greater than 70% regulated transcripts in common, most residues in sequence A, minus the number of gap residues in preferably with greater than 90% regulated transcripts in sequence B, into the Sum of the residue matches between common) will have highly similar functions. Fowler et al. sequence A and sequence B, times one hundred. Gaps of low (2002) Plant Cell 14: 1675-79) have shown that three paralo or of no similarity between the two amino acid sequences are gous AP2 family genes (CBF1, CBF2 and CBF3), each of not included in determining percentage similarity. Percent which is induced upon cold treatment, and each of which can identity between polynucleotide sequences can also be condition improved freezing tolerance, have highly similar counted or calculated by other methods known in the art, e.g., transcript profiles. Once a transcription factor has been shown the Jotun Hein method. (See, e.g., Hein (1990) Methods Enzy to provide a specific function, its transcript profile becomes a mol. 183: 626-645.) Identity between sequences can also be diagnostic tool to determine whether putative paralogs or determined by other methods known in the art, e.g., by vary orthologs have the same function. ing hybridization conditions (see US Patent Application No. 0149 Furthermore, methods using manual alignment of 20010010913). sequences similar or homologous to one or more polynucle 0144. The percent identity between two conserved otide sequences or one or more polypeptides encoded by the domains of a transcription factor DNA-binding domain con polynucleotide sequences may be used to identify regions of sensus polypeptide sequence can be as low as 16%, as exem similarity and conserved domains. Such manual methods are plified in the case of GATA1 family of eukaryotic Cys/Cys well-known of those of skill in the art and can include, for type zinc finger transcription factors. The DNA-binding example, comparisons of tertiary structure between a domain consensus polypeptide sequence of the GATA1 fam polypeptide sequence encoded by a polynucleotide which ily is CXCX,CX-C, where X is any amino acid residue. comprises a known function with a polypeptide sequence (See, for example, Takatsuji, Supra.) Other examples of Such encoded by a polynucleotide sequence which has a function conserved consensus polypeptide sequences with low overall not yet determined. Such examples of tertiary structure may percent sequence identity are well known to those of skill in comprise predicted alpha helices, beta-sheets, amphipathic the art. helices, leucine Zipper motifs, Zinc finger motifs, proline-rich 0145 Thus, the invention provides methods for identify regions, cysteine repeat motifs, and the like. ing a sequence similar or paralogous or orthologous or 0150 Orthologs and paralogs of presently disclosed tran homologous to one or more polynucleotides as noted herein, Scription factors may be cloned using compositions provided or one or more target polypeptides encoded by the polynucle by the present invention according to methods well known in otides, or otherwise noted herein and may include linking or the art. cDNAs can be cloned using mRNA from a plant cell associating a given plant phenotype or gene function with a or tissue that expresses one of the present transcription fac sequence. In the methods, a sequence database is provided tors. Appropriate mRNA sources may be identified by inter (locally or across an internet or intranet) and a query is made rogating Northern blots with probes designed from the against the sequence database using the relevant sequences present transcription factor sequences, after which a library is herein and associated plant phenotypes or gene functions. prepared from the mRNA obtained from a positive cell or 0146 In addition, one or more polynucleotide sequences tissue. Transcription factor-encoding cDNA is then isolated or one or more polypeptides encoded by the polynucleotide using, for example, PCR, using primers designed from a sequences may be used to search against a BLOCKS (Bairoch presently disclosed transcription factor gene sequence, or by et al. (1997) Nucleic Acids probing with a partial or complete cDNA or with one or more 0147 Res. 25: 217-221), PFAM, and other databases sets of degenerate probes based on the disclosed sequences. which contain previously identified and annotated motifs, The cDNA library may be used to transform plant cells. sequences and gene functions. Methods that search for pri Expression of the cDNAs of interest is detected using, for mary sequence patterns with secondary structure gap penal example, methods disclosed herein Such as microarrays, ties (Smith etal. (1992) Protein Engineering 5:35-51) as well Northern blots, quantitative PCR, or any other technique for as algorithms such as Basic Local Alignment Search Tool monitoring changes in expression. Genomic clones may be (BLAST: Altschul (1993).J. Mol. Evol. 36:290-300; Altschul isolated using similar techniques to those. etal. (1990) supra), BLOCKS (Henikoff and Henikoff (1991) Identifying Polynucleotides or Nucleic Acids by Hybridiza Nucleic Acids Res. 19: 6565-6572), Hidden Markov Models (HMM; Eddy (1996) Curr. Opin. Str. Biol. 6: 361-365: Son tion nhammeretal. (1997) Proteins 28: 405-420), and the like, can 0151 Polynucleotides homologous to the sequences illus be used to manipulate and analyze polynucleotide and trated in the Sequence Listing and tables can be identified, polypeptide sequences encoded by polynucleotides. These e.g., by hybridization to each other under Stringent or under databases, algorithms and other methods are well known in highly stringent conditions. Single stranded polynucleotides the art and are described in Ausubel et al. (1997; Short Pro hybridize when they associate based on a variety of well tocols in Molecular Biology, John Wiley & Sons, New York, characterized physical-chemical forces, such as hydrogen US 2011/007880.6 A1 Mar. 31, 2011 bonding, Solvent exclusion, base stacking and the like. The 0156 Hybridization experiments are generally conducted stringency of a hybridization reflects the degree of sequence in a buffer of pH between 6.8 to 7.4, although the rate of identity of the nucleic acids involved, such that the higher the hybridization is nearly independent of pH at ionic strengths stringency, the more similar are the two polynucleotide likely to be used in the hybridization buffer (Anderson et al. Strands. Stringency is influenced by a variety of factors, (1985) supra). In addition, one or more of the following may including temperature, salt concentration and composition, be used to reduce non-specific hybridization: Sonicated organic and non-organic additives, solvents, etc. present in salmon sperm DNA or another non-complementary DNA, both the hybridization and wash solutions and incubations bovine serum albumin, Sodium pyrophosphate, sodium dode (and number thereof), as described in more detail in the cylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and Den references cited above. hardt's solution. Dextran sulfate and polyethylene glycol 6000 act to exclude DNA from solution, thus raising the 0152 Encompassed by the invention are polynucleotide effective probe DNA concentration and the hybridization sig sequences that are capable of hybridizing to the claimed nal within a given unit of time. In some instances, conditions polynucleotide sequences, including any of the transcription of even greater stringency may be desirable or required to factor polynucleotides within the Sequence Listing, and frag reduce non-specific and/or background hybridization. These ments thereofunder various conditions of stringency (See, for conditions may be created with the use of higher temperature, example, Wahl and Berger (1987) Methods Enzymol. 152: lower ionic strength and higher concentration of a denaturing 399-407; and Kimmel (1987) Methods Enzymol. 152: 507 agent such as formamide. 511). In addition to the nucleotide sequences listed in Tables 0157 Stringency conditions can be adjusted to screen for 4-9, full length cDNA, orthologs, and paralogs of the present moderately similar fragments such as homologous sequences nucleotide sequences may be identified and isolated using from distantly related organisms, or to highly similar frag well-known methods. The cDNA libraries, orthologs, and ments such as genes that duplicate functional enzymes from paralogs of the present nucleotide sequences may be screened closely related organisms. The stringency can be adjusted using hybridization methods to determine their utility as either during the hybridization step or in the post-hybridiza hybridization target or amplification probes. tion washes. Salt concentration, formamide concentration, 0153. With regard to hybridization, conditions that are hybridization temperature and probe lengths are variables highly stringent, and means for achieving them, are well that can be used to alter stringency (as described by the known in the art. See, for example, Sambrook et al. (1989) formula above). As a general guidelines high Stringency is “Molecular Cloning: A Laboratory Manual (2nd ed., Cold typically performed at T-5°C. to T-20° C. moderate Spring Harbor Laboratory); Berger and Kimmel, eds., (1987) stringency at T-20°C. to T-35°C. and low stringency at “Guide to Molecular Cloning Techniques'. In Methods in T-35°C. to T-50° C. for duplex >150 base pairs. Hybrid Enzymology: 152: 467-469; and Anderson and Young (1985) ization may be performed at low to moderate stringency (25 “Quantitative Filter Hybridisation.” In: Hames and Higgins, 50° C. below T), followed by post-hybridization washes at ed., Nucleic Acid Hybridisation, A Practical Approach. increasing stringencies. Maximum rates of hybridization in Oxford, IRL Press, 73-111. solution are determined empirically to occurat T-25°C. for 0154 Stability of DNA duplexes is affected by such fac DNA-DNA duplex and T-15° C. for RNA-DNA duplex. tors as base composition, length, and degree of base pair Optionally, the degree of dissociation may be assessed after mismatch. Hybridization conditions may be adjusted to allow each wash step to determine the need for Subsequent, higher DNAs of different sequence relatedness to hybridize. The stringency wash steps. melting temperature (T,) is defined as the temperature when 0158 High stringency conditions may be used to select for 50% of the duplex molecules have dissociated into their con nucleic acid sequences with high degrees of identity to the stituent single strands. The melting temperature of a perfectly disclosed sequences. An example of stringent hybridization matched duplex, where the hybridization buffer contains for conditions obtained in a filter-based method such as a South mamide as a denaturing agent, may be estimated by the fol ern or northern blot for hybridization of complementary lowing equations: nucleic acids that have more than 100 complementary resi (I) DNA-DNA: dues is about 5° C. to 20° C. lower than the thermal melting point (T) for the specific sequence at a defined ionic strength and pH. Conditions used for hybridization may include about 0.02 M to about 0.15M sodium chloride, about 0.5% to about 5% casein, about 0.02% SDS or about 0.1% N-laurylsar (II) DNA-RNA: cosine, about 0.001 M to about 0.03 M sodium citrate, at hybridization temperatures between about 50° C. and about 70° C. More preferably, high stringency conditions are about 0.02 M sodium chloride, about 0.5% casein, about 0.02% (III) RNA-RNA: SDS, about 0.001M sodium citrate, at a temperature of about 50° C. Nucleic acid molecules that hybridize under stringent conditions will typically hybridize to a probe based on either the entire DNA molecule or selected portions, e.g., to a 0155 where L is the length of the duplex formed, Na+ is unique Subsequence, of the DNA. the molar concentration of the sodium ion in the hybridization 0159 Stringent salt concentration will ordinarily be less or washing solution, and % G+C is the percentage of (gua than about 750 mM NaCl and 75 mM trisodium citrate. nine-cytosine) bases in the hybrid. For imperfectly matched Increasingly stringent conditions may be obtained with less hybrids, approximately 1°C. is required to reduce the melting than about 500 mM NaCl and 50 mM trisodium citrate, to temperature for each 1% mismatch. even greater stringency with less than about 250 mM NaCl US 2011/007880.6 A1 Mar. 31, 2011

and 25 mM trisodium citrate. Low stringency hybridization tions will be readily apparent to those skilled in the art (see, can be obtained in the absence of organic solvent, e.g., for for example, US Patent Application No. 20010010913). mamide, whereas high Stringency hybridization may be 0169 Stringency conditions can be selected such that an obtained in the presence of at least about 35% formamide, and oligonucleotide that is perfectly complementary to the coding more preferably at least about 50% formamide. Stringent oligonucleotide hybridizes to the coding oligonucleotide with temperature conditions will ordinarily include temperatures at least about a 5-10x higher signal to noise ratio than the ratio for hybridization of the perfectly complementary oligonucle of at least about 30°C., more preferably of at least about 37° otide to a nucleic acid encoding a transcription factor known C., and most preferably of at least about 42°C. with forma as of the filing date of the application. It may be desirable to mide present. Varying additional parameters, such as hybrid select conditions for a particular assay Such that a higher ization time, the concentration of detergent, e.g., sodium signal to noise ratio, that is, about 15x or more, is obtained. dodecyl sulfate (SDS) and ionic strength, are well known to Accordingly, a Subject nucleic acid will hybridize to a unique those skilled in the art. Various levels of stringency are coding oligonucleotide with at least a 2x or greater signal to accomplished by combining these various conditions as noise ratio as compared to hybridization of the coding oligo needed. nucleotide to a nucleic acid encoding known polypeptide. 0160 The washing steps that follow hybridization may The particular signal will depend on the label used in the also vary in stringency; the post-hybridization wash steps relevant assay, e.g., a fluorescent label, a colorimetric label, a primarily determine hybridization specificity, with the most radioactive label, or the like. Labeled hybridization or PCR critical factors being temperature and the ionic strength of the probes for detecting related polynucleotide sequences may be final wash Solution. Wash stringency can be increased by produced by oligolabeling, nick translation, end-labeling, or decreasing salt concentration or by increasing temperature. PCR amplification using a labeled nucleotide. Stringent salt concentration for the wash steps will preferably 0170 Encompassed by the invention are polynucleotide sequences encoding polypeptides capable of regulating tran be less than about 30 mMNaCl and 3 mM trisodium citrate, Scription, said polynucleotide sequences being capable of and most preferably less than about 15 mMNaCl and 1.5 mM hybridizing to the claimed polynucleotide sequences, includ trisodium citrate. ing those listed in the Sequence Listing, or polynucleotides 0161 Thus, hybridization and wash conditions that may that encode the polypeptides listed in the Sequence Listing, be used to bind and remove polynucleotides with less than the and specifically SEQID NOs: 1-2237, and fragments thereof desired homology to the nucleic acid sequences or their under various conditions of stringency. (See, e.g., Wahl and complements that encode the present transcription factors Berger (1987) Methods Enzymol. 152: 399-407; Kimmel include, for example: (1987) Methods Enzymol. 152: 507-511.) Estimates of (0162 6XSSC at 65° C.: homology are provided by either DNA-DNA or DNA-RNA (0163 50% formamide, 4xSSC at 42°C.; or hybridization under conditions of stringency as is well under stood by those skilled in the art (Hames and Higgins, Eds. (0164 0.5xSSC, 0.1% SDS at 65° C.; (1985) Nucleic Acid Hybridisation, IRL Press, Oxford, 0.165 with, for example, two wash steps of 10-30 minutes U.K.). Stringency conditions can be adjusted to Screen for each. Useful variations on these conditions will be readily moderately similar fragments, such as homologous apparent to those skilled in the art. sequences from distantly related organisms, to highly similar 0166 A person of skill in the art would not expect sub fragments, such as genes that duplicate functional enzymes stantial variation among polynucleotide species encom from closely related organisms. Post-hybridization washes passed within the scope of the present invention because the determine stringency conditions. highly stringent conditions set forth in the above formulae Identifying Polynucleotides or Nucleic Acids with Expres yield structurally similar polynucleotides. sion Libraries 0171 In addition to hybridization methods, transcription 0167 If desired, one may employ wash steps of even factor homolog polypeptides can be obtained by Screening an greater stringency, including about 0.2xSSC, 0.1% SDS at expression library using antibodies specific for one or more 65° C. and washing twice, each wash step being about 30 min, transcription factors. With the provision herein of the dis or about 0.1xSSC, 0.1% SDS at 65° C. and washing twice for closed transcription factor, and transcription factor homolog 30 min. The temperature for the wash solutions will ordi nucleic acid sequences, the encoded polypeptide(s) can be narily be at least about 25°C., and for greater stringency at expressed and purified in a heterologous expression system least about 42°C. Hybridization stringency may be increased (e.g., E. coli) and used to raise antibodies (monoclonal or further by using the same conditions as in the hybridization polyclonal) specific for the polypeptide(s) in question. Anti steps, with the wash temperature raised about 3°C. to about bodies can also be raised against Synthetic peptides derived 5°C., and stringency may be increased even further by using from transcription factor, or transcription factor homolog, the same conditions except the wash temperature is raised amino acid sequences. Methods of raising antibodies are well about 6° C. to about 9° C. For identification of less closely known in the art and are described in Harlow and Lane related homologs, wash steps may be performed at a lower (1988), Antibodies: A Laboratory Manual, Cold Spring Har temperature, e.g., 50° C. bor Laboratory, New York. Such antibodies can then be used 0168 An example of a low stringency wash step employs to screen an expression library produced from the plant from a solution and conditions of at least 25°C. in 30 mMNaCl, 3 which it is desired to clone additional transcription factor mM trisodium citrate, and 0.1% SDS over 30 min. Greater homologs, using the methods described above. The selected stringency may be obtained at 42°C. in 15 mMNaCl, with 1.5 cDNAS can be confirmed by sequencing and enzymatic activ mM trisodium citrate, and 0.1% SDS over 30 min. Even 1ty. higher stringency wash conditions are obtained at 65° C.-68° C. in a solution of 15 mMNaCl, 1.5 mMtrisodium citrate, and Sequence Variations 0.1% SDS. Wash procedures will generally employ at least 0172. It will readily be appreciated by those of skill in the two final wash steps. Additional variations on these condi art, that any of a variety of polynucleotide sequences are US 2011/007880.6 A1 Mar. 31, 2011

capable of encoding the transcription factors and transcrip orthologous to SEQ ID NOs: 761-1348, 1557-2101, and tion factor homolog polypeptides of the invention. Due to the 2124-2237), sequences that are orthologous to paralogous to degeneracy of the genetic code, many different polynucle SEQ ID NOs: 1349-1556, variant sequences that have been otides can encode identical and/or Substantially similar shown to confer an altered trait listed in Table 4 (SEQIDNOs: polypeptides in addition to those sequences illustrated in the 2102-2123) listed in the Sequence Listing, and sequences that Sequence Listing. Nucleic acids having a sequence that dif are complementary to any of the above nucleotide sequences. fers from the sequences shown in the Sequence Listing, or Related nucleic acid molecules also include nucleotide complementary sequences, that encode functionally equiva sequences encoding a polypeptide comprising or consisting lent peptides (i.e., peptides having some degree of equivalent essentially of a Substitution, modification, addition and/or or similar biological activity) but differ in sequence from the deletion of one or more amino acid residues compared to the sequence shown in the Sequence Listing due to degeneracy in polypeptides as set forthin the Sequence Listing. Such related the genetic code, are also within the scope of the invention. polypeptides may comprise, for example, additions and/or 0173 Altered polynucleotide sequences encoding deletions of one or more N-linked or O-linked glycosylation polypeptides include those sequences with deletions, inser sites, or an addition and/or a deletion of one or more cysteine tions, or Substitutions of different nucleotides, resulting in a residues. polynucleotide encoding a polypeptide with at least one func 0177. For example, Table 1 illustrates, e.g., that the codons tional characteristic of the instant polypeptides. Included AGC, AGT, TCA, TCC, TCG, and TCT all encode the same within this definition are polymorphisms which may or may amino acid: Serine. Accordingly, at each position in the not be readily detectable using a particular oligonucleotide sequence where there is a codon encoding serine, any of the probe of the polynucleotide encoding the instant polypep above trinucleotide sequences can be used without altering tides, and improper or unexpected hybridization to allelic the encoded polypeptide. variants, with a locus other than the normal chromosomal locus for the polynucleotide sequence encoding the instant polypeptides. TABLE 1. 0.174 Allelic variant refers to any of two or more alterna Amino acid PossibleCodons tive forms of a gene occupying the same chromosomal locus. Alanine Ala A. GCA GCC GCG GCU Allelic variation arises naturally through mutation, and may result in phenotypic polymorphism within populations. Gene Cysteine Cys C TGC TGT mutations can be silent (i.e., no change in the encoded polypeptide) or may encode polypeptides having altered Aspartic acid Asp D GAC GAT amino acid sequence. The term allelic variant is also used Glutamic acid Glu E GAA GAG herein to denote a protein encoded by an allelic variant of a gene. Splice variant refers to alternative forms of RNA tran Phenylalanine Phe F TTC TTT scribed from a gene. Splice variation arises naturally through Glycine Gly G GGA. GGC GGG GGT use of alternative splicing sites within a transcribed RNA molecule, or less commonly between separately transcribed Histidine His H CAC CAT RNA molecules, and may result in several mRNAs tran scribed from the same gene. Splice variants may encode Isoleucine Ile I ATA ATC ATT polypeptides having altered amino acid sequence. The term Lysine Llys K AAA AAG splice variant is also used herein to denote a protein encoded by a splice variant of an mRNA transcribed from a gene. Leucine Lieu. Li TTA TTG CTA CTC CTG CTT 0175 Those skilled in the art would recognize that, for Methionine Met M ATG example, G47, SEQID NO: 12, represents a single transcrip tion factor; allelic variation and alternative splicing may be Asparagine Asn. N AAC AAT expected to occur. Allelic variants of SEQID NO: 11 can be cloned by probing cDNA or genomic libraries from different Proline Pro P CCA CCC CCG CCT individual organisms according to standard procedures. Glutamine Glin Q CAA CAG Allelic variants of the DNA sequence shown in SEQID NO: 11, including those containing silent mutations and those in Arginine Arg R AGA AGG CGA CGC CGG CGT which mutations result in amino acid sequence changes, are Serine Ser S AGC AGT TCA TCC TCG TCT within the scope of the present invention, as are proteins which are allelic variants of SEQID NO: 12. cDNAs gener Threonine Thir T ACA ACC ACG ACT ated from alternatively spliced mRNAs, which retain the properties of the transcription factor are included within the Waline Wal W GTA GTC GTG GTT Scope of the present invention, as are polypeptides encoded Tryptophan Trp W TGG by such cDNAs and mRNAs. Allelic variants and splice vari ants of these sequences can be cloned by probing cDNA or Tyrosine Tyr Y TAC TAT genomic libraries from different individual organisms or tis Sues according to standard procedures known in the art (see 0.178 Sequence alterations that do not change the amino U.S. Pat. No. 6,388,064). acid sequence encoded by the polynucleotide are termed 0176 Thus, in addition to the sequences set forth in the “silent variations. With the exception of the codons ATG and Sequence Listing, the invention also encompasses related TGG, encoding methionine and tryptophan, respectively, any nucleic acid molecules that include allelic or splice variants of of the possible codons for the same amino acid can be Sub SEQ ID NO: 2N-1, where N=1-335, sequences that are stituted by a variety of techniques, e.g., site-directed US 2011/007880.6 A1 Mar. 31, 2011

mutagenesis, available in the art. Accordingly, any and all functional Substitutions. For example, a residue in column 1 such variations of a sequence selected from the above table of Table 3 may be substituted with a residue in column 2; in are a feature of the invention. addition, a residue in column 2 of Table 3 may be substituted 0179. In addition to silent variations, other conservative with the residue of column 1 variations that alter one, or a few amino acids in the encoded polypeptide, can be made without altering the function of the TABLE 3 polypeptide, these conservative variants are, likewise, a fea ture of the invention. Residue Similar Substitutions Ala Ser: Thr; Gly; Val; Leu: Ile 0180 For example, substitutions, deletions and insertions Arg Lys; His; Gly introduced into the sequences provided in the Sequence List ASn Gln: His: Gly: Ser: Thr ing, are also envisioned by the invention. Such sequence Asp Glu, Ser: Thr modifications can be engineered into a sequence by site Gln ASn; Ala Cys Ser: Gly directed mutagenesis (Wu (ed.) Methods Enzymol. (1993) Glu Asp vol. 217, Academic Press) or the other methods noted below Gly Pro; Arg Amino acid Substitutions are typically of single residues; His ASn; Gln: Tyr; Phe, Lys; Arg insertions usually will be on the order of about from 1 to 10 Ile Ala; Leu; Val; Gly; Met Leu Ala: Ile:Val; Gly; Met amino acid residues; and deletions will range about from 1 to Lys Arg; His; Glin; Gly; Pro 30 residues. In preferred embodiments, deletions or inser Met Leu: Ile: Phe tions are made in adjacent pairs, e.g., a deletion of two resi Phe Met; Leu: Tyr; Trp: His; Val; Ala dues or insertion of two residues. Substitutions, deletions, Ser Thr; Gly; Asp; Ala; Val; Ile: His Thr Ser; Val; Ala; Gly insertions or any combination thereof can be combined to Trp Tyr; Phe: His arrive at a sequence. The mutations that are made in the Tyr Trp; Phe: His polynucleotide encoding the transcription factor should not Wall Ala: Ile; Leu; Gly: Thr; Ser; Glu place the sequence out of reading frame and should not create complementary regions that could produce secondary mRNA structure. Preferably, the polypeptide encoded by the DNA 0183 Substitutions that are less conservative than those in performs the desired function. Table 2 can be selected by picking residues that differ more significantly in their effect on maintaining (a) the structure of 0181 Conservative substitutions are those in which at the polypeptide backbone in the area of the substitution, for least one residue in the amino acid sequence has been example, as a sheet or helical conformation, (b) the charge or removed and a different residue inserted in its place. Such hydrophobicity of the molecule at the target site, or (c) the Substitutions generally are made in accordance with the Table bulk of the side chain. The substitutions which in general are 2 when it is desired to maintain the activity of the protein. expected to produce the greatest changes in protein properties Table 2 shows amino acids which can be substituted for an will be those in which (a) a hydrophilic residue, e.g., seryl or amino acid in a protein and which are typically regarded as threonyl, is substituted for (orby) a hydrophobic residue, e.g., conservative Substitutions. leucyl, isoleucyl phenylalanyl, Valyl or alanyl; (b) a cysteine or proline is substituted for (or by) any other residue; (c) a TABLE 2 residue having an electropositive side chain, e.g., lysyl, argi Conservative nyl, or histidyl, is substituted for (or by) an electronegative Residue Substitutions residue, e.g., glutamyl or aspartyl; or (d) a residue having a Ala Ser bulky side chain, e.g., phenylalanine, is Substituted for (orby) Arg Lys one not having a side chain, e.g., glycine. ASn Gln: His Asp Glu Further Modifying Sequences of the Invention—Mutation/ Gln ASn Forced Evolution Cys Ser Glu Asp Gly Pro 0184. In addition to generating silent or conservative sub His ASn; Glin stitutions as noted, above, the present invention optionally Ile Leu, Val includes methods of modifying the sequences of the Leu Ile: Val Sequence Listing. In the methods, nucleic acid or protein Lys Arg: Gln modification methods are used to alter the given sequences to Met Leu: Ile Phe Met; Leu: Tyr produce new sequences and/or to chemically or enzymati Ser Thr; Gly cally modify given sequences to change the properties of the Thr Ser; Val nucleic acids or proteins. Trp Tyr 0185. Thus, in one embodiment, given nucleic acid Tyr Trp; Phe sequences are modified, e.g., according to standard mutagen Wal Ile: Leu esis or artificial evolution methods to produce modified sequences. The modified sequences may be created using 0182 Similar substitutions are those in which at least one purified natural polynucleotides isolated from any organism residue in the amino acid sequence has been removed and a or may be synthesized from purified compositions and chemi different residue inserted in its place. Such substitutions gen cals using chemical means well know to those of skill in the erally are made in accordance with the Table 3 when it is art. For example, Ausubel, Supra, provides additional details desired to maintain the activity of the protein. Table 3 shows on mutagenesis methods. Artificial forced evolution methods amino acids which can be substituted for an amino acid in a are described, for example, by Stemmer (1994) Nature 370: protein and which are typically regarded as structural and 389-391, Stemmer (1994) Proc. Natl. Acad. Sci. 91: 10747 US 2011/007880.6 A1 Mar. 31, 2011

10751, and U.S. Pat. Nos. 5,811,238, 5,837,500, and 6,242, ecules that direct expression of polypeptides of the invention 568. Methods for engineering synthetic transcription factors inappropriate host cells, transgenic plants, in vitro translation and other polypeptides are described, for example, by Zhang systems, or the like. Due to the inherent degeneracy of the et al. (2000) J. Biol. Chem. 275: 33.850-33860, Liu et al. genetic code, nucleic acid sequences which encode Substan (2001).J. Biol. Chem. 276: 11323-11334, and Isalan et al. tially the same or a functionally equivalent amino acid (2001) Nature Biotechnol. 19: 656-660. Many other mutation and evolution methods are also available and expected to be sequence can be substituted for any listed sequence to provide within the skill of the practitioner. for cloning and expressing the relevant homolog. 0186 Similarly, chemical or enzymatic alteration of 0.192 The transgenic plants of the present invention com expressed nucleic acids and polypeptides can be performed prising recombinant polynucleotide sequences are generally by standard methods. For example, sequence can be modified derived from parental plants, which may themselves be non by addition of lipids, Sugars, peptides, organic or inorganic transformed (or non-transgenic) plants. These transgenic compounds, by the inclusion of modified nucleotides or plants may either have a transcription factor gene "knocked amino acids, or the like. For example, protein modification out' (for example, with a genomic insertion by homologous techniques are illustrated in Ausubel, Supra. Further details on recombination, an antisense or ribozyme construct) or chemical and enzymatic modifications can be found herein. expressed to a normal or wild-type extent. The transgenic These modification methods can be used to modify any given sequence, or to modify any sequence produced by the various plants of the present invention includes, for example, a plant mutation and artificial evolution modification methods noted in which a transcription factor gene encoding a transcription herein. factor polypeptide has been eliminated by homologous 0187. Accordingly, the invention provides for modifica recombination, said transcription factor polypeptide com tion of any given nucleic acid by mutation, evolution, chemi prising a HLH/MYC conserved domain that is at least 85% cal or enzymatic modification, or other available methods, as identical to the conserved HLH/MYC domain of SEQID NO: well as for the products produced by practicing Such methods, 594 (amino acid coordinates 65-137). Overexpressing trans e.g., using the sequences herein as a starting Substrate for the genic “progeny' plants will exhibit greater mRNA levels, various modification approaches. wherein the mRNA encodes a transcription factor, that is, a 0188 For example, optimized coding sequence contain DNA-binding protein that is capable of binding to a DNA ing codons preferred by a particular prokaryotic or eukaryotic regulatory sequence and inducing transcription, and prefer host can be used e.g., to increase the rate of translation or to ably, expression of a plant trait gene. Preferably, the mRNA produce recombinant RNA transcripts having desirable prop expression level will be at least three-fold greater than that of erties, such as a longer half-life, as compared with transcripts the parental plant, or more preferably at least ten-fold greater produced using a non-optimized sequence. Translation stop mRNA levels compared to said parental plant, and most pref codons can also be modified to reflect host preference. For erably at least fifty-fold greater compared to said parental example, preferred stop codons for Saccharomyces cerevi plant. siae and mammals are TAA and TGA, respectively. The pre ferred stop codon for monocotyledonous plants is TGA, Vectors, Promoters, and Expression Systems whereas insects and E. coli prefer to use TAA as the stop codon. 0193 The present invention includes recombinant con 0189 The polynucleotide sequences of the present inven structs comprising one or more of the nucleic acid sequences tion can also be engineered in order to alter a coding sequence herein. The constructs typically comprise a vector, such as a for a variety of reasons, including but not limited to, alter plasmid, a cosmid, a phage, a virus (e.g., a plant Virus), a ations which modify the sequence to facilitate cloning, pro bacterial artificial chromosome (BAC), a yeast artificial chro cessing and/or expression of the gene product. For example, mosome (YAC), or the like, into which a nucleic acid alterations are optionally introduced using techniques which sequence of the invention has been inserted, in a forward or are well known in the art, e.g., site-directed mutagenesis, to reverse orientation. In a preferred aspect of this embodiment, insert new restriction sites, to alter glycosylation patterns, to the construct further comprises regulatory sequences, includ change codon preference, to introduce splice sites, etc. ing, for example, a promoter, operably linked to the sequence. 0190. Furthermore, a fragment or domain derived from Large numbers of Suitable vectors and promoters are known any of the polypeptides of the invention can be combined with to those of skill in the art, and are commercially available. domains derived from other transcription factors or synthetic 0194 General texts that describe molecular biological domains to modify the biological activity of a transcription techniques useful herein, including the use and production of factor. For instance, a DNA-binding domain derived from a vectors, promoters and many other relevant topics, include transcription factor of the invention can be combined with the Berger, Sambrook, Supra and Ausubel, Supra. Any of the activation domain of another transcription factor or with a identified sequences can be incorporated into a cassette or synthetic activation domain. A transcription activation vector, e.g., for expression in plants. A number of expression domain assists in initiating transcription from a DNA-binding vectors suitable for stable transformation of plant cells or for site. Examples include the transcription activation region of the establishment of transgenic plants have been described VP16 or GAL4 (Moore et al. (1998) Proc. Natl. Acad. Sci.95: including those described in Weissbach and Weissbach 376-381: Aoyama et al. (1995) Plant Cell 7: 1773-1785), (1989) Methods for Plant Molecular Biology, Academic peptides derived from bacterial sequences (Ma and Ptashne Press, and Gelvin et al. (1990) Plant Molecular Biology (1987) Cell 51: 113-119) and synthetic peptides (Giniger and Manual, Kluwer Academic Publishers. Specific examples Ptashne (1987) Nature 330: 670-672). include those derived from a Ti plasmid of Agrobacterium tumefaciens, as well as those disclosed by Herrera-Estrella et Expression and Modification of Polypeptides al. (1983) Nature 303: 209, Bevan (1984) Nucleic Acids Res. 0191 Typically, polynucleotide sequences of the inven 12: 8711-8721, Klee (1985) Bio/Technology 3: 637-642, for tion are incorporated into recombinant DNA (or RNA) mol dicotyledonous plants. US 2011/007880.6 A1 Mar. 31, 2011 20

0.195 Alternatively, non-TiVectors can be used to transfer described in U.S. Pat. No. 5,773.697), fruit-specific promot the DNA into monocotyledonous plants and cells by using ers that are active during fruit ripening (such as the dru 1 free DNA delivery techniques. Such methods can involve, for promoter (U.S. Pat. No. 5,783,393), or the 2A11 promoter example, the use of liposomes, electroporation, microprojec (U.S. Pat. No. 4,943,674) and the tomato polygalacturonase tile bombardment, silicon carbide whiskers, and viruses. By promoter (Bird et al. (1988) Plant Mol. Biol. 11: 651-662), using these methods transgenic plants such as wheat, rice root-specific promoters, such as those disclosed in U.S. Pat. (Christou (1991) Bio/Technology 9:957-962) and corn (Gor Nos. 5,618,988, 5,837,848 and 5,905,186, pollen-active pro don-Kamm (1990) Plant Cell 2: 603-618) can be produced. moters such as PTA29, PTA26 and PTA13 (U.S. Pat. No. An immature embryo can also be a good target tissue for 5,792.929), promoters active in vascular tissue (Ringli and monocots for direct DNA delivery techniques by using the Keller (1998) Plant Mol. Biol. 37: 977-988), flower-specific particle gun (Weeks et al. (1993) Plant Physiol. 102: 1077 (Kaiser et al. (1995) Plant Mol. Biol. 28: 231-243), pollen 1084; Vasil (1993) Bio/Technology 10: 667-674; Wan and (Baerson et al. (1994) Plant Mol. Biol. 26: 1947-1959), car Lemeaux (1994) Plant Physiol. 104:37-48, and for Agrobac pels (Ohl et al. (1990) Plant Cell 2: 837-848), pollen and terium-mediated DNA transfer (Ishida et al. (1996) Nature ovules (Baerson et al. (1993) Plant Mol. Biol. 22:255-267), Biotechnol. 14: 745-750). auxin-inducible promoters (such as that described in Van der 0196. Typically, plant transformation vectors include one Kopetal. (1999) Plant Mol. Biol. 39:979-990 or Baumannet or more cloned plant coding sequence (genomic or cDNA) al., (1999) Plant Cell 11:323-334), cytokinin-inducible pro under the transcriptional control of 5' and 3' regulatory moter (Guevara-Garcia (1998) Plant Mol. Biol. 38: 743-753), sequences and a dominant selectable marker. Such plant promoters responsive to gibberellin (Shi et al. (1998) Plant transformation vectors typically also contain a promoter (e.g., Mol. Biol. 38: 1053-1060, Willmottet al. (1998) Plant Molec. a regulatory region controlling inducible or constitutive, envi Biol. 38: 817-825) and the like. Additional promoters are ronmentally-or developmentally-regulated, or cell- or tissue those that elicit expression in response to heat (Ainley et al. specific expression), a transcription initiation start site, an (1993) Plant Mol. Biol. 22: 13-23), light (e.g., the pea rbcS RNA processing signal (such as intron splice sites), a tran 3A promoter, Kuhlemeier et al. (1989) Plant Cell 1: 471-478, Scription termination site, and/or a polyadenylation signal. and the maize rbcS promoter, Schaffner and Sheen (1991) 0197) A potential utility for the transcription factor poly Plant Cell 3: 997-1012); wounding (e.g., wunI, Siebertz et al. nucleotides disclosed herein is the isolation of promoter ele (1989) Plant Cell 1: 961-968); pathogens (such as the PR-1 ments from these genes that can be used to program expres promoter described in Buchel et al. (1999) Plant Mol. Biol. sion in plants of any genes. Each transcription factor gene 40:387-396, and the PDF1.2 promoter described in Manners disclosed herein is expressed in a unique fashion, as deter et al. (1998) Plant Mol. Biol. 38: 1071-1080), and chemicals mined by promoter elements located upstream of the start of such as methyljasmonate or salicylic acid (Gatz (1997) Annu. translation, and additionally within an intron of the transcrip Rev. Plant Physiol. Plant Mol. Biol. 48: 89-108). In addition, tion factor gene or downstream of the termination codon of the timing of the expression can be controlled by using pro the gene. As is well known in the art, for a significant portion moters such as those acting at senescence (Gan and Amasino of genes, the promoter sequences are located entirely in the (1995) Science 270: 1986-1988); or late seed development region directly upstream of the start of translation. In Such (Odell et al. (1994) Plant Physiol. 106: 447-458). cases, typically the promotersequences are located within 2.0 0201 Plant expression vectors can also include RNA pro kb of the start of translation, or within 1.5 kb of the start of cessing signals that can be positioned within, upstream or translation, frequently within 1.0kb of the start of translation, downstream of the coding sequence. In addition, the expres and sometimes within 0.5 kb of the start of translation. sion vectors can include additional regulatory sequences 0198 The promoter sequences can be isolated according from the 3'-untranslated region of plant genes, e.g., a 3' ter to methods known to one skilled in the art. minator region to increase mRNA stability of the mRNA, 0199 Examples of constitutive plant promoters which can such as the PI-II terminator region of potato or the octopine or be useful for expressing the TF sequence include: the cauli nopaline synthase 3' terminator regions. flower mosaic virus (CaMV) 35S promoter, which confers Additional Expression Elements constitutive, high-level expression in most plant tissues (see, 0202 Specific initiation signals can aid in efficient trans e.g., Odell et al. (1985) Nature 313: 810-812); the nopaline lation of coding sequences. These signals can include, e.g., synthase promoter (An et al. (1988) Plant Physiol. 88: 547 the ATG initiation codon and adjacent sequences. In cases 552); and the octopine synthase promoter (Fromm et al. where a coding sequence, its initiation codon and upstream (1989) Plant Cell 1:977-984). sequences are inserted into the appropriate expression vector, 0200 A variety of plant gene promoters that regulate gene no additional translational control signals may be needed. expression in response to environmental, hormonal, chemi However, in cases where only coding sequence (e.g., a mature cal, developmental signals, and in a tissue-active manner can protein coding sequence), or a portion thereof, is inserted, be used for expression of a TF sequence in plants. Choice of exogenous transcriptional control signals including the ATG a promoter is based largely on the phenotype of interest and is initiation codon can be separately provided. The initiation determined by Such factors as tissue (e.g., seed, fruit, root, codon is provided in the correct reading frame to facilitate pollen, vascular tissue, flower, carpel, etc.), inducibility (e.g., transcription. Exogenous transcriptional elements and initia in response to wounding, heat, cold, drought, light, patho tion codons can be of various origins, both natural and Syn gens, etc.), timing, developmental stage, and the like. Numer thetic. The efficiency of expression can be enhanced by the ous known promoters have been characterized and can favor inclusion of enhancers appropriate to the cell system in use. ably be employed to promote expression of a polynucleotide of the invention in a transgenic plant or cell of interest. For Expression Hosts example, tissue specific promoters include: seed-specific pro 0203 The present invention also relates to host cells which moters (such as the napin, phaseolin or DC3 promoter are transduced with vectors of the invention, and the produc US 2011/007880.6 A1 Mar. 31, 2011

tion of polypeptides of the invention (including fragments Amino acid residue(s) are modified, for example, co-transla thereof) by recombinant techniques. Host cells are geneti tionally or post-translationally during recombinant produc cally engineered (i.e., nucleic acids are introduced, e.g., trans tion or modified by synthetic or chemical means. duced, transformed or transfected) with the vectors of this 0208. Non-limiting examples of a modified amino acid invention, which may be, for example, a cloning vector or an expression vector comprising the relevant nucleic acids residue include incorporation or other use of acetylated herein. The vector is optionally a plasmid, a viral particle, a amino acids, glycosylated amino acids, Sulfated amino acids, phage, a naked nucleic acid, etc. The engineered host cells can prenylated (e.g., farnesylated, geranylgeranylated) amino be cultured in conventional nutrient media modified as appro acids, PEG modified (e.g., “PEGylated”) amino acids, bioti priate for activating promoters, selecting transformants, or nylated amino acids, carboxylated amino acids, phosphory amplifying the relevant gene. The culture conditions, such as lated amino acids, etc. References adequate to guide one of temperature, pH and the like, are those previously used with skill in the modification of amino acid residues are replete the host cell selected for expression, and will be apparent to throughout the literature. those skilled in the art and in the references cited herein, 0209. The modified amino acid residues may prevent or including, Sambrook, Supra and Ausubel, Supra. increase affinity of the polypeptide for another molecule, 0204 The host cell can be a eukaryotic cell, such as a yeast including, but not limited to, polynucleotide, proteins, carbo cell, or a plant cell, or the host cell can be a prokaryotic cell, hydrates, lipids and lipid derivatives, and other organic or such as a bacterial cell. Plant protoplasts are also suitable for synthetic compounds. Some applications. For example, the DNA fragments are introduced into plant tissues, cultured plant cells or plant protoplasts by standard methods including electroporation Identification of Additional Factors (Fromm et al. (1985) Proc. Natl. Acad. Sci. 82: 5824-5828, 0210. A transcription factor provided by the present inven infection by viral vectors such as cauliflower mosaic virus tion can also be used to identify additional endogenous or (CaMV) (Hohn et al. (1982) Molecular Biology of Plant exogenous molecules that can affect a phentoype or trait of Tumors Academic Press, New York, N.Y., pp. 549-560; U.S. interest. On the one hand, such molecules include organic Pat. No. 4,407,956), high velocity ballistic penetration by (Small or large molecules) and/or inorganic compounds that small particles with the nucleic acid either within the matrix affect expression of (i.e., regulate) a particular transcription of small beads or particles, or on the surface (Klein et al. factor. Alternatively, Such molecules include endogenous (1987) Nature 327: 70-73), use of pollen as vector (WO molecules that are acted upon either at a transcriptional level 85/01856), or use of Agrobacterium tumefaciens or A. rhizo by a transcription factor of the invention to modify a pheno genes carrying a T-DNA plasmid in which DNA fragments type as desired. For example, the transcription factors can be are cloned. The T-DNA plasmid is transmitted to plant cells employed to identify one or more downstream genes that are upon infection by Agrobacterium tumefaciens, and a portion Subject to a regulatory effect of the transcription factor. In one is stably integrated into the plant genome (Horsch et al. approach, a transcription factor or transcription factor (1984) Science 233: 496-498; Fraley et al. (1983) Proc. Natl. homolog of the invention is expressed in a host cell, e.g., a Acad. Sci. 80: 4803-4807). transgenic plant cell, tissue or explant, and expression prod 0205 The cell can include a nucleic acid of the invention ucts, either RNA or protein, of likely or random targets are that encodes a polypeptide, wherein the cell expresses a monitored, e.g., by hybridization to a microarray of nucleic polypeptide of the invention. The cell can also include vector acid probes corresponding to genes expressed in a tissue or sequences, or the like. Furthermore, cells and transgenic cell type of interest, by two-dimensional gel electrophoresis plants that include any polypeptide or nucleic acid above or of protein products, or by any other method known in the art throughout this specification, e.g., produced by transduction for assessing expression of gene products at the level of RNA of a vector of the invention, are an additional feature of the or protein. Alternatively, a transcription factor of the inven invention. tion can be used to identify promoter sequences (such as 0206 For long-term, high-yield production of recombi binding sites on DNA sequences) involved in the regulation of nant proteins, stable expression can be used. Host cells trans a downstream target. After identifying a promoter sequence, formed with a nucleotide sequence encoding a polypeptide of interactions between the transcription factor and the promoter the invention are optionally cultured under conditions Suit sequence can be modified by changing specific nucleotides in able for the expression and recovery of the encoded protein the promoter sequence or specific amino acids in the tran from cell culture. The protein or fragment thereof produced Scription factor that interact with the promoter sequence to by a recombinant cell may be secreted, membrane-bound, or altera plant trait. Typically, transcription factor DNA-binding contained intracellularly, depending on the sequence and/or sites are identified by gel shift assays. After identifying the the vector used. As will be understood by those of skill in the promoter regions, the promoter region sequences can be art, expression vectors containing polynucleotides encoding employed in double-stranded DNA arrays to identify mol mature proteins of the invention can be designed with signal ecules that affect the interactions of the transcription factors sequences which direct secretion of the mature polypeptides with their promoters (Bulyket al. (1999) Nature Biotechnol. through a prokaryotic or eukaryotic cell membrane. 17:573-577). 0211. The identified transcription factors are also useful to Modified Amino Acid Residues identify proteins that modify the activity of the transcription 0207 Polypeptides of the invention may contain one or factor. Such modification can occur by covalent modification, more modified amino acid residues. The presence of modified Such as by phosphorylation, or by protein-protein (homo amino acids may be advantageous in, for example, increasing or-heteropolymer) interactions. Any method suitable for polypeptide half-life, reducing polypeptide antigenicity or detecting protein-protein interactions can be employed. toxicity, increasing polypeptide storage stability, or the like Among the methods that can be employed are co-immuno US 2011/007880.6 A1 Mar. 31, 2011 22 precipitation, cross-linking and co-purification through gra or organic (e.g., DMSO-based) solutions for easy delivery to dients or chromatographic columns, and the two-hybridyeast the cell or plant of interest in which the activity of the modu system. lator is to be tested. Optionally, the assays are designed to 0212. The two-hybrid system detects protein interactions screen large modulator composition libraries by automating in vivo and is described in Chien et al. (1991) Proc. Natl. the assay steps and providing compounds from any conve Acad. Sci. 88: 9578-9582) and is commercially available nient source to assays, which are typically run in parallel (e.g., from Clontech (Palo Alto, Calif.). In such a system, plasmids in microtiter formats on micrometer plates in robotic assays). are constructed that encode two hybrid proteins: one consists 0215. In one embodiment, high throughput screening of the DNA-binding domain of a transcription activator pro methods involve providing a combinatorial library containing tein fused to the TF polypeptide and the other consists of the a large number of potential compounds (potential modulator transcription activator protein's activation domainfused to an compounds). Such “combinatorial chemical libraries” are unknown protein that is encoded by a cDNA that has been then screened in one or more assays, as described herein, to recombined into the plasmid as part of a cDNA library. The identify those library members (particular chemical species DNA-binding domain fusion plasmid and the cDNA library or Subclasses) that display a desired characteristic activity. are transformed into a strain of the yeast Saccharomyces The compounds thus identified can serve as target com cerevisiae that contains a reporter gene (e.g., lac7) whose pounds. regulatory region contains the transcription activator's bind 0216 A combinatorial chemical library can be, e.g., a ing site. Either hybrid protein alone cannot activate transcrip collection of diverse chemical compounds generated by tion of the reporter gene. Interaction of the two hybrid pro chemical synthesis or biological synthesis. For example, a teins reconstitutes the functional activator protein and results combinatorial chemical library Such as a polypeptide library in expression of the reporter gene, which is detected by an is formed by combining a set of chemical building blocks assay for the reporter gene product. Then, the library plasmids (e.g., in one example, amino acids) in every possible way for responsible for reporter gene expression are isolated and a given compound length (i.e., the number of amino acids in sequenced to identify the proteins encoded by the library a polypeptide compound of a set length). Exemplary libraries plasmids. After identifying proteins that interact with the include peptide libraries, nucleic acid libraries, antibody transcription factors, assays for compounds that interfere libraries (see, e.g., Vaughn et al. (1996) Nature Biotechnol. with the TF protein-protein interactions can be preformed. 14: 309-314 and PCT/US96/10287), carbohydrate libraries (see, e.g., Liang et al. Science (1996) 274: 1520-1522 and Identification of Modulators U.S. Pat. No. 5,593.853), peptide nucleic acid libraries (see, 0213. In addition to the intracellular molecules described e.g., U.S. Pat. No. 5,539,083), and small organic molecule above, extracellular molecules that alter activity or expres libraries (see, e.g., benzodiazepines, in Baum Chem. & Engi sion of a transcription factor, either directly or indirectly, can neering News Jan. 18, 1993, page 33; isoprenoids, U.S. Pat. be identified. For example, the methods can entail first plac No. 5,569,588; thiazolidinones and metathiazanones, U.S. ing a candidate molecule in contact with a plant or plant cell. Pat. No. 5,549,974; pyrrolidines, U.S. Pat. Nos. 5,525,735 The molecule can be introduced by topical administration, and 5,519,134; morpholino compounds, U.S. Pat. No. 5,506, Such as spraying or soaking of a plant, or incubating a plant in 337) and the like. a solution containing the molecule, and then the molecule's 0217 Preparation and screening of combinatorial or other effect on the expression or activity of the TF polypeptide or libraries is well known to those of skill in the art. Such the expression of the polynucleotide monitored. Changes in combinatorial chemical libraries include, but are not limited the expression of the TF polypeptide can be monitored by use to, peptide libraries (see, e.g., U.S. Pat. No. 5,010, 175: Furka, of polyclonal or monoclonal antibodies, gel electrophoresis (1991) Int. J. Pept. Prot. Res. 37: 487-493; and Houghton et or the like. Changes in the expression of the corresponding al. (1991) Nature 354: 84-88). Other chemistries for generat polynucleotide sequence can be detected by use of microar ing chemical diversity libraries can also be used. rays, Northerns, quantitative PCR, or any other technique for 0218. In addition, as noted, compound screening equip monitoring changes in mRNA expression. These techniques ment for high-throughput Screening is generally available, are exemplified in Ausubel et al. (eds.) Current Protocols in e.g., using any of a number of well known robotic systems Molecular Biology, John Wiley & Sons (1998, and supple that have also been developed for solution phase chemistries ments through 2001). Changes in the activity of the transcrip useful in assay systems. These systems include automated tion factor can be monitored, directly or indirectly, by assay workstations including an automated synthesis apparatus and ing the function of the transcription factor, for example, by robotic systems utilizing robotic arms. Any of the above measuring the expression of promoters known to be con devices are Suitable for use with the present invention, e.g., trolled by the transcription factor (using promoter-reporter for high-throughput screening of potential modulators. The constructs), measuring the levels of transcripts using microar nature and implementation of modifications to these devices rays, Northern blots, quantitative PCR, etc. Such changes in (if any) so that they can operate as discussed herein will be the expression levels can be correlated with modified plant apparent to persons skilled in the relevant art. traits and thus identified molecules can be useful for soaking 0219 Indeed, entire high-throughput screening systems or spraying on fruit, vegetable and grain crops to modify traits are commercially available. These systems typically auto in plants. mate entire procedures including all sample and reagent 0214 Essentially any available composition can be tested pipetting, liquid dispensing, timed incubations, and final for modulatory activity of expression or activity of any readings of the microplate in detector(s) appropriate for the nucleic acid or polypeptide herein. Thus, available libraries of assay. These configurable systems provide high throughput compounds such as chemicals, polypeptides, nucleic acids and rapid start up as well as a high degree of flexibility and and the like can be tested for modulatory activity. Often, customization. Similarly, microfluidic implementations of potential modulator compounds can be dissolved in aqueous screening are also commercially available. US 2011/007880.6 A1 Mar. 31, 2011

0220. The manufacturers of such systems provide detailed length. A nucleic acid probe is useful in hybridization proto protocols the various high throughput. Thus, for example, cols, e.g., to identify additional polypeptide homologs of the Zymark Corp. provides technical bulletins describing screen invention, including protocols for microarray experiments. ing systems for detecting the modulation of gene transcrip Primers can be annealed to a complementary target DNA tion, ligand binding, and the like. The integrated systems strand by nucleic acid hybridization to form a hybrid between herein, in addition to providing for sequence alignment and, the primer and the target DNA strand, and then extended optionally, synthesis of relevant nucleic acids, can include along the target DNA strand by a DNA polymerase enzyme. Such screening apparatus to identify modulators that have an Primer pairs can be used for amplification of a nucleic acid effect on one or more polynucleotides or polypeptides sequence, e.g., by the polymerase chain reaction (PCR) or according to the present invention. other nucleic-acid amplification methods. See Sambrook, 0221. In some assays it is desirable to have positive con Supra, and Ausubel, Supra. trols to ensure that the components of the assays are working 0225. In addition, the invention includes an isolated or properly. At least two types of positive controls are appropri recombinant polypeptide including a Subsequence of at least ate. That is, known transcriptional activators or inhibitors can about 15 contiguous amino acids encoded by the recombinant be incubated with cells or plants, for example, in one sample or isolated polynucleotides of the invention. For example, of the assay, and the resulting increase? decrease in transcrip Such polypeptides, or domains or fragments thereof, can be tion can be detected by measuring the resulting increase in used as immunogens, e.g., to produce antibodies specific for RNA levels and/or protein expression, for example, accord the polypeptide sequence, or as probes for detecting a ing to the methods herein. It will be appreciated that modu sequence of interest. A Subsequence can range in size from lators can also be combined with transcriptional activators or about 15 amino acids in length up to and including the full inhibitors to find modulators that inhibit transcriptional acti length of the polypeptide. Vation or transcriptional repression. Either expression of the 0226 To be encompassed by the present invention, an nucleic acids and proteins herein or any additional nucleic expressed polypeptide which comprises such a polypeptide acids or proteins activated by the nucleic acids or proteins Subsequence performs at least one biological function of the herein, or both, can be monitored. intact polypeptide in Substantially the same manner, or to a 0222. In an embodiment, the invention provides a method similar extent, as does the intact polypeptide. For example, a for identifying compositions that modulate the activity or polypeptide fragment can comprise a recognizable structural expression of a polynucleotide or polypeptide of the inven motif or functional domain Such as a DNA binding domain tion. For example, a test compound, whether a small or large that activates transcription, e.g., by binding to a specific DNA molecule, is placed in contact with a cell, plant (or plant tissue promoter region an activation domain, or a domain for pro or explant), or composition comprising the polynucleotide or tein-protein interactions. polypeptide of interestanda resulting effect on the cell, plant, (or tissue or explant) or composition is evaluated by monitor Production of Transgenic Plants ing, either directly or indirectly, one or more of expression 0227 Modification of Traits level of the polynucleotide or polypeptide, activity (or modu 0228. The polynucleotides of the invention are favorably lation of the activity) of the polynucleotide or polypeptide. In employed to produce transgenic plants with various traits, or Some cases, an alteration in a plant phenotype can be detected characteristics, that have been modified in a desirable man following contact of a plant (or plant cell, or tissue or explant) ner, e.g., to improve the seed characteristics of a plant. For with the putative modulator, e.g., by modulation of expres example, alteration of expression levels or patterns (e.g., spa sion or activity of a polynucleotide or polypeptide of the tial or temporal expression patterns) of one or more of the invention. Modulation of expression or activity of a poly transcription factors (or transcription factor homologs) of the nucleotide or polypeptide of the invention may also be caused invention, as compared with the levels of the same protein by molecular elements in a signal transduction second mes found in a wild-type plant, can be used to modify a plant's senger pathway and Such modulation can affect similar ele traits. An illustrative example of trait modification, improved ments in the same or another signal transduction second mes characteristics, by altering expression levels of a particular senger pathway. transcription factor is described further in the Examples and the Sequence Listing. Subsequences 0229 Arabidopsis as a Model System 0223) Also contemplated are uses of polynucleotides, also 0230 Arabidopsis thaliana is the object of rapidly grow referred to herein as oligonucleotides, typically having at ing attention as a model for genetics and metabolism in least 12 bases, preferably at least 15, more preferably at least plants. Arabidopsis has a small genome, and well-docu 20, 30, or 50 bases, which hybridize under at least highly mented studies are available. It is easy to grow in large num stringent (or ultra-high Stringent or ultra-ultra-high Stringent bers and mutants defining important genetically controlled conditions) conditions to a polynucleotide sequence mechanisms are either available, or can readily be obtained. described above. The polynucleotides may be used as probes, Various methods to introduce and express isolated homolo primers, sense and antisense agents, and the like, according to gous genes are available (see Koncz et al., eds., Methods in methods as noted Supra. Arabidopsis Research (1992) World Scientific, New Jersey, 0224 Subsequences of the polynucleotides of the inven N.J., in “Preface'). Because of its small size, short life cycle, tion, including polynucleotide fragments and oligonucle obligate autogamy and high fertility, Arabidopsis is also a otides are useful as nucleic acid probes and primers. An choice organism for the isolation of mutants and studies in oligonucleotide Suitable for use as a probe or primer is at least morphogenetic and development pathways, and control of about 15 nucleotides in length, more often at least about 18 these pathways by transcription factors (Koncz. Supra, p. 72). nucleotides, often at least about 21 nucleotides, frequently at A number of studies introducing transcription factors into A. least about 30 nucleotides, or about 40 nucleotides, or more in thaliana have demonstrated the utility of this plant for under US 2011/007880.6 A1 Mar. 31, 2011 24 standing the mechanisms of gene regulation and trait alter 0236. For many of the specific effects, traits and utilities ation in plants. (See, for example, Koncz. Supra, and U.S. Pat. listed in Table 4 and Table 6 that may be conferred to plants, No. 6,417,428). one or more transcription factor genes may be used to 0231 Homologous Genes Introduced into Transgenic increase or decrease, advance or delay, or improve or prove Plants. deleterious to a given trait. For example, overexpression of a 0232 Homologous genes that may be derived from any transcription factor gene that naturally occurs in a plant may plant, or from any source whether natural, synthetic, semi cause early flowering relative to non-transformed or wild synthetic or recombinant, and that share significant sequence type plants. By knocking out the gene, or Suppressing the identity or similarity to those provided by the present inven gene (with, for example, antisense Suppression) the plant may tion, may be introduced into plants, for example, crop plants, experience delayed flowering. Similarly, overexpressing or to confer desirable or improved traits. Consequently, trans genic plants may be produced that comprise a recombinant Suppressing one or more genes can impart significant differ expression vector or cassette with a promoter operably linked ences in production of plant products, such as different fatty to one or more sequences homologous to presently disclosed acid ratios. Thus, Suppressing a gene that causes a plant to be sequences. The promoter may be, for example, a plant or viral more sensitive to cold may improve a plant's tolerance of promoter. cold. More than one transcription factor gene may be intro 0233. The invention thus provides for methods for prepar duced into a plant, either by transforming the plant with one ing transgenic plants, and for modifying plant traits. These or more vectors comprising two or more transcription factors, methods include introducing into a plant a recombinant or by selective breeding of plants to yield hybrid crosses that expression vector or cassette comprising a functional pro comprise more than one introduced transcription factor. moter operably linked to one or more sequences homologous 0237. A listing of specific effects and utilities that the to presently disclosed sequences. Plants and kits for produc presently disclosed transcription factor genes have on plants, ing these plants that result from the application of these meth as determined by direct observation and assay analysis, is ods are also encompassed by the present invention. provided in Table 4. Table 4 shows the polynucleotides iden 0234 Transcription Factors of Interest for the Modifica tified by SEQ ID NO; Gene ID No. (GID); and if the poly tion of Plant Traits nucleotide was tested in a transgenic assay. The first column 0235 Currently, the existence of a series of maturity shows the polynucleotide SEQ ID NO; the second column groups for different latitudes represents a major barrier to the shows the GID; the third column shows whether the gene was introduction of new valuable traits. Any trait (e.g. disease overexpressed (OE) or knocked out (KO) in plant studies; the resistance) has to be bred into each of the different maturity fourth column shows the category of modified trait resulting groups separately, a laborious and costly exercise. The avail from the knockout or overexpression of the polynucleotide in ability of single strain, which could be grown at any latitude, the transgenic plant; and the fifth column (“Experimental would therefore greatly increase the potential for introducing Observations'), includes specific observations made with new traits to crop species such as soybean and cotton. respect to the polynucleotide of the respective first column.

TABLE 4 Traits, trait categories, and effects and utilities that transcription factor genes have on plants.

SEQID OE NO: GID KO Category Experimental Observations

7 G30 OE Leaf: altered shape Long cotyledons, petioles and hypocotyls, dark Leaf: dark green leaves green, glossy leaves; shade avoidance Leaf: glossy leaves Light response; Long petioles Light response; Long hypocotyls Light response; Long cotyledons 11 G47 OE Flowering time Late flowering OE Abiotic stress; osmotic stress Better root growth under osmotic stress OE Dev and morph; Architecture Altered architecture and inflorescence development OE Dev and morph; stem Altered structure of vascular tissues OE Abiotic stress; drought increased tolerance to drought in a soil-based assay 33 G142 OE Flowering time Early flowering 39 G148 OE Flowering time Early flowering inflorescence; terminal Terminal flower OWES 43 G153 OE Flowering time Early flowering OE Abiotic stress: Nutrient Altered C/N sensing uptake 6S G287 OE Dev and morph; Size increased biomass 1OS G485 KO Flowering time Late flowering OE Flowering time Early flowering US 2011/007880.6 A1 Mar. 31, 2011 25

TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID OEf NO: GID KO Category Experimental Observations 121 G627 OE Flowering time Early flowering 161 G975 OE Leaf biochemistry; Leaf fatty increased wax in leaves acids Abiotic stress; Drought increased tolerance to drought in a soil-based assay 163 G1011 OE Morphology; altered flowers Floral organ abscission was delayed, with stamens, Leaf: altered shape petals, and sepals persisting following pollination; Flowering time increased trichome density on sepals and ectopic Morphology; increased richomes on carpels; rounded leaves; early trichome density flowering 177 G1108 Altered Sugar sensing Less sensitive to glucose 193 G1274 Abiotic stress; Cold More tolerant to cold in a germination assay Abiotic stress; Chilling More tolerant to chilling in a seedling growth assay Abiotic stress; Drought increased tolerance to drought in a soil-based assay Dev and morph: Altered inflorescence architecture Inflorescence O E Abiotic stress: Nutrient increased tolerance to low nitrogen uptake O E Abiotic stress: Nutrient Altered CIN sensing uptake Dev and morph; Leaf Large leaves G1357 Flowering time Late flowering Hormone sensitivity insensitive to ABA Abiotic stress; Chilling More tolerant to chilling stress in a growth assay Dev and morph; Leaf Altered leaf shape and dark green leaves Abiotic stress; Drought increased tolerance to drought in a soil-based assay 225 G1452 Flowering time Late flowering Dev and morph; Leaf Altered leaf shape, dark green leaves Abiotic stress: Osmotic Better germination on Sucrose and salt Abiotic stress; drought increased tolerance to drought in a soil-based assay ormone sensitivity Reduced sensitivity to ABA ev and morph; Trichome Reduced trichome density 233 G1482 increased pigment increased anthocyanins in leaf 237 G1493 Altered Sugar sensing Seedling vigor on high glucose Leaf: altered shape Altered leaf shape Flowering time Late flowering 241 G1510 Leaf: dark green leaves Dark green leaves Altered light response Long hypocolyls 263 G1660 Abiotic stress; sodium More root growth and seedling vigor in high Salt chloride tolerance 267 G1730 Abiotic stress; osmotic stress Large and green seedlings on mannitol and glucose O(8Ce 275 G1779 E Abiotic stress; chilling Mature plants have enhanced tolerance to chilling stress for a long time period 277 G1792 Disease; Erysiphe increased resistance to Erysiphe Disease: Fusarium increased resistance to Fusarium Disease; Botrytis increased resistance to Botrytis Dev and morph; Leaf Dark green, shiny leaves Nutrient uptake; tolerance to increased tolerance to low nitrogen ow N 281 G1797 Abiotic stress; Drought increased tolerance to drought in a soil-based assay Flowering time Early flowering Dev and morph Flower Flower organs persisted following fertilization 283 G1798 Flowering time Early flowering Dev and morph; Multiple inflorescence defects inflorescence 287 G1816 E Trichome; glabrous leaves Glabrous leaves Abiotic stress; osmotic stress increased tolerance to high glucose olerance increased root hairs Root; increased hairs increased tolerance to high glucose Altered Sugar sensing C/N sensing: improved tolerance to low nitrogen Altered C/N sensing 303 G1863 KO Abiotic stress; sodium Decreased germination under salt stress OE chloride Altered leaf shape and coloration Leaf: altered shape and Late flowering coloration Flowering time 317 G1945 OE Leaf: altered shape Altered leaf shape Flowering time Late flowering 327 G1988 OE Nutrient: Tolerance to low N Better growth on low nitrogen plus glutamine, Nutrient: Tolerance to low better growth on low phosphate; long hypocotyl, US 2011/007880.6 A1 Mar. 31, 2011 26

TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID OEf NO: GID KO Category Experimental Observations PO long petiole, early flowering Flowering time Light response; Long petiole Light response; Long hypocotyl 341 and G2O41 OE Abiotic stress: Sodium Increased tolerance to Sodium chloride 2110 chloride tolerance 1495 G2133 OE Abiotic stress; drought Increased tolerance to drought in a soil-based assay 365 G2142 OE Abiotic stress; tolerance to More tolerant to phosphate deprivation in a root low PO growth assay OE Flowering time Accelerated flowering time 371 OE Hormone sensitivity; altered Increased tolerance to osmotic stress under high Salt ABA response Abiotic or Sucrose and less sensitive to ABA in germination stress; sodium chloride assays; late flowering; narrow darkgreen leaves tolerance Abiotic stress; osmotic tolerance Flowering time Leaf: altered shape Leaf: dark green leaves 393 G2334 Dev and morph; Size increased biomass Flowering time Late flowering Dev and morph; Leaf Dark green leaves and altered leaf shape 403 G2394 Abiotic stress; sodium Enhanced germination on high sodium chloride chloride tolerance 505 G2717 Abiotic stress: Osmotic increased tolerance to osmotic stress (salt and StreSS Sucrose) Abiotic stress; sodium increased tolerance to drought in a soil-based assay chloride tolerance Abiotic stress; drought olerance Hormone sensitivity; altered insensitive to ABA in germination assays ABA response Size: increased plantsize Larger seedlings 507 G2718 Dev and morph; Root increased root hair density Dev and morph; Trichome Reduced trichome density Abiotic stress: Nutrient increased tolerance to low nitrogen uptake E Biochem: misc: Reduced pigment production Biochemistry: other 511 G2741 Flowering time Late flowering Dev and morph; Size increased biomass 523 G2765 Slow growth Retarded growth at early stages 557 G2839 Abiotic stress; osmotic stress Better germination on high Sucrose; increased olerance resistance to osmotic stress; Small, contorted leaves Leaf: altered shape hat are upcurled at margins, short petioles; poorly Growth regulator: altered developed flowers with downward-pointing short Sugar Sensing pedicels inflorescence; Architectural change 585 G2898 Sugar sensing Better germination on high glucose media 593 G2933 g Seed: Large seed Big Seeds; larger plants; more tolerant to chilling Abiotic stress; chilling stress in growth assays tolerance 607 G2979 Flowering time Late flowering Dev and morph; Size increased biomass Dev and morph Flower increased flower organ size and number 609 G2981 Nutrient: Tolerance to low N Greener, larger seedlings on low nitrogen medium Supplemented with glutamine 611 G2982 E Abiotic stress; drought Plants transformed with this gene displayed tolerance increased tolerance to dehydration stress in a soil based assay 615 G2990 O E Nutrient; tolerance to low N Altered response to nitrogen deprivation, including more root growth and more anthocyanin production in some lines, more bleaching in others when grown on low nitrogen, indicating this gene is involved in the response to nutrient limitation 623 G2998 OE Abiotic stress; sodium Better germination in high NaCl; late flowering chloride tolerance 625 G2999 OE Abiotic stress; sodium Increased tolerance to high sodium chloride OE chloride tolerance Increased tolerance to drought in a soil-based assay US 2011/007880.6 A1 Mar. 31, 2011 27

TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID OEf NO: GID KO Category Experimental Observations Abiotic stress: drought tolerance OE Flowering time 655 G3076 OE Abiotic stress; Drought Increased tolerance to drought 657 G3O83 OE Abiotic stress: sodium Higher germination in high salt chloride tolerance 661. G3O86 OE Flowering time Early flowering Abiotic stress: heat tolerance Increased tolerance to heat Abiotic stress; sodium Increased tolerance to high sodium chloride chloride tolerance Increased tolerance to drought in a soil-based assay Abiotic stress: drought tolerance

0238 Table 5 shows the polypeptides identified by SEQ ID NO; Gene ID (GID) No; the transcription factor family to TABLE 5-continued which the polypeptide belongs, and conserved domains of the polypeptide. The first column shows the polypeptide SEQID Gene families and conserved domains NO; the third column shows the transcription factor family to Conserved Domains which the polynucleotide belongs; and the fourth column Polypeptide in Amino Acid shows the amino acid residue positions of the conserved SEQID NO: GID No. Family Coordinates domain in amino acid (AA) coordinates. SO6 G2717 5-58 MYB-related TABLE 5 508 G2718 21-76 MYB-related 512 G2741 140-205 GARP Gene families and conserved domains 524 G2765 124-190 HLHAMYC 594 G2933 65-137 HLHAMYC Conserved Domains Polypeptide in Amino Acid S86 G2898 62-133 HMG SEQID NO: GID No. Family Coordinates 608 G2979 192-211 E2F 610 G2981 155-173 E2F 8 G30 17-35 AP2 612 G2982 107-124 E2F 12 G47 11-80 AP2 34 G142 2-57 MADS 616 G2990 54-109, ZF-EB 40 G148 1-57 MADS 2O3-263 44 G153 1-57 MADS 656 G3076 70-100, bZIP-ZW2 66 G287 293-354 MISC 182-209 106 G485 21-116 CAAT 122 G627 1-57 MADS 658 G3O83 75-105, bZIP-ZW2 162 G975 4-71 AP2 188-215 164 G1011 2-57 MADS 178 G1108 363-403 RINGi C3H2C3 194 G1274 111-164 WRKY 208 G1357 16-153 NAC 0239 Examples of some of the utilities that may be desir 226 G1452 30-177 NAC able in plants, and that may be provided by transforming the 234 G1482 S-63 Z-CO-like plants with the presently disclosed sequences, are listed in 238 G1493 242-289 GARP 242 G1510 230-263 GATAZn Table 6. Many of the transcription factors listed in Table 6 264 G1660 362-476 DBP may be operably linked with a specific promoter that causes 268 G1730 103-144 RINGi C3H2C3 276 G1779 190-239 GATAZn the transcription factor to be expressed in response to envi 278 G1792 17-85 AP2 ronmental, tissue-specific or temporal signals. For example, 282 G1797 1-57 MADS G370 induces ectopic trichomes on flowers but also produces 284 G1798 1-57 MADS 288 G1816 31-81 MYB-related small plants. The former may be desirable to produce insector 3O4 G1863 77-186 GRF-like herbivore resistance, or increased cotton yield, but the latter 318 G1945 49-71 AT-hook may be undesirable with respect to yield in that it may reduce 328 G1988 5-50 Z-CO-like 342 G2041 670-906, SWISNF biomass. However, by operably linking G370 with a flower 1090-11.75 specific promoter, one may achieve the desirable benefits of 1496 G2133 11-83 AP2 366 G2142 43-120 HLHAMYC the gene without affecting overall biomass to a significant 372 G22O7 180–227, bZIP-NIN degree. For examples of flower specific promoters, see Kaiser S46-627 et al. (Supra). For examples of other tissue-specific, temporal 394 G2334 82-118, GRF-like 150-194 specific or inducible promoters, see the above discussion 404 G2394 355-395 RINGi C3H2C3 under the heading “Vectors, Promoters, and Expression Sys tems. US 2011/007880.6 A1 Mar. 31, 2011 28

TABLE 6 Genes, traits and utilities that affect plant characteristics Table 6: Transcription factor Trait Category Phenotypic alteration(s) genes that impact traits Utility Abiotic stress Effect of chilling on plants Improved growth rate, earlier increased tolerance G1274: G1357. G1779; planting, yield G2933 Germination in cold Temperature stress response increased tolerance G1274 manipulation Earlier planting; improved Survival, yield Drought Improved survival, vigor, increased tolerance G47: G975; G1274; appearance, yield, range G1357; G1452: G1792; G2133; G2717: G2982; G3076; Freezing G2982 improved survival, vigor, appearance, yield Osmotic stress Abiotic stress response increased sensitivity 863 manipulation increased tolerance 47; G1452: G1730; improved germination rate, 816; G2207 Survival, yield Salt tolerance improved germination rate, Altered response (one line more 2394 Survival, yield; extended olerant, one line more sensitive) growth range increased tolerance 660; G2041: G2207; 2394; G2717: G3.083; Nitrogen stress improved yield and nutrient Sensitivity to N limitation 2718; G2990; stress tolerance, decreased Less sensitive to N limitation 53; G1274: G1792; ertilizer usage 816; G1988: G2718; improved yield and nutrient 2981 stress tolerance, decreased Phosphate stress 988: G2142 ertilizer usage Less sensitive to PO limitation Altered expression induced by ABA 482 Modification of seed development, seed dormancy, cold and dehydration tolerance Altered by auxin G153; G1274: G1482 Regulation of cell division, growth and maturation, particularly at shoot tips Induced by Salicylic acid G1274 Resilience to heator physiological conditions that result in high levels of Salicylic acid After challenge with Erysiphe 274 Yield, appearance, Survival, extended range After challenge with Fusarium 53 Yield, appearance, Survival, extended range Induced by heat 53; G1482 Germination, growth rate, later planting Cold 274: G1730 Improved growth rate, earlier planting, yield Osmotic stress 274: G1482; G1730 Abiotic stress response manipulation Herbicide Glyphosate resistance G2133 Generation of glyphosate resistant plants, and increasing plant resistance to oxidative StreSS Hormone sensitivity Abscisic acid (ABA) sensitivity Modification of seed Reduced sensitivity or insensitive G1357; G1452: G2207; development, improved seed to ABA G2717 dormancy, cold and dehydration tolerance Disease Botrytis Improved yield, appearance, Increased resistance or tolerance G1792 Survival, extended range Fusarium Improved yield, appearance, Increased resistance or tolerance G1792 Survival, extended range Erysiphe Improved yield, appearance, Increased resistance or tolerance G1792 Survival, extended range Growth regulator Altered Sugar sensing Alteration of energy balance, Decreased tolerance to Sugars G155; G344; G478; photosynthetic rate, Increased tolerance to Sugars G1420; G2111; G2763 carbohydrate accumulation, G224: G905; G916: biomass production, Source G1033; G1108; G1493; sink relationships, senescence; G1535; G1753; G1816; alteration of storage

US 2011/007880.6 A1 Mar. 31, 2011 30

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Table 6: Transcription factor Trait Category Phenotypic alteration(s) genes that impact traits Utility G1420; G1549; G1798: G1825; G1995; G.2226; G2457; G2455; G2515; G2575; G2616; G2639; G2640; G2649; G2694; G2743; G2826; G2838; G2859; G2884; G3094 Changes in organ identity G129; G133; G134; G140 Enlarged floral organs G15; G2979 Increase in flower organ number G2768: G2979 Terminal flowers G1798; G2515 Flower organs persisting G1011: G1797 following fertilization Siliques G15; G2579; G2884 Broad, large rosettes G1274 Loss of flower determinacy G131; G135; G2768 Reduced fertility G15; G549; G651: G846; G1100; G1798; G2372; G2579; G2616: G2639; G2640; G2649; G2768: G2884 Gamete lethal G846 Altered shoot meristem G438 (KO): G916: Ornamental modification of development G1585; G1957: G2636; plant architecture, G2650: manipulation of growth and development, increase leaf numbers, modulation of branching patterns to provide improved yield or biomass Inflorescence architectural Ornamental modification of change flower architecture; timing of Altered inflorescence branching G47: G446; G2571; flowering; altered plant habit pattern G2146; G2571; G2694; for yield or harvestability G2784: G2859 benefit; reduction in pollen Short internodes/bushy G47; G253; G1274; production of genetically inflorescences G1474: G1593; G1743; modified plants; manipulation G1753; G1796; G2146; of seasonality and annual or G.2226; G2550; G2251; perennial habit; manipulation G2575; G2616; G2639; of determinate vs. G2640; G2649; G2958; indeterminate growth G3021 Terminal flowers G131; G135; G137; G145; G148; G155; G549; G1798; G2372; G2515 Altered inflorescence G131; G135; G549; determinacy G2372; G2515 Aerial rosette development G1985; G1995; G2826; G2838 Downward pedicels G2839 Homeotic transformation G129, G133, G134: G140 Multiple inflorescence alterations G446; G549; G1798: G2616; G2694; G2784: G2839; G3059 Altered branching pattern G47: G438 (KO) Ornamental modification of plant architecture, improved lodging resistance Stem morphology and altered Modulation of lignin content; vascular tissue structure improvement of wood, palatability of fruits and vegetables Apical dominance Ornamental modification of Reduced apical dominance plant architecture, improved lodging resistance Altered trichome density; Ornamental modification of development, or structure plant architecture, increased Ectopic trichomes G370; G2826 plant product (e.g., diterpenes, Altered trichome development G1539; G2983 cotton) productivity, insect Increased trichome number or G370; G1995; G2085; and herbivore resistance US 2011/007880.6 A1 Mar. 31, 2011 31

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Table 6: Transcription factor Trait Category Phenotypic alteration(s) genes that impact traits Utility density G2826; G2838 Reduced or no trichomes G1452: G1816; G2718 Root development Decreased root growth or G651; G730; G2655; Modification of root Secondary root development G2747; G2992; G2993 architecture and mass Decreased root branching G651; G2993 Influence uptake of water and nutrients Increased root branching G2747; G2992 Improved anchorage Abnormal gravitropic response G2983 Manipulation of root development Increased root hairs G1816; G2718, G2983 Improved yield, stress tolerance; anchorage Altered cotyledon shape G916; G1420; G 1893; Ornamental applications G2432; G2636; G2859; G3059 Altered hypocotyl shape, color, G807: G916; G 1510; Ornamental applications; development G1988: G2771; G2859; altered light response (see G2884; G2993 “Light Response, below) Altered seed development, G961 Modification of seed ripening and germination germination properties and performance Slow growth G652; G 1013; G1100; Ornamental applications G1468; G1535; G1549; G1779; G1938; G2765; G2784: G2826; G2834; G2851; G3091: G3095 Fast growth G807; G1476; G2617 Appearance, biomass, yield Cell differentiation and cell G1539; G1585; G1591; Increase in carpel or fruit proliferation G2983 development; improve regeneration of shoots from callus in transformation or micro-propagation systems Cell expansion GS21 Control of cell elongation Phase change and floral reversion G370; G1985; G1995; Improved yield, biomass, G2826; G2838 manipulation of seasonality and annual or perennial habit, developmental plasticity in response to environmental StreSS Senescence Accelerated or premature G652; G1033; G1772; Improvement in response to SeleSCGCC G2467; G2574; G2783; disease, fruit ripening G2907: G3059; G3111 Reduced or delayed senescence G571; G652 (KO); G2S36 Abnormal embryo development G2884 Embryo lethal when knocked out G374 Herbicide target Gamete lethal G846 Potential to prevent escape of GMO pollen Altered programmed cell death G12 Lethality when overexpressed G366; G1384: G1556; Herbicide target; ablation of G1832; G1850; G1957: specific tissues or organs such G1990; G2213; G2298; as stamen to prevent pollen G2505; G2570; G2587; escape G2869; G2887 Necrosis, formation of necrotic G12; G1840 Disease resistance lesions Plant size Increased plantsize or biomass G46; G268; G287: Improved yield, biomass, G314: G319; G324; appearance G438: G624; G852; G1113; G1150; G1451; G1468; G2334; G2536; G2650; G2741; G2979 Large seedlings G1313; G2679; G2694; Increased survival and vigor G2838 of seedlings, yield Dwarfed or more compact plants G131; G136; G253; Dwarfism, lodging resistance, G309; G370; G386; manipulation of gibberellin G549; G550; G600; responses G651; G652; G707; G738; G811; G1011: G1100; G1247; G1289;

US 2011/007880.6 A1 Mar. 31, 2011 33

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Table 6: Transcription factor Trait Category Phenotypic alteration(s) genes that impact traits Utility G2866; G2888; G2958; G2992: G3021; G3044: G3059; G3O84: G3091; G3094; G3095; G3111 Increased leaf size and mass G268: G324; G438: Increased yield, ornamental G852; G1113; G1274; applications G1451; G2536; G2699; G2768: G3008 Lightgreen or gray leaves G351; G600; G651: Ornamental applications G1468; G1718; G2565; G2604: G2779; G2859; G3044: G3070 Glossy leaves G30; G370 (KO); Ornamental applications, G975; G1792; G2640; manipulation of wax G2649 composition, amount, or distribution Altered abaxial adaxial polarity G730 Modification of plant growth and form Seed morphology Altered seed coloration G581; G961; G2085; Appearance G2371 Seed size and shape Large seed G151; G581; G2085; G2585; G2933 Leaf biochemistry Increased leaf wax G975 Insect, pathogen resistance Leaf fatty acids Increase in leaf fatty acids G975 Light Altered cotyledon G30 Increased planting densities response shade Altered hypocotyl G30; G1510; G1988 and yield enhancement avoidance Altered petiole G478; G807; G1988: G2650; G2694; G2754 Shade avoidance G30 Pigment Increased anthocyanin levels G1482 Enhanced health benefits, Decreased anthocyanin levels G2718 improved ornamental appearance, increased stress resistance, attraction of pollinating and seed dispersing animals

Abbreviations: N = nitrogen P = phosphate ABA = abscisic acid C/N = carbon nitrogen balance

Detailed Description of Genes, Traits and Utilities that Affect 0243 Abiotic stress: adult stage chilling. Enhanced chill Plant Characteristics ing tolerance produced by modifying expression levels of 0240. The following descriptions of traits and utilities transcription factors such as G1274, G1357. G1779, G1928, G2063, G2567, G2579, G2650, G2771, G2930, or G2933, associated with the present transcription factors offer a more for example, in plants may extend the effective growth range comprehensive description than that provided in Table 6. of chilling sensitive crop species by allowing earlier planting 0241 Abiotic Stress, General Considerations or later harvest during a growing season. Improved chilling 0242 Plant transcription factors can modulate gene tolerance may be conferred by increased expression of glyc expression, and, in turn, be modulated by the environmental erol-3-phosphate acetyltransferase in chloroplasts (see, for experience of a plant. Significant alterations in a plant's envi example, Wolter et al. (1992) EMBO J. 4685-4692, and ronment invariably result in a change in the plant's transcrip Murata et al. (1992) Nature 356: 710-713). tion factor gene expression pattern. Altered transcription fac 0244 Chilling tolerance could also serve as a model for torexpression patterns generally result in phenotypic changes understanding how plants adapt to water deficit. Both chilling in the plant. Transcription factor gene product(s) in transgenic and water stress share similar signal transduction pathways plants then differ(s) in amounts or proportions from that and tolerance/adaptation mechanisms. For example, acclima found in wild-type or non-transformed plants, and those tran tion to chilling temperatures can be induced by water stress or Scription factors likely represent polypeptides that are used to treatment with abscisic acid. Genes induced by low tempera alter the response to the environmental change. By way of ture include dehydrins (or LEA proteins). Dehydrins are also example, it is well accepted in the art that analytical methods induced by Salinity, abscisic acid, water stress and during the based on altered expression patterns may be used to screen for late stages of embryogenesis. phenotypic changes in a plant far more effectively than can be 0245 Another large impact of chilling occurs during post achieved using traditional methods. harvest storage. For example, Some fruits and vegetables do US 2011/007880.6 A1 Mar. 31, 2011 34 not store well at low temperatures (for example, bananas, cold-acclimation proteins, strategies that allow plants to Sur avocados, melons, and tomatoes). The normal ripening pro Vive in low water conditions may include, for example, cess of the tomato is impaired if it is exposed to cool tem reduced Surface area, or Surface oil or wax production. peratures. Genes conferring resistance to chilling tempera 0250 Consequently, one skilled in the art would expect tures may enhance tolerance during post-harvest storage. that some pathways involved in resistance to one of these 0246 Abiotic stress: cold germination. The potential util stresses, and hence regulated by an individual transcription ity of presently disclosed transcription factor genes that factor, will also be involved in resistance to another of these increase tolerance to cold is to confer better germination and stresses, regulated by the same or homologous transcription growth in cold conditions. Plants with modified expression factors. Ofcourse, the overall resistance pathways are related, levels of G.224, G728, G807, G1274, G1837, G2051, G2317, not identical, and therefore not all transcription factors con G2603, or G2784 show less sensitivity to germination in cold trolling resistance to one stress will control resistance to the conditions, indicating a role in regulation of cold responses. other stresses. Nonetheless, if a transcription factor condi These genes might be engineered to manipulate the response tions resistance to one of these stresses, it would be apparent to low temperature stress. Genes that would allow germina to one skilled in the art to test for resistance to these related tion and seedling vigor in the cold would have highly signifi stresses. Modifying the expression of a number of presently cant utility in allowing seeds to be planted earlier in the disclosed transcription factor genes shown to confer season with a high rate of survival. Transcription factor genes increased tolerance to drought, e.g., G46, G47, G926, G975, that confer better survival in cooler climates allow a grower to G1206, G1274, G1357, G1452, G1792, G2133, G2505, move up planting time in the spring and extend the growing G2717, G2982, G2999, G3076, and G3086, and increased season further into autumn for higher crop yields. Germina tolerance to salt, e.g., G355, G624, G1017, G1037. G1538, tion of seeds and Survival attemperatures significantly below G1557, G1660, G1837, G2035, G2041, G2060, G2207, that of the mean temperature required for germination of G2317, G2319, G2404, G2453, G2457, G2691, G2717, seeds and Survival of non-transformed plants would increase G2992, G2998, G2999, G3083, and G3086, during germina the potential range of a crop plant into regions in which it tion, the seedling stage, and throughout a plant's life cycle, would otherwise fail to thrive. may thus be used to increase a plant's tolerance to low water 0247 Abiotic Stress: Salt and Drought Tolerance conditions and provide the benefits of improved survival, 0248 Plants are subject to a range of environmental chal increased yield and an extended geographic and temporal lenges. Several of these, including salt stress, general osmotic planting range. stress, drought stress and freezing stress, have the ability to 0251 Abiotic stress: freezing tolerance and osmotic impact whole plant and cellular water availability. Not sur stress. Modification of the expression of a number of pres prisingly, then, plant responses to this collection of stresses ently disclosed transcription factor genes, G47, G916, G926, are related. In a recent review, Zhu notes that (Zhu (2002) G1033, G1206, G1412, G1452, G1730, G1753, G1816, Ann. Rev. Plant Biol. 53: 247-273) “most studies on water G2207, G2661, G2717, G2776, G2839, G2854, G2969, or stress signaling have focused on salt stress primarily because G2982, for example, may be used to increase germination rate plant responses to salt and drought are closely related and the or growth under adverse osmotic conditions, which could mechanisms overlap. Many examples of similar responses impact Survival and yield of seeds and plants. Osmotic (i.e., genetic pathways to this set of stresses have been docu stresses may be regulated by specific molecular control mented. For example, the CBF transcription factors have mechanisms that include genes controlling water and ion been shown to condition resistance to salt, freezing and movements, functional and structural stress-induced pro drought (Kasuga et al. (1999) Nature Biotech. 17: 287-291). teins, signal perception and transduction, and free radical The Arabidopsis rd29B gene is induced in response to both Scavenging, and many others (Wang et al. (2001) Acta Hort. salt and dehydration stress, a process that is mediated largely (ISHS) 560: 285-292). Instigators of osmotic stress include through an ABA signal transduction process (Uno et al. freezing, drought and high Salinity, each of which are dis (2000) Proc. Natl. Acad. Sci. USA 97: 11632-11637), result cussed in more detail below. ing in altered activity of transcription factors that bind to an 0252. In many ways, freezing, high Salt and drought have upstream element within the rd29B promoter. In Mesembry similar effects on plants, not the least of which is the induction anthemum crystallinum (ice plant), Patharker and Cushman of common polypeptides that respond to these different have shown that a calcium-dependent protein kinase (Mc stresses. For example, freezing is similar to water deficit in CDPK1) is induced by exposure to both drought and salt that freezing reduces the amount of water available to a plant. stresses (Patharker and Cushman (2000) Plant J. 24: 679 Exposure to freezing temperatures may lead to cellular dehy 691). The stress-induced kinase was also shown to phospho dration as water leaves cells and forms ice crystals in inter rylate a transcription factor, presumably altering its activity, cellular spaces (Buchanan, Supra). As with high salt concen although transcript levels of the target transcription factor are tration and freezing, the problems for plants caused by low not altered in response to salt or drought stress. Similarly, water availability include mechanical stresses caused by the Saijo et al. demonstrated that a rice Salt/drought-induced withdrawal of cellular water. Thus, the incorporation of tran calmodulin-dependent protein kinase (OsCDPK7) conferred Scription factors that modify a plant's response to osmotic increased salt and drought tolerance to rice when overex stress into, for example, a crop or ornamental plant, may be pressed (Saijo et al. (2000) Plant J. 23:319-327). useful in reducing damage or loss. Specific effects caused by 0249 Exposure to dehydration invokes similar survival freezing, high salt and drought are addressed below. strategies in plants as does freezing stress (see, for example, 0253) Abiotic stress: heat stress tolerance. The germina Yelenosky (1989) Plant Physiol 89: 444-451) and drought tion of many crops is also sensitive to high temperatures. stress induces freezing tolerance (see, for example, Simino Presently disclosed transcription factor genes, including, for vitch et al. (1982) Plant Physiol 69: 250-255; and Guy et al. example, G3086, that provide increased heat tolerance, are (1992) Planta 188: 265-270). In addition to the induction of generally useful in producing plants that germinate and grow US 2011/007880.6 A1 Mar. 31, 2011

in hot conditions, may find particular use for crops that are Larkindale et al. (2002) Plant Physiol. 128: 682-695). In planted late in the season, or extend the range of a plant by seeds, ABA promotes seed development, embryo maturation, allowing growth in relatively hot climates. synthesis of storage products (proteins and lipids), desicca 0254 Nutrient uptake and utilization: nitrogen and phos tion tolerance, and is involved in maintenance of dormancy phorus. Presently disclosed transcription factor genes intro (inhibition of germination), and apoptosis (Zeevaart et al. duced into plants provide a means to improve uptake of essen (1988) Ann. Rev. Plant Physiol. Plant Mol. Biol. 49: 439-473: tial nutrients, including nitrogenous compounds, phosphates, Davies (1991), supra; Thomas (1993) Plant Cell 5: 1401 potassium, and trace minerals. The enhanced performance of 1410; and Bethke et al. (1999) Plant Cell 11: 1033-1046). for example, G153, G200, G581, G839, G916, G1013, ABA also affects plant architecture, including root growth G1150, G1274, G1792, G1816, G1988, G2239, G2604, and morphology and root-to-shoot ratios. ABA action and G2718, G2830, G2913, and G2981, and other overexpressing metabolism is modulated not only by environmental signals lines under low nitrogen conditions or G355, G624, G1988, but also by endogenous signals generated by metabolic feed G2142, and G2972 under low phosphorus conditions indicate back, transport, hormonal cross-talk and developmental that these genes and their homologs could be used to engineer stage. Manipulation of ABA levels, and hence by extension crops that could thrive under conditions of reduced nutrient the sensitivity to ABA, has been described as a very promising availability. Phosphorus, in particular, tends to be a limiting means to improve productivity, performance and architecture nutrient in soils and is generally added as a component in in plants Zeevaart (1999) in: Biochemistry and Molecular fertilizers. Young plants have a rapid intake of phosphate and Biology of Plant Hormones, Hooykaas et al. eds. Elsevier Sufficient phosphate is important for yield of root crops such Science pp 189-207; and Cutleretal. (1999) Trends Plant Sci. as carrot, potato and parsnip. 4: 472-478). 0255. The effect of these modifications is to increase the 0259. A number of genes have been shown to be induced seedling germination and range of ornamental and crop by cold acclimation in higher plants, including, for example, plants. The utilities of presently disclosed transcription factor G171, G.224, G1274, G1730, G2085, and G2597, and the genes conferring tolerance to conditions of low nutrients also proteins encoded by these genes are thought to play a role in include cost savings to the grower by reducing the amounts of protecting plant cells from injury, including freezing (Nagao fertilizer needed, environmental benefits of reduced fertilizer et al. (2002) Plant Cell Physiol. 43: S168-S168). Since ABA runoff into watersheds; and improved yield and stress toler mediates conversion of apical meristems into dormant buds, ance. In addition, by providing improved nitrogen uptake altered expression to ABA may increase protection of the capability, these genes can be used to alter seed protein buds from mechanical damage during winter. A plant's amounts and/or composition in Such a way that could impact response to ABA also affects sprouting inhibition during pre yield as well as the nutritional value and production of various mature warm spells. ABA is also important in protecting food products. plants from drought tolerance. Thus, by affecting ABA sen 0256 Decreased herbicide sensitivity. Presently disclosed sitivity, introduced transcription factor genes may affect cold transcription factor genes, including G2133 and its equiva sensitivity, yield and survival, and plants with G12 knocked logs that confer resistance or tolerance to herbicides (e.g., out or plants overexpressing G926, G1357, G1412, G1452, glyphosate) will find use in providing means to increase her G1893, G2109, G2146, G2207, G2382, G2617, G2717, bicide applications without detriment to desirable plants. This G2854, G2865, G2969, G2992, G3054, G3055, and G3067, would allow for the increased use of a particular herbicide in may have modified ABA responses that influence seed devel a local environment, with the effect of increased detriment to opment and dormancy, as well as cold and dehydration toler undesirable species and less harm to transgenic, desirable ance, and Survival. cultivars. 0260 Auxin refers to a class of plant hormones, includ 0257 Knockouts of a number of the presently disclosed ing indoleacetic acid (IAA), having a variety of effects. Such transcription factor genes have been shown to be lethal to as phototropic response through the stimulation of cell elon developing embryos. Thus, these genes are potentially useful gation, stimulation of secondary growth, and the develop as herbicide targets. ment of leaf traces and fruit. Specifically, auxin is involved in 0258 Altered expression and hormone sensitivity: absci the regulation of cell division, particularly at shoot tips. Tran sic acid and auxin. Altering the expression levels of a number Scription factors genes that regulate a plant's response to of the presently disclosed transcription factor genes, includ auxin thus provide a means for controlling shoot tip develop ing G12, G224, G244, G355, G571, G926, G1037, G1357, ment and secondary growth, which in turn can be used to G1412, G1452, G1482, G1507, G1893, G2070, G2085, manipulate plant growth and development. G2109, G2146, G2207, G2382, G2617, G2717, G2854, 0261 Disease resistance or tolerance: Erysiphe, G2865, G2969, G2992, G3054, G3055, or G3067, may be Fusarium, Botrytis, and other pathogens. A number of the used to reduce a plant's sensitivity to ABA or render a plant presently disclosed transcription factor genes have been insensitive to ABA exposure. ABA plays regulatory roles in a induced to be expressed (e.g., G140, G171, G.224, G434, host of physiological processes in all higher as well as in G571, G1100, G1274, G1384, G1507, G1538, G1923, and lower plants (Davies et al. (1991) Abscisic Acid. Physiology G2085), or have been shown to provide resistance or toler and Biochemistry. Bios Scientific Publishers, Oxford, UK: ance (e.g., G1792) after challenge with more than one patho Zeevaart et al. (1988) Ann. Rev. Plant Physiol. Plant Mol. gen, including fungal pathogens Fusarium oxysporum, Bot Biol. 49: 439-473; Shimizu-Sato et al. (2001) Plant Physiol rytis cinerea and Erysiphe Orontii. Modification of the 127: 1405-1413). ABA mediates stress tolerance responses in expression levels of one or more transcription factor genes higher plants, is a key signal compound that regulates Sto may provide some benefit to the plant to help prevent or matal aperture and, in concert with other plant signaling overcome infestation. The mechanisms by which the tran compounds, is implicated in mediating responses to patho Scription factors work could include changing Surface char gens and wounding or oxidative damage (for example, see acteristics such as waxes, oils, or cell wall composition and US 2011/007880.6 A1 Mar. 31, 2011 36 thickness, or by the activation of signal transduction path interaction between osmotic stress, temperature stress, and ways that regulate plant defenses in response to attacks by ABA responses in plants. These investigators analyzed the pathogens (including, for example, reactive oxygen species, expression of RD29A-LUC in response to various treatment anti-fungal proteins, defensins, thionins, glucanases, and regimes in Arabidopsis. The RD29A promoter contains both chitinases). Another means to combat fungal and other patho the ABA-responsive and the dehydration-responsive ele gens is by accelerating local cell death or senescence, mecha ment—also termed the C-repeat—and can be activated by nisms used to impair the spread of pathogenic microorgan osmotic stress, low temperature, or ABA treatment; transcrip isms throughout a plant. For instance, the best known tion of the RD29A gene in response to osmotic and cold example of accelerated cell death is the resistance gene-me stresses is mediated by both ABA-dependent and ABA-inde diated hypersensitive response, which causes localized cell pendent pathways (Xiong, Ishitani, and Zhu (1999) supra). death at an infection site and initiates a systemic defense LUC refers to the firefly luciferase coding sequence, which, response. Because many defenses, signaling molecules, and in this case, was driven by the stress responsive RD29A signal transduction pathways are common to defense against promoter. The results revealed both positive and negative different pathogens and pests, such as fungal, bacterial, interactions, depending on the nature and duration of the oomycete, nematode, and insect, transcription factors that are treatments. Low temperature stress was found to impair implicated in defense responses against the fungal pathogens osmotic signaling but moderate heat stress strongly enhanced tested may also function in defense against other pathogens osmotic stress induction, thus acting synergistically with and pests. osmotic signaling pathways. In this study, the authors 0262 Growth Regulator: Sugar Sensing. reported that osmotic stress and ABA could act synergisti 0263. In addition to their important role as an energy cally by showing that the treatments simultaneously induced Source and structural component of the plant cell, Sugars are transgene and endogenous gene expression. Similar results central regulatory molecules that control several aspects of were reported by Bostock and Quatrano (1992) Plant plant physiology, metabolism and development (Hsieh et al. Physiol. 98: 1356 text missing or illegible when filed 1363), who found that osmotic stress and ABA act synergis (1998) Proc. Natl. Acad. Sci. 95: 13965-13970). It is thought tically and induce maize Em gene expression. Ishitani et al that this control is achieved by regulating gene expression (1997) Plant Cell 9: 1935-text missing or illegible and, in higher plants, Sugars have been shown to repress or When filed 1949) isolated a group of Arabidopsis single activate plant genes involved in many essential processes gene mutations that confer enhanced responses to both Such as photosynthesis, glyoxylate metabolism, respiration, osmotic stress and ABA. The nature of the recovery of these Starch and Sucrose synthesis and degradation, pathogen mutants from osmotic stress and ABA treatment indicated response, wounding response, cell cycle regulation, pigmen that although separate signaling pathways exist for osmotic tation, flowering and senescence. The mechanisms by which stress and ABA, the pathways share a number of components; Sugars control gene expression are not understood. these common components may mediate Synergistic interac tions between osmotic stress and ABA. Thus, contrary to the 0264. Several sugar-sensing mutants have turned out to be previously held belief that ABA-dependent and ABA-inde allelic to abscisic acid (ABA) and ethylene mutants. ABA is pendent stress signaling pathways act in a parallel manner, found in all photosynthetic organisms and acts as a key regu our data reveal that these pathways cross talk and converge to lator of transpiration, stress responses, embryogenesis, and activate stress gene expression. seed germination. Most ABA effects are related to the com 0266 Because Sugars are important signaling molecules, pound acting as a signal of decreased water availability, the ability to control either the concentration of a signaling whereby it triggers a reduction in water loss, slows growth, Sugar or how the plant perceives or responds to a signaling and mediates adaptive responses. However, ABA also influ Sugar could be used to control plant development, physiology ences plant growth and development via interactions with or metabolism. For example, the flux of Sucrose (a disaccha other phytohormones. Physiological and molecular studies ride Sugar used for systemically transporting carbon and indicate that maize and Arabidopsis have almost identical energy in most plants) has been shown to affect gene expres pathways with regard to ABA biosynthesis and signal trans sion and alter storage compound accumulation in seeds. duction. For further review, see Finkelstein and Rock ((2002) Manipulation of the Sucrose-signaling pathway in seeds may Abscisic acid biosynthesis and response (In The Arabidopsis therefore cause seeds to have more protein, oil or carbohy Book, Editors: Somerville and Meyerowitz (American Soci drate, depending on the type of manipulation. Similarly, in ety of Plant Biologists, Rockville, Md.). tubers, sucrose is converted to starch which is used as an 0265. This potentially implicates G155, G.224, G344, energy store. It is thought that Sugar signaling pathways may G478, G905, G916, G1033, G1108, G1420, G1493, G1535, partially determine the levels of starch synthesized in the G1753, G1816, G2111, G2661, G2763, G2776, G2839, tubers. The manipulation of Sugar signaling in tubers could G2854, G2898 and related transcription factors in hormone lead to tubers with a higher starch content. signaling based on the Sucrose Sugar sensing phenotype of 0267 Thus, the presently disclosed transcription factor transgenic lines overexpressing these polypeptides. On the genes that manipulate the Sugar signal transduction pathway other hand, the Sucrose treatment used in these experiments may lead to altered gene expression to produce plants with (9.5% w/v) could also be an osmotic stress. Therefore, one desirable traits. In particular, manipulation of Sugar signal could interpret these data as an indication that these trans transduction pathways could be used to alter Source-sink genic lines overexpressing are more tolerant to osmotic relationships in seeds, tubers, roots and other storage organs stress. However, it is well known that plant responses to ABA, leading to increase in yield. osmotic and other stress may be linked, and these different 0268 Growth regulator: carbon and nitrogen balance. A treatments may even act in a synergistic manner to increase number of the transcription factor-overexpressing lines, the degree of a response. For example, Xiong, Ishitani, and including G153, G200, G581, G707, G916, G1013, G1150, Zhu (1999) Plant Physiol. 119: 205-212) have shown that G1274, G1483, G1535, G1816, G1988, G2239, G2604, genetic and molecular studies may be used to show extensive G2830, G2913, and G2981, may be used to produce plants US 2011/007880.6 A1 Mar. 31, 2011 37 with altered C/N sensing. These plants may, for example, Vegetative development with presently disclosed transcrip make less anthocyanin on high Sucrose plus glutamine, indi tion factor genes could thus bring about large increases in cating that these genes can be used to modify carbon and yields. Prevention of flowering can help maximize vegetative nitrogen status, and hence assimilate partitioning (assimilate yields and prevent escape of genetically modified organism partitioning refers to the manner in which an essential ele (GMO) pollen. ment, Such as nitrogen, is distributed among different pools 0272 Presently disclosed transcription factors that extend inside a plant, generally in a reduced form, for the purpose of flowering time have utility in engineering plants with longer transport to various tissues). lasting flowers for the horticulture industry, and for extending 0269 Flowering time: early and late flowering. Presently the time in which the plant is fertile. disclosed transcription factor genes that accelerate flowering, 0273 Altered flower structure and inflorescence: aerial which include G129, G131, G135, G136, G137, G138, G140, rosettes, architecture, branching, short internodes, terminal G142, G145, G146, G148, G153, G155, G172, G200, G246, flowers and phase change. Presently disclosed transgenic G416, G485, G549, G600, G627, G1011, G1037, G1142, transcription factors such as G15, G129, G131, G133, G134, G1538, G1797, G1798, G1823, G1825, G1988, G2071, G135, G140, G446, G549, G550, G651, G730, G846, G1011, G2129, G2142, G2184, G2311, G2372, G2443, G2515, G1013, G1100, G1274, G1420, G1539, G1549, G1591, G2628, G2633, G2639, G2650, G2754, G2777, G2779, G1796, G1797, G1798, G1825, G1995, G2226, G2372, G2802, G2805, G2832, G2967, G2992, G3002, G3032, G2455, G2457, G2515, G2575, G2579, G2616, G2617, G3044, G3060, G3061, and G3086, could have valuable G2639, G2640, G2649, G2694, G2743, G2768, G2826, applications in Such programs, since they allow much faster G2838, G2839, G2859, G2884, G2979, G2983, and G3094 generation times. In a number of species, for example, broc have been used to create plants with larger flowers or arrange coli, cauliflower, where the reproductive parts of the plants ments of flowers that are distinct from wild-type or non constitute the crop and the vegetative tissues are discarded, it transformed cultivars. This would likely have the most value would be advantageous to accelerate time to flowering. for the ornamental horticulture industry, where larger flowers Accelerating flowering could shorten crop and tree breeding orinteresting floral configurations are generally preferred and programs. Additionally, in Some instances, a faster generation command the highest prices. time would allow additional harvests of a crop to be made 0274 Flower structure may have advantageous or delete within a given growing season. A number of Arabidopsis rious effects on fertility, and could be used, for example, to genes have already been shown to accelerate flowering when decrease fertility by the absence, reduction or screening of constitutively expressed. These include LEAFY, APETALA1 reproductive components. In fact, plants that overexpress a and CONSTANS (Mandeletal. (1995) Nature 377:522-524; sizable number of the presently disclosed transcription factor Weigel and Nilsson (1995) Nature 377:495-500; Simonetal. genes, including G15, G549, G651, G846, G1100, G1798, (1996) Nature 384: 59-62). G2372, G2579, G2616, G2639, G2640, G2649, G2768, and 0270. By regulating the expression of potential flowering G2884, have been shown to possess reduced fertility com using inducible promoters, flowering could be triggered by pared with control plants. These could be desirable traits, as application of an inducer chemical. This would allow flow low fertility could be exploited to prevent or minimize the ering to be synchronized across a crop and facilitate more escape of the pollen of genetically modified organisms efficient harvesting. Such inducible systems could also be (GMOs) into the environment. used to tune the flowering of crop varieties to different lati 0275. The alterations in shoot architecture seen in the lines tudes. At present, species Such as soybean and cotton are in which the expression G47, G446, G2571, G2146, G2571, available as a series of maturity groups that are suitable for G2694, G2784, or G2859, for example, was modified indi different latitudes on the basis of their flowering time (which cates that these genes can be used to manipulate inflorescence is governed by day-length). A system in which flowering branching patterns. This could influence yield and offer the could be chemically controlled would allow a single high potential for more effective harvesting techniques. For yielding northern maturity group to be grown at any latitude. example, a “self pruning mutation of tomato results in a In Southern regions such plants could be grown for longer determinate growth pattern and facilitates mechanical har periods before flowering was induced, thereby increasing vesting (Pnueli et al. (2001) Plant Cell 13(12): 2687-702). yields. In more northern areas, the induction would be used to 0276 Although the fertility of plants overexpressing some ensure that the crop flowers prior to the first winter frosts. of the lines in which the present transcription factors (e.g., 0271 In a sizeable number of species, for example, root G2579) expression levels were poor, siliques of these plants crops, where the vegetative parts of the plants constitute the appeared to grow out fairly extensively in many instances, crop and the reproductive tissues are discarded, it is advanta indication that these genes may be producing parthenocarpic geous to identify and incorporate transcription factor genes effects (fruit development in the absence of seed set), and may that delay or prevent flowering in order to prevent resources have utility in producing seedless fruit. being diverted into reproductive development. For example, 0277 One interesting application for manipulation of G2, G15, G47, G173, G309, G319, G324, G372, G380, flower structure, for example, by introduced transcription G434, G485, G571, G581, G624, G707, G738, G744, G752, factors could be in the increased production of edible flowers G839, G852, G905, G1113, G1136, G1142, G1150, G1276, or flower parts, including saffron, which is derived from the G1357, G1361, G1446, G1451, G1452, G1468, G1474, Stigmas of Crocus sativus. G1493, G1549, G1554, G1863, G1945, G1983, G1998, 0278 A number of the presently disclosed transcription G1999, G2106, G2146, G2207, G2251, G2269, G2319, factors may affect the timing of phase changes in plants (e.g., G2334, G2432, G2559, G2604, G2694, G2723, G2741, G370, G1985, G1995, G2826, and G2838). Since the timing G2743, G2763, G2771, G2802, G2838, G2846, G2964, or phase changes generally affects a plant's eventual size, G2979, G2993, G2998, G3003, G3021, G3060, and G3111 these genes may prove beneficial by providing means for have been shown to delay flowering time in plants. Extending improving yield and biomass. US 2011/007880.6 A1 Mar. 31, 2011

0279 General development and morphology: shoot mer tions, changing lignin content by selectively expressing or istem and branching patterns. Presently disclosed transcrip repressing transcription factors in fruits and vegetables might tion factor genes, when introduced into plants, may be used to increase their palatability. modify branching patterns (e.g., by knocking-out G438, and 0285 Transcription factors that modify stem structure, overexpression of G916, G1585, G1957, G2636, and including G47 and its equivalogs, may also be used to achieve G2650), for example, by causing stem bifurcations in devel reduction of higher-order shoot development, resulting in oping shoots in which the shoot meristems split to form two or significant plant architecture modification. Overexpression three separate shoots. These transcription factors and their of the genes that encode these transcription factors in Woody functional equivalogs may thus be used to manipulate branch plants might result in trees that lack side branches, and have fewer knots in the wood. Altering branching patterns could ing. This would provide a unique appearance, which may be also have applications amongst ornamental and agricultural desirable in ornamental applications, and may be used to crops. For example, applications might exist in any species modify lateral branching for use in the forestry industry. A where secondary shoots currently have to be removed manu reduction in the formation of lateral branches could reduce ally, or where changes in branching pattern could increase knot formation. Conversely, increasing the number of lateral yield or facilitate more efficient harvesting. branches could provide utility when a plant is used as a view 0286 General development and morphology: altered root or windscreen. Transcription factors that cause primary development. By modifying the structure or development of shoots to become linked at each coflorescence node (e.g., roots by modifying expression levels of one or more of the G47) may be used to manipulate plant structure and provide presently disclosed transcription factor genes, including for a unique ornamental appearance. G651, G730, G1816, G2655, G2718, G2747, G2983, G2992, 0280 General development and morphology: apical G2993, and their equivalogs, plants may be produced that dominance: The modified expression of presently disclosed have the capacity to thrive in otherwise unproductive soils. transcription factors (e.g., G47, and its equivalogs) that For example, grape roots extending further into rocky soils reduce apical dominance could be used in ornamental horti would provide greater anchorage, greater coverage with culture, for example, to modify plant architecture, for increased branching, or would remain viable in waterlogged example, to produce a shorter, more bushy stature than wild soils, thus increasing the effective planting range of the crop type. The latterform would have ornamental utility as well as and/or increasing yield and Survival. It may be advantageous provide increased resistance to lodging. to manipulate a plant to produce short roots, as when a soil in 0281. Development and morphology: trichomes. Several which the plant will be growing is occasionally flooded, or of the presently disclosed transcription factor genes have when pathogenic fungi or disease-causing nematodes are been used to modify trichome number, density, trichome cell prevalent. fate or amount of trichome products produced by plants. 0287. In addition, presently disclosed transcription factors These include G370, G1452, G1539, G1816, G1995, G2085, including G1816, G2718, G2983 and their equivalogs, may G2718, G2826, G2838, and G2983. In most cases where the be used to increase root hair density and thus increase toler metabolic pathways are impossible to engineer, increasing ance to abiotic stresses, thereby improving yield and quality. trichome density or size on leaves may be the only way to 0288. Development and morphology: cotyledon, hypo increase plant productivity. Thus, by increasing trichome cotyl. The morphological phenotypes shown by plants over density, size or type, trichome-affecting genes and their expressing several of the transcription factor genes in Table 6 homologs would have profound utilities in molecular farming indicate that these genes, including those that produce altered practices and increasing the yield of cotton fibers. cotyledons (e.g., G916, G1420, G1893, G2432, G2636, 0282) If the effects on trichome patterning reflecta general G2859, and G3059) and hypocotyls (G807, G916, G1510, change in heterochronic processes, trichome-affecting tran G1988, G2771, G2859, G2884, G2993), can be used to Scription factors or their homologs can be used to modify the manipulate light responses such as shade avoidance. As these way meristems and/or cells develop during different phases genes also alter plant architecture, they may find use in the of the plant life cycle. In particular, altering the timing of ornamental horticulture industry. phase changes could afford positive effects on yield and bio 0289. Development and morphology: seed development, mass production. ripening and germination rate. A number of the presently 0283 General development and morphology: stem mor disclosed transcription factor genes (e.g., G961) have been phology and altered vascular tissue structure. Plants in which shown to modify seed development and germination rate, expression of transcription factor gene that modify stem mor including when the seeds are in conditions normally unfavor phology or lignin content is modified may be used to affect able for germination (e.g., cold, heat or salt stress, or in the overall plantarchitecture and the distribution of lignified fiber presence of ABA), and may, along with functional equiva cells within the stem. logs, thus be used to modify and improve germination rates 0284. Modulating lignin content might allow the quality under adverse conditions. of wood used for furniture or construction to be improved. 0290 Growth rate and development: fast growth. A num Lignin is energy rich; increasing lignin composition could ber of the presently disclosed transcription factor genes, therefore be valuable in raising the energy content of wood including G807, G1476, and G2617, could be used to accel used for fuel. Conversely, the pulp and paper industries seek erate seedling growth, and thereby allow a crop to become wood with a reduced lignin content. Currently, lignin must be established faster. This would minimize exposure to stress removed in a costly process that involves the use of many conditions at early stages of growth when the plants are most polluting chemicals. Consequently, lignin is a serious barrier sensitive. Additionally, it can allow a crop to grow faster than to efficient pulp and paper production (TZfira et al. (1998) competing weed species. TIBTECH 16:439-446: Robinson (1999) Nature Biotechnol 0291. A number of these transcription factors have also ogy 17: 27-30). In addition to forest biotechnology applica been shown to increase growth rate of mature plants to a US 2011/007880.6 A1 Mar. 31, 2011 39 significant extent, including more rapid growth and develop regulated expression, a necrosis-inducing transcription factor ment of reproductive organs. This provides utility for regions can restrict the spread of a pathogen infection through a plant. with short growing seasons. Accelerating plant growth would 0296 Plant Size: Large Plants and Increased Biomass. also improve early yield or increase biomass at an earlier 0297 Plants overexpressing G46, G268, G287, G314, stage, when Such is desirable (for example, in producing G319, G324, G438, G624, G852, G1113, G1150, G1451, Vegetable crops or forestry products). G1468, G2334, G2536, G2650, G2741, and G2979, for 0292 General development and morphology: slow growth example, have been shown to be larger than controls. For rate. A number of the presently disclosed transcription factor Some ornamental plants, the ability to provide larger varieties with these genes or their equivalogs may be highly desirable. genes, including G652, G1013, G1100, G1468, G1535, More significantly, crop species overexpressing these genes G1549, G1779, G1938, G2765, G2784, G2826, G2834, from diverse species would also produce higher yields on G2851, G3091, and G3095, have been shown to have signifi larger cultivars, particularly those in which the vegetative cant effects on retarding plant growth rate and development. portion of the plant is edible. These observations have included, for example, delayed 0298. Overexpression of these genes can confer increased growth and development of reproductive organs. Slow grow stress tolerance as well as increased biomass, and the ing plants may be highly desirable to ornamental horticultur increased biomass appears to be related to the particular ists, both for providing house plants that display little change mechanism of stress tolerance exhibited by these genes. The in their appearance over time, or outdoor plants for which decision for a lateral organ to continue growth and expansion wild-type or rapid growth is undesirable (e.g., ornamental Versus entering late development phases (growth cessation palm trees). Slow growth may also provide for a prolonged and senescence) is controlled genetically and hormonally, fruiting period, thus extending the harvesting season, particu including regulation at an organ size checkpoint (e.g., larly in regions with long growing seasons. Slow growth Mizukami (1001) Curr Opinion Plant Biol. 4: 533-39; could also provide a prolonged period in which pollen is Mizukami and Fisher (2000) Proc. Natl. Acad. Sci. 97: 942 available for improved self- or cross-fertilization, or cross 47; Huetal. Plant Cell 15: 1591)). Organ size is controlled by fertilization of cultivars that normally flower over non-over the meristematic competence of organ cells, with increased lapping time periods. The latter aspect may be particularly meristematic competence leading to increased organ size useful to plants comprising two or more distinct grafted cul (both leaves and stems). Plant hormones can impact plant tivars (e.g., fruit trees) with normally non-overlapping flow organ size, with ethylene pathway overexpression leading to ering periods. reduced organ size. There are also suggestions that auxin 0293 General development and morphology: senescence. plays a determinative role in organ size. Stress responses can Presently disclosed transcription factor genes may be used to impact hormone levels in plant tissues, including ABA and alter senescence responses in plants. Although leaf Senes ethylene levels. Thus, overexpression of G 1073 appears to cence is thought to be an evolutionary adaptation to recycle alter environmental (e.g., stress) inputs to the organ size nutrients, the ability to control senescence in an agricultural checkpoint, thus enhancing organ size setting has significant value. For example, a delay in leaf 0299 Plant size: large seedlings. Presently disclosed tran senescence in Some maize hybrids is associated with a sig Scription factor genes, that produce large seedlings can be nificant increase in yields and a delay of a few days in the used to produce crops that become established faster. Large senescence of soybean plants can have a large impact on seedlings are generally hardier, less Vulnerable to stress, and yield. In an experimental setting, tobacco plants engineered better able to out-compete weed species. Seedlings in which to inhibit leaf Senescence had a longer photosynthetic expression of some of the presently disclosed transcription lifespan, and produced a 50% increase in dry weight and seed factors, including G1313, G2679, G2694, and G2838, for yield (Gan and Amasino (1995) Science 270: 1986-1988). example, was modified, have been shown to possess larger Delayed flower senescence caused by knocking out G652 or cotyledons and/or were more developmentally advanced than overexpressing G571, G2536, for example, may generate control plants. Rapid seedling development made possible by plants that retain their blossoms longer and this may be of manipulating expression of these genes or their equivalogs is potential interest to the ornamental horticulture industry, and likely to reduce loss due to diseases particularly prevalent at delayed foliar and fruit senescence could improve post-har the seedling stage (e.g., damping off) and is thus important for vest shelf-life of produce. survivability of plants germinating in the field or in controlled 0294 Premature senescence caused by, for example, environments. G652, G1033, G1772, G2467, G2574, G2783, G2907, 0300 Plant size: dwarfed plants. Presently disclosed tran G3059, G3111 and their equivalogs may be used to improve scription factor genes, including G131, G136, G253, G309, a plant's response to disease and hasten fruit ripening. G370, G386, G549, G550, G600, G651, G652, G707, G738, 0295 Growth rate and development: lethality and necro G811, G1011 G1100, G1247, G1289, G1340, G1423, sis. Overexpression of transcription factors, for example, G1474, G1483, G1549, G1554, G1593, G1753, G1772, G12, G366, G1384, G1556, G1840, G1832, G1840, G1850, G1779, G1798, G1938, G1983, G1993, G2085, G.2226, G1957, G1990, G2213, G2298, G2505, G2570, G2587, G.2227, G2251, G2372, G2375, G2453, G2456, G2459, G2869, G.2887 and their equivalogs that have a role in regu G2492, G2515, G2550, G2565, G2574, G2575, G2579, lating cell death may be used to induce lethality in specific G2616, G2628, G2640, G2649, G2682, G2702, G2757, tissues or necrosis in response to pathogen attack. For G2783, G2839, G2846, G2847, G2850, G2884, G2934, example, if a transcription factor gene inducing lethality or G2958, G2979, G2992, G3017, G3059, G3091, and G3111 necrosis was specifically active in gametes (e.g., (G846), and their equivalogs can be used to decrease plant stature and embryos (e.g., G374 knockouts) or reproductive organs, its may produce plants that are more resistant to damage by wind expression in these tissues would lead to ablation and Subse and rain, have improved lodging resistance, or more resistant quent male or female sterility. Alternatively, under pathogen to heat or low humidity or water deficit. Dwarf plants are also US 2011/007880.6 A1 Mar. 31, 2011 40 of significant interest to the ornamental horticulture industry, lutein is an important nutraceutical; lutein-rich diets have and particularly for home garden applications for which been shown to help preventage-related macular degeneration space availability may be limited. (ARMD), the leading cause of blindness in elderly people. 0301 Growth rate and development: Cell proliferation Consumption of dark green leafy vegetables has been shown and differentiation. Transcription factors may be used regu in clinical studies to reduce the risk of ARMD. late cell proliferation and/or differentiation in plants. Control 0304 Enhanced chlorophyll and carotenoid levels could of these processes could have valuable applications in plant also improve yield in crop plants. Lutein, like other Xantho transformation, cell culture or micro-propagation systems, as phylls such as Zeaxanthin and violaxanthin, is an essential well as in control of the proliferation of particular useful component in the protection of the plant against the damaging tissues or cell types. Transcription factors that induce the effects of excessive light. Specifically, lutein contributes, proliferation ofundifferentiated cells, such as G1539, G1585, directly or indirectly, to the rapid rise of non-photochemical G1591, and G2983, can be operably linked with an inducible quenching in plants exposed to highlight. Crop plants engi promoter to promote the formation of callus that can be used neered to contain higher levels of lutein could therefore have for transformation or production of cell Suspension cultures. improved photo-protection, leading to less oxidative damage Transcription factors that promote differentiation of shoots and better growth under highlight (e.g., during long Summer could be used in transformation or micro-propagation sys days, or at higher altitudes or lower latitudes than those at tems, where regeneration of shoots from callus is currently which a non-transformed plant would thrive). Additionally, problematic. In addition, transcription factors that regulate elevated chlorophyll levels increases photosynthetic capacity. the differentiation of specific tissues could be used to increase 0305 Leaf morphology: changes in leaf shape. Presently the proportion of these tissues in a plant. Transcription factors disclosed transcription factors produce marked and diverse may promote the differentiation of carpel tissue, and these effects on leaf development and shape, and include G30 and genes could be applied to commercial species to induce for many others (see Table 6, “Change in leaf shape”). At early mation of increased numbers of carpels or fruits. A particular stages of growth, transgenic Seedlings have developed nar application might exist in Saffron, one of the world's most row, upward pointing leaves with long petioles, possibly indi expensive spices. Saffron filaments, or threads, are actually cating a disruption in circadian-clock controlled processes or the dried Stigmas of the Saffron flower, Crocus sativus Lin nyctinastic movements. Other transcription factor genes can neaus. Each flower contains only three Stigmas, and more be used to alter leaf shape in a significant manner from wild than 75,000 of these flowers are needed to produce just one type, Some of which may find use in ornamental applications. pound of saffron filaments. An increase in carpel number (0306 Leaf morphology: altered leaf size. Large leaves, would increase the quantity of Stigmatic tissue and improve Such as those produced in plants overexpressing G268, G324. yield. G438, G852, G1113, G1274, G1451, G2536, G2699, G2768, 0302 Growth rate and development: cell expansion. Plant and G3008, generally increase plant biomass. This provides growth results from a combination of cell division and cell benefit for crops where the vegetative portion of the plant is expansion. Transcription factors may be useful in regulation the marketable portion. of cell expansion. Altered regulation of cell expansion (for 0307 Leaf morphology: light green and gray leaves. Tran example, by G521) could affect stem length, an important scription factor genes such as G351, G600, G651, G1468, agronomic characteristic. For instance, short cultivars of G1718, G2565, G2604, G2779, G2859, G3044, and G3070 wheat contributed to the Green Revolution, because plants that provide an altered appearance may positively affect a that put fewer resources into stem elongation allocate more plant's value to the ornamental horticulture industry. resources into developing seed and produce higher yield. 0308 Leaf morphology: glossy leaves. Transcription fac These plants are also less Vulnerable to wind and rain damage. tor genes such as G30, G370 (when knocked-out), G975, These cultivars were found to be altered in their sensitivity to G1792. G2640, G2649 and their equivalogs that induce the gibberellins, hormones that regulate stem elongation through formation of glossy leaves generally do so by elevating levels control of both cell expansion and cell division. Altered cell of epidermal wax. Thus, the genes could be used to engineer expansion in leaves could also produce novel and ornamental changes in the composition and amount of leaf Surface com plant forms. ponents, including waxes. The ability to manipulate wax 0303 Leaf morphology: dark leaves. Color-affecting composition, amount, or distribution could modify plant tol components in leaves include chlorophylls (generally green), erance to drought and low humidity, or resistance to insects or anthocyanins (generally red to blue) and carotenoids (gener pathogens. Additionally, wax may be a valuable commodity ally yellow to red). Transcription factor genes that increase in some species, and altering its accumulation and/or compo these pigments in leaves, including G30, G253, G309, G707, sition could enhance yield. G811, G957, G1100, G1327, G1341, G1357, G1389, G1420, 0309. Seed morphology: altered seed coloration. Pres g1423, G1452, G1482, G1510, G1535, G1549, G1554, ently disclosed transcription factor genes, including G581, G1593, G1743, G1792, G1796, G1846, G1863, G1932, G961, G2085, and G2371, have been used to modify seed G1938, G1983, G2085, G2146, G2207, G.2226, G2251, color, which, along with the equivalogs of these genes, could G2334, G2371, G2372, G2453, G2456, G2457, G2459, provide added appeal to seeds or seed products. G2550, G2640, G2649, G2661, G2690, G2694, G2771, 0310 Seed morphology: altered seed size and shape. The G2763, G2784, G2837, G2838, G2846, G2847, G2850, introduction of presently disclosed transcription factor genes, G2851, G2958, G2993, G3021, G3059, G3091, G3095, and including G151, G581, G2085, G2585, or G2933, into plants G3111, may positively affect a plant's value to the ornamental that increase the size of seeds may have a significant impact horticulture industry. Variegated varieties, in particular, on yield and appearance, particularly when the product is the would show improved contrast. Other uses that result from seed itself (e.g., in the case of grains, legumes, nuts, etc.). overexpression of transcription factor genes include improve Seed size, in addition to seed coat integrity, thickness and ments in the nutritional value of foodstuffs. For example, permeability, seed water content and a number of other com US 2011/007880.6 A1 Mar. 31, 2011

ponents including antioxidants and oligosaccharides, also Thus dietary fatty acid ratios altered in seeds may affect the affects affect seed longevity in storage, with larger seeds etiology and outcome of bone loss. often being more desirable for prolonged storage. 0316 Transcription factors that reduce leaf fatty acids, for 0311 Transcription factor genes that alter seed shape, example, 16:3 fatty acids, may be used to control thylakoid including G652, G916, G961 and their equivalogs may have membrane development, including proplastid to chloroplast both ornamental applications and improve or broaden the development. The genes that encode these transcription fac appeal of seed products. tors might thus be useful for controlling the transition from 0312 Leaf and seed biochemistry. Overexpression of tran proplastid to chromoplast in fruits and vegetables. It may also Scription factors genes, including G975 and its equivalogs, be desirable to change the expression of these genes to pre which results in increased leaf wax could be used to manipu vent cotyledon greening in Brassica napus or B. campestris to late wax composition, amount, or distribution. These tran Scription factors can improve yield in those plants and crops avoid green oil due to early frost. from which wax is a valuable product. The genes may also be 0317 Transcription factor genes that increase leaf fatty used to modify plant tolerance to drought and/or low humid acid production, including G975 and its equivalogs could ity or resistance to insects, as well as plant appearance (glossy potentially be used to manipulate seed composition, which is leaves). The effect of increased wax deposition on leaves of a very important for the nutritional value and production of plant like may improve water use efficiency. Manipulation of various food products. A number of transcription factor genes these genes may reduce the wax coating on Sunflower seeds; are involved in mediating an aspect of the regulatory response this wax fouls the oil extraction system during Sunflower seed to temperature. These genes may be used to alter the expres processing for oil. For the latter purpose or any other where sion of desaturases that lead to production of 18:3 and 16:3 wax reduction is valuable, antisense or co-suppression of the fatty acids, the balance of which affects membrane fluidity transcription factor genes in a tissue-specific manner would and mitigates damage to cell membranes and photosynthetic be valuable. structures at high and low temperatures. 0313 Prenyl lipids play a role in anchoring proteins in membranes or membranous organelles. Thus modifying the 0318. The G652 knockoutline had a reproducible increase prenyl lipid content of seeds and leaves could affect mem in the leaf glucosinolate M39480. It also showed a reproduc brane integrity and function. One important group of prenyl ible increase in seed alpha-tocopherol. A number of glucosi lipids, the tocopherols, have both anti-oxidant and vitamin E nolates have been shown to have anti-cancer activity; thus, activity. Transcription factor genes (e.g., a G652 knockout) increasing the levels or composition of these compounds by have been shown to modify the prenyl lipid content of leaves modifying the expression of transcription factors (e.g., in plants, and these genes and their equivalogs may thus be G652), can have a beneficial effect on human diet. used to alter prenyl lipid content of leaves. 0319 Glucosinolates are undesirable components of the 0314. Overexpression of transcription factors have oilseeds used in animal feed since they produce toxic effects. resulted in plants with altered leaf insoluble sugar content. Low-glucosinolate varieties of canola, for example, have These transcription factors and their equivalogs that alter been developed to combat this problem. Glucosinolates form plant cell wall composition have several potential applica part of a plant's natural defense against insects. Modification tions including altering food digestibility, plant tensile of glucosinolate composition or quantity by introducing tran strength, wood quality, pathogen resistance and in pulp pro scription factors that affect these characteristics can therefore duction. In particular, hemicellulose is not desirable in paper afford increased protection from herbivores. Furthermore, in pulps because of its lack of strength compared with cellulose. edible crops, tissue specific promoters can be used to ensure Thus modulating the amounts of cellulose vs. hemicellulose that these compounds accumulate specifically in tissues. Such in the plant cell wall is desirable for the paper/lumber indus as the epidermis, which are not taken for consumption. try. Increasing the insoluble carbohydrate content in various 0320 Presently disclosed transcription factor genes that fruits, vegetables, and other edible consumer products will modify levels of phytosterols in plants may have at least two result in enhanced fiber content. Increased fiber content utilities. First, phytosterols are an important Source of precur would not only provide health benefits in food products, but sors for the manufacture of human Steroid hormones. Thus, might also increase digestibility of forage crops. In addition, regulation of transcription factor expression or activity could the hemicellulose and pectin content of fruits and berries lead to elevated levels of important human steroid precursors affects the quality of jam and catsup made from them. for steroid semi-synthesis. For example, transcription factors Changes in hemicellulose and pectin content could result in a that cause elevated levels of campesterol in leaves, or sito Superior consumer product. sterols and stigmasterols in seed crops, would be useful for 0315. A number of the presently disclosed transcription this purpose. Phytosterols and their hydrogenated derivatives factor genes have been shown to alter the fatty acid compo phytostanols also have proven cholesterol-lowering proper sition in plants (e.g., G975), and seeds and leaves in particu ties, and transcription factor genes that modify the expression lar. This modification Suggests several utilities, including of these compounds in plants would thus provide health ben improving the nutritional value of seeds or whole plants. efits. Dietary fatty acids ratios have been shown to have an effect 0321. The composition of seeds, particularly with respect on, for example, bone integrity and remodeling (see, for to seed oil amounts and/or composition, is very important for example, Weiler (2000) Pediatr. Res.47:5692-697). The ratio the nutritional and caloric value and production of various of dietary fatty acids may alter the precursor pools of long food and feed products. Modifying the expression of tran chain polyunsaturated fatty acids that serve as precursors for Scription factor genes that alter seed oil content could be used prostaglandin synthesis. In mammalian connective tissue, to improve the heat stability of oils or to improve the nutri prostaglandins serve as important signals regulating the bal tional quality of seed oil, by, for example, reducing the num ance between resorption and formation in bone and cartilage. ber of calories in seed by decreasing oil or fatty acid content, US 2011/007880.6 A1 Mar. 31, 2011 42

OR increasing the number of calories in animal feeds by tion nucleic acids, the nucleic acids are also useful for sense increasing fatty acid or seed oil content (e.g., by knocking out and anti-sense Suppression of expression, e.g. to down-regu G961, G1451, or G2830). late expression of a nucleic acid of the invention, e.g. as a 0322. As with seed oils, the composition of seeds, particu further mechanism for modulating plant phenotype. That is, larly with respect to protein amounts and/or composition, is the nucleic acids of the invention, or Subsequences or anti very important for the nutritional value and production of sense sequences thereof, can be used to block expression of various food and feed products. Transcription factor genes naturally occurring homologous nucleic acids. A variety of may be used to modify protein concentrations in seeds, which sense and anti-sense technologies are known in the art, e.g. as would modify the caloric content of seeds or provide nutri set forth in Lichtenstein and Nellen (1997) Antisense Tech tional benefits, and may be used to prolong storage, increase nology: A Practical Approach IRL Press at Oxford University seed pest or disease resistance, or modify germination rates. Press, Oxford, U.K. Antisense regulation is also described in 0323 Prenyl lipids play a role in anchoring proteins in Crowley et al. (1985) Cell 43: 633-641; Rosenberg et al. membranes or membranous organelles. Thus, presently dis (1985) Nature 313: 703-706; Preiss et al. (1985) Nature 313: closed transcription factor genes, including G652 and equiva 27-32: Melton (1985) Proc. Natl. Acad. Sci. 82: 144-148: logs, that modify the prenyl lipid content of seeds and leaves Izant and Weintraub (1985) Science 229: 345-352; and Kim (in the case of G652, when this gene is knocked out) could and Wold (1985) Cell 42: 129-138. Additional methods for affect membrane integrity and function. Transcription factor antisense regulation are known in the art. Antisense regula genes have been shown to modify the tocopherol composition tion has been used to reduce or inhibit expression of plant of plants. O-Tocopherol is better known as vitamin E. Toco genes in, for example in European Patent Publication No. pherols such as C- and Y-tocopherol both have anti-oxidant 271988. Antisense RNA may be used to reduce gene expres activity. sion to produce a visible or biochemical phenotypic change in 0324 Light response/shade avoidance: altered cotyledon, a plant (Smith et al. (1988) Nature, 334: 724-726; Smith et al. hypocotyl, petiole development, altered leaf orientation, con (1990) Plant Mol. Biol. 14: 369-379). In general, sense or stitutive photomorphogenesis, photomorphogenesis in low anti-sense sequences are introduced into a cell, where they are light. Presently disclosed transcription factor genes, includ optionally amplified, e.g. by transcription. Such sequences ing G30; G246; G351, G478, G807, G916, G1013, G1082, include both simple oligonucleotide sequences and catalytic G1510, G1988, G2432: G2650; G2694, G2754, G2771, sequences such as ribozymes. G2859, G2884, G2993, G3032 and their equivalogs that can 0328. For example, a reduction or elimination of expres modify a plant's response to light may be useful for modify sion (i.e., a “knock-out”) of a transcription factor or transcrip ing plant growth or development, for example, photomorpho tion factor homolog polypeptide in a transgenic plant, e.g., to genesis in poor light, or accelerating flowering time in modify a plant trait, can be obtained by introducing an anti response to various light intensities, quality or duration to sense construct corresponding to the polypeptide of interest which a non-transformed plant would not similarly respond. as a cDNA. Forantisense Suppression, the transcription factor Examples of Such responses that have been demonstrated or homolog cDNA is arranged in reverse orientation (with include leaf number and arrangement, and early flower bud respect to the coding sequence) relative to the promoter appearances. Elimination of shading responses may lead to sequence in the expression vector. The introduced sequence increased planting densities with Subsequent yield enhance need not be the full length cDNA or gene, and need not be ment. As these genes may also alter plant architecture, they identical to the cDNA or gene found in the plant type to be may find use in the ornamental horticulture industry. transformed. Typically, the antisense sequence need only be 0325 Pigment: Increased Anthocyanin Level in Various capable of hybridizing to the target gene or RNA of interest. Plant Organs and Tissues. Thus, where the introduced sequence is of shorter length, a 0326 G253, G386, G581, G707, G1482, G2453, G2456, higher degree of homology to the endogenous transcription G2459, G2604, G2718 and equivalogs can be used to alter anthocyanin levels in one or more tissues, depending on the factor sequence will be needed for effective antisense Sup organ in which these genes are expressed may be used to alter pression. While antisense sequences of various lengths can be anthocyanin production in numerous plant species. Expres utilized, preferably, the introduced antisense sequence in the sion of presently disclosed transcription factor genes that vector will be at least 30 nucleotides in length, and improved increase flavonoid production in plants, including anthocya antisense Suppression will typically be observed as the length nins and condensed tannins, may be used to alter in pigment of the antisense sequence increases. Preferably, the length of production for horticultural purposes, and possibly increas the antisense sequence in the vector will be greater than 100 ing stress resistance. A number of flavonoids have been nucleotides. Transcription of an antisense construct as shown to have antimicrobial activity and could be used to described results in the production of RNA molecules that are engineer pathogen resistance. Several flavonoid compounds the reverse complement of mRNA molecules transcribed have health promoting effects such as inhibition of tumor from the endogenous transcription factor gene in the plant growth, prevention of bone loss and prevention of the oxida cell. tion of lipids. Increased levels of condensed tannins, in forage 0329 Suppression of endogenous transcription factor legumes would be an important agronomic trait because they gene expression can also be achieved using RNA interfer prevent pasture bloat by collapsing protein foams within the ence, or RNAi. RNAi is a post-transcriptional, targeted gene rumen. For a review on the utilities of flavonoids and their silencing technique that uses double-stranded RNA (dsRNA) derivatives, refer to Dixon et al. (1999) Trends Plant Sci. 4: to incite degradation of messenger RNA (mRNA) containing 394-400. the same sequence as the dsRNA (Constans, (2002) The Sci entist 16:36). Small interfering RNAs, or siRNAs are pro Antisense and Co-Suppression duced in at least two steps: an endogenous ribonuclease 0327. In addition to expression of the nucleic acids of the cleaves longer dsRNA into shorter, 21-23 nucleotide-long invention as gene replacement or plant phenotype modifica RNAs. The siRNA segments then mediate the degradation of US 2011/007880.6 A1 Mar. 31, 2011

the target mRNA (Zamore, (2001) Nature Struct. Biol., expression cassette by manipulating the activity or expression 8:746–50). RNAi has been used for gene function determina level of the endogenous gene by other means, such as, for tion in a manner similar to antisense oligonucleotides (Con example, by ectopically expressing a gene by T-DNA activa stans, (2002) The Scientist 16:36). Expression vectors that tion tagging (Ichikawa et al. (1997) Nature 390 698-701: continually express siRNAS in transiently and stably trans Kakimoto et al. (1996) Science 274: 982-985). This method fected have been engineered to express small hairpin RNAs entails transforming a plant with a gene tag containing mul (shRNAs), which get processed in vivo into siRNAs-like tiple transcriptional enhancers and once the tag has inserted molecules capable of carrying out gene-specific silencing into the genome, expression of a flanking gene coding (Brummelkamp et al., (2002) Science 296:550-553, and Pad sequence becomes deregulated. In another example, the tran dison, et al. (2002) Genes & Dev. 16:948-958). Post-tran Scriptional machinery in a plant can be modified so as to Scriptional gene silencing by double-stranded RNA is dis increase transcription levels of a polynucleotide of the inven cussed in further detail by Hammond et al. (2001) Nature Rev tion (See, e.g., PCT Publications WO 96/06166 and WO Gen 2: 110-119, Fire et al. (1998) Nature 391:806-811 and 98.753057 which describe the modification of the DNA-bind Timmons and Fire (1998) Nature 395: 854. Vectors in which ing specificity of Zinc finger proteins by changing particular RNA encoded by a transcription factor or transcription factor amino acids in the DNA-binding motif). homolog cDNA is over-expressed can also be used to obtain 0334. The transgenic plant can also include the machinery co-Suppression of a corresponding endogenous gene, e.g., in necessary for expressing or altering the activity of a polypep the manner described in U.S. Pat. No. 5,231,020 to Jorgensen. tide encoded by an endogenous gene, for example, by altering Such co-suppression (also termed sense Suppression) does the phosphorylation state of the polypeptide to maintain it in not require that the entire transcription factor cDNA be intro an activated State. duced into the plant cells, nor does it require that the intro 0335 Transgenic plants (or plant cells, or plant explants, duced sequence be exactly identical to the endogenous tran or plant tissues) incorporating the polynucleotides of the Scription factor gene of interest. However, as with antisense invention and/or expressing the polypeptides of the invention Suppression, the Suppressive efficiency will be enhanced as can be produced by a variety of well established techniques as specificity of hybridization is increased, e.g., as the intro described above. Following construction of a vector, most duced sequence is lengthened, and/or as the sequence simi typically an expression cassette, including a polynucleotide, larity between the introduced sequence and the endogenous e.g., encoding a transcription factor or transcription factor transcription factor gene is increased. homolog, of the invention, standard techniques can be used to 0330 Vectors expressing an untranslatable form of the introduce the polynucleotide into a plant, a plant cell, a plant transcription factor mRNA, e.g., sequences comprising one explant or a plant tissue of interest. Optionally, the plant cell, or more stop codon, or nonsense mutation) can also be used to explant or tissue can be regenerated to produce a transgenic Suppress expression of an endogenous transcription factor, plant. thereby reducing or eliminating its activity and modifying 0336. The plant can be any higher plant, including gym one or more traits. Methods for producing Such constructs are nosperms, monocotyledonous and dicotyledenous plants. described in U.S. Pat. No. 5,583,021. Preferably, such con Suitable protocols are available for Leguminosae (alfalfa, structs are made by introducing a premature stop codon into Soybean, clover, etc.), Umbelliferae (carrot, celery, parsnip), the transcription factor gene. Alternatively, a plant trait can be Cruciferae (cabbage, radish, rapeseed, broccoli, etc.), Cur modified by gene silencing using double-strand RNA (Sharp curbitaceae (melons and cucumber), Gramineae (wheat, (1999) Genes and Development 13: 139-141). Another corn, rice, barley, millet, etc.), Solanaceae (potato, tomato, method for abolishing the expression of a gene is by insertion tobacco, peppers, etc.), and various other crops. See protocols mutagenesis using the T-DNA of Agrobacterium tumefa described in Ammirato et al., eds., (1984) Handbook of Plant ciens. After generating the insertion mutants, the mutants can Cell Culture—CropSpecies, Macmillan Publ. Co., New York, be screened to identify those containing the insertion in a N.Y.; Shimamoto et al. (1989) Nature 338: 274–276: Fromm. transcription factor or transcription factor homolog gene. et al. (1990) Bio/Technol. 8: 833-839; and Vasil et al. (1990) Plants containing a single transgene insertion event at the Bio/Technol. 8: 429-434. desired gene can be crossed to generate homozygous plants 0337 Transformation and regeneration of both monocoty for the mutation. Such methods are well known to those of ledonous and dicotyledonous plant cells is now routine, and skill in the art (See for example Koncz et al. (1992) Methods the selection of the most appropriate transformation tech in Arabidopsis Research, World Scientific Publishing Co. Pte. nique will be determined by the practitioner. The choice of Ltd., River Edge, N.J.). method will vary with the type of plant to be transformed: 0331 Alternatively, a plant phenotype can be altered by those skilled in the art will recognize the suitability of par eliminating an endogenous gene. Such as a transcription fac ticular methods for given plant types. Suitable methods can tor or transcription factor homolog, e.g., by homologous include, but are not limited to: electroporation of plant pro recombination (Kempin et al. (1997) Nature 389: 802-803). toplasts; liposome-mediated transformation; polyethylene 0332 A plant trait can also be modified by using the Cre glycol (PEG) mediated transformation; transformation using lox system (for example, as described in U.S. Pat. No. 5,658, viruses; micro-injection of plant cells; micro-projectile bom 772). A plant genome can be modified to include first and bardment of plant cells; vacuum infiltration; and Agrobacte second lox sites that are then contacted with a Cre recombi rium tumefaciens mediated transformation. Transformation nase. If the loX sites are in the same orientation, the interven means introducing a nucleotide sequence into a plant in a ing DNA sequence between the two sites is excised. If the lox manner to cause stable or transient expression of the sites are in the opposite orientation, the intervening sequence Sequence. is inverted. 0338. Successful examples of the modification of plant 0333. The polynucleotides and polypeptides of this inven characteristics by transformation with cloned sequences tion can also be expressed in a plant in the absence of an which serve to illustrate the current knowledge in this field of US 2011/007880.6 A1 Mar. 31, 2011 44 technology, and which are herein incorporated by reference, 0345 One example algorithm that is suitable for determin include: U.S. Pat. Nos. 5,571,706; 5,677,175: 5,510,471; ing percent sequence identity and sequence similarity is the 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268,526; BLAST algorithm, which is described in Altschul et al. 5,780,708: 5,538,880; 5,773,269; 5,736,369 and 5,610,042. (1990).J. Mol. Biol. 215: 403-410. Software for performing 0339. Following transformation, plants are preferably BLAST analyses is publicly available, e.g., through the selected using a dominant selectable marker incorporated National Library of Medicine's National Center for Biotech into the transformation vector. Typically, such a marker will nology Information (ncbi.nlm.nih; see at world wide web confer antibiotic or herbicide resistance on the transformed (www) National Institutes of Health US government (gov) plants, and selection of transformants can be accomplished by website). This algorithm involves first identifying high scor exposing the plants to appropriate concentrations of the anti ing sequence pairs (HSPs) by identifying short words of biotic or herbicide. length W in the query sequence, which either match or satisfy 0340. After transformed plants are selected and grown to some positive-valued threshold score T when aligned with a maturity, those plants showing a modified trait are identified. word of the same length in a database sequence. T is referred The modified trait can be any of those traits described above. to as the neighborhood word score threshold (Altschul et al. Additionally, to confirm that the modified trait is due to supra). These initial neighborhood word hits act as seeds for changes in expression levels or activity of the polypeptide or initiating searches to find longer HSPs containing them. The polynucleotide of the invention can be determined by analyz word hits are then extended in both directions along each ing mRNA expression using Northern blots, RT-PCR or sequence for as far as the cumulative alignment score can be microarrays, or protein expression using immunoblots or increased. Cumulative scores are calculated using, for nucle Western blots or gel shift assays. otide sequences, the parameters M (reward score for a pair of matching residues; always >0) and N (penalty score for mis Integrated Systems—Sequence Identity matching residues; always <0). For amino acid sequences, a 0341. Additionally, the present invention may be an inte scoring matrix is used to calculate the cumulative score. grated system, computer or computer readable medium that Extension of the word hits in each direction are halted when: comprises an instruction set for determining the identity of the cumulative alignment score falls off by the quantity X one or more sequences in a database. In addition, the instruc from its maximum achieved value; the cumulative score goes tion set can be used to generate or identify sequences that to zero or below, due to the accumulation of one or more meet any specified criteria. Furthermore, the instruction set negative-scoring residue alignments; or the end of either may be used to associate or link certain functional benefits, sequence is reached. The BLAST algorithm parameters W.T. such improved characteristics, with one or more identified and X determine the sensitivity and speed of the alignment. Sequence. The BLASTN program (for nucleotide sequences) uses as 0342 For example, the instruction set can include, e.g., a defaults a wordlength (W) of 11, an expectation (E) of 10, a sequence comparison or other alignment program, e.g., an cutoff of 100, M-5, N=-4, and a comparison of both strands. available program Such as, for example, the Wisconsin Pack For amino acid sequences, the BLASTP program uses as age Version 10.0, such as BLAST, FASTA, PILEUP. FIND defaults a wordlength (W) of 3, an expectation (E) of 10, and PATTERNS or the like (GCG, Madison, Wis.). Public the BLOSUM62 scoring matrix (see Henikoff and Henikoff sequence databases such as GenBank, EMBL, Swiss-Prot (1992) Proc. Natl. Acad. Sci. 89: 10915-10919). Unless oth and PIR or private sequence databases such as PHYTOSEQ erwise indicated, “sequence identity” here refers to the % sequence database (Incyte Genomics, Palo Alto, Calif.) can sequence identity generated from a thiastx using the NCBI be searched. version of the algorithm at the default settings using gapped 0343 Alignment of sequences for comparison can be con alignments with the filter “off” (see, for example, NIHNLM ducted by the local homology algorithm of Smith and Water NCBI website at ncbi.nlm.nih, supra). man (1981) Adv. Appl. Math. 2: 482-489, by the homology 0346. The percent identity between two polypeptide alignment algorithm of Needleman and Wunsch (1970) J. sequences can also be determined using Accelrys Gene V2.5 Mol. Biol. 48: 443-453, by the search for similarity method of (2006) with default parameters: Pairwise Matrix: GONNET: Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85: 2444 Align Speed: Slow; Open Gap Penalty: 10.000; Extended 2448, by computerized implementations of these algorithms. Gap Penalty: 0.100; Multiple Matrix: GONNET: Multiple After alignment, sequence comparisons between two (or Open Gap Penalty: 10.000; Multiple Extended Gap Penalty: more) polynucleotides or polypeptides are typically per 0.05; Delay Divergent: 30; Gap Separation Distance: 8: End formed by comparing sequences of the two sequences over a Gap Separation: false; Residue Specific Penalties: false; comparison window to identify and compare local regions of Hydrophilic Penalties: false; Hydrophilic Residues: G. P. S. sequence similarity. The comparison window can be a seg N. D. Q., E, K, and R. The default parameters for determining ment of at least about 20 contiguous positions, usually about percent identity between two polynucleotide sequences using 50 to about 200, more usually about 100 to about 150 con Accelrys Gene are: Align Speed: Slow; Open Gap Penalty: tiguous positions. A description of the method is provided in 10.000; Extended Gap Penalty: 5.000; Multiple Open Gap Ausubel et al. Supra. Penalty: 10.000; Multiple Extended Gap Penalty: 5.000; 0344) A variety of methods for determining sequence rela Delay Divergent: 40: Transition: Weighted. tionships can be used, including manual alignment and com 0347 In addition to calculating percent sequence identity, puter assisted sequence alignment and analysis. This later the BLAST algorithm also performs a statistical analysis of approach is a preferred approach in the present invention, due the similarity between two sequences (see, e.g. Karlin and to the increased throughput afforded by computer assisted Altschul (1993) Proc. Natl. Acad. Sci. 90: 5873-5787). One methods. As noted above, a variety of computer programs for measure of similarity provided by the BLAST algorithm is performing sequence alignment are available, or can be pro the smallest sum probability (P(N)), which provides an indi duced by one of skill. cation of the probability by which a match between two US 2011/007880.6 A1 Mar. 31, 2011

nucleotide or amino acid sequences would occur by chance. base. The control sequences can be detected by the query to For example, a nucleic acid is considered similar to a refer ensure the general integrity of both the database and the ence sequence (and, therefore, in this context, homologous) if query. As noted, the query can be performed using a web the Smallest Sum probability in a comparison of the test browser based interface. For example, the database can be a nucleic acid to the reference nucleic acid is less than about centralized public database Such as those noted herein, and 0.1, or less than about 0.01, and or even less than about 0.001. the querying can be done from a remote terminal or computer An additional example of a useful sequence alignment algo across an internet or intranet. 0352 Any sequence herein can be used to identify a simi rithm is PILEUP, PILEUP creates a multiple sequence align lar, homologous, paralogous, or orthologous sequence in ment from a group of related sequences using progressive, another plant. This provides means for identifying endog pairwise alignments. The program can align, e.g., up to 300 enous sequences in other plants that may be useful to alter a sequences of a maximum length of 5,000 letters. trait of progeny plants, which results from crossing two plants 0348. The integrated system, or computer typically of different strain. For example, sequences that encode an includes a user input interface allowing a user to selectively ortholog of any of the sequences herein that naturally occur in view one or more sequence records corresponding to the one a plant with a desired trait can be identified using the or more character strings, as well as an instruction set which sequences disclosed herein. The plant is then crossed with a aligns the one or more character Strings with each other or second plant of the same species but which does not have the with an additional character string to identify one or more desired trait to produce progeny which can then be used in region of sequence similarity. The system may include a link further crossing experiments to produce the desired trait in the of one or more character strings with a particular phenotype second plant. Therefore the resulting progeny plant contains or gene function. Typically, the system includes a user read no transgenes; expression of the endogenous sequence may able output element that displays an alignment produced by also be regulated by treatment with a particular chemical or the alignment instruction set. other means, such as EMR. Some examples of Such com 0349 The methods of this invention can be implemented pounds well known in the art include: ethylene; cytokinins; in a localized or distributed computing environment. In a phenolic compounds, which stimulate the transcription of the distributed environment, the methods may implemented on a genes needed for infection; specific monosaccharides and single computer comprising multiple processors or on a mul acidic environments which potentiate vir gene induction; tiplicity of computers. The computers can be linked, e.g. acidic polysaccharides which induce one or more chromo through a common bus, but more preferably the computer(s) Somal genes; and opines; other mechanisms include light or are nodes on a network. The network can be a generalized or dark treatment (for a review of examples of such treatments, a dedicated local or wide-area network and, in certain pre see, Winans (1992) Microbiol. Rev. 56: 12-31; Eyalet al. ferred embodiments, the computers may be components of an (1992) Plant Mol. Biol. 19:589-599; Chrispeels et al. (2000) intra-net or an internet. Plant Mol. Biol. 42: 279-290; Piazza et al. (2002) Plant 0350 Thus, the invention provides methods for identify Physiol. 128: 1077-1086). ing a sequence similar or homologous to one or more poly 0353 Table 7 lists sequences discovered to be orthologous nucleotides as noted herein, or one or more target polypep to a number of representative transcription factors of the tides encoded by the polynucleotides, or otherwise noted present invention. The column headings include the transcrip herein and may include linking or associating a given plant tion factors listed by (a) the SEQID NO: of the Arabidopsis phenotype or gene function with a sequence. In the methods, sequence that was used to discover the non-Arabidopsis a sequence database is provided (locally or across an inter or orthologous sequence; (b) the GID sequence identifier of the intra net) and a query is made against the sequence database Arabidopsis sequence; (c) the Sequence Identifier or Gen using the relevant sequences herein and associated plant phe Bank Accession Number of the orthologous sequence; (d) the notypes or gene functions. species from which the orthologous sequence is derived; (e) 0351. Any sequence herein can be entered into the data the SEQ ID NO: of the non-Arabidopsis orthologous base, before or after querying the database. This provides for sequence, and (e) the Smallest Sum probability pairwise com both expansion of the database and, if done before the que parison of each orthologous sequence to the similar Arabi rying step, for insertion of control sequences into the data dopsis sequence determined by BLAST analysis.

TABLE 7 Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 7 G30 Oryza sativa G3381 2126 S.OOE-33 7 G30 Glycine max AW308784.1 685 7 G30 Glycine max BG790680.1 686 7 G30 Glycine max GLYMA-28NOVO1- 687 CLUSTER602185 1 7 G30 Glycine max GLYMA-28NOVO1- 688 CLUSTER91218 1 7 G30 Glycine max LIB5118-009-Q1-PF1-F2 689 7 G30 Oryza sativa OSC20174.C1.p2.fg 690 7 G30 Zea mays LIB4756-134-A1-K1-G10 691 US 2011/007880.6 A1 Mar. 31, 2011 46

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known G30 Oryza sativa Os S102414 1559 G30 Glycine max Gma S5001644 1633 G30 Zea mays Zm S11513768 1754 G30 Trictim aestivum Ta S274849 1834 G30 Brassica oleracea BHS17030 1.OO 3 7 G30 Lycopersicon AI776626 2.OO 3 5 escientain G30 Trictim aestivum BTOO9060 2.OO -33 G30 Sorghum bicolor BZ337899 1.OO -32 G30 Eucalyptus grandis CB967722 1.OO -31 G30 Zea mays CC34.96SS 1.OO -31 G30 Oryza sativa APOO4623 3.00 -31 (japonica cultivar group) G30 Oryza sativa (indica AAAAO1OOS323 3.00 E-31 cultivar-group) G30 Oryza sativa APOO3891 3.00 -31 G30 Glycine max BG79068O 4.OO -29 G30 Oryza sativa gi28071302 3.60 -32 (japonica cultivar group) G30 Lycopersicon gi2213783 7.90 -26 escientain G30 Catharanthus gi898.0313 4.70 -24 iOSetiS G30 Matricaria gi17385,636 1.10 -23 chamomilia G30 Oryza sativa gi12597874 18O -23 s G30 Mesembryanthemum gi324O1273 3.70 -23 crystallinum G30 Nicotiana tabacum gi1732406 S.2O -23 G30 Nicotiana Sylvestris gi8809571 8.70 -22 G30 Cicer arietintin gi24817250 1.10 -21 G30 Glycine max gi21304712 140 -21 G47 Glycine max G3643 2225 2.OO -29 G47 Oryza sativa G3644 2227 3.00 -25 G47 Brassica rapa G3645 2229 1.OO -63 G47 Brassica oleracea G3646 2231 2.OO -46 G47 Zinnia elegans G3647 2233 3.00 -33 G47 Oryza sativa G3649 2235 4.OO -23 G47 Oryza sativa G3651 2237 3.00 -20 G47 Glycine max GLYMA-28NOVO1 702 CLUSTER115749 1 Oryza sativa OSC21268.C1-p12.fg 703 Hordeum vulgare Hv S7318 1718 Brassica rapa BGS43936 2.OO -60 Subsp. pekinensis Brassica oleracea BH420519 4.OO -43 : Zinnia elegans AU2926O3 S.OO -30 Medicago BE32O193 2.OO -24 truncatula Oryza sativa (indica 2.OO -22 cultivar-group) Oryza sativa APOO3379 2.OO -22 : Oryza sativa AC124836 1.OO -20 (japonica cultivar group) Zea mays BZAO3609 2.OO -20 Soianum tuberosum BQ513932 7.OO -17 Pinus taeda BQ698717 1.OO -16 Oryza sativa gi2O161239 8.50 -24 (japonica cultivar group) 2 Oryza sativa gi14140155 8.30 -17 Lycopersicon gi25992102 2.8O -16 escientain Glycine max gi31324058 2.8O -16 : Zea mays gi21908.034 8.60 -15 Brassica naptis gi20303011 2.30 -14 US 2011/007880.6 A1 Mar. 31, 2011 47

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 12 Atriplex hortensis gi857.1476 3.70 -14 12 Catharanthus gi898.0313 2.60 -13 iOSetiS 12 Hordeum vulgare gi1907 1243 S.40 -13 12 Matricaria gi17385636 140 -12 chamomilia 34 42 Brassica oleracea BOLSO8409 1.OO -127 var. botrytis 34 42 Vitis vinifera 1.OO -88 34 42 Maius domestica MIDA763 3.00 -84 34 42 Petunia X hybrida ABO31O3S 2.OO -77 34 42 Agapanthus praecox ABO792.61 1.OO -76 34 42 Chrysanthemum X AY173062 8.OO -75 morifolium 34 42 Oryza sativa OSUT8782 6.OO -74 34 42 Oryza sativa AKO69103 6.OO -74 (japonica cultivar group) 34 42 Zea mays MZEMADSB 3.00 -73 34 42 Trictim aestivum ABOO7SOS 3.00 -72 34 42 Brassica oleracea gi23304710 6.50 -120 var. botrytis 34 42 Vitis vinifera gi20385586 3.30 -86 34 42 Maius domestica gi3646340 1.2O -81 34 42 Petunia X hybrida gi7544096 1.60 -75 34 42 Agapanthus praecox gi29467050 1...SO -74 34 42 Oryza sativa gi2286109 2.2O -73 34 42 Chrysanthemum X gi27804371 4...SO -73 morifolium 34 42 Zea mays gi7446515 1...SO -72 34 42 Lolium perenne gi28630959 840 -72 34 42 Trictim aestivum gi368859 2.2O -71 39 48 Glycine max LYMA-28NOVO1 704 LUSTER24877 1 39 48 Glycine max LYMA-28NOVO1 705 LUSTER99362. 1 39 48 Oryza sativa RYSA-22ANO2 706 LUSTER865. 1 39 48 Oryza sativa 707 39 48 Zea mays 708 39 48 Zea mays EAMA-08NOVO1 709 TER914 1 39 48 Zea mays EA A-08NOVO1 710 TER914 14 39 48 Zea mays A-08NOVO1 711 STER914 2 39 48 Zea mays A-08NOVO1 712 LUSTER914 3 39 48 Oryza sativa Os S31752 1560 39 48 Oryza sativa OS S63871 1561 39 48 Oryza sativa Os S65486 1562 39 48 Zea mays Zm S11418374 1755 39 48 Zea mays Zm S11418375 1756 39 48 Trictim aestivum Ta S66204 1835 39 48 Lycopersicon SGN-UNIGENE-441.28 1943 escientain 39 48 Lycopersicon SGN-UNIGENE-SINGLET 1944 escientain 42436 40 48 Brassica oleracea BOLSO8409 3.OOE-74 var. botrytis 40 48 Maius domestica MIDA763 2.OOE-65 40 48 Vitis vinifera AF3736O2 3.OOE-64 40 48 Petunia X hybrida ABO31O3S 1.OOE-59 40 48 Chrysanthemum X AY173062 1.OOE-58 morifolium 40 48 Oryza sativa OSUT8782 3.OOE-57 40 48 Oryza sativa AKO69103 3.OOE-57 (japonica cultivar group) US 2011/007880.6 A1 Mar. 31, 2011 48

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Sma lest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 40 48 Trictim aestivum ABOO7SOS 1.OO -56 40 48 Lolium perenne AY198329 1.OO -SS 40 48 Poa annia AF372840 S.OO -SS 40 48 Brassica oleracea gi23304710 1.70 -73 var. botrytis 40 48 Maius domestica gi3646340 6.50 -65 40 48 Vitis vinifera gi20385586 740 -64 40 48 Petunia X hybrida gi7544096 7.10 -59 40 48 Chrysanthemum X gi27804371 2.2O -57 morifolium 40 48 Trictim aestivum gi3688591 3.SO -57 40 48 Oryza sativa gi2286109 4...SO -57 40 48 Lolium perenne gi28630959 S.2O -56 40 48 Poa annia gi1395.8339 840 -56 40 48 Agapanthus praecox gi29467050 9.70 -SS 43 53 Oryza sativa G3479 21.89 2.OO -59 43 53 Glycine max G3484 2191 3.00 -77 43 53 Glycine max G3485 21.93 9.OO -63 43 53 Zea mays G3487 2195 S.OO -63 43 53 Zea mays G3488 2197 2.OO -61 43 53 Zea mays G3489 2199 6.OO -66 43 53 Glycine max GLYMA-28NOVO1 713 CLUSTER393266 1 43 53 Glycine max GLYMA-28NOVO1 71.4 CLUSTER84992 1 43 53 Oryza sativa OSC19180.C1-p14.fg 715 43 53 Zea mays ZEAMA-08NOVO1 716 CLUSTER124 1 43 53 Zea mays ZEAMA-08NOVO1 717 CLUSTER226078. 2 43 53 Zea mays uC-ZmfMo17202h)1 718 43 53 Glycine max Gma SS139103 1634 43 53 Zea mays Zm S11418691 1757 43 53 Zea mays Zm S11433900 1758 43 53 Lycopersicon SGN-UNIGENE-SINGLET 1945 escientain 3629O3 43 53 Lycopersicon SGN-UNIGENE-SINGLET 1946 escientain 8562 44 53 Antirrhinum maius AMDEFEH12S 1.OO 44 53 Zea mays AF112149 8.OO 44 53 Oryza sativa AY177696 1.OO (japonica cultivar group) 44 53 Glycine max AW706936 S.OO -59 44 53 Medicago BQ164807 S.OO -59 truncatula 44 53 Lycopersicon AW21828O E-56 escientain 44 53 Soianum tuberosum BM405213 2.OO -SS 44 53 Medicago Saiiva MSU91964 6.OO -54 44 53 Trictim aestivum AX658813 3.00 -49 44 53 Mesembryanthemum BEO34403 3.00 crystallinum 44 53 Antirrhinum maius gi1816459 2.10 -66 44 53 Oryza sativa gi3O313677 2.90 -62 (japonica cultivar group) 44 Zea mays gi2961 1976 7.70 -62 44 Medicago Saiiva gi1928874 1.30 -52 44 Ipomoea batatas gi15081463 6.90 -45 44 Oryza sativa gi7592642 9.1O -43 44 Lolium perenne gi28630953 8.20 -42 44 Lolium tentienium gi4204232 1.70 -41 44 Trictim aestivum gi30721847 2.8O -41 44 Hordeum vulgare gi9367313 2.8O -41 66 Vicia faba VFPTF2 S.OO -99 66 Oryza sativa AKO69464 1.OO -80 (japonica cultivar group)

US 2011/007880.6 A1 Mar. 31, 2011 50

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 05 G485 Zea mays ZEAMA-08NOVO1- 814 CLUSTER719 1 05 G485 Zea mays ZEAMA-08NOVO1- 815 CLUSTER719 10 05 G485 Zea mays ZEAMA-08NOVO1- 816 CLUSTER719 2 05 G485 Zea mays ZEAMA-08NOVO1- 817 CLUSTER719 3 05 G485 Zea mays ZEAMA-08NOVO1- 818 CLUSTER719 4 05 G485 Zea mays ZEAMA-08NOVO1- 819 CLUSTER719 5 05 G485 Zea mays ZEAMA-08NOVO1- 82O CLUSTER90408. 1 05 G485 Zea mays ZEAMA-08NOVO1- 821 CLUSTER90408. 2 05 G485 Glycine max Gma S4904793 1641 05 G485 Hordeum vulgare Hv S138973 1725 05 G485 Hordeum vulgare Hv S17617 1726 05 G485 Zea mays Zm S11418173 1776 05 G485 Zea mays Zm S11434692 1777 05 G485 Zea mays Zm S11509886 1778 05 G485 Trictim aestivum Ta S198814 1846 05 G485 Trictim aestivum Ta S45374 1847 05 G485 Trictim aestivum Ta S50443 1848 05 G485 Trictim aestivum Tai S93629 1849 05 G485 Lycopersicon SGN-UNIGENE-46859 1980 escientain 05 G485 Lycopersicon SGN-UNIGENE-47447 1981 escientain O6 G485 Poncirus trifoliata CD574709 9.OOE-62 O6 G485 Solanum tuberosum BQ505706 4.OOE-60 O6 G485 Lactica Saiiva BQ996.905 2.OOE-58 O6 G485 Oryza sativa (indica AAAA01003638 3.OOE-57 cultivar-group) O6 G485 Oryza sativa APOOS193 3.OOE-57 (japonica cultivar group) O6 G485 Beta vulgaris BQ592365 9.OOE-57 O6 G485 Zea mays CD438068 9.OOE-57 O6 G485 Physcomitrella AX288144 3.OOE-56 patens O6 G485 Populus BU880488 1.OOE-SS balsamifera Subsp. trichocarpa O6 G485 Glycine max AXS84277 6.OOE-SS O6 G485 Oryza sativa gi30409461 4.6OE-48 (japonica cultivar group) O6 G485 Zea mays gi115840 9.5OE-48 O6 G485 Oryza sativa (indica gi30349365 1.1OE-39 cultivar-group) O6 G485 Oryza sativa gi15408794 1.6OE-38 O6 G485 Phaseolus gi2253.6010 2.90E-37 Coccinetis O6 G485 Gossypium gi28274147 6.3OE-3S barbadense O6 G485 Vernonia gi16902054 2.7OE-34 galamensis O6 G485 Glycine max gi16902050 1.2OE-33 O6 G485 Argemone mexicana gi16902056 1.1OE-32 O6 G485 Trictim aestivum gi16902058 2.90E-30 21 G627 Glycine max GLYMA-28NOVO1- 822 CLUSTER65192 1 21 G627 Glycine max GLYMA-28NOVO1- 823 CLUSTER65192 2 21 G627 Oryza sativa ORYSA-22ANO2- 824 CLUSTER495. 1 21 G627 Oryza sativa Os S65371 1575 US 2011/007880.6 A1 Mar. 31, 2011 51

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 21 G627 Medicago Mitr S54.55444 1695 truncatula 21 G627 Hordeum vulgare Hw S12327 1727 21 G627 Trictim aestivum Ta S329524 1850 21 G627 Lycopersicon SGN-UNIGENE-S8075 1982 escientain 22 G627 Populus tremuloides AF377868 3.OOE-60 22 G627 Eucalyptus globulus AF086642 1.OOE-59 Subsp. globulus 22 G627 Petuniax hybrida AF33S239 1.OOE-58 22 G627 Pimpinella AFO82531 1.OOE-58 brachycarpa 22 G627 Populus tremula x BU896825 3.OOE-58 Populus tremuloides 22 G627 Cardamine flexuosa AY257542 2.OOE-57 22 G627 Nicotiana tabacum NTTOB 3.OOE-57 22 G627 Sinapis alba SAU2S696 4.OOE-57 22 G627 Brassica rapa AY257541 7.OOE-57 Subsp. pekinensis 22 G627 Oryza sativa AF141.96S 3.OOE-SS 22 G627 Populus tremuloides gi31295609 1.OOE-59 22 G627 Eucalyptus globulus gi4322475 2.7OE-59 Subsp. globulus 22 G627 Pimpinella gi3493647 8.2OE-58 brachycarpa 22 G627 Petuniax hybrida gi13384056 1.OOE-57 22 G627 Sinapis alba gi1049022 2. SOE-56 22 G627 Nicotiana tabacum gi1076646 2. SOE-56 22 G627 Cardamine flexuosa gi30171309 2. SOE-56 22 G627 Brassica rapa gi30171307 3.2OE-56 Subsp. pekinensis 22 G627 Elaeisguineensis gió635740 2.OOE-54 22 G627 Oryza sativa gi5295990 5.3OE-54 6 G975 Glycine max AWTO5973.1 902 6 G975 Glycine max BE610471.1 903 6 G975 Glycine max GLYMA-28NOVO1- 904 CLUSTER232634 1 6 G975 Glycine max GLYMA-28NOVO1- 905 CLUSTER8245 1 6 G975 Glycine max GLYMA-28NOVO1- 906 CLUSTER84865. 1 6 G975 Oryza sativa ORYSA-22ANO2- 907 CLUSTER256875. 1 6 G975 Oryza sativa OSC33871.C1.p4.fg 908 6 G975 Oryza sativa rsicek 16488.y1.abd 909 6 G975 Zea mays BG874224.1 910 6 G975 Zea mays ZEAMA-08NOVO1- 911 CLUSTER277338. 1 6 G975 Hordeum vulgare Hv S31912 1733 6 G975 Lycopersicon SGN-UNIGENE-S2816 2003 escientain 6 G975 Lycopersicon SGN-UNIGENE-SINGLET 2004 escientain 14957 6 G975 Lycopersicon SGN-UNIGENE-SINGLET 2005 escientain 330976 6 G975 Lycopersicon SGN-UNIGENE-SINGLET 2006 escientain 335836 62 G975 Brassica naptis CD83813S 2.OOE-91 62 G975 Brassica oleracea BH477624 2.OOE-69 62 G975 Triticinaesitti CA486875 4.OOE-64 62 G975 Oryza sativa AKO 61163 3.OOE-62 (japonica cultivar group) 62 G975 Oryza sativa AX699685 2.OOE-61 62 G975 Rosa chinensis BI978981 3.OOE-60 62 G975 Amborelia CD484.088 3.OOE-59 trichopoda 62 G975 Hordeum vulgare BU978490 2.OOE-58 Subsp. vulgare US 2011/007880.6 A1 Mar. 31, 2011 52

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 62 G975 Vitis aestivais CB289.393 7.OOE-58 62 G975 Lycopersicon BG642554 1.OOE-56 escientain 62 G975 Oryza sativa gi32479658 2.2OE-30 (japonica cultivar group) 62 G975 Lycopersicon gi1865.0662 2.2OE-25 escientain 62 G975 Lupinus polyphyllus gi131754 2.6OE-22 62 G975 Nicotiana tabacum gi3065895 1.1OE-19 62 G975 Atriplex hortensis gi857.1476 1.1OE-19 62 G975 Zea mays gi21908.036 1.OOE-18 62 G975 Stylosanthes hamata gia.099914 1.3OE-18 62 G975 Hordeum vulgare gi27960757 1.7OE-18 62 G975 Oryza sativa gi10567106 2.OOE-18 62 G975 Nicotiana Sylvestris gi88.09573 1.2OE-17 63 G10 Glycine max GLYMA-28NOVO1- 912 CLUSTER36089 1 63 G10 Glycine max GLYMA-28NOVO1- 913 CLUSTER36089 2 63 G10 Glycine max GLYMA-28NOVO1- 914 CLUSTER36089 3 63 G10 Glycine max GLYMA-28NOVO1- 915 CLUSTER36089 4 63 G10 Glycine max GLYMA-28NOVO1- 916 CLUSTER36089 6 63 G10 Glycine max GLYMA-28NOVO1- 917 CLUSTER475715 2 63 G10 Oryza sativa ORYSA-22ANO2- 918 CLUSTER475 3 63 G10 Oryza sativa OSC101782.C1.p2.fg 919 63 G10 Zea mays ZEAMA-08NOVO1- 920 CLUSTER48. 1 63 G10 Zea mays ZEAMA-08NOVO1- 921 CLUSTER48 2 63 G10 Zea mays ZEAMA-08NOVO1- 922 CLUSTER48 4 63 G10 Zea mays ZEAMA-08NOVO1- 923 CLUSTER48 5 63 G10 Zea mays ZEAMA-08NOVO1- 924 CLUSTER8143 1 63 G10 Oryza sativa OS S60918 1581 63 G10 Glycine max Gima S5094568 1651 63 G10 Medicago Mitr S5357829 1696 truncatula 63 G10 Zea mays Zm S11418746 1786 63 G10 Zea mays Zm S11527819 1787 63 G10 Trictim aestivum Ta S2O3038 1858 63 G10 Trictim aestivum Ta S304256 1859 63 G10 Trictim aestivum Ta S424724 1860 63 G10 Lycopersicon Les S5295933 1929 escientain 63 G10 Lycopersicon SGN-UNIGENE-SOS86 2007 escientain 63 G10 Lycopersicon SGN-UNIGENE-S2410 2008 escientain 63 G10 Lycopersicon SGN-UNIGENE-SINGLET 2009 escientain 366830 63 G10 Lycopersicon SGN-UNIGENE-SINGLET 2010 escientain 394847 64 G10 Petunia X hybrida AF33S240 1.OOE-58 64 G10 Sinapis alba SAU2S696 S.OOE-58 64 G10 Brassica rapa AY257541 1.OOE-57 Subsp. pekinensis 64 G10 Lycopersicon AI486684 1.OOE-57 escientain 64 G10 Cardamine flexuosa AY257542 2.OOE-57 64 G10 Vitis vinifera CA808988 3.OOE-57 64 G10 Populus tremuloides AF377868 9.OOE-57 US 2011/007880.6 A1 Mar. 31, 2011 53

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 64 G10 Pimpinella AFO82531 8.OOE-56 brachycarpa 64 G10 Eucalyptus grandis AY263808 2.OOE-SS 64 G10 Draba memorosa AY257543 8.OOE-SS var. he becarpa 64 G10 Petunia X hybrida gi13384058 3.90E-58 64 G10 Sinapis alba gi1049022 4...SOE-57 64 G10 Brassica rapa gi30171307 5.7OE-57 Subsp. pekinensis 64 G10 Cardamine flexuosa gi30171309 1.90E-56 64 G10 Populus tremuloides gi31295609 14OE-SS 64 G10 Pimpinella gi3493647 7.6OE-SS brachycarpa 64 G10 Nicotiana tabacum gi1076646 1.6OE-54 64 G10 Eucalyptus grandis gi30575600 1.6OE-54 64 G10 Draba memorosa gi30171311 1.1OE-53 var. he becarpa 64 G10 Eucalyptus gi30983946 1.1OE-53 occidentais 78 G1108 Oryza sativa AKO66424 1.OOE-113 (japonica cultivar group) 78 G1108 Zea mays BG837939 1.OOE-91 78 G1108 Brassica oleracea BZA-86328 1.OOE-89 78 G1108 Lacitica Saiva BQ852089 3.OOE-8O 78 G1108 Titicin aestiviti BJ319065 2.OOE-78 78 G1108 Oryza sativa (indica CB634.885 S.OOE-78 cultivar-group) 78 G1108 Lycopersicon BI921710 1.OOE-75 escientain 78 G1108 Oryza sativa AX6997OO 1.OOE-73 78 G1108 Hordeum vulgare ALSOS242 8.OOE-71 Subsp. vulgare 78 G1108 Solanum tuberosum BQ512426 6.OOE-69 78 G1108 Oryza sativa gi15289774 6.OOE-78 (japonica cultivar group) 78 G1108 Phacelia gi5002214 14OE-28 tanacetifolia 78 G1108 Medicago sativa gi23451086 5.1OE-12 78 G1108 Oryza sativa gi14164470 1.1OE-11 78 G1108 Cicerarietintin gi4651204 2.6OE-10 78 G1108 Nicotiana tabacum gi12003386 14OE-09 78 G1108 Theilungiella gi20340241 1. SOE-09 halophila 78 G1108 Hordeum vulgare gi2894379 2.8OE-09 78 G1108 Cucumis meio gi1701 6985 2.3OE-08 78 G1108 Hordeum vulgare gi20152976 3.1OE-08 Subsp. vulgare 93 G1274 Glycine max GLYMA-28NOVO1- 968 CLUSTER16030 1 93 G1274 Glycine max GLYMA-28NOVO1- 969 CLUSTER305171. 1 93 G1274 Oryza sativa OSC10O386.C1-p11.fg 970 93 G1274 Oryza sativa OSC100526.C1-p1.fg 971 93 G1274 Zea mays ZEAMA-08NOVO1- 972 CLUSTER1396.42 1. 93 G1274 Zea mays ZEAMA-08NOVO1- 973 CLUSTER139642 2 93 G1274 Zea mays ZEAMA-08NOVO1- 974 CLUSTER2967 14 93 G1274 Zea mays ZEAMA-08NOVO1- 975 CLUSTER452657 1 93 G1274 Lycopersicon SGN-UNIGENE-S1404 2017 escientain 93 G1274 Lycopersicon SGN-UNIGENE-S7O64 2018 escientain 94 G1274 Glycine max BQ742659 1.OOE-33 94 G1274 Solanum tuberosum BQ516647 2.OOE-32 US 2011/007880.6 A1 Mar. 31, 2011 54

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 94 G1274 Lycopersicon BI2O9002 2.OOE-32 escientain 94 G1274 Hordeum vulgare BE216OSO 4.OOE-31 94 G1274 Capsicum annuum CA524920 2.OOE-30 94 G1274 Stevia rebaudiana BGS2SO40 3.OOE-30 94 G1274 Sorghum bicolor CD2331.13 3.OOE-29 94 G1274 Zea mays BM334368 2.OOE-28 94 G1274 Hordeum vulgare BJ478103 3.OOE-28 Subsp. spontanet in 94 G1274 Hordeum vulgare B456908 3.OOE-28 Subsp. vulgare 94 G1274 Oryza sativa gi9558431 1.1OE-28 94 G1274 Oryza sativa gi21104763 4.90E-28 (japonica cultivar group) 94 G1274 Nicotiana tabacum gi29536791 6.OOE-23 94 G1274 Capsella rubella gi32454266 1.7OE-22 94 G1274 Solanum tuberosum gi24745606 8.7OE-22 94 G1274 Oryza sativa (indica gi23305.051 1.40E-21 cultivar-group) 94 G1274 Pimpinella gi3420906 1.7OE-21 brachycarpa 94 G1274 Lycopersicon gi13620227 3.90E-21 escientain 94 G1274 Clictimis Saivitis gi7484759 S.7OE-21 94 G1274 Ipomoea batatas gi1076685 7.OOE-21 2O7 G1357 Glycinemax GLYMA-28NOVO1- 982 CLUSTER80398. 1 2O7 G1357 Lycopersicon SGN-UNIGENE-S2387 2020 escientain 208 G1357 Brassica oleracea BHS90226 3.OOE-94 208 G1357 Medicago BF6456OS S.OOE-59 truncatula 208 G1357 Sorghum bicolor BI14O703 8.OOE-44 208 G1357 Hordeum vulgare BJ481205 8.OOE-44 Subsp. spontanet in 208 G1357 Hordeum vulgare BU967S16 8.OOE-44 Subsp. vulgare 208 G1357 Hordeum vulgare BQ469035 8.OOE-44 208 G1357 Petuniax hybrida AFSO9874 9.OOE-42 208 G1357 Triticin aestiviti BJ257015 9.OOE-42 208 G1357 Oryza sativa AX654515 3.OOE-41 208 G1357 Oryza sativa AKO99540 S.OOE-41 (japonica cultivar group) 208 G1357 Oryza sativa gi19225018 15OE-42 (japonica cultivar group) 208 G1357 Petuniax hybrida gi21105751 2.4OE-42 208 G1357 Medicago gi7716952 7.2OE-42 truncatula 208 G1357 Oryza sativa gió730946 3.5OE-41 208 G1357 Glycinemax gi22597158 1.1OE-37 208 G1357 Brassica napus gi31322582 4.3OE-36 208 G1357 Phaseolus vulgaris gi15148914 7.OOE-36 208 G1357 Lycopersicon gió175246 2.2OE-32 escientain 208 G1357 Triticum sp. gi4218537 2.8OE-32 208 G1357 Triticip gió732160 2.8OE-32 ficio COCGilii 225 G1452 Glycine max GLYMA-28NOVO1- 982 CLUSTER80398. 1 225 G1452 Lycopersicon SGN-UNIGENE-S2387 2020 escientain 226 G1452 Medicago BF6456OS S.OOE-6S truncatula 226 G1452 Sorghum bicolor BI14O703 7.OOE-43 226 G1452 Hordeum vulgare BQ469035 1.OOE-42 US 2011/007880.6 A1 Mar. 31, 2011 55

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 226 452 Hordeum vulgare BU967S16 1.OOE-42 Subsp. vulgare 226 452 Hordeum vulgare BJ481205 1.OOE-42 Subsp. spontanet in 226 452 Trictim aestivum BQ620568 3.00 E-42 226 452 Oryza sativa (indica CB630990 3.00 E-4 cultivar-group) 226 452 Oryza sativa AX654172 8.OO 226 452 Oryza sativa CB657109 1.OO (japonica cultivar group) 226 452 Lactica Saiiva BQ997138 4.OO 226 452 Oryza sativa gió730946 1.30 226 452 Petunia X hybrida gi21105746 1.2O 226 452 Oryza sativa gi27452910 S.10 : (japonica cultivar group) 226 452 Medicago gi7716952 4 1 truncatula 226 452 Glycine max gi22597158 5.30 226 452 Phaseolus vulgaris gi15148914 7.OO 226 452 Brassica naptis gi31322578 2.30 226 452 Triticum sp. gi4218537 3.90 226 452 Trictim gió732160 3.90 E-E-E-E-E- ficio COCGilii 226 452 Lycopersicon gi6175246 34 escientain 233 482 Glycine max LYMA-28NOVO1 O14 LUSTER228559 1 233 482 Glycine max LYMA-28NOVO1A. O15 LUSTER228559 2 233 482 Glycine max LYMA-28NOVO1A O16 LUSTER38097 1 233 482 Glycine max LYMA-28NOVO1A O17 LUSTER39971 1 233 482 Glycine max LYMA-28NOVO1A O18 LUSTER39971 2 233 482 Oryza sativa YSA-22ANO2 O19 LUSTER17570 1 233 482 Oryza sativa YSA-22ANO2 O2O USTER1757O 2 233 482 Oryza sativa YSA-22ANO2 O21 LUSTER687 1 233 482 Oryza sativa RYSA-22ANO2 O22 LUSTER99743 1 233 482 Oryza sativa SC101266.C1-p1.fg O23 233 482 Oryza sativa SC15654C1-p3.fg O24 233 482 Zea mays S631093 O25 233 482 Zea mays EAMA-08NOVO1 O26 LUSTER35072 1 233 482 Zea mays EAMA-08NOVO1 LUSTER35072 2 233 482 Zea mays EAMA-08NOVO1 LUSTER366705 1 233 482 Zea mays EAMA-08NOVO1 LUSTER439033. 1 233 482 Zea mays EAMA-08NOVO1 LUSTER439033 2 233 482 Oryza sativa OS S60490 592 233 482 Medicago Mitr S10820905 703 truncatula 233 482 Zea mays Zm S11432778 802 233 482 Trictim aestivum Ta S288030 879 233 482 Lycopersicon SGN-UNIGENE-47593 2032 escientain 234 482 Soianum tuberosum 1.OOE-60 234 482 Medicago 2.OOE-57 truncatula US 2011/007880.6 A1 Mar. 31, 2011 56

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 234 482 Robinia BI678.186 1.OO -52 pseudoacacia 234 482 Glycine max 6.OO -52 234 482 Lotus japonicus 1.OO -48 234 482 Zinnia elegans AU288043 2.OO -45 234 482 Populus tremula BU892726 2.OO -45 234 482 Lycopersicon BM4O9788 2.OO -44 escientain 234 482 Oryza sativa AKO71507 1.OO -43 (japonica cultivar group) 234 482 Oryza sativa ABOO1884 S.OO 234 482 Oryza sativa gi3618312 1.90 -45 234 482 Oryza sativa gi3248.8104 2.OO -38 (japonica cultivar group) 234 482 Brassica nigra gi11037311 4.90 -18 234 482 Raphantis sativits gi334.1723 8.OO -17 234 482 Brassica naptis gi30984.027 2.70 -15 234 482 Maius X domestica gi4091806 740 -15 234 482 Ipomoea nil gi10946337 2.OO -14 234 482 Hordeum vulgare gi21667485 2.90 -13 234 482 Hordeum vulgare gi21655154 1...SO -11 Subsp. vulgare 234 482 Pinus radiata gi4557.093 3.10 -10 238 493 Medicago CB891281 9.OO -98 truncatula 238 493 Zea mays ABO)6O130 S.OO -95 238 493 Brassica naptis CD825309 7.OO -84 238 493 Vitis vinifera CD8OO109 9.OO -84 238 493 Oryza sativa AK10OS30 7.OO -81 (japonica cultivar group) 238 493 Oryza sativa (indica CB630542 3.00 -77 cultivar-group) 238 493 Brassica oleracea BH68726S 2.OO -74 238 493 Glycine max AWS96.288 4.OO -70 238 493 Poncirus trifoliata CD574729 6.OO -69 238 493 Lactica Saiiva BQ858556 1.OO -66 238 493 Zea mays gi13661 174 1.OO -84 238 493 Oryza sativa gi243O8616 9.20 -82 (japonica cultivar group) 238 493 Oryza sativa (indica gi31338860 2.2O E4 2 cultivar-group) 238 493 Oryza glaberrima gi31338862 2.2O -42 238 493 Oryza sativa gi15289981 96.O -19 238 493 Soianum gi32470629 1.OO -10 buibocasianum 238 493 Chlamydomonas gi5916207 1.2O -09 reinhardtii 238 493 Mesembryanthemum gió942190 -09 crystallinum 238 493 Nicotiana tabacum gi4519671 2.50 -08 238 493 Dianthus gi13173408 140 -07 caryophyllius 241 510 Oryza sativa ORYSA-22ANO2 1031 CLUSTER159728. 1 241 510 Oryza sativa OSC101.036.C1.p2.fg 1032 241 510 Glycine max Gma S5061040 1662 241 510 Trictim aestivum Ta S2O6702 1880 241 510 Lycopersicon Les S5271097 1932 escientain 241 510 Lycopersicon SGN-UNIGENE-S6179 escientain 242 510 Brassica oleracea BZA93938 8.OOE-58 242 510 Brassica naptis CB686317 3.OOE-31 242 510 Vitis vinifera BM437179 S.OOE-23 242 510 Glycine max BF42S622 S.OOE-23 US 2011/007880.6 A1 Mar. 31, 2011 57

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 242 Oryza sativa AKO99607 7.OOE-23 (japonica cultivar group) 242 Sorghum bicolor CD213245 9.OOE-2O 242 Medicago BQ165696 2.OOE-18 truncatula 242 Populus tremitia X BU8631.59 S.OOE-18 Populus tremuloides 242 Trictim aestivum AL816777 4.OOE-17 242 g Oryza sativa ACO87597 3.OOE-15 242 Oryza sativa gi28372691 7.OOE-19 (japonica cultivar group) 242 Oryza sativa gi14165317 5.1OE-10 242 g Nicotiana tabacum gi12711287 3.7OE-07 242 Nicotiana gi1076609 4.2OE-05 plumbaginifolia 242 Fagopyrim sp. gi31088153 O.O13 C97107 242 Fagopyrim gi31088139 O.O16 rubifolium 242 Fagopyrim gi31088119 O.O32 gracipes 242 Fagopyrim sp. gi310881.51 O.O32 C97106 242 Fagopyrum gi31088129 O.O32 capillatin 242 Fagopyrim gi31088131 O.04 caiianthium 263 660 Glycine max GLYMA-28NOVO1 CLUSTER30666 1 263 660 Glycine max uC-gmfLIB3275P059b07b1 263 660 Oryza sativa ORYSA-22ANO2 CLUSTER6548. 1 263 660 Oryza sativa ORYSA-22ANO2 CLUSTER93242 1. 263 660 Oryza sativa OSC100113.C1.p9.fg 263 660 Oryza sativa OSC101572.C1.p8.fg 263 660 Oryza sativa OSC34319.C1.p4.fg 263 660 Zea mays 700167489 FLI 263 660 Zea mays LIB3279-010-H4 FLI 263 660 Zea mays LIB4767-001-R1-M1-D1 263 660 Zea mays ZEAMA-08NOVO1 CLUSTER43109 1 263 660 Zea mays ZEAMA-08NOVO1 CLUSTER64649 1 263 660 Oryza sativa Os S94670 593 263 660 Zea mays Zm S11454293 803 263 660 Zea mays Zm S11520265 804 263 660 Trictim aestivum Ta S142271 881 263 660 Lycopersicon SGN-UNIGENE-SINGLET 2O34 escientain 35095 263 660 Lycopersicon SGN-UNIGENE-SINGLET 2035 escientain S3090 264 660 Oryza sativa AK102604 1.OOE-109 (japonica cultivar group) 264 660 Brassica oleracea BZA316O7 1.OOE-108 264 660 Brassica naptis CD818917 2.OOE-95 264 660 Oryza sativa BEO40229 2.OOE-62 264 660 Ipomoea nil BJ576287 1.OOE-54 264 660 Lycopersicon AW443990 7.OOE-54 escientain 264 660 Oryza sativa (indica AAAAO1OO1098 2.OOE-52 cultivar-group) 264 660 Zea mays CB886289 3.OOE-SO 264 660 Hordeum vulgare BM377843 3.OOE-SO 264 660 Trictim aestivum BJ238O27 6.OOE-47 US 2011/007880.6 A1 Mar. 31, 2011 58

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 264 660 Oryza sativa gi27452912 (japonica cultivar group) 264 660 Zea mays gi23928441 3.3OE-22 264 660 Solanum tuberosum gi1881585 1.6OE-17 264 660 Lycopersicon gi4731573 1.2OE-16 escientain 264 660 Nicotiana tabacum gi8096269 264 660 Cucurbita maxima gi17221648 264 660 Cicer arietintin gi7208779 264 660 Oryza sativa gi11875196 264 660 Plastid Oenothera gi13276714 eiata Subsp. hookeri 264 660 Oenothera elata gi23822375 subsp. hookeri 267 730 Zea mays LIBSO74-O1O-R1-XP1-A11 1048 268 730 Brassica oleracea BZA72679 6.OO 67 268 730 Medicago AC126787 1.OO truncatula 268 730 Brassica naptis CD8141.99 4.OO 2277 268 730 Zea mays BZ715596 4.OO 2 1 268 730 Oryza sativa AK108491 S.OO EE-E 2 1 (japonica cultivar group) 268 730 Oryza sativa (indica AAAA01009602 7.OO E-21 cultivar-group) 268 730 Oryza sativa AX653.298 1.OO -18 268 730 Cucumis meio AF499727 2.OO -18 268 730 Soianum tuberostin BG593372 S.OO -18 268 730 Lycopersicon AWO32769 2.OO -17 escientain 268 730 Cucumis meio gi28558782 6.70 -23 268 730 Oryza sativa gi12643047 1.90 -19 268 730 Oryza sativa gi31433649 1.90 -19 (japonica cultivar group) 268 730 Nicotiana tabacum gi12003386 S.10 -17 268 730 Zea mays gi21645888 140 -16 268 730 Medicago Saiiva gi23451086 1.30 -14 268 730 Hordeum vulgare gi20152976 5.70 -14 Subsp. vulgare 268 730 Hordeum vulgare gi2894379 1.10 -09 268 730 Oryza sativa (indica gi29164825 4.10 -09 cultivar-group) 268 730 Theilungiella gi20340241 1.10 -08 halophila 275 779 Glycine max GLYMA-28NOVO1 1051 CLUSTER185518. 1 275 779 Glycine max GLYMA-28NOVO1 1052 CLUSTER264928. 1 275 779 Glycine max GLYMA-28NOVO1 1053 CLUSTER76652 1 275 779 Oryza sativa OSC21832.C1.p4.fg 1054 275 779 Zea mays ZEAMA-08NOVO1 1055 CLUSTER78309 1 275 779 Lycopersicon SGN-UNIGENE-SINGLET escientain 56.681 276 779 Brassica oleracea BHSS8232 3.OOE-36 276 779 Vitis vinifera BM437179 2.OOE-26 276 779 Glycine max BF42S622 1.OOE-24 276 779 Oryza sativa AKO99607 S.OOE-21 (japonica cultivar group) 276 779 Sorghum bicolor CD213245 3.OOE-2O 276 779 Medicago BQ165696 2.OOE-19 truncatula 276 779 Populus tremula x BU863159 2.OOE-18 Populus tremuloides 276 779 Brassica naptis CB686317 9.OOE-18 US 2011/007880.6 A1 Mar. 31, 2011 59

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 276 779 Poncirus trifoliata CDS76O18 3.OOE-17 276 779 Trictim aestivum AL816777 2.OOE-16 276 779 Oryza sativa gi28564714 1.2OE-2O (japonica cultivar group) 276 779 Oryza sativa gi5091599 2.8OE-08 276 779 Nicotiana tabacum gi12711287 2.90E-07 276 779 Nicotiana gi1076609 3.SOE-OS plumbaginifolia 276 779 Lycopersicon gi1418.988 O.36 escientain 276 779 Entirema wasabi gi23200602 0.55 276 779 Amicia glanditiosa gi3O313971 O.62 276 779 Ipomoea batatas gió04324 O.8 276 779 Trictim aestivum gi23451222 1 276 779 Gnetum gnemon gi31746346 1 277 792 Oryza sativa G3380 2124 S.OOE-29 277 792 Oryza sativa G3383 2128 3.OOE-33 277 792 Oryza sativa G3515 2209 7.OOE-30 277 792 Zea mays G3516 2211 2.OOE-31 277 792 Zea mays G3517 2213 9.OOE-33 277 792 Glycine max G3518 2215 9.OOE-3S 277 792 Glycine max G3519 2217 3.OOE-3S 277 792 Glycine max G3520 2219 3.OOE-36 277 792 Glycine max AW308784.1 685 277 792 Glycine max BG790680.1 686 277 792 Glycine max LYMA-28NOVO1 687 LUSTER602185 1 277 792 Glycine max LYMA-28NOVO1 688 LUSTER91218 1 277 792 Glycine max B5118-009-Q1-PF1-F2 689 277 792 Oryza sativa SC20174.C1.p2.fg 690 277 792 Zea mays B4756-134-A1-K1-G10 691 277 792 Glycine max ma. S5001644 1633 277 792 Zea mays m S11513768 1754 278 792 Lycopersicon 2 776626 7.OO -35 escientain 278 792 Soianum tuberosum BQ045702 1.OO -32 278 792 Glycine max BM178875 9.OO -32 278 792 Medicago BF649790 2.OO -31 truncatula 278 792 Eucalyptus grandis 1.OO -30 278 792 Brassica oleracea 1.OO -30 278 792 Oryza sativa (indica 4.OO -30 cultivar-group) 278 792 Oryza sativa AEO17099 4.OO E-30 (japonica cultivar group) 278 792 Oryza sativa ACO2S907 4.OO -30 278 792 Sorghum bicolor BZ337899 4.OO -30 278 792 Oryza sativa gi31432356 1.10 -30 (japonica cultivar group) 278 792 Lycopersicon gi23452024 4.90 -26 escientain 278 792 Nicotiana tabacum gi1732406 2.60 -25 278 792 Oryza sativa gi12597874 4...SO -25 278 792 Mesembryanthemum gi324O1273 9.40 -25 crystallinum 278 792 Catharanthus gi898.0313 2.2O E-23 iOSetiS 278 792 Nicotiana Sylvestris gi8809571 2.2O -23 278 792 Matricaria gi17385636 140 -21 chamomilia 278 792 Glycine max gi21304712 3.80 -21 278 792 Atriplex hortensis gi857.1476 1.30 -20 282 797 Petunia X hybrida AF33S240 S.OO -52 282 797 Lycopersicon AI486684 7.OO -49 escientain US 2011/007880.6 A1 Mar. 31, 2011 60

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Sma lest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 282 797 Eucalyptus grandis AY2638O8 8.OO 282 797 Eucalyptus AY273872 7.OO occidentais 282 797 Populus tremuloides CA925124 8.OO 282 797 Brassica rapa AY257541 S.OO Subsp. pekinensis 282 797 Sinapis alba SAU2S696 S.OO 282 797 Pimpinella AFO82531 S.OO brachycarpa 282 797 Cardamine fiexitosa AY257542 2.OO -43 282 797 Nicotiana tabacum NTTOB S.OO -43 282 797 Petunia X hybrida gi13384058 4.40 -50 282 797 Eucalyptus grandis gi30575600 8.60 -47 282 797 Eucalyptus gi30983946 6.OO -46 occidentais 282 797 Brassica rapa gi30171307 4.90 E Subsp. pekinensis 282 797 Populus tremuloides gi31295609 4.90 282 797 Sinapis alba gi1049022 1.60 282 797 Pimpinella gi3493647 1.60 brachycarpa 282 797 Cardamine fiexitosa gi30171309 2.70 -43 282 797 Nicotiana tabacum gi1076646 1...SO -42 282 797 Draba memorosa gi30171311 1.OO -41 var. he becarpa 284 798 Petunia X hybrida AF33S240 S.OO -53 284 798 Lycopersicon AI486684 3.00 -52 escientain 284 798 Brassica rapa AY257541 3.00 Subsp. pekinensis 284 798 Sinapis alba SAU2S696 3.00 284 798 Cardamine fiexitosa AY257542 S.OO 284 798 Pimpinella AFO82531 S.OO brachycarpa 284 798 Populus tremuloides CA925124 284 798 Eucalyptus grandis AY263807 284 798 Nicotiana tabacum NTTOB 284 798 Oryza sativa AK104921 (japonica cultivar group) 284 798 Petunia X hybrida gi13384058 -52 284 798 Brassica rapa gi30171307 -48 Subsp. pekinensis 284 798 Sinapis alba gi1049022 -47 284 798 Cardamine fiexitosa gi30171309 -46 284 798 Pimpinella gi3493647 -46 brachycarpa 284 798 Populus tremuloides gi31295609 284 798 Oryza sativa gi5295990 284 798 Eucalyptus grandis gi30575598 284 798 Zea mays gi12002139 284 798 Nicotiana tabacum gi1076646 287 8 6 Oryza sativa G3392 2131 2.OO 287 Oryza sativa G3392 2133 2.OO 287 Zea mays G3431 2147 OO 287 Zea mays G3444 2157 OO 287 Glycine max G3445 21.59 S.OO 287 Glycine max G3446 2161 S.OO 287 Glycine max G3447 2163 S.OO 287 Glycine max G3448 216S OO 287 Glycine max G3449 2167 3.00 287 Glycine max G3450 2168 3.00 287 Glycine max GLYMA-28NOVO1 1057 CLUSTER31802 1 287 Glycine max GLYMA-28NOVO1 1058 CLUSTER586 102 287 Glycine max GLYMA-28NOVO1 1059 CLUSTER586 116 US 2011/007880.6 A1 Mar. 31, 2011 61

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 287 Glycine max LYMA-28NOVO1 LUSTER8724. 1 287 Glycine max LYMA-28NOVO1 LUSTER8724 2 287 Oryza sativa RYSA-22ANO2 LUSTER30974 2 287 Oryza sativa RYSA-22ANO2 LUSTER30974 3 287 Oryza sativa SC20053.C1.p5.fg 287 Oryza sativa SC20055.C1.p5.fg 287 Zea mays E AMA-08NOVO1 LUSTER69699 1 287 Zea mays AMA-08NOVO1 LUSTER69699 2 287 Glycine max ma. S4901946 663 287 Trictim aestivum Ta S45274 883 288 Vitis vinifera BM4373.13 8.OO -28 288 Populus BU872107 2.OO -27 balsamifera Subsp. trichocarpa 288 8 6 Populus tremitia X BU831849 2.OO E 27 Populus tremuloides 288 Vitis aestivais B289.238 7.OO 27 288 Glycine max 495284 7.OO 288 Brassica naptis D843377 6.OO 288 Nuphar advena D473522 1.OO 288 Pinus pinaster L75O151 3.00 E 288 Lactica Saiiva UO15255 S.OO 288 Brassica oleracea H961028 8.OO 288 Gossypioides kirkii gi23476295 4.90 288 Gossypium gi14269333 2.70 raimondii 288 8 6 Gossypium gi1426.9335 2.70 E 1 herbaceum 288 Gossypium hirsuttin gi14269337 2.70 288 Soianum tuberosum gi9954118 1...SO 288 Oryza sativa gi2605619 240 E 288 Ciclinissativits gi2O514371 3.10 288 Zea mays subsp. gi15042108 4.OO parvigitimis 288 8 6 Zea luxurians gi15042124 O 288 8 6 Anihurium gi29824962 andraeant in 304 863 Brassica oleracea BHS82941 S.OO -61 304 863 Oryza sativa AF2O1895 2.OO -34 304 863 Soianum tuberosum BM4O4872 3.00 -34 304 863 Medicago AW981431 1.OO -33 truncatula 304 863 Glycine max BIf86.182 1.OO -33 304 863 Oryza sativa AK103508 2.OO -33 (japonica cultivar group) 304 863 Lactica Saiiva BQ852906 4.OO -33 304 863 Lycopersicon AW442227 2.OO -32 escientain 304 863 Hordeum vulgare CAO29723 4.OO -32 Subsp. vulgare 304 863 Oryza sativa (indica AAAAO1OO4865 1.OO -31 cultivar-group) 304 863 Oryza sativa gi32492205 1.90E-43 (japonica cultivar group) 304 863 Oryza sativa gió573149 240 -39 304 863 Soianum gi32470630 3.90 -39 buibocasianum 304 863 Sorghum bicolor gi18390099 1...SO -37 304 863 Lycopersicon gi19171209 O.15 escientain 304 863 Pistin Saivain gi7008009 0.75 US 2011/007880.6 A1 Mar. 31, 2011 62

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis Species from Which Sequence Identifier O Orthologous Ortholog, When Sequence O. Ortholog is Derived Accession Number Sequence Known 304 863 Zea mays gi1061308 O.85 304 863 Glycine max gi2129829 O.98 304 863 Oryza sativa (indica gi4680184 O.99 cultivar-group) 304 863 Brassica rapa gi12655.953 1 305 893 Glycine max AW278.047.1 1068 3.18 945 Brassica rapa BGS43096 2.OOE-85 Subsp. pekinensis 945 Pistin Saivain CD860359 9.OOE-69 945 Brassica oleracea BH48O897 1.OOE-66 945 Glycine max CD397129 4.OOE-66 945 Medicago BG647027 4.OOE-66 truncatula 945 Oryza sativa (indica AAAAO1OOO383 7.OOE-56 cultivar-group) 945 Oryza sativa APOO5755 9.OOE-56 (japonica cultivar group) 945 annuit is BUO23570 3.OOE-52 945 Zea mays BZA-12041 7.OOE-51 945 Oryza sativa APOO4O2O 2.OOE-48 945 Oryza sativa gi32489626 1.6OE-47 (japonica cultivar group) 945 Antirrhinum maius gi41651.83 1.2OE-21 945 Pistin Saivain gi2213534 2.2OE-14 945 Heianthus hirsutus gi27526446 O.09 945 Heianthus gi27526452 O.12 tuberostis 945 Heianthus niveus gi27526450 O.12 945 Heianthus ciliaris gi14588999 O.2 945 Helianthus praecox gi18073228 O.25 945 Heianthus debiis gi27526440 945 Lycopersicon gi1345538 O46 escientain 327 988 Glycine max LYMA-28NOVO 1098 LUSTER75453 1 327 988 Glycine max LYMA-28NOVO 1099 LUSTER75453 2 327 988 Oryza sativa RYSA-22ANO2 1100 LUSTER153439 2 327 988 Zea mays EAMA-08NOVO 1101 LUSTER10890 1 327 988 Zea mays EAMA-08NOVO 1102 LUSTER10890 3 327 988 Zea mays EAMA-08NOVO 1103 LUSTER2O1962 1 327 988 Zea mays EAMA-08NOVO 1104 LUSTER3040 3 327 988 Oryza sativa S. S91481 16O1 327 988 Lycopersicon GN-UNIGENE-S NGLET 2045 escientain SO90 328 988 Brassica oleracea BH4.78747 S.OOE-23 328 988 Populus BU873581 7.OOE-22 balsamifera Subsp. trichocarpa 328 988 Citrus unshiu 2.OOE-18 328 988 Lycopersicon 2.OOE-18 escientain 328 988 Oryza sativa (indica AAAAO1OOO340 1.OOE-17 cultivar-group) 328 988 Beta vulgaris BQ594583 1.OOE-16 328 988 Zea mays CC655765 2.OOE-15 328 988 Glycine max B469275 8.OOE-15 328 988 Prunus persica BUO46688 7.OOE-14 328 988 Vitis vinifera CD71994.1 2.OOE-13 328 988 Maius X domestica gi4091806 2.6OE-07 328 988 Brassica naptis gi30984.027 1.1OE-06 328 988 Brassica nigra gi2285.4920 1.1OE-06 US 2011/007880.6 A1 Mar. 31, 2011 63

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 328 G1988 Raphantis sativits gi334.1723 2.7OE-06 328 G1988 Oryza sativa gi3248.8104 4.8OE-06 (japonica cultivar group) 328 G1988 Ipomoea nil gi10946337 5.1OE-06 328 G1988 Oryza sativa gi11094211 2.2OE-OS 328 G1988 Hordeum vulgare gi21667475 4...SOE-OS 328 G1988 Hordeum vulgare gi21655168 O.OOO18 Subsp. vulgare 328 G1988 Pinus radiata gi4557.093 O.OO16 341 G204 Glycine max GLYMA-28NOVO1 1105 CLUSTER244491. 1 341 G204 Glycine max LIB428O-051-Q1-K1-E4 1106 341 G204 Oryza sativa rsicem 7360.y1.abd 1107 341 G204 Zea mays Zm S11428,605 1810 341 G204 Lycopersicon SGN-UNIGENE-471-27 2046 escientain 341 G204 Lycopersicon SGN-UNIGENE-SINGLET 2047 escientain 3.89924 342 G204 Glycine max AX196296 1.Oe-999 342 G204 Oryza sativa (indica AAAAO1023044 1.OOE-161 cultivar-group) 342 G204 Oryza sativa APOO4333 1.OOE-161 (japonica cultivar group) 342 G204 Oryza sativa AC107085 8.OOE-90 342 G204 Lotus corniculatus APOO6426 7.OOE-89 variaponicus 342 G204 Medicago BZ286591 9.OOE-89 truncatula 342 G204 Helianthus annuit is CD853758 2.OOE-88 342 G204 Lactica Saiiva BQ853515 6.OOE-87 342 G204 Capsicum annittin BMO67.036 3.OOE-82 342 G204 Lycopersicon BI92S244 8.OOE-79 escientain 342 G204 Oryza sativa gi33146888 1. SOE-152 (japonica cultivar group) 342 G204 Oryza sativa gi14140291 5.6OE-34 342 G204 Zea mays gi18463957 15OE-19 342 G204 Hordeum vulgare gi23193481 4.4OE-08 342 G204 Hordeum vulgare gi23193479 14OE-07 Subsp. vulgare 342 G204 Trictim gi23193487 2.6OE-07 ficio COCGilii 342 G204 Brassica naptis gi4106378 O.12 342 G204 Medicago Saiiva gi1279563 1 342 G204 Nicotiana tabacum gi8096269 1 342 G204 Trictim aestivum gi32400814 1 365 G2142 Glycine max GLYMA-28NOVO1 116 CLUSTER10684 8 365 G2142 Glycine max GLYMA-28NOVO1 117 CLUSTER137024. 1 365 G2142 Glycine max GLYMA-28NOVO1 118 CLUSTER49853 1 365 G2142 Glycine max GLYMA-28NOVO1 119 CLUSTER49853 4 365 G2142 Glycine max LIB3242-451-P1-1-G8 120 365 G2142 Glycine max jC-gmXLIB3563P042ag.07d1 121 365 G2142 Oryza sativa ORYSA-22ANO2 122 CLUSTER54709 1 365 G2142 Oryza sativa ORYSA-22ANO2 123 CLUSTER8097 1 365 G2142 Zea mays 7001645O1H1 124 365 G2142 Glycine max Gma S48.91278 666 365 G2142 Medicago Mitr S5397469 708 truncatula 365 G2142 Zea mays Zm S11527973 812 365 G2142 Trictim aestivum Ta S115402 899 US 2011/007880.6 A1 Mar. 31, 2011 64

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 365 42 Trictim aestivum Ta S146851 1900 365 42 Trictim aestivum Ta S308126 1901 365 42 Lycopersicon SGN-UNIGENE-48174 2048 escientain 365 42 Lycopersicon SGN-UNIGENE-SO424 2049 escientain 365 42 Lycopersicon SGN-UNIGENE-S6397 2OSO escientain 365 42 Lycopersicon SGN-UNIGENE-S6608 2051 escientain 366 42 Brassica naptis CD813318 8.OO -90 366 42 Medicago BF650735 2.OO -59 truncatula 366 42 Populus tremitia X BU837621 4.OO -59 Populus tremuloides 366 42 Glycine max BUO80678 3.00 -58 366 42 Beta vulgaris BQ594352 4.OO -54 366 42 Soianum tuberosum BF186943 6.OO -53 366 42 Lycopersicon AI490572 1.OO -52 escientain 366 42 Oryza sativa AK1O1896 7.OO (japonica cultivar EE group) 366 42 Stevia rebaudiana BGS24O15 2.OO 366 42 Hordeum vulgare BU989763 8.OO subsp. vulgare 366 42 Pennisettin gi527655 3.10 E-10 glauctim 366 42 Sorghum bicolor gi527665 3.90 -08 366 42 Phyllostachys acuta gi527661 6.50 -08 366 42 Tripsacum australe gi527663 18O -07 366 42 Oryza sativa gi32488.806 3.2O -07 (japonica cultivar group) 366 42 Oryza sativa gi15451582 3.SO -07 366 42 Oryza rufipogon gi2130061 6.40 -07 366 42 Oryza australiensis gi1086526 140 -06 366 42 Oryza officinalis gi1086534 2.90 -06 366 42 Oryza gi1086530 3.80 -06 longistaminata 371 Oryza sativa Os S17837 1605 371 Oryza sativa OS S6232 1606 371 Glycine max Gima S5129383 1667 371 Lycopersicon SGN-UNIGENE-SO991 2052 escientain 371 Lycopersicon SGN-UNIGENE-SINGLET 2053 escientain 399.437 372 Oryza sativa AK10OO46 1.OO -172 (japonica cultivar group) 372 Oryza sativa AX654.056 1.OO -168 372 Lotus japonicus LA239041 1.OO -148 372 Pistin Saivain PSA493066 1.OO -130 372 Brassica oleracea BZO78380 1.OO -123 372 Oryza sativa (indica AAAAO1OO2O68 2.OO -76 cultivar-group) 372 Brassica nigra AYO61812 7.OO -71 372 Zea mays CC644684 6.OO -70 372 Gossypium BF269998 4.OO -58 arboreum 372 Lycopersicon BI931 640 7.OO -SS escientain 372 Oryza sativa gi20503001 1.OO -166 (japonica cultivar group) 372 Lotus japonicus gió448579 3.90 -160 372 Pistin Saivain gi23504759 4...SO -124 372 Oryza sativa gi7339715 1.OO -122 US 2011/007880.6 A1 Mar. 31, 2011 65

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 372 Chlamydomonas gi2190980 -06 incerta 372 Chlamydomonas gi1928929 OOOO49 reinhardtii 372 Bromheadia gi2108256 0.55 finlaysoniana 372 Lycopersicon gi100214 0.73 escientain 372 Nicotiana tabacum gi322758 O.81 372 Oryza sativa (indica gi2407271 O.96 cultivar-group) 393 G2334 Lycopersicon SGN-UNIGENE-S7794 2055 escientain 394 G2334 Brassica oleracea BZA28330 S.OO -61 394 G2334 Medicago AW981431 1.OO -30 truncatula 394 G2334 Glycine max BIf86.182 3.00 -30 394 G2334 Soianum tuberosum BE922.572 7.OO -30 394 G2334 Oryza sativa AK110934 7.OO -30 (japonica cultivar group) 394 G2334 Amborelia CD483211 3.00 E-29 trichopoda 394 G2334 Lycopersicon AW.650S63 4.OO E-29 escientain 394 G2334 Oryza sativa AF2O1895 6.OO -29 394 G2334 Hordeum vulgare CAO29723 6.OO -29 Subsp. vulgare 394 G2334 Zea mays CA828910 2.OO -28 394 G2334 Oryza sativa gió573149 6.20 -37 394 G2334 Oryza sativa gi24413958 18O -35 (japonica cultivar group) 394 G2334 Sorghum bicolor gi18390099 6.OO -33 394 G2334 Soianum gi32470646 S.90 -32 buibocasianum 394 G2334 Nicotiana alata gi1087017 0.79 394 G2334 Petunia X hybrida gi14522848 O.94 394 G2334 Picea abies gi10764150 O.98 394 G2334 Oryza sativa (indica gi4680183 cultivar-group) 394 G2334 Lycopersicon gi1418.988 escientain 394 G2334 Pyrus pyrifolia gi8698889 404 G2394 Oryza sativa AKO71804 1.OOE-108 (japonica cultivar group) 404 G2394 Zea mays BG837939 2.OO -85 404 G2394 Oryza sativa AX6997OO 3.00 -72 404 G2394 Trictim aestivum BJ31906S 7.OO -72 404 G2394 Oryza sativa (indica CB634.885 3.00 -69 cultivar-group) 404 G2394 Lactica Saiiva BQ852089 4.OO -69 404 G2394 Lycopersicon BI921710 1.OO -67 escientain 404 G2394 Hordeum vulgare ALSOS242 9.OO E-64 Subsp. vulgare 404 G2394 Hordeum vulgare BU991.885 3.00 -60 404 G2394 Soianum tuberosum BQ512426 3.00 -57 404 G2394 Oryza sativa gi15289774 1...SO -74 (japonica cultivar group) 404 G2394 Phacelia gi5002214 1...SO -24 tanacetifolia 404 G2394 Oryza sativa gi14164470 140 -13 404 G2394 Hordeum vulgare gi20152976 18O -12 Subsp. vulgare 404 G2394 Cicer arietintin gi10334499 6.90 -12 404 G2394 Cucumis meio gi1701 6985 8.10 -12

US 2011/007880.6 A1 Mar. 31, 2011 68

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 511 G274 Zea mays Zm S11434269 1819 511 G274 Lycopersicon SGN-UNIGENE-SO878 2070 escientain G274 Lycopersicon SGN-UNIGENE-SINGLET 2O71 escientain 356106 G274 Oryza sativa APOO3277 2.OO -56 G274 Brassica oleracea BZSO6408 6.OO -48 G274 Zea mays BZ709707 1.OO -47 G274 Glycine max CA953428 4.OO -45 G274 Lycopersicon BEA32293 3.00 -39 escientain G274 Oryza sativa AC1306O7 7.OO E-39 (japonica cultivar group) G274 Hordeum vulgare BES59431 2.OO -37 5 2 G274 Oryza minuta CB210O34 2.OO -34 G274 Oryza sativa (indica AAAAO1O113OO S.OO -34 cultivar-group) G274 Lactica Saiiva BUOOO462 1.OO -33 5 2 G274 Oryza sativa gi15289981 3.2O -57 G274 Oryza sativa gi2O160613 9.30 -29 (japonica cultivar group) G274 Zea mays gi13661 174 3.00 -25 G274 Oryza glaberrima gi31338862 2.50 -13 G274 Oryza sativa (indica gi31338860 7.60 -13 cultivar-group) G274 Chlamydomonas gi5916207 3.2O -11 reinhardtii G274 Mesembryanthemum gió942190 7.90 -11 crystallinum G274 Nicotiana tabacum gi4519671 1.2O -09 G274 Soianum gi32470629 4.30 -09 buibocasianum G274 Pistin Saivain gi23504755 G2765 Oryza sativa AK106649 4.OO -61 (japonica cultivar group) 524 G2765 Lycopersicon AI488.313 -60 escientain 524 G2765 Brassica oleracea 4.OO -51 524 G2765 Glycine max 2.OO -50 524 G2765 Oryza sativa subsp. 4.OO -49 japonica 524 G2765 Populus tremitia X BU813371 1.OO -37 Populus tremuloides 524 G2765 Medicago BF647687 2.OO -37 truncatula 524 G2765 Pinus pinaster BX252556 1.OO -32 524 G2765 Populus BU869748 4.OO -32 balsamifera Subsp. trichocarpa 524 G2765 Zea mays BZ644709 3.00 -31 524 G2765 Oryza sativa gi32129332 2.30 -30 (japonica cultivar group) 524 G2765 Oryza sativa gi108.00070 3.80 -28 524 G2765 Pennisettin gi527655 840 -09 glauctim 524 G2765 Perilla frutescens gi28375728 1.30 -08 524 G2765 Sorghum bicolor gi527665 140 -08 524 G2765 Oryza australiensis gi1086526 18O -08 524 G2765 Oryza rufipogon gi1086.536 2.30 -08 524 G2765 Phyllostachys acuta gi527661 3.80 -08 524 G2765 Oryza gi1086530 4.90 -08 longistaminata 524 G2765 Oryza officinalis gi1086534 1.OO -07 S86 G2898 Medicago AJSO1279 2.OO truncatula US 2011/007880.6 A1 Mar. 31, 2011 69

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known S86 G2898 Glycine max BG651880 2.OO -41 S86 G2898 Soianum tuberosum BQ516260 3.00 -35 S86 G2898 Populus tremula BU816897 8.OO -32 S86 G2898 Zinnia elegans AU29282O 4.OO -30 S86 G2898 Oryza sativa AKO 64663 7.OO -30 (japonica cultivar group) S86 G2898 Zea mays CD999897 S.OO -29 S86 G2898 Trictim aestivum BM13S160 2.OO -28 S86 G2898 Gossypium BG446904 6.OO -21 arboreum S86 G2898 Nuphar advena CD475578 1.OO -19 S86 G2898 Vicia faba gi541981 1.60 -20 S86 G2898 Oryza sativa gi2O161572 3.90 -19 (japonica cultivar group) S86 G2898 Ipomoea nil gi1052956 6.30 -19 S86 G2898 Soianum tuberosum gi2894.109 1.OO -18 S86 G2898 Pistin Saivain gi436424 1.OO -18 S86 G2898 Nicotiana tabacum gi2196548 2.8O -16 S86 G2898 Glycine max gi123379 S.90 -16 S86 G2898 gladiata gi1813329 7.50 -16 S86 G2898 Narcissils gi18419623 2.50 -15 pseudonarcissils S86 G2898 Oryza sativa (indica gi23345287 2.50 -15 cultivar-group) 593 G2933 Glycine max GLYMA-28NOVO1 1314 CLUSTER243321 1 593 G2933 Oryza sativa OSC7496.C1-p10.fg 1315 593 G2933 Zea mays ZEAMA-08NOVO1 1316 CLUSTER88899 1 593 G2933 Oryza sativa OS S391.18 1624 593 G2933 Zea mays Zm S114455.25 1828 593 G2933 Lycopersicon SGN-UNIGENE-S3603 2090 escientain 594 G2933 Brassica oleracea B HS87081 6.OO -59 594 G2933 Populus tremitia X BU884102 8.OO -37 Populus tremuloides 594 G2933 Lycopersicon 205905 6.OO -29 escientain 594 G2933 Glycine max BQ611037 4.OO -28 594 G2933 Trictim aestivum CD872523 4.OO -24 594 G2933 Lupinus albits CA41O291 4.OO -23 594 G2933 Oryza sativa CB660906 S.OO -23 (japonica cultivar group) 594 G2933 Oryza sativa (indica CB624355 1.OO -22 cultivar-group) 594 G2933 Medicago AC125478 -21 truncatula 594 G2933 Zinnia elegans AU28891S 9.OO -20 594 G2933 Oryza sativa gi15528806 3.90 -26 594 G2933 Pennisettin gi527657 8.60 -07 glauctim 594 G2933 Phyllostachys acuta gi527661 4.10 594 G2933 Sorghum bicolor gi527667 S.6OE-OS 594 G2933 Tripsacum australe gi527663 O.OOO2 594 G2933 Mesembryanthemum gi4206118 OOOO48 crystallinum 594 G2933 Oryza sativa gi20521292 (japonica cultivar group) 594 G2933 Zea mays gi1854.2170 O.OO14 594 G2933 Oryza australiensis gi1086526 O.OO31 594 G2933 Oryza rufipogon gi1086.538 0.0055 607 G2979 Lycopersicon SGN-UNIGENE-4942S 2092 escientain 608 G2979 Zea mays AY107996 2.OOE-68 US 2011/007880.6 A1 Mar. 31, 2011 70

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Sma lest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Orth olog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 608 G2979 Theilungiella BI698460 1.OO -60 Salistiginea 608 G2979 Vitis vinifera CB920900 4.OO -45 608 G2979 Helianthus annuit is CD8531.83 2.OO -41 608 G2979 Medicago BG4SOS49 3.00 -39 truncatula 608 G2979 Glycine max BMS24804 8.OO -38 608 G2979 Lycopersicon BI924306 8.OO -37 escientain 608 G2979 Soianum tuberosum BE92O312 7.OO -32 608 G2979 Eschschoizia CD478692 9.OO -32 Californica 608 G2979 Sorghum bicolor BG273641 S.OO -28 608 G2979 Nicotiana tabacum gió328415 4.40 -10 608 G2979 Physcomitrella gi26190147 1.OO -09 patens 608 G2979 Trictim gi13619655 3.90 -09 ficio COCGilii 608 G2979 Triticum sp. gi5763821 3.90 -09 608 G2979 Daiiciis Carota gi89778.33 5.8O -09 608 G2979 Oryza sativa gi12225,043 9.90 -09 608 G2979 Chenopodium gi11558.192 3.00 -08 rubrum 608 G2979 Populus alba gi27802536 3.10 -08 608 G2979 Oryza sativa gi32479738 1.10 -07 (japonica cultivar group) 608 G2979 Thlaspi gi22086272 2.90 -07 caerulescens 609 G298 Glycine max LYMA-28NOVO1 318 LUSTER28852 1 609 G298 Glycine max LYMA-28NOVO1 319 LUSTER28852 2 609 G298 Glycine max LYMA-28NOVO1 32O LUSTER28852 4 609 G298 Glycine max LYMA-28NOVO1 321 LUSTER28852 5 609 G298 Glycine max LYMA-28NOVO1 322 LUSTER28852 6 609 G298 Glycine max LYMA-28NOVO1 323 LUSTER28852 8 609 G298 Glycine max LYMA-28NOVO1 324 LUSTER28852 9 609 G298 Glycine max B3242-344-Q1-J1-G7 325 609 G298 Glycine max B4392-O29-R1-K1-C8 326 609 G298 Oryza sativa RYSA-22ANO2 327 LUSTER89637 1 609 G298 Oryza sativa OS S104685 626 609 G298 Glycine max Gma S4882455 683 609 G298 Zea mays Zm S11334447 829 609 G298 Zea mays Zm S11524241 830 609 G298 Lycopersicon SGN-UNIGENE-SO978 2093 escientain G298 Populus tremitia X AY307373 1.OO E-123 Populus tremuloides G298 Oryza sativa AY224,589 1.OO E-106 (japonica cultivar group) G298 Zea mays AY1O8383 1.OO 5 G298 Poncirus trifoliata CD573622 1.OO g G298 Glycine max BUS 79005 8.OO G298 Soianum tuberosum BM4O6319 6.OO 79 G298 Lycopersicon BG134590 2.OO 7 6 escientain 6 O G298 Pinus taeda BGO40894 4.OO G298 Marchantia C9629O 2.OO polymorpha G298 Lactica Saiiva BUO12590 4.OOE US 2011/007880.6 A1 Mar. 31, 2011 71

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 6 O G2981 Populus tremitia X gi32187097 8.2OE-119 Populus tremuloides 6 O G2981 Oryza sativa gi29371983 2.8OE-101 (japonica cultivar group) 6 O G2981 Triticum sp. gi11877791 4.1OE-47 6 O G2981 Trictim gi13619653 4.1OE-47 ficio COCGilii G2981 Populus alba gi27802536 O.OO64 G2981 Gnetum gnemon gi5019435 O.O37 G2981 Nicotiana tabacum gió328415 O.069 G2981 Oryza sativa gi12225,043 O.O71 G2981 Physcomitrella gi26190147 O.099 patens O G2981 Chenopodium gi11558.192 O.15 rubrum G2982 Glycine max GLYMA-28NOVO1- 1318 CLUSTER28852 1 G2982 Glycine max GLYMA-28NOVO1- 1319 CLUSTER28852 2 G2982 Glycine max GLYMA-28NOVO1- 1320 CLUSTER28852 4 G2982 Glycine max GLYMA-28NOVO1- 1321 CLUSTER28852 5 G2982 Glycine max GLYMA-28NOVO1- 1322 CLUSTER288526 G2982 Glycine max GLYMA-28NOVO1- 1323 CLUSTER28852 8 G2982 Glycine max LIB3242-344-Q1-J1-G7 1325 G2982 Glycine max LIB4392-O29-R1-K1-C8 1326 G2982 Oryza sativa ORYSA-22ANO2- 1327 CLUSTER89637 1 G2982 Lycopersicon SGN-UNIGENE-SO978 2093 escientain G2982 Brassica naptis CD813391 1.OOE-79 G2982 Populus tremitia X AY307373 2.OOE-59 Populus tremuloides 2 G2982 Zea mays AY1O8383 6.OOE-57 6 2 G2982 Oryza sativa AY224,551 2.OOE-54 (japonica cultivar group) G2982 Glycine max BUS 79005 8.OOE-52 G2982 Pinus taeda BGO40894 3.OOE-SO G2982 Soianum tuberosum BM4O6319 2.OOE-47 G2982 Marchantia C9629O 3.OOE-47 polymorpha G2982 Lycopersicon BM412584 1.OOE-42 escientain 2 G2982 Triticum sp. TSP271917 9.OOE-40 G2982 Populus tremitia X gi32187097 1.2OE-58 Populus tremuloides G2982 Oryza sativa gi29367654 6.8OE-54 (japonica cultivar group) G2982 Triticum sp. gi11877791 2.OOE-40 G2982 Trictim gi13619653 2.OOE-40 ficio COCGilii G2982 Daiiciis Carota gi89778.33 O.OO44 G2982 Nicotiana tabacum gió328415 0.057 G2982 Physcomitrella gi26190147 O.17 patens G2982 Thlaspi gi22086272 O.21 caerulescens G2982 Oryza sativa gi12225,043 O.24 G2982 Chenopodium gi11558.192 O.25 rubrum G2990 Oryza sativa OSC4898.C1.p6.fg 1334 G2990 Zea mays LIB3279-221-Q6-K6-B2 1335 US 2011/007880.6 A1 Mar. 31, 2011 72

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 6 5 G2990 Zea mays ZEAMA-08NOVO1 1336 CLUSTER42733 1 6 5 G2990 Oryza sativa OS S56831 1628 6 5 G2990 Glycine max Gma S4897246 1685 G2990 Medicago Mitr S5341529 1715 truncatula G2990 Trictim aestivum Ta S171947 1921 G2990 Lycopersicon SGN-UNIGENE-4.9426 2095 escientain G2990 Lycopersicon SGN-UNIGENE-S2S25 2096 escientain 6 6 G2990 Brassica oleracea BH738007 1.OO -100 6 6 G2990 Medicago AC139600 3.00 84 truncatula G2990 Flaveria bidentis FBI18580 8.OO -81 G2990 Glycine max BFO695.75 4.OO -59 G2990 Soianum tuberosum BEA71989 7.OO -56 G2990 Flaveria trinervia FTR18577 3.00 -51 G2990 Populus AI166342 S.OO -45 balsamifera Subsp. trichocarpa 6 6 G2990 Vitis vinifera CB97O621 7.OO -45 G2990 Oryza sativa APOOS152 2.OO -43 (japonica cultivar group) G2990 Zea mays CC335993 3.00 6 6 G2990 Flaveria bidentis gi13277220 1.10 -76 6 6 G2990 Oryza sativa gi32480091 2.10 -38 (japonica cultivar group) G2990 Flaveria trinervia gi13277216 1.60 -29 G2990 Oryza sativa gi5091602 3.00 -28 G2990 Lactica Saiiva gi291 19890 9.OO -20 G2990 Bromheadia gi2108256 4.30 -06 finlaysoniana G2990 Lycopersicon gi100214 1.2O E-OS escientain G2990 Daiiciis Carota gi224.556 1.70 -OS 6 6 G2990 Nicotiana alata gi1247388 1.90 -OS G2990 Gossypium gi451544 3.80 -OS barbadense 655 G3076 Oryza sativa Os. S95874 1630 655 G3076 Lycopersicon SGN-UNIGENE-S2322 2100 escientain 656 G3076 Brassica oleracea 1.OO -59 656 G3076 Lycopersicon 3.00 -52 escientain 656 G3076 Theobroma cacao CA796492 -31 656 G3076 Nicotiana giatica X TOBTID3 3.00 -25 Nicotiana langsdorfii 656 G3076 Populus tremitia X BU866131 3.00 -21 Populus tremuloides 656 G3076 Medicago BQ123004 4.OO -20 truncatula 656 G3076 Zea mays CC633595 8.OO -18 656 G3076 Oryza sativa AK106334 1.OO -17 (japonica cultivar group) 656 G3076 Oryza sativa APOO3S67 4.OO -17 656 G3076 Oryza sativa (indica AAAAO1OO1312 4.OO -17 cultivar-group) 656 G3076 Oryza sativa gi15408613 1.10 -19 656 G3076 Oryza sativa gi21104797 1.10 -19 (japonica cultivar group) 656 G3076 Lycopersicon gi4959970 1 3 escientain 656 G3076 Trictim aestivum gi100809 US 2011/007880.6 A1 Mar. 31, 2011 73

TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Table 7: SEQ Smallest Sum ID NO: of SEQ ID NO: of Probability to Arabidopsis GID Species from Which Sequence Identifier or Orthologous Ortholog, When Sequence No. Ortholog is Derived Accession Number Sequence Known 656 G3076 Soianum tuberosum gi13195751 6.9OE-12 656 G3076 Zea mays gi297020 8.8OE-12 656 G3076 Nicotiana glauca X gió88423 1.OOE-11 Nicotiana langsdorfii 656 G3076 Phaseolus vulgaris gi15148924 14OE-10 656 G3076 Nicotiana tabacum gi1223.0709 1.1OE-09 656 G3076 Glycinemax gi7488719 14OE-08 657 G3083 Oryza sativa LIB3434-06S-P1-K1-BS 1346 657 G3083 Oryza sativa OS S54214 1631 657 G3083 Glycinemax Gma S4880456 1687 657 G3083 Hordeum vulgare Hv S6O182 1753 657 G3083 Trictim aestivitin Ta S179586 1924 657 G3083 Lycopersicon SGN-UNIGENE-SINGLET 2101 escientain 306367 658 G3083 Medicago BQ123004 9.OOE-6S truncatula 658 G3083 Arachis hypogaea CDO38.559 3.OOE-58 658 G3083 Glycinemax BE657440 7.OOE-51 658 G3083 Theobroma cacao CA794948 2.OOE-48 658 G3083 Phaseolus CA899.019 8.OOE-47 Coccinetis 658 G3083 Brassica oleracea BZO28606 3.OOE-42 658 G3083 Brassica napus CD823868 3.OOE-42 658 G3083 Populus tremula X BU866131 S.OOE-36 Populus tremuloides 658 G3083 Oryza sativa (indica AAAAO1OO6352 2.OOE-32 cultivar-group) 658 G3083 Nicotiana glauca X TOBTID3 1.OOE-31 Nicotiana langsdorfii 658 G3083 Nicotiana glauca X gió88423 8.8OE-36 Nicotiana langsdorfii 658 G3083 Oryza sativa gi8570052 1.3OE-29 658 G3083 Lycopersicon gi4959970 3.1OE-17 escientain 658 G3083 Nicotiana tabacum gi1223.0709 7...SOE-16 658 G3083 Trictim aestivitin gi100809 1.6OE-15 658 G3083 Soianum tuberosum gi13195751 3.OOE-14 658 G3083 Zea mays gi297020 6.OOE-14 658 G3083 Phaseolus vulgaris gi15148926 1.90E-13 658 G3083 Nicotiana sp. gi19680 7.3OE-13 658 G3083 Glycinemax gi7488719 5.1OE-11

0354) Table 8 lists sequences discovered to be paralogous to a number of transcription factors of the present invention. TABLE 8-continued The columns headings include, from left to right, the Arabi dopsis SEQ ID NO; corresponding Arabidopsis Gene ID Arabidopsis Transcription Factor Genes and Paralogs (GID) numbers; the GID numbers of the paralogs discovered Table 8: Arabidopsis Paralog Paralog Transcription Factor Arabidopsis GID Nucleotide SEQ in a database search; and the SEQ ID NOs assigned to the SEQ ID NO: TF GID No No. ID NO: paralogs. 11 G47 G2133 1495 39 G148 G142 33 TABLE 8 43 G153 G152 1365 G1760 1459 - ArabidopsisPOPSS TranscriptionSPO FOSSSSFactor Genes alland ParalogsSES G860 1419 Table 8: Arabidopsis Paralog Paralog 105 G485 G1364 1439 Transcription Factor Arabidopsis GID Nucleotide SEQ G2345 15O1 SEQ ID NO: TF GID No No. ID NO: G481 1395 G482 1397 7 G30 G1791 1461 121 G627 G149 1363 G1792 1463 161 G975 G1387 1443 G1795 146S G2S83 1515 US 2011/007880.6 A1 Mar. 31, 2011 74

TABLE 8-continued TABLE 8-continued Arabidopsis Transcription Factor Genes and Paralogs Arabidopsis Transcription Factor Genes and Paralogs Table 8: Arabidopsis Paralog Paralog Table 8: Arabidopsis Paralog Paralog Transcription Factor Arabidopsis GID Nucleotide SEQ Transcription Factor Arabidopsis GID Nucleotide SEQ SEQ ID NO: TF GID No No. ID NO: SEQ ID NO: TF GID No No. ID NO: 163 O11 G154 367 505 G2717 G204 1373 2O7 357 G1452 451 G2709 1525 GS12 4O1 507 G2718 G1816 287 225 452 G1357 437 G225 1375 GS12 4O1 G226 1377 233 482 G1888 477 G682 1407 277 792 G1791 461 511 G2741 G1435 1449 G1795 465 593 G2933 G2928 1539 G30 7 G2932 1541 281 797 G1798 283 607 G2979 G298O 1547 283 798 G1797 281 609 G2981 G2982 1551 287 816 G225 375 611 G2982 G2981 1549 G226 377 615 G2990 G2989 1553 G2718 507 G682 407 303 863 G2334 499 0355 Table 9 lists the gene identification number (GID) 341 G2882 537 and relationships for homologous (found using analyses 371 G2199 497 according to Example IX) and variant sequences for the 393 G2334 G1863 303 sequences of the Sequence Listing. Table 9. Similarity rela tionships found within the Sequence Listing

TABLE 9 Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 685 DNA Glycine max Predicted polypeptide sequence is orthologous to G30, G1792 686 DNA Glycine max Predicted polypeptide sequence is orthologous to G30, G1792 687 DNA Glycine max Predicted polypeptide sequence is orthologous to G30, G1792 688 DNA Glycine max Predicted polypeptide sequence is orthologous to G30, G1792 689 DNA Glycine max Predicted polypeptide sequence is orthologous to G30, G1792 690 PRT Oryza sativa Orthologous to G30, G1792 691 DNA Zea mays Predicted polypeptide sequence is orthologous to G30, G1792 702 DNA Glycine max Predicted polypeptide sequence is orthologous to G47 703 PRT Oryza sativa Orthologous to G47 704 DNA Glycine max Predicted polypeptide sequence is orthologous to G148 705 DNA Glycine max Predicted polypeptide sequence is orthologous to G148 706 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G148 707 PRT Oryza sativa Orthologous to G148 708 DNA Zea mays Predicted polypeptide sequence is orthologous to G148 709 DNA Zea mays Predicted polypeptide sequence is orthologous to G148 710 DNA Zea mays Predicted polypeptide sequence is orthologous to G148 711 DNA Zea mays Predicted polypeptide sequence is orthologous to G148 712 DNA Zea mays Predicted polypeptide sequence is orthologous to G148 713 DNA Glycine max Predicted polypeptide sequence is orthologous to G153 714 DNA Glycine max Predicted polypeptide sequence is orthologous to G153 715 PRT Oryza sativa Orthologous to G153 716 DNA Zea mays Predicted polypeptide sequence is orthologous to G153 717 DNA Zea mays Predicted polypeptide sequence is orthologous to G153 718 DNA Zea mays Predicted polypeptide sequence is orthologous to G153 798 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 799 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 800 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 8O1 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 8O2 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 803 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 804 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 805 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 806 DNA Glycine max Predicted polypeptide sequence is orthologous to G485 US 2011/007880.6 A1 Mar. 31, 2011 75

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 807 PRT Oryza sativa Orthologous to G485 808 PRT Oryza sativa Orthologous to G485 809 PRT Oryza sativa Orthologous to G485 810 PRT Oryza sativa Orthologous to G485 811 PRT Oryza sativa Orthologous to G485 812 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G485 813 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 814 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 815 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 816 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 817 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 818 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 819 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 82O DNA Zea mays Predicted polypeptide sequence is orthologous to G485 821 DNA Zea mays Predicted polypeptide sequence is orthologous to G485 822 DNA Glycine max Predicted polypeptide sequence is orthologous to G627 823 DNA Glycine max Predicted polypeptide sequence is orthologous to G627 824 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G627 902 DNA Glycine max Predicted polypeptide sequence is orthologous to G975 903 DNA Glycine max Predicted polypeptide sequence is orthologous to G975 904 DNA Glycine max Predicted polypeptide sequence is orthologous to G975 905 DNA Glycine max Predicted polypeptide sequence is orthologous to G975 906 DNA Glycine max Predicted polypeptide sequence is orthologous to G975 907 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G975 908 PRT Oryza sativa Orthologous to G975 909 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G975 910 DNA Zea mays Predicted polypeptide sequence is orthologous to G975 911 DNA Zea mays Predicted polypeptide sequence is orthologous to G975 912 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 913 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 914 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 915 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 916 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 917 DNA Glycine max Predicted polypeptide sequence is orthologous to G1011 918 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1011 919 PRT Oryza sativa Orthologous to G1011 920 DNA Zea mays Predicted polypeptide sequence is orthologous to G1011 921 DNA Zea mays Predicted polypeptide sequence is orthologous to G1011 922 DNA Zea mays Predicted polypeptide sequence is orthologous to G1011 923 DNA Zea mays Predicted polypeptide sequence is orthologous to G1011 924 DNA Zea mays Predicted polypeptide sequence is orthologous to G1011 968 DNA Glycine max Predicted polypeptide sequence is orthologous to G1274 969 DNA Glycine max Predicted polypeptide sequence is orthologous to G1274 970 PRT Oryza sativa Orthologous to G1274 971 PRT Oryza sativa Orthologous to G1274 972 DNA Zea mays Predicted polypeptide sequence is orthologous to G1274 973 DNA Zea mays Predicted polypeptide sequence is orthologous to G1274 974 DNA Zea mays Predicted polypeptide sequence is orthologous to G1274 975 DNA Zea mays Predicted polypeptide sequence is orthologous to G1274 982 DNA Glycine max Predicted polypeptide sequence is orthologous to G1357, G1452 O14 DNA Glycine max Predicted polypeptide sequence is orthologous to G1482 O15 DNA Glycine max Predicted polypeptide sequence is orthologous to G1482 O16 DNA Glycine max Predicted polypeptide sequence is orthologous to G1482 O17 DNA Glycine max Predicted polypeptide sequence is orthologous to G1482 O18 DNA Glycine max Predicted polypeptide sequence is orthologous to G1482 O19 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1482 O2O DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1482 O21 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1482 O22 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1482 O23 PRT Oryza sativa Orthologous to G1482 O24 PRT Oryza sativa Orthologous to G1482 O25 DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O26 DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O27 DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O28 DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O29 DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O3O DNA Zea mays Predicted polypeptide sequence is orthologous to G1482 O31 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1510 O32 PRT Oryza sativa Orthologous to G1510 US 2011/007880.6 A1 Mar. 31, 2011 76

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship

O36 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1660 O37 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1660 O38 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G1660 O39 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G1660 O40 PRT Oryza sativa Orthologous to G1660 O41 PRT Oryza sativa Orthologous to G1660 O42 PRT Oryza sativa Orthologous to G1660 O43 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1660 O44 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1660 O45 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1660 O46 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1660 O47 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1660 O48 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1730 051 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1779 052 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1779 053 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1779 OS4 PRT Oryza sativa Orthologous to G1779 055 DNA Zea mays Predicted po ypeptide sequence is or O Ogous 057 DNA Glycine max Predicted po ypeptide sequence is or O Ogous G2718 OS8 DNA Glycine max Predicted po ypeptide sequence is or O Ogous G2718 059 DNA Glycine max Predicted po ypeptide sequence is or O Ogous G2718 O60 DNA Glycine max Predicted po ypeptide sequence is or O Ogous G2718 O61 DNA Glycine max Predicted po ypeptide sequence is or O ogous G2718 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous G2718 PRT Oryza sativa Orthologous to G1816, G2718 PRT Oryza sativa Orthologous to G1816, G2718 DNA Zea mays Predicted po ypeptide sequence is or O Ogous G2718 DNA Zea mays Predicted po ypeptide sequence is or O Ogous G2718 O98 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1988 O99 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G1988 OO DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G1988 O1 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1988 O2 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1988 O3 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1988 O4 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G1988 05 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2O41 O6 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2O41 O7 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2O41 16 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 17 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 18 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 19 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 2O DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 21 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2142 22 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2142 23 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2142 24 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G2142 209 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 210 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 211 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 212 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 213 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 214 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 215 DNA Glycine max Predicted po ypeptide sequence is or O Ogous O G2717 216 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2717 217 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2717 218 DNA Oryza sativa Predicted po ypeptide sequence is or O Ogous O G2717 219 PRT Oryza sativa Orthologous to G2717 220 PRT Oryza sativa Orthologous to G2717 221 PRT Oryza sativa Orthologous to G2717 222 DNA Zea mays Predicted po ypeptide sequence is or O Ogous O G2717 US 2011/007880.6 A1 Mar. 31, 2011 77

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 223 DNA Zea mays Predicted polypeptide sequence is orthologous to G2717 224 DNA Zea mays Predicted polypeptide sequence is orthologous to G2717 225 DNA Zea mays Predicted polypeptide sequence is orthologous to G2717 226 DNA Zea mays Predicted polypeptide sequence is orthologous to G2717 227 DNA Zea mays Predicted polypeptide sequence is orthologous to G2717 229 DNA Glycine max Predicted polypeptide sequence is orthologous to G274 230 DNA Glycine max Predicted polypeptide sequence is orthologous to G274 231 DNA Glycine max Predicted polypeptide sequence is orthologous to G274 232 PRT Oryza sativa Orthologous to G2741 233 PRT Oryza sativa Orthologous to G2741 234 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G274 235 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G274 236 DNA Zea mays Predicted polypeptide sequence is orthologous to G274 314 DNA Glycine max Predicted polypeptide sequence is orthologous to G2933 315 PRT Oryza sativa Orthologous to G2933 316 DNA Zea mays Predicted polypeptide sequence is orthologous to G2933 318 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 319 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 32O DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 321 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 322 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 323 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 324 DNA Glycine max Predicted polypeptide sequence is orthologous to G298 325 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 326 DNA Glycine max Predicted polypeptide sequence is orthologous to G2981, G2982 327 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2981, G2982 334 PRT Oryza sativa Orthologous to G2990 335 DNA Zea mays Predicted polypeptide sequence is orthologous to G2990 336 DNA Zea mays Predicted polypeptide sequence is orthologous to G2990 346 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G3083 353 G30 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1792 3S4 G30 PRT Arabidopsis thaliana Paralogous to G1792 361 G142 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G148 362 G142 PRT Arabidopsis thaliana Paralogous to G148 363 G149 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G627 364 G149 PRT Arabidopsis thaliana Paralogous to G627 36S G152 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G153 366 G152 PRT Arabidopsis thaliana Paralogous to G153 367 G154 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1011 368 G154 PRT Arabidopsis thaliana Paralogous to G101 373 G204 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2717 374 G204 PRT Arabidopsis thaliana Paralogous to G2717 375 G225 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1816, G2718 376 G225 PRT Arabidopsis thaliana Paralogous to G1816, G2718 377 G226 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1816, G2718 378 G226 PRT Arabidopsis thaliana Paralogous to G1816, G2718 39S G481 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G485 396 G481 PRT Arabidopsis thaliana Paralogous to G485 397 G482 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G485 398 G482 PRT Arabidopsis thaliana Paralogous to G485 401 GS12 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1357, G1452 402 GS12 PRT Arabidopsis thaliana Paralogous to G1357, G1452 407 G682 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1816, G2718 408 G682 PRT Arabidopsis thaliana Paralogous to G1816, G2718 419 G860 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G153 42O G860 PRT Arabidopsis thaliana Paralogous to G153 437 G1357 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1452 US 2011/007880.6 A1 Mar. 31, 2011 78

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQI Protein Species from which NO: D (PRT) Sequence is Derived Relationship 438 357 PRT Arabidopsis thaliana Paralogous to G1452 439 364 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G485 440 364 PRT Arabidopsis thaliana Paralogous to G485 443 387 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G975 444 387 PRT Arabidopsis thaliana Paralogous to G975 449 435 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2741 450 435 PRT Arabidopsis thaliana Paralogous to G274 451 452 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1357 452 452 PRT Arabidopsis thaliana Paralogous to G1357 459 760 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G153 460 760 PRT Arabidopsis thaliana Paralogous to G153 461 791 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G30, G1792 462 791 PRT Arabidopsis thaliana Paralogous to G30, G1792 463 792 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G30 464 792 PRT Arabidopsis thaliana Paralogous to G30 465 795 DNA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G30, G1792 466 Arabidopsis thaliana C G30, G1792 467 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1798 468 RT Arabidopsis thaliana Paralogous to G1798 469 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1797 470 RT Arabidopsis thaliana Paralogous to G1797 471 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2718 472 Arabidopsis thaliana Paralogous to G2718 475 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2334 476 Arabidopsis thaliana Paralogous to G2334 477 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1482 478 RT Arabidopsis thaliana Paralogous to G1482 495 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G47 496 RT Arabidopsis thaliana Paralogous to G47 497 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2207 498 Arabidopsis thaliana Paralogous to G2207 499 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1863 500 Arabidopsis thaliana Paralogous to G1863 5O1 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G485 502 Arabidopsis thaliana Paralogous to G485 515 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G975 S16 RT Arabidopsis thaliana Paralogous to G975 525 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2717 526 RT Arabidopsis thaliana Paralogous to G2717 527 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G1816 528 Arabidopsis thaliana Paralogous to G1816 537 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2041 538 Arabidopsis thaliana Paralogous to G204 539 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2933 S4O RT Arabidopsis thaliana Paralogous to G2933 S41 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2933 S42 RT Arabidopsis thaliana Paralogous to G2933 547 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2979 S48 Arabidopsis thaliana Paralogous to G2979 S49 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2982 550 Arabidopsis thaliana Paralogous to G2982 551 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2981 552 Arabidopsis thaliana Paralogous to G298 553 NA Arabidopsis thaliana Predicted polypeptide sequence is paralogous to G2990 554 RT Arabidopsis thaliana Paralogous to G2990 559 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G30 S60 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G148 561 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G148 S62 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G148 575 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G627 581 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1011 592 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1482 593 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1660 6O1 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1988 60S DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2207 606 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2207 614 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2717 624 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2933 US 2011/007880.6 A1 Mar. 31, 2011 79

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship

626 DNA Oryza sativa Predicted po e (SCC uence is or O Ogous O G2981 628 DNA Oryza sativa Predicted po e (SCC uence is or O Ogous O G2990 630 DNA Oryza sativa Predicted po e (SCC uence is or O Ogous O G3076 631 DNA Oryza sativa Predicted po e (SCC uence is or O Ogous O G3O83 633 DNA Glycine max Predicted po e (SCC uence is or O Ogous o G30, G1792 634 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G153 641 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G485 651 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G1011 662 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G1510 663 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G1816, G2718 666 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G2142 667 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G22O7 670 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G2717 671 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G2741 683 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G2981 685 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G2990 687 DNA Glycine max Predicted po e (SCC uence is or O Ogous O G3O83 695 DNA Medicago truncatula Predicted po e (SCC uence is or O Ogous O G627 696 DNA Medicago truncatula Predicted po e (SCC uence is or O Ogous O G1011 703 DNA Medicago truncatula Predicted po e (SCC uence is or O Ogous O G1482 708 DNA Medicago truncatula Predicted po e (SCC uence is or O Ogous O G2142 715 DNA Medicago truncatula Predicted po e (SCC uence is or O Ogous O G2990 718 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G47 725 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G485 726 DNA Hordeum vulgare Predicted po e (SCC uence is or O ogous O G485 727 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G627 733 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G975 744 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G2717 745 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G2741 753 DNA Hordeum vulgare Predicted po e (SCC uence is or O Ogous O G3O83 754 DNA Zea mays Predicted po e (SCC uence is or O Ogous o G30, G1792 755 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G148 756 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G148 757 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G153 758 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G153 776 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G485 777 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G485 778 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G485 786 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G1011 787 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G1011 802 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G1482 803 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G1660 804 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G1660 810 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2O41 812 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2142 818 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2717 819 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2741 828 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2933 829 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2981 830 DNA Zea mays Predicted po e (SCC uence is or O Ogous O G2981 834 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G3O 835 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G148 846 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G485 847 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G485 848 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G485 849 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G485 8SO DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G627 858 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1011 859 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1011 860 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1011 879 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1482 880 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1510 881 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1660 883 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G1816, G2718 899 DNA Tritictim aestivum Predicted po e (SCC uence is or O Ogous O G2142 900 DNA Tritictim aestivum Predicted po 3. e (SCC uence is or O Ogous O G2142 US 2011/007880.6 A1 Mar. 31, 2011 80

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQID Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 901 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G2142 907 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G2717 908 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G2717 909 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G2717 921 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G2990 924 DNA Tritictim aestivum Predicted polypeptide sequence is orthologous to G3083 929 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1011 932 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1510 943 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G148 944 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G148 945 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G153 946 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G153 98O DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G485 981 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G485 982 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G627 2003 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G975 2004 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G975 2005 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G975 2006 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G975 2007 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1011 2008 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1011 2009 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1011 2010 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1011 2017 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1274 2018 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1274 2020 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1357, G1452 2032 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1482 2O33 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1510 2O34 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1660 2035 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1660 2O36 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1779 2045 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G1988 2046 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2041 2047 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2041 2048 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2142 2049 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2142 2OSO DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2142 2051 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2142 2052 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2207 2053 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2207 2055 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2334 2O68 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2717 2069 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2717 2070 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2741 2O71 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2741 2090 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2933 2092 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2979 2093 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2981, G2982 2095 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2990 2096 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G2990 2100 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G3076 2101 DNA Lycopersicon escientiin Predicted polypeptide sequence is orthologous to G3083 2110 G2041 1 DNA Arabidopsis thaliana Expression construct P13846 (sequence variant) 2124 G3380 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1795 Member of G1792 clade 2 25 G3380 PRT Oryza sativa Orthologous to G1795 Member of G1792 clade 2 26 G3381 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G30 Member of G1792 clade 2 27 G3381 PRT Oryza sativa Orthologous to G30 Member of G1792 clade 2128 G3383 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G1792 Member of G1792 clade 2 29 G3383 PRT Oryza sativa Orthologous to G1792 Member of G1792 clade 2130 G3392 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G682 Member of G1816 and G2718 clade 2131 G3392 PRT Oryza sativa Orthologous to G682 Member of G1816 and G2718 clade 2132 G3393 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G682 Member of G1816 and G2718 clade 2133 G3393 PRT Oryza sativa Orthologous to G682 Member of G1816 and G2718 clade US 2011/007880.6 A1 Mar. 31, 2011 81

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQI Protein Species from which NO: GID (PRT) Sequence is Derived Relationship

2 34 G3394 NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 35 C RT Oryza sativa Ogous O G485 Mem e O G485 clade 2 36 D NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 37 RT Oryza sativa Ogous O G485 Mem e O G485 clade 38 NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 39 G.3396 RT Oryza sativa Ogous O G485 Mem e O G485 clade 40 G3397 NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 41 G3397 RT Oryza sativa Ogous O G485 Mem e O G485 clade 42 G3398 NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 43 G3398 RT Oryza sativa Ogous O G485 Mem e O G485 clade 44 G3429 NA Oryza sativa (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 45 G3429 RT Oryza sativa Ogous O G485 Mem e O G485 clade 46 G3431 NA Zea mays (CO ypeptide sequence is orthologous to G682 er of G1816 and G2718 clade 47 G3431 RT Zea mays Ogous O G682 Mem G1816 and G2718 clade 48 G3434 NA Zea mays (CO ypeptide sequence is orthologous to G485 er of G485 clade 49 RT Zea mays Ogous O G485 Mem e O G485 clade 50 NA Zea mays (CO ypeptide sequence is orthologous to G482 er of G485 clade 51 RT Zea mays ogous O G482 Mem e O G485 clade 52 NA Zea mays (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 53 G3436 RT Zea mays Ogous O G485 Mem e O G485 clade S4 G3437 NA Zea mays (CO ypeptide sequence is orthologous to G485 er of G485 clade 2 55 G3437 RT Zea mays Ogous O G485 Mem e O G485 clade 56 G3444 NA Zea mays (CO ypeptide sequence is orthologous to G682 er of G1816 and G2718 c 57 G3444 RT Zea mays Ogous O G682 Mem G1816 and G2718 clade 58 G3445 NA Glycine max (CO ypeptide sequence is orthologous to G225 er of G1.8 6 and G2718 c e 2 59 G3445 RT Glycine max Ogous O G.225 Mem e O G1816 and G2718 clade 60 G3446 NA Glycine max (CO ypepti SeleCe. is orthologous to G225 er of G 816 G2718 c e 2 61 G3446 RT Glycine max Ogous O G2 Mem e O G1816 and G2718 clade 62 G3447 NA Glycine max (CO ypep SeleCe. is orthologous to G225 er of G 816 G2718 c e 2 63 G3447 RT Glycine max Ogous O G2 5 Mem e O G1816 and G2718 clade 64 G3448 NA Glycine max (CO ypep e SeleCe. is orthologous to G225 er of G 816 G2718 c e 65 G3448 RT Glycine max Ogous O G.225 Mem e O G1816 and G2718 clade 66 G3449 NA Glycine max (CO (ScClelC6 is orthologous to G225 er of G1.8 and G2718 c e 67 G3449 RT Glycine max Ogous 25 Mem e O G1816 and G2718 clade 68 G3450 NA Glycine max (CO ypepti e sequence i ologous to G682 er of G1816 and G2718 c 8. e 69 G3450 RT Glycine max Ogous O G682 Mem e O G1816 and G2718 clade 70 G3470 NA Glycine max (CO ypeptide sequence is orthologous to G482 er of G485 clade 71 G3470 RT Glycine max Ogous O G482 Mem e O G485 clade 72 G3471 NA Glycine max (CO ypeptide sequence is orthologous to G482 er of G485 clade 73 G3471 RT Glycine max Ogous O G482 Mem e O G485 clade 74 G3472 NA Glycine max (CO ypeptide sequence is orthologous to G485 er of G485 clade 75 G3472 RT Glycine max Ogous O G485 Mem e O G485 clade 76 G3473 NA Glycine max (CO ypeptide sequence is orthologous to G485 er of G485 clade 77 G3473 RT Glycine max Ogous O G485 Mem e O G485 clade 78 G3474 NA Glycine max (CO ypeptide sequence is orthologous to G485 Memb er of G485 clade 79 G3474 RT Glycine max O Ogous O G485 Mem e O G485 clade US 2011/007880.6 A1 Mar. 31, 2011 82

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQI Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 218O G3475 DNA Glycine max icted po ypep ide sequence is orthologous O G485 Member of G485 clade 2181 G3475 PRT Glycine max hologous O G485 Member o G485 clade 2182 G3476 DNA Glycine max icted po ypeptide sequence is orthologous O G482 Member of G485 clade 2 83 G3476 PRT Glycine max hologous O G485 Member o G482 clade 21.84 G3477 DNA Glycine max icted po ypeptide sequence is orthologous O G482 Member of G485 clade 218S G3477 PRT Glycine max hologous O G485 Member o G482 clade 2186 G3478 DNA Glycine max icted po ypeptide sequence is orthologous O G485 Member of G485 clade 2 87 G3478 PRT Glycine max hologous O G485 Member o G485 clade 2.188 G3479 DNA Oryza sativa icted po ypeptide sequence is orthologous O G153 Member of G153 clade 2 89 G3479 PRT Oryza sativa hologous O G153 Member o 2190 G3484 DNA Glycine max icted po ypeptide sequence O G153 Member of G153 clade 2 91 G3484 PRT Glycine max hologous O G153 Member o 2.192 G3485 DNA Glycine max icted po ypeptide sequence O G153 Member of G153 clade 21.93 G3485 PRT Glycine max hologous O G153 Member o 2194 G3487 DNA Zea mays icted po ypeptide sequence O G153 Member of G153 clade 2195 G3487 PRT Zea mays hologous O G153 Member o 21.96 G3488 DNA Zea mays icted po ypeptide sequence O G153 Member of G153 clade 2197 G3488 PRT Zea mays hologous O G153 Member o 21.98 G3489 DNA Zea mays icted po ypeptide sequence O G153 Member of G153 clade 2199 G3489 PRT Zea mays hologous O G153 Member o 2208 G3515 DNA Oryza sativa icted po ypeptide sequence is orthologous Member of G1792 clade 2209 G3515 PRT Oryza sativa hologous O G30 Member of G1792 clade 2210 G3516 DNA Zea mays icted po ypeptide sequence is orthologous O G1792 Member of G1792 clade 2211 G3516 PRT Zea mays hologous O G1792 Member of G1792 clade 2212 G3517 DNA Zea mays icted po ypeptide sequence is orthologous O G1791 Member of G1792 clade 2213 G3517 PRT Zea mays hologous O G1791 Member of G1792 clade 2214 G3518 DNA Glycine max icted po ypeptide sequence is orthologous O G1792 Member of G1792 clade 2215 G3518 PRT Glycine max hologous O G1792 Member of G1792 clade 2216 G3519 DNA Glycine max icted po ypeptide sequence is orthologous O G1792 Member of G1792 clade 2217 G3519 PRT Glycine max hologous O G1792 Member of G1792 clade 2218 G3520 DNA Glycine max icted po ypeptide sequence is orthologous O G1792 Member of G1792 clade 2219 G3520 PRT Glycine max hologous O G1792 Member of G1792 clade 2220 G3527 DNA Glycine max 2221 G3527 PRT Glycine max 2222 G3528 DNA Glycine max 2223 G3528 PRT Glycine max 2224 G3643 DNA Glycine max icted po ypep ide sequence is orthologous O G47 Member of G47 and G2133 cla e 2225 G3643 PRT Glycine max Or hologous to G47 Member of G47 and G2133 clade 2226 G3644 DNA Oryza sativa icted po ide sequence is orthologous O G47 Member of G47 and G2133 cla e 2227 G3644 PRT Oryza sativa Or hologous to G47 Member of G47 and G2133 clade 2228 G3645 DNA Brassica rapa icted po ide sequence is orthologous O G47 Member of G47 and G2133 cla e 2229 G3645 PRT Brassica rapa Or hologous to G47 Member of G47 and G2133 clade 2230 G3646 DNA Brassica oleracea icted po ide sequence is orthologous O G2133 Member of G47 and G2133 cla e 2231 G3646 PRT Brassica oleracea Or hologous to G2133 Member of G47 and G2 33 clade 2232 G3647 DNA Zinnia elegans icted po ide sequence is orthologous O G47 Member of G47 and G2133 cla e 2233 G3647 PRT Zinnia elegans Or hologous to G47 Member of G47 and G2133 clade 2234 G3649 DNA Oryza sativa icted po ypeptide sequence is orthologous o G47 and 33 Mem ber of G47 and G2 33 clade US 2011/007880.6 A1 Mar. 31, 2011 83

TABLE 9-continued Similarity relationships found within the Sequence Listing

Table 9: DNA or SEQI Protein Species from which NO: GID (PRT) Sequence is Derived Relationship 2235 G3649 PRT Oryza sativa Orthologous to G47 and G2133 Member of G47 and G2133 clade 2236 G3651 DNA Oryza sativa Predicted polypeptide sequence is orthologous to G2133 Member of G47 and G2133 clade 2237 G3651 PRT Oryza sativa Orthologous to G2133 Member of G47 and G2133 clade

EXAMPLES ate double stranded cDNA, blunting cDNA ends, followed by 0356. The invention, now being generally described, will ligation of the MARATHON Adaptor to the cDNA to form a be more readily understood by reference to the following library of adaptor-ligated ds cDNA. examples, which are included merely for purposes of illus 0361 Gene-specific primers were designed to be used tration of certain aspects and embodiments of the present along with adaptor specific primers for both 5' and 3' RACE invention and are not intended to limit the invention. It will be reactions. Nested primers, rather than single primers, were recognized by one of skill in the art that a transcription factor used to increase PCR specificity. Using 5' and 3' RACE reac that is associated with a particular first trait may also be tions, 5' and 3' RACE fragments were obtained, sequenced associated with at least one other, unrelated and inherent and cloned. The process can be repeated until 5' and 3' ends of second trait which was not predicted by the first trait. the full-length gene were identified. Then the full-length 0357 The complete descriptions of the traits associated cDNA was generated by PCR using primers specific to 5' and with each polynucleotide of the invention are fully disclosed 3' ends of the gene by end-to-end PCR. in Table 4 and Table 6. The complete description of the transcription factor gene family and identified conserved Example II domains of the polypeptide encoded by the polynucleotide is Construction of Expression Vectors fully disclosed in Table 5. 0362. The sequence was amplified from a genomic or Example I cDNA library using primers specific to sequences upstream and downstream of the coding region. The expression vector Full Length Gene Identification and Cloning was pMEN20 or pMEN65, which are both derived from 0358 Putative transcription factor sequences (genomic or pMON316 (Sanders et al. (1987) Nucleic Acids Res. 15:1543 ESTs) related to known transcription factors were identified 1558) and contain the CaMV 35S promoter to express trans in the Arabidopsis thaliana GenBank database using the genes. To clone the sequence into the vector, both pMEN20 thlastin sequence analysis program using default parameters and the amplified DNA fragment were digested separately and a P-value cutoff threshold of-4 or -5 or lower, depending with SalI and NotI restriction enzymes at 37°C. for 2 hours. on the length of the query sequence. Putative transcription The digestion products were subject to electrophoresis in a factor sequence hits were then screened to identify those 0.8% agarose gel and visualized by ethidium bromide stain containing particular sequence strings. If the sequence hits ing. The DNA fragments containing the sequence and the contained such sequence strings, the sequences were con linearized plasmid were excised and purified by using a firmed as transcription factors. QIAQUICK gel extraction kit (Qiagen, Valencia Calif.). The 0359 Alternatively, Arabidopsis thaliana cDNA libraries fragments of interest were ligated at a ratio of 3:1 (vector to derived from different tissues or treatments, or genomic insert). Ligation reactions using T4 DNA ligase (New libraries were screened to identify novel members of a tran England Biolabs, Beverly Mass.) were carried out at 16° C. Scription family using a low stringency hybridization for 16 hours. The ligated DNAs were transformed into com approach. Probes were synthesized using gene specific prim petent cells of the E. coli strain DH5alpha by using the heat ers in a standard PCR reaction (annealing temperature 60°C.) shock method. The transformations were plated on LB plates and labeled with PdCTP using the High Prime DNA Label containing 50 mg/l kanamycin (Sigma Chemical Co. St. ing Kit (Boehringer Mannheim Corp. (now Roche Diagnos Louis Mo.). Individual colonies were grown overnight in five tics Corp., Indianapolis, Ind.). Purified radiolabelled probes milliliters of LB broth containing 50 mg/l kanamycin at 37° were added to filters immersed in Church hybridization C. Plasmid DNA was purified by using Qiaquick Mini Prep medium (0.5 M NaPO pH 7.0, 7% SDS, 1% w/v bovine kits (Qiagen). serum albumin) and hybridized overnight at 60°C. with shak ing. Filters were washed two times for 45 to 60 minutes with Example III 1xSCC, 1% SDS at 60° C. Transformation of Agrobacterium with the Expres 0360. To identify additional sequence 5' or 3' of a partial sion Vector cDNA sequence in a cDNA library, 5' and 3' rapid amplifica tion of cDNA ends (RACE) was performed using the MARA 0363. After the plasmid vector containing the gene was THON cDNA amplification kit (Clontech, Palo Alto, Calif.). constructed, the vector was used to transform Agrobacterium Generally, the method entailed first isolating poly(A) mRNA, tumefaciens cells expressing the gene products. The stock of performing first and second strand cDNA synthesis to gener Agrobacterium tumefaciens cells for transformation were US 2011/007880.6 A1 Mar. 31, 2011

made as described by Nagel et al. (1990) FEMS Microbiol the plastic wrap was removed and pots are turned upright. The Letts. 67: 325-328. Agrobacterium strain ABI was grown in immersion procedure was repeated one week later, for a total 250 ml LB medium (Sigma) overnight at 28°C. with shaking of two immersions per pot. Seeds were then collected from until an absorbance over 1 cm at 600 nm (A) of 0.5-1.0 was each transformation pot and analyzed following the protocol reached. Cells were harvested by centrifugation at 4,000xg described below. for 15 min at 4° C. Cells were then resuspended in 250 ul chilled buffer (1 mM HEPES, pH adjusted to 7.0 with KOH). Cells were centrifuged again as described above and resus Example V pended in 125 ml chilled buffer. Cells were then centrifuged and resuspended two more times in the same HEPES bufferas Identification of Arabidopsis Primary Transformants described above at a volume of 100 ml and 750 ul, respec tively. Resuspended cells were then distributed into 40 ul 0368. Seeds collected from the transformation pots were aliquots, quickly frozen in liquid nitrogen, and stored at -80° sterilized essentially as follows. Seeds were dispersed into in C. a solution containing 0.1% (v/v) Triton X-100 (Sigma) and 0364 Agrobacterium cells were transformed with plas sterile water and washed by shaking the suspension for 20 mids prepared as described above following the protocol min. The wash solution was then drained and replaced with described by Nagel et al. (supra). For each DNA construct to freshwash solution to wash the seeds for 20 min with shaking. be transformed, 50-100 ng DNA (generally resuspended in 10 After removal of the ethanol/detergent solution, a solution mM Tris-HCl, 1 mM EDTA, pH 8.0) was mixed with 40 ul of containing 0.1% (v/v) Triton X-100 and 30% (v/v) bleach Agrobacterium cells. The DNA/cell mixture was then trans (CLOROX; Clorox Corp. Oakland Calif.) was added to the ferred to a chilled cuvette with a 2 mm electrode gap and seeds, and the Suspension was shaken for 10 min. After subject to a 2.5 kV charge dissipated at 25 LF and 200 uF removal of the bleach/detergent solution, seeds were then using a Gene Pulser II apparatus (Bio-Rad, Hercules, Calif.). washed five times in sterile distilled water. The seeds were After electroporation, cells were immediately resuspended in stored in the last wash water at 4°C. for 2 days in the dark 1.0 ml LB and allowed to recover without antibiotic selection before being plated onto antibiotic selection medium (1x for 2-4 hours at 28°C. in a shaking incubator. After recovery, Murashige and Skoog salts (pH adjusted to 5.7 with 1M cells were plated onto selective medium of LB broth contain KOH), 1x Gamborg's B-5 vitamins, 0.9% phytagar (Life ing 100 g/ml spectinomycin (Sigma) and incubated for Technologies), and 50 mg/l kanamycin). Seeds were germi 24-48 hours at 28°C. Single colonies were then picked and nated under continuous illumination (50-75 uE/m/sec) at inoculated in fresh medium. The presence of the plasmid 22-23°C. After 7-10 days of growth under these conditions, construct was verified by PCR amplification and sequence kanamycin resistant primary transformants (T1 generation) analysis. were visible and obtained. These seedlings were transferred first to fresh selection plates where the seedlings continued to Example IV grow for 3-5 more days, and then to soil (Pro-Mix BX potting medium). Transformation of Arabidopsis Plants with Agrobac 0369 Primary transformants were crossed and progeny terium tumefaciens with Expression Vector seeds (T) collected; kanamycin resistant seedlings were 0365. After transformation of Agrobacterium tumefaciens selected and analyzed. The expression levels of the recombi with plasmid vectors containing the gene, single Agrobacte nant polynucleotides in the transformants varies from about a rium colonies were identified, propagated, and used to trans 5% expression level increase to a least a 100% expression form Arabidopsis plants. Briefly, 500 ml cultures of LB level increase. Similar observations are made with respect to medium containing 50 mg/l kanamycin were inoculated with polypeptide level expression. the colonies and grown at 28°C. with shaking for 2 days until an optical absorbance at 600 nm wavelength over 1 cm (Ago) Example VI of >2.0 is reached. Cells were then harvested by centrifuga tion at 4,000xg for 10 min, and resuspended in infiltration Identification of Arabidopsis Plants with Transcrip medium (/2xMurashige and Skoog salts (Sigma), 1X Gam tion Factor Gene Knockouts borg's B-5 vitamins (Sigma), 5.0% (w/v) sucrose (Sigma), 0.044 uMbenzylamino purine (Sigma), 200 ul/1 Silwet L-77 0370. The screening of insertion mutagenized Arabidop (Lehle Seeds) until an Asoo of 0.8 was reached. sis collections for null mutants in a known target gene was 0366 Prior to transformation, Arabidopsis thaliana seeds essentially as described in Krysan et al. (1999) Plant Cell 11: (ecotype Columbia) were sown at a density of ~10 plants per 2283-2290. Briefly, gene-specific primers, nested by 5-250 4" pot onto Pro-Mix BX potting medium (Hummert Interna base pairs to each other, were designed from the 5' and 3 tional) covered with fiberglass mesh (18 mmx 16 mm) Plants regions of a known target gene. Similarly, nested sets of were grown under continuous illumination (50-75 uE/m/ primers were also created specific to each of the T-DNA or sec) at 22-23°C. with 65-70% relative humidity. After about transposon ends (the “right' and “left borders). All possible 4 weeks, primary inflorescence stems (bolts) are cut off to combinations of gene specific and T-DNA/transposon prim encourage growth of multiple secondary bolts. After flower ers were used to detect by PCR an insertion event within or ing of the mature secondary bolts, plants were prepared for close to the target gene. The amplified DNA fragments were transformation by removal of all siliques and opened flowers. then sequenced which allows the precise determination of the 0367 The pots were then immersed upside down in the T-DNA/transposon insertion point relative to the target gene. mixture of Agrobacterium infiltration medium as described Insertion events within the coding or intervening sequence of above for 30 sec, and placed on their sides to allow draining the genes were deconvoluted from a pool comprising a plu into a 1'x2' flat surface covered with plastic wrap. After 24 h. rality of insertion events to a single unique mutant plant for US 2011/007880.6 A1 Mar. 31, 2011

functional characterization. The method is described in more ride in hexane and, after centrifugation, the two upper phases detail in Yu and Adam, U.S. application Ser. No. 09/177,733 were combined and evaporated. 2% methylene chloride in filed Oct. 23, 1998. hexane was added to the tubes and the samples were then extracted with one ml of water. The upper phase was Example VII removed, dried, and resuspended in 400 ul of 2% methylene chloride in hexane and analyzed by gas chromatography Identification of Modified Phenotypes in Overex using a 50 m DB-5 ms (0.25 mm ID, 0.25 um phase, J&W pression or Gene Knockout Plants Scientific). 0371 Experiments were performed to identify those trans 0376 Insoluble sugar levels were measured by the method formants or knockouts that exhibited modified biochemical essentially described by Reiter et al. (1999), Plant J. 12: characteristics. Among the biochemicals that were assayed 335-345. This method analyzes the neutral sugar composition were insoluble Sugars, such as arabinose, fucose, galactose, of cell wall polymers found in Arabidopsis leaves. Soluble mannose, rhamnose or xylose or the like; prenyl lipids, Such Sugars were separated from Sugar polymers by extracting as lutein, beta-carotene, Xanthophyl-1, Xanthophyll-2, chlo leaves with hot 70% ethanol. The remaining residue contain rophylls A or B, or alpha-, delta- or gamma-tocopherol or the ing the insoluble polysaccharides was then acid hydrolyzed like: fatty acids, such as 16:0 (palmitic acid), 16:1 (palmi with allose added as an internal standard. Sugar monomers toleic acid), 18:0 (stearic acid), 18:1 (oleic acid), 18:2 (li generated by the hydrolysis were then reduced to the corre noleic acid), 20:0, 18:3 (linolenic acid), 20:1 (eicosenoic sponding alditols by treatment with NaBH4, then were acety acid), 20:2, 22:1 (erucic acid) or the like; waxes, such as by lated to generate the volatile alditol acetates which were then altering the levels of C29, C31, or C33 alkanes; sterols, such analyzed by GC-FID. Identity of the peaks was determined by as brassicasterol, campesterol, Stigmasterol, sitosterol or Stig comparing the retention times of known Sugars converted to mastanol or the like, glucosinolates, protein or oil levels. the corresponding alditol acetates with the retention times of 0372 Fatty acids were measured using two methods peaks from wild-type plant extracts. Alditol acetates were depending on whether the tissue was from leaves or seeds. For analyzed on a Supelco SP-2330 capillary column (30 mx250 leaves, lipids were extracted and esterified with hot methan umx0.2 um) using a temperature program beginning at 180° olic HSO and partitioned into hexane from methanolic C. for 2 minutes followed by an increase to 220° C. in 4 brine. For seed fatty acids, seeds were pulverized and minutes. After holding at 220° C. for 10 minutes, the oven extracted in methanol: heptane:toluene: 2,2-dimethoxypro temperature is increased to 240°C. in 2 minutes and held at pane:HSO, (39:34:20:5:2) for 90 minutes at 80° C. After this temperature for 10 minutes and brought back to room cooling to room temperature the upper phase, containing the temperature. seed fatty acid esters, was subjected to GCanalysis. Fatty acid 0377 To identify plants with alterations in total seed oil or esters from both seed and leaf tissues were analyzed with a protein content, 150 mg of seeds from T2 progeny plants were SUPELCO SP-2330 column (Supelco, Bellefonte, Pa.). subjected to analysis by Near Infrared Reflectance Spectros 0373) Glucosinolates were purified from seeds or leaves copy (NIRS) using a Foss NirSystems Model 6500 with a by first heating the tissue at 95°C. for 10 minutes. Preheated spinning cup transport system. NIRS is a non-destructive ethanol: water (50:50) is added and after heating at 95°C. for analytical method used to determine seed oil and protein a further 10 minutes, the extraction solvent is applied to a composition. Infrared is the region of the electromagnetic DEAE SEPHADEX column (Pharmacia) which had been spectrum located after the visible region in the direction of previously equilibrated with 0.5M pyridine acetate. Desulfo longer wavelengths. Near infrared owns its name for being glucosinolates were eluted with 300 ul water and analyzed by the infrared region near to the visible region of the electro reverse phase HPLC monitoring at 226 mm. magnetic spectrum. For practical purposes, near infrared 0374 For wax alkanes, samples were extracted using an comprises wavelengths between 800 and 2500 nm. NIRS is identical method as fatty acids and extracts were analyzed on applied to organic compounds rich in O-H bonds (such as a HP 5890 GC coupled with a 5973 MSD. Samples were moisture, carbohydrates, and fats), C-H bonds (such as chromatographically isolated on a J&W DB35 mass spec organic compounds and petroleum derivatives), and N—H trometer (J&W Scientific Agilent Technologies. Folsom, bonds (such as proteins and amino acids). The NIRS analyti Calif.). cal instruments operate by Statistically correlating NIRS sig 0375 To measure prenyl lipid levels, seeds or leaves were nals at several wavelengths with the characteristic or property pulverized with 1 to 2% pyrogallol as an antioxidant. For intended to be measured. All biological Substances contain seeds, extracted samples were filtered and a portion removed thousands of C H, O—H, and N—H bonds. Therefore, the for tocopherol and carotenoid/chlorophyll analysis by HPLC. exposure to near infrared radiation of a biological sample, The remaining material was Saponified for Sterol determina Such as a seed, results in a complex spectrum which contains tion. For leaves, an aliquot was removed and diluted with qualitative and quantitative information about the physical methanol and chlorophyll A, chlorophyll B, and total caro and chemical composition of that sample. tenoids measured by spectrophotometry by determining opti 0378. The numerical value of a specific analyte in the cal absorbance at 665.2 nm, 652.5 nm, and 470 nm. An sample, such as protein content or oil content, is mediated by aliquot was removed for tocopherol and carotenoid/chloro a calibration approach known as chemometrics. Chemomet phyll composition by HPLC using a Waters uBondapak C18 rics applies statistical methods such as multiple linear regres column (4.6 mmx 150 mm). The remaining methanolic solu sion (MLR), partial least squares (PLS), and principle com tion was saponified with 10% KOH at 80° C. for one hour. The ponent analysis (PCA) to the spectral data and correlates samples were cooled and diluted with a mixture of methanol them with a physical property or other factor, that property or and water. A solution of 2% methylene chloride in hexane was factor is directly determined rather than the analyte concen mixed in and the samples were centrifuged. The aqueous tration itself. The method first provides “wet chemistry” data methanol phase was again re-extracted 2% methylene chlo of the samples required to develop the calibration. US 2011/007880.6 A1 Mar. 31, 2011

0379 Calibration of NIRS response was performed using 0384 Experiments were performed to identify those trans data obtained by wet chemical analysis of a population of formants or knockouts that exhibited modified Sugar sensing. Arabidopsis ecotypes that were expected to represent diver For Such studies, seeds from transformants were germinated sity of oil and protein levels. on media containing 5% glucose or 9.4% Sucrose which 0380. The exact oil composition of each ecotype used in normally partially restrict hypocotyl elongation. Plants with the calibration experiment was performed using gravimetric altered Sugar sensing may have either longer or shorter hypo analysis of oils extracted from seed samples (0.5g or 1.0 g) by cotyls than normal plants when grown on this media. Addi the accelerated solvent extraction method (ASE: Dionex tionally, other plant traits may be varied Such as root mass. Corp., Sunnyvale, Calif.). The extraction method was vali 0385 Experiments may be performed to identify those dated against certified canola samples (Community Bureau of transformants or knockouts that exhibited an improved patho Reference, Belgium). Seed samples from each ecotype (0.5g gentolerance. For Such studies, the transformants are exposed or 1 g) were subjected to accelerated solvent extraction and to biotropic fungal pathogens, such as Erysiphe Oronti, and the resulting extracted oil weights compared to the weight of necrotropic fungal pathogens, such as Fusarium oxysporum. oil recovered from canola seed that has been certified for oil Fusarium oxysporum isolates cause vascular wilts and damp content (Community Bureau of Reference). The oil calibra ing off of various annual vegetables, perennials and weeds tion equation was based on 57 samples with a range of oil (Mauch-Mani and Slusarenko (1994) Molec Plant-Microbe contents from 27.0% to 50.8%. To check the validity of the Interact. 7: 378-383). For Fusarium oxysporum experiments, calibration curve, an additional set of samples was extracted plants are grown on Petri dishes and sprayed with a fresh by ASE and predicted using the oil calibration equation. This spore Suspension of F. Oxysporum. The spore Suspension is validation set counted 46 samples, ranging from 27.9% to prepared as follows: A plug of fungal hyphae from a plate 47.5% oil, and had a predicted standard error of performance culture is placed on a fresh potato dextrose agar plate and of 0.63%. The wet chemical method for protein was elemen allowed to spread for one week. Five ml sterile water is then tal analysis (% Nx6.0) using the average of 3 representative added to the plate, swirled, and pipetted into 50 ml Armstrong samples of 5 mg each validated against certified ground corn Fusarium medium. Spores are grown overnight in Fusarium (NIST). The instrumentation was an Elementar Vario-EL III medium and then sprayed onto plants using a Preval paint elemental analyzer operated in CNS operating mode (El sprayer. Plant tissue is harvested and frozen in liquid nitrogen ementar Analysensysteme GmbH, Hanau, Germany) 48 hours post-infection. 0381. The protein calibration equation was based on a 0386 Erysiphe Orontii is a causal agent of powdery mil library of 63 samples with a range of protein contents from dew. For Erysiphe Orontii experiments, plants are grown 17.4% to 31.2%. An additional set of samples was analyzed approximately 4 weeks in a greenhouse under 12 hour light for protein by elemental analysis (n=57) and scanned by (20°C., -30% relative humidity (rh)). Individual leaves are NIRS in order to validate the protein prediction equation. The infected with E. Orontii spores from infected plants using a protein range of the validation set was from 16.8% to 31.2% camel's hairbrush, and the plants are transferred to a Percival and the standard error of prediction was 0.468%. growth chamber (20°C., 80% rh.). Plant tissue is harvested 0382 NIRS analysis of Arabidopsis seed was carried out and frozen in liquid nitrogen 7 days post-infection. on between 40-300 mg experimental sample. The oil and 0387 Botrytis cinerea is a necrotrophic pathogen. Botrytis protein contents were predicted using the respective calibra cinerea is grown on potato dextrose agar under 12 hour light tion equations. (20°C., -30% relative humidity (rh)). A spore culture is made 0383 Data obtained from NIRS analysis was analyzed by spreading 10 ml of sterile water on the fungus plate, statistically using a nearest-neighbor (N-N) analysis. The swirling and transferring spores to 10 ml of sterile water. The N N analysis allows removal of within-block spatial vari spore inoculum (approx. 105 spores/ml) is then used to spray ability in a fairly flexible fashion, which does not require prior 10 day-old seedlings grown under sterile conditions on MS knowledge of the pattern of variability in the chamber. Ide (minus Sucrose) media. Symptoms are evaluated every day up ally, all hybrids are grown under identical experimental con to approximately 1 week. ditions within a block (rep). In reality, even in many block 0388 Sclerotinia sclerotiorum hyphal cultures are grown designs, significant within-block variability exists. Nearest in potato dextrose broth. One gram of hyphae is ground, neighbor procedures are based on assumption that environ filtered, spun down and resuspended in sterile water. A 1:10 mental effect of a plot is closely related to that of its neigh dilution is used to spray 10 day-old seedlings grown asepti bors. Nearest-neighbor methods use information from cally under a 12 hour light/dark regime on MS (minus adjacent plots to adjust for within-block heterogeneity and so Sucrose) media. Symptoms are evaluated every day up to provide more precise estimates of treatment means and dif approximately 1 week. ferences. If there is within-plot heterogeneity on a spatial 0389 Pseudomonas Syringae pv maculicola (Psm) strain scale that is larger thana single plot and Smaller than the entire 4326 and pv maculicola strain 4326 was inoculated by hand at block, then yields from adjacent plots will be positively cor two doses. Two inoculation doses allows the differentiation related. Information from neighboring plots can be used to between plants with enhanced susceptibility and plants with reduce or remove the unwanted effect of the spatial hetero enhanced resistance to the pathogen. Plants are grown for 3 geneity, and hence improve the estimate of the treatment weeks in the greenhouse, then transferred to the growth effect. Data from neighboring plots can also be used to reduce chamber for the remainder of their growth. Psm ES4326 may the influence of competition between adjacent plots. The be hand inoculated with 1 ml syringe on 3 fully-expanded Papadakis N N analysis can be used with designs to remove leaves per plant (4/2 wk old), using at least 9 plants per within-block variability that would not be removed with the overexpressing line at two inoculation doses, OD=0.005 and standard split plot analysis (Papadakis (1973) Inst. OD=0.0005. Disease scoring is performed at day 3 post d'Amelior. Plantes Thessaloniki (Greece) Bull. Scientif. No. inoculation with pictures of the plants and leaves taken in 23: Papadakis (1984) Proc. Acad. Athens 59: 326-342). parallel. US 2011/007880.6 A1 Mar. 31, 2011

0390. In some instances, expression patterns of the patho 0401 Steps 2, 3 and 4 are repeated for 28 cycles; gen-induced genes (such as defense genes) may be monitored (0402 Step 5: 72° C. for 5 min; and by microarray experiments. In these experiments, cDNAS are 0403 STEP 64° C. generated by PCR and resuspended at a final concentration of 04.04 To amplify more products, for example, to identify ~100 ng/ul in 3xSSC or 150 mM Na-phosphate (Eisen and genes that have very low expression, additional steps may be Brown (1999) Methods Enzymol. 303: 179-205). The cDNAs performed: The following method illustrates a method that are spotted on microscope glass slides coated with polylysine. may be used in this regard. The PCR plate is placedback in the The prepared cDNAs are aliquoted into 384 well plates and thermocycler for 8 more cycles of steps 2-4. spotted on the slides using, for example, an X-y-Z gantry 04.05 Step 2 93° C. for 30 sec: (OmniGrid) which may be purchased from GeneMachines (0406 Step 3 65° C. for 1 min: (Menlo Park, Calif.) outfitted with quill type pins which may (0407 Step 4 72° C. for 2 min, repeated for 8 cycles; and be purchased from Telechem International (Sunnyvale, 0408 Step 54° C. Calif.). After spotting, the arrays are cured for a minimum of 04.09 Eight microliters of PCR product and 1.5ul of load one week at room temperature, rehydrated and blocked fol ing dye are loaded on a 1.2% agarose gel for analysis after 28 lowing the protocol recommended by Eisen and Brown cycles and 36 cycles. Expression levels of specific transcripts (1999; supra). are considered low if they were only detectable after 36 cycles 0391 Sample total RNA (10 ug) samples are labeled using of PCR. Expression levels are considered medium or high fluorescent Cy3 and Cy5 dyes. Labeled samples are resus depending on the levels of transcript compared with observed pended in 4xSSC/0.03% SDS/4 ug salmon sperm DNA/2 ug transcript levels for an internal control Such as actin2. Tran tRNA/50 mM Na-pyrophosphate, heated for 95° C. for 2.5 Script levels are determined in repeat experiments and com minutes, spun down and placed on the array. The array is then pared to transcript levels in control (e.g., non-transformed) covered with a glass coverslip and placed in a sealed chamber. plants. The chamber is then kept in a water bath at 62°C. overnight. 0410 Experiments were performed to identify those trans The arrays are washed as described in Eisen and Brown formants or knockouts that exhibited an improved environ (1999, supra) and scanned on a General Scanning 3000 laser mental stress tolerance. For Such studies, the transformants scanner. The resulting files are Subsequently quantified using were exposed to a variety of environmental stresses. Plants IMAGENE, software (BioDiscovery, Los Angeles Calif.). were exposed to chilling stress (6 hour exposure to 4-8°C.), 0392 RT-PCR experiments may be performed to identify heat stress (6 hour exposure to 32-37°C.), high salt stress (6 those genes induced after exposure to biotropic fungal patho hour exposure to 200 mM NaCl), drought stress (168 hours gens, such as Erysiphe Orontii, necrotropic fungal pathogens, after removing water from trays), osmotic stress (6 hour expo Such as Fusarium oxysporum, bacteria, viruses and salicylic Sure to 3 M mannitol), or nutrient limitation (nitrogen, phos acid, the latter being involved in a nonspecific resistance phate, and potassium) (nitrogen: all components of MS response in Arabidopsis thaliana. Generally, the gene expres medium remained constant except N was reduced to 20 mg/1 sion patterns from ground plant leaf tissue is examined of NHNO; phosphate: all components of MS medium 0393 Reverse transcriptase PCR was conducted using except KH2PO, which was replaced by KSO, potassium: gene specific primers within the coding region for each all components of MS medium except removal of KNO, and sequence identified. The primers were designed near the 3' KHPO, which were replaced by NaHPO). region of each DNA binding sequence initially identified. 0411 Experiments were performed to identify those trans 0394 Total RNA from these ground leaf tissues was iso formants or knockouts that exhibited a modified structure and lated using the CTAB extraction protocol. Once extracted development characteristics. For Such studies, the transfor total RNA was normalized in concentration across all the mants were observed by eye to identify novel structural or tissue types to ensure that the PCR reaction for each tissue developmental characteristics associated with the ectopic received the same amount of cDNA template using the 28S expression of the polynucleotides or polypeptides of the band as reference. Poly(A+)RNA was purified using a modi invention. fied protocol from the Qiagen OLIGOTEX purification kit 0412 Flowering time was measured by the number of batch protocol. cDNA was synthesized using standard proto rosette leaves present when a visible inflorescence of approxi cols. After the first strand cDNA synthesis, primers for Actin mately 3 cm is apparent. Rosette and total leaf number on the 2 were used to normalize the concentration of cDNA across progeny stem are tightly correlated with the timing of flow the tissue types. Actin 2 is found to be constitutively ering (Koornneef et al. (1991) Mol. Gen. Genet. 229: 57-66). expressed in fairly equal levels across the tissue types we are The Vernalization response was also measured. For Vernal investigating. ization treatments, seeds were sown to MSagar plates, sealed 0395. For RT PCR, cDNA template was mixed with cor with micropore tape, and placed in a 4°C. cold room with low responding primers and Taq DNA polymerase. Each reaction light levels for 6-8 weeks. The plates were then transferred to consisted of 0.2 ul cDNA template, 2 ul 10x Tricine buffer, 2 the growth rooms alongside plates containing freshly sown ul 10x Tricine buffer and 16.8 ul water, 0.05 ml Primer 1, 0.05 non-vernalized controls. Rosette leaves were counted when a ul, Primer 2, 0.3 ul Taq DNA polymerase and 8.6 ul water. visible inflorescence of approximately 3 cm was apparent. 0396 The 96 well plate is covered with microfilm and set 0413 Modified phenotypes observed for particular over in the thermocycler to start the reaction cycle. By way of expressor or knockout plants are provided in Table 4. For a illustration, the reaction cycle may comprise the following particular overexpressor that shows a less beneficial charac steps: teristic, it may be more useful to select a plant with a 0397 Step 1: 93° C. for 3 min: decreased expression of the particular transcription factor. 0398 Step 2: 93° C. for 30 sec; For a particular knockout that shows a less beneficial charac 0399 Step 3: 65° C. for 1 min: teristic, it may be more useful to select a plant with an (0400 Step 4: 72° C. for 2 min: increased expression of the particular transcription factor. US 2011/007880.6 A1 Mar. 31, 2011

0414. The sequences of the Sequence Listing or those in ber. Evaluation of germination and seedling vigor was con Tables 4-9, or those disclosed here, can be used to prepare ducted 3 to 15 days after planting. The basal media was 80% transgenic plants and plants with altered traits. The specific Murashige-Skoog medium (MS)+vitamins. transgenic plants listed below are produced from the 0423 For salt and osmotic stress germination experi sequences of the Sequence Listing, as noted. Tables 4 and 6 ments, the medium was supplemented with 150 mM NaCl or provide exemplary polynucleotide and polypeptide 300 mM mannitol. Growth regulator sensitivity assays were sequences of the invention. performed in MS media, vitamins, and either 0.3 uMABA, Example VIII 9.4% sucrose, or 5% glucose. 0424 Temperature stress cold germination experiments Examples of Genes that Confer Significant Improve were carried out at 8°C. Heat stress germination experiments ments to Plants were conducted at 32° C. to 37°C. for 6 hours of exposure. 0415 Examples of genes and homologs that confer sig 0425 For stress experiments conducted with more mature nificant improvements to knockout or overexpressing plants plants, seeds were germinated and grown for seven days on are noted below. Experimental observations made by us with MS+vitamins+1% sucrose at 22° C. and then transferred to regard to specific genes whose expression has been modified chilling and heat stress conditions. The plants were either in overexpressing or knock-out plants, and potential applica exposed to chilling stress (6 hour exposure to 4-8°C.), or heat tions based on these observations, are also presented. stress (32°C. was applied for five days, after which the plants 0416) This example provides experimental evidence for were transferred back 22°C. for recovery and evaluated after increased biomass and abiotic stress tolerance controlled by 5 days relative to controls not exposed to the depressed or the transcription factor polypeptides and polypeptides of the elevated temperature). invention. 0417 Salt stress assays are intended to find genes that confer better germination, seedling vigor or growth in high Results: salt. Evaporation from the soil Surface causes upward water G30 (SEQID NO: 7) movement and Salt accumulation in the upper soil layer where the seeds are placed. Thus, germination normally takes place Published Information at a salt concentration much higher than the mean salt con centration of in the whole soil profile. Plants differ in their 0426 G30 (Atlg04370) is part of the BAC clone F19P19, tolerance to NaCl depending on their stage of development, GenBank accession number AC000 104 (nid=2341023). therefore seed germination, seedling vigor, and plant growth responses are evaluated. Experimental Observations 0418 Osmotic stress assays (including NaCl and mannitol assays) are intended to determine if an osmotic stress pheno 0427. Initial experiments were performed with G30 type is NaCl-specific or if it is a general osmotic stress related knockout mutant plants. However, these experiments did not phenotype. Plants tolerant to osmotic stress could also have uncover the functions of the gene. more tolerance to drought and/or freezing. 0428. In order to characterize the gene further, 35S::G30 0419 Drought assays are intended to find genes that medi overexpressing lines were generated. Morphological analysis ate better plant survival after short-term, severe water depri of the transgenic plants indicated that G30 could be involved vation. Ion leakage will be measured if needed. Osmotic in light regulation: the seedlings had long hypocotyls and stress tolerance would also support a drought tolerant pheno elongated cotyledon petioles. In addition, Some of the seed type. lings also had longer roots compared to control plants. At later 0420 Temperature stress assays are intended to find genes stages, the plants became darker green, and had glossy leaves, that confer better germination, seedling vigor or plant growth perhaps indicating elevated levels of epidermal wax. The under temperature stress (cold, freezing and heat). phenotype for G30 overexpression resembled those produced 0421 Sugar sensing assays are intended to find genes by related AP2 genes. involved in Sugar sensing by germinating seeds on high con centrations of Sucrose and glucose and looking for degrees of Utilities hypocotyl elongation. The germination assay on mannitol controls for responses related to osmotic stress. Sugars are 0429 Based on the appearance of 35S::G30 leaves, the key regulatory molecules that affect diverse processes in gene could be used to engineer changes in the composition higher plants including germination, growth, flowering, and amount of leaf Surface components (most likely wax). senescence, Sugar metabolism and photosynthesis. Sucrose is The ability to manipulate wax composition, amount, or dis the major transportform of photosynthate and its flux through tribution could modify plant tolerance to drought and low cells has been shown to affect gene expression and alter humidity, or resistance to insects or pathogens. Additionally, storage compound accumulation in seeds (source-sink rela in some species, wax is a valuable commodity and altering its tionships). Glucose-specific hexose-sensing has also been accumulation and/or composition could enhance yield. described in plants and is implicated in cell division and 0430. The phenotypes of 35S::G30 seedlings indicate that repression of “famine genes (photosynthetic or glyoxylate the gene may also be used to manipulate light-regulated cycles). developmental processes like shade avoidance. Eliminating 0422 Germination assays followed modifications of the shading responses might allow increased planting densities same basic protocol. Sterile seeds were sown on the condi with Subsequent yield enhancement. tional media listed below. Plates were incubated at 22° C. 0431 Additionally, if the dark coloration of 35S::G30 under 24-hour light (120-130 uEin/m/s) in a growth cham lines reflects an increase in biochemical composition, the US 2011/007880.6 A1 Mar. 31, 2011 gene might be used to improve the nutraceutical value of 0440 The use of G47 or its equivalogs from tree species foodstuffs, or increase photosynthetic capacity to improve could offer the potential for modulating lignin content. This yield. might allow the quality of wood used for furniture or con struction to be improved. G47 (SEQID NO: 11) Published Information G148 (SEQID NO:39) 0432 G47 corresponds to gene T22J18.2 (AAC25505). Published Information No information is available about the function(s) of G47. 0441 G148 corresponds to AGAMOUS-LIKE 13 (AGL 13), and was originally identified based on its con Experimental Observations served MADS domain (Purugganan et al. (1995) Genetics 140: 345-356: Rounsley et al. (1995). Plant Cell 7: 1259 0433. The function of G47 was studied using transgenic 1269). No functional information about G148 is available in plants in which the gene was expressed under the control of the public domain. However, its expression pattern indicated the 35S promoter. Overexpression of G47 resulted in a variety that the gene has a role in ovule development; AGL13 tran of morphological and physiological phenotypic alterations. Script was present in ovules at the time of integument devel 0434 35S::G47 plants showed enhanced tolerance to opment, but fell following fertilization. Additionally, lower osmotic stress. In a root growth assay on PEG containing levels of expression were found in anther filaments and style media, G47 overexpressing transgenic seedlings were larger tissue (Rounsley et al. (1995) supra). and had more root growth compared to the wild-type controls (FIG. 3A). Interestingly, G47 expression levels might be Experimental Observations altered by environmental conditions, in particular reduced by salt and osmotic stresses. In addition to the phenotype 0442 Homozygotes were analyzed for a transposon inser observed in the osmotic stress assay, germination efficiency tion (SLAT collection) within G148; these plants showed no for the seeds from G47 overexpressors was low. obvious macroscopic changes in morphology and exhibited a 0435 35S::G47 plants were also significantly larger and similar response to wildtype in all of the physiological assays greener in a soil-based drought assay than wild-type controls performed. plants. 0443) The effects of G148 overexpression were studied by 0436. Overexpression of G47 also produced a substantial generating transgenic lines in which a G148 genomic clone delay in flowering time and caused a marked change in shoot was expressed from the 35S CaMV promoter. 35S::G148 architecture. 35S::G47 transformants were small at early transformants displayed a range of morphological changes stages and Switched to flowering more than a week later than including a severe reduction in overall plant size, leafcurling, wild-type controls (continuous light conditions). Interest accelerated flowering, and terminal flower formation. Such ingly, the inflorescences from these plants appeared thick and changes indicate that G148 influences the genetic networks fleshy, had reduced apical dominance, and exhibited reduced controlling various aspects of development including flower internode elongation leading to a short compact stature (FIG. ing time and meristem determinacy. 3B). The branching pattern of the stems also appeared abnor mal, with the primary shoot becoming kinked at each cof Utilities lorescence node. Additionally, the plants showed slightly reduced fertility and formed rather small siliques that were 0444 The morphological changes seen in the overexpres borne on short pedicels and held vertically, close against the sion lines demonstrate that G148 could be used to manipulate Stem. various aspects of plant development. 0445. The appearance of terminal flowers in 35S::G148 0437 Additional alterations were detected in the inflores transformants indicated that the gene or its orthologs can cence stems of 35S::G47 plants. Stem sections from T2-21 modify inflorescence architecture and confer a determinate and T2-24 plants were of wider diameter, and had large habit in species where the shoots otherwise show an indeter irregular vascular bundles containing a much greater number minate growth pattern. Such changes completely alter the of xylem vessels than wild type. Furthermore some of the overall plant form, and may, for example, facilitate mechani xylem vessels within the bundles appeared narrow and were cal harvesting (as already exemplified by the SELF-PRUN possibly more lignified than were those of controls. ING gene, which controls shoot determinacy in tomato, 0438 G47 was expressed at higher levels in rosette leaves, Pnueli Let al. (1998). Development 125: 1979-1989). and transcripts can be detected in other tissues (flower, embryo, silique, and germinating seedling), but apparently 0446. Additionally, the accelerated switch to reproductive growth seen in 35S::G148 plants, indicated that the gene can not in roots. be used to manipulate flowering time in commercial species. Specifically, the gene can accelerate flowering or eliminate Utilities any requirement for Vernalization. In some instances, a faster 0439 G47 or its equivalogs could potentially be used to cycling time might allow additional harvests of a crop to be manipulate flowering time, to modify plant architecture and made within a given growing season. Shortening generation stem structure, including development of vascular tissues and times can also help speed-up breeding programs, particularly lignin content, and to improve plant performance under in species such as trees, which grow for many years before drought and osmotic stress conditions. flowering.