USOO880963 OB2

(12) United States Patent (10) Patent No.: US 8,809,630 B2 Kumimoto et al. (45) Date of Patent: Aug. 19, 2014

(54) POLYNUCLEOTIDES AND POLYPEPTIDES 09/958,131, filed as application No. PCT/US00/09448 IN on Apr. 6, 2000, now Pat. No. 6,946,586, said (75) Inventors: Roderick W. Kumimoto, Norman, OK application No. 09/958,131 is a continuation-in-part of (US); Luc J. Adam, Hayward, CA (US); application No. 09/713,994, filed on Nov. 16, 2000, Roger Canales, Westerly, RI (US); now abandoned, said application No. 10/412,699 is a Karen S. Century, Albany, CA (US); continuation-in-part of application No. 09/819,142, filed on Mar. 27, 2001, now abandoned, and a Robert A. Creelman, Castro Valley, CA continuation-in-part of application No. 09/934,455, (US); Neal I. Gutterson, Oakland, CA filed on Aug. 22, 2001, now abandoned, and a (US); Frederick D. Hempel, Hayward, continuation-in-part of application No. 09/837,944, CA (US); Jacqueline E. Heard, filed on Apr. 18, 2001, now abandoned, and a Stonington, CT (US); Cai-Zhong Jiang, continuation-in-part of application No. 09/713,994, Fremont, CA (US); Katherine said application No. 10/412,699 is a Krolikowski, Richmond, CA (US); continuation-in-part of application No. 10/225,068, Omaira Pineda, Vero Beach, FL (US); filed on Aug. 9, 2002, now Pat. No. 7,193,129, and a Oliver J. Ratcliffe, Oakland, CA (US); continuation-in-part of application No. 09/837,944, Peter P. Repetti, Emeryville, CA (US); said application No. 10/225,068 is a T. Lynne Reuber, San Mateo, CA (US) continuation-in-part of application No. 10/171.468, (73) Assignee: Mendel Biotechnology, Inc., Hayward, filed on Jun. 14, 2002, now abandoned, said CA (US) application No. 10/412,699 is a continuation-in-part of (*) Notice: Subject to any disclaimer, the term of this application No. 10/225,066, filed on Aug. 9, 2002, now patent is extended or adjusted under 35 Pat. No. 7.238,860, and a continuation-in-part of U.S.C. 154(b) by 911 days. (Continued) (51) Int. Cl. (21) Appl. No.: 11/986,992 AOIH 5/00 (2006.01) (22) Filed: Nov. 26, 2007 AOIH 5/10 (2006.01) (65) Prior Publication Data AOIH I/O (2006.01) CI2N 5/82 (2006.01) US 2009/02658O7 A1 Oct. 22, 2009 (52) U.S. Cl. Related U.S. Application Data USPC ...... 800/298; 800/287: 800/320.3: 800/278; 435/419 (60) Division of application No. 10/412,699, filed on Apr. (58) Field of Classification Search 10, 2003, now Pat. No. 7,345,217, which is a None continuation-in-part of application No. 10/295,403, See application file for complete search history. filed on Nov. 15, 2002, now abandoned, which is a division of application No. 09/394,519, filed on Sep. (56) References Cited 13, 1999, now abandoned, said application No. 10/412,699 is a continuation-in-part of application No. U.S. PATENT DOCUMENTS 09/489,376, filed on Jan. 21, 2000, now abandoned, 5,659,026 A 8/1997 Baszczynski et al. said application No. 10/412,699 is a 5,831,060 A 11/1998 Wada et al. continuation-in-part of application No. 10/302.267, (Continued) filed on Nov. 22, 2002, now Pat. No. 7,223,904, which is a division of application No. 09/506,720, filed on FOREIGN PATENT DOCUMENTS Feb. 17, 2000, now abandoned, said application No. 10/412,699 is a continuation-in-part of application No. DE 19503359 C1 2, 1996 10/286.264, filed on Nov. 1, 2002, now abandoned, EP O8O3572 A2 10, 1997 which is a division of application No. 09/533,030, filed (Continued) on Mar. 22, 2000, now abandoned, said application OTHER PUBLICATIONS No. 10/412,699 is a continuation-in-part of application No. 10/278,173, filed on Oct. 21, 2002, now Schellmann etal (2002, The EMBO Journal 21 (19): 5036-5046).* abandoned, which is a division of application No. (Continued) 09/533,392, filed on Mar. 22, 2000, now abandoned, said application No. 10/412,699 is a Primary Examiner — Stuart F Baum continuation-in-part of application No. 09/533,029, (74) Attorney, Agent, or Firm — Jeffrey M. Libby: Yifan filed on Mar. 22, 2000, now Pat. No. 6,664,446, said Mao application No. 10/412,699 is a continuation-in-part of (57) ABSTRACT application No. 10/278,536, filed on Oct. 22, 2002, The invention relates to transcription factor polypep now abandoned, which is a division of application No. tides, polynucleotides that encode them, homologs from a 09/532,591, filed on Mar. 22, 2000, now abandoned, variety of plant species, and methods of using the polynucle said application No. 10/412,699 is a otides and polypeptides to produce transgenic plants having continuation-in-part of application No. 10/290,627, advantageous properties compared to a reference plant. filed on Nov. 7, 2002, now abandoned, which is a Sequence information related to these polynucleotides and division of application No. 09/533,648, filed on Mar. polypeptides can also be used in bioinformatic search meth 22, 2000, now abandoned, said application No. ods and is also disclosed. 10/412,699 is a continuation-in-part of application No. 19 Claims, 21 Drawing Sheets US 8,809,630 B2 Page 2

Related U.S. Application Data 2003. O131386 A1 7/2003 Samaha et al. 2003/0188330 A1 10, 2003 Heard et al. application No. 09/837,944, and a continuation-in-part 2004/OOO6797 A1 1/2004 Shi et al. of application No. 10/171.468, said application No. 2004/0010821 A1 1/2004 McCourt et al. 2004/0031072 A1 2/2004 La Rosa et al. 10/412,699 is a continuation-in-part of application No. 2004/0098764 A1 5, 2004 Heard et al. 10/225,067, filed on Aug. 9, 2002, now Pat. No. 2004/O128712 A1 7/2004 Jiang et al. 7,135,616, and a continuation-in-part of application 2005, OO86718 A1 4/2005 Heard et al. 2005/OO97638 A1 5/2005 Jiang et al. No. 09/837,944, and a continuation-in-part of 2005/O155117 A1 7/2005 Century et al. application No. 10/171.468, said application No. 2005/0172364 A1 8, 2005 Heard et al. 10/412,699 is a continuation-in-part of application No. 2006, OOO8874 A1 1/2006 Creelman et al. 10/374,780, filed on Feb. 25, 2003, now Pat. No. 2006, OO15972 A1 1/2006 Heard et al. 2006,0162018 A1 7/2006 Gutterson et al. 7,511,190. 2006/0242738 A1 10, 2006 Sherman et al. 2006/0272060 A1 11/2006 Heard et al. (60) Provisional application No. 60/101,349, filed on Sep. 2007/0022495 A1 1/2007 Reuber et al. 22, 1998, provisional application No. 60/103,312, 2007/0101454 A1 5/2007 Jiang et al. filed on Oct. 6, 1998, provisional application No. 2007/0199.107 A1 8, 2007 Ratcliffe et al. 60/108,734, filed on Nov. 17, 1998, provisional 2007/02268.39 A1 9, 2007 Gutterson et al. 2008/0010703 A1 1/2008 Creelman et al. application No. 60/113,409, filed on Dec. 22, 1998, 2008. O1557O6 A1 6/2008 Riechmann et al. provisional application No. 60/116,841, filed on Jan. 2008. O163397 A1 7/2008 Ratcliffe et al. 22, 1999, provisional application No. 60/120,880, 2008/0229448 A1 9/2008 Libby et al. filed on Feb. 18, 1999, provisional application No. 2008/0301836 Al 12/2008 Century et al. 60/121,037, filed on Feb. 22, 1999, provisional 2008/0301840 A1 12/2008 Gutterson et al. 2008/0301841 A1 12/2008 Ratcliffe et al. application No. 60/124.278, filed on Mar. 11, 1999, 2008/0313756 Al 12/2008 Zhang et al. provisional application No. 60/129,450, filed on Apr. 2009.0049566 A1 2/2009 Zhang et al. 15, 1999, provisional application No. 60/135,134, 2009, O138981 A1 5/2009 Repetti et al. filed on May 20, 1999, provisional application No. 2009. O151015 A1 6/2009 Adam et al. 60/144,153, filed on Jul. 15, 1999, provisional 2009,0192305 A1 7/2009 Riechmann et al. 2009/0205063 A1 8/2009 Zhang et al. application No. 60/161,143, filed on Oct. 22, 1999, 2009,0265807 A1 10, 2009 Kumimoto et al. provisional application No. 60/162,656, filed on Nov. 2009,0265813 A1 10, 2009 Gutterson et al. 1, 1999, provisional application No. 60/125,814, filed 2009/02769 12 A1 11/2009 Sherman et al. on Mar. 23, 1999, provisional application No. 60/128,153, filed on Apr. 7, 1999, provisional FOREIGN PATENT DOCUMENTS application No. 60/166.228, filed on Nov. 17, 1999, provisional application No. 60/197899, filed on Apr. EP 1033405 A2 9, 2000 WO WO93,22342 11, 1993 17, 2000, provisional application No. 60/227,439, WO WO97/42327 11, 1997 filed on Aug. 22, 2000, provisional application No. WO WO98,07842 2, 1998 60/310,847, filed on Aug. 9, 2001, provisional WO WO98.37.184 A1 8, 1998 application No. 60/336,049, filed on Nov. 19, 2001, WO WO98.37755 A1 9, 1998 WO WO98,48007 10, 1998 provisional application No. 60/338,692, filed on Dec. WO WO98.58069 A1 12, 1998 11, 2001, provisional application No. 60/434,166, WO WO99.24573 5, 1999 filed on Dec. 17, 2002. WO WO99,53016 10, 1999 WO WO99,55840 11, 1999 (56) References Cited WO WOOO,53724 A2 9, 2000 WO WOO1/35727 A1 5, 2001 U.S. PATENT DOCUMENTS WO WOO1/36598 5, 2001 WO WOO1/36598 A1 5, 2001 5,939,601 A 8/1999 Klessig et al. WO WOO2/O8410 A2 1, 2002 6,121,513 A 9/2000 Zhang et al. WO WOO2/O8411 A2 1, 2002 6,664,446 B2 12/2003 Heard et al. WO WOO2, 16655 A2 2, 2002 6,717,034 B2 4/2004 Jiang WO WOO3/O12116 A2 2, 2003 6,835,540 B2 12/2004 Broun WO WOO3,O14327 2, 2003 6,946,586 B1 9, 2005 Fromm et al. WO WO2004/031349 A2 4/2004 7,109,393 B2 9, 2006 Gutterson et al. WO WO2004/076638 9, 2004 7,135,616 B2 11/2006 Heard et al. WO WO2005/047516 5, 2005 7,193,129 B2 3/2007 Reuber et al. WO WO2006,130156 A3 12/2006 7,196,245 B2 * 3/2007 Jiang et al...... 800,278 WO WO2007/028165 3, 2007 7,223,904 B2 5, 2007 Heard et al. OTHER PUBLICATIONS 7,238,860 B2 7/2007 Ratcliffe et al. 7,345,217 B2 3/2008 Zhang et al. 7,511, 190 B2 3/2009 Creelman et al. Bowie et al, Science 247: 1306-1310, 1990.* 7,598.429 B2 10/2009 Heard et al. McConnell et al. Nature 411 (6838):709–713, 2001.* 7,601,893 B2 10/2009 Reuber et al. Payne etal (1999, Development 126:671-682).* 7,635,800 B2 12/2009 Ratcliffe et al. Lietal (1994, Cereal Chem. 71(1):87-90).* 2002fOO23280 A1 2/2002 Gorlach et al. NCBI acc. No. AX366.159. Pogue, et al. (2002) Sequence 78 from 2002/0102695 A1 8, 2002 Silva et al. Patent WOO2O8411. 2002/0142319 A1 10, 2002 Gorlach et al. NCBI accession No. BI699876 (2001). Shoemaker, et al. saga9b09. 2003/0041356 A1 2/2003 Reuber et al. y1 Gm-c 1081 Glycinemax cDNA clone Genome Systems Clone ID: 2003, OO61637 A1 3/2003 Jiang et al. Gm-c1081-2009 5" Similar to TRO22059 O22059 Putative DNA 2003/0093837 A1 5, 2003 Keddie et al. Binding Protein CPC, ... mRNA sequence. 2003/0101481 A1 5/2003 Zhang et al. NCBI accession No. BI321189 (2001); Shoemaker, et al. saf48e11. 2003/O121070 A1 6, 2003 Adam et al. y3 Gm-c 1077 Glycinemax cDNA clone Genome Systems Clone ID: US 8,809,630 B2 Page 3

(56) References Cited Berger, F. et al. (Mar. 17, 1998) Positional information in root epi dermis is defined during embryogenesis and acts in domains with OTHER PUBLICATIONS strict boundaries. Current Biol. 8:421-430. Bird et al., The tomatopolygalacturonase gene and ripening-specific Gm-c1077-1773 5" Similar to TRO22059 O22059 Putative DNA expression in transgenic plants, Plant Mol. Biol. (1988) 11:651-662. Binding Protein CPC., mRNA sequence. Bohmert et al., AGO1 defines a novel locus of Arabidopsis control NCBI accession No. BQ627798 (2002); Shoemaker, et al. sao63e06. ling leaf development, EMBO J. (Jan. 2, 1998) 17:170-180. y2 Gm-c1073 Glycine max cDNA clone Soybean Clone ID: Borevitz et al., “Activation Tagging Identifies a Conserved MYB Gm-c1073-4043 5" Similar to TRO22059 O22059 Putative DNA Regulator of Phenylpropanoid Biosynthesis'. Plant Cell (Dec. 2000) Binding Protein CPC., mRNA sequence. 12:2383-2394. NCBI accession No. BQ627904 (2002); Shoemaker, et al. sao65e05. Bork, P. Powers and Pitfalls in Sequence Analysis: the 70% Hurdle. y2 Gm-c1073 Glycine max cDNA clone Soybean Clone ID: Genome Research, vol. 10, 2000; pp. 398-400. Gm-c1073-4065 5" Similar to TRO22059 O22059 Putative DNA Bowie et al., Deciphering the message in protein sequences: toler Binding Protein CPC., mRNA sequence. ance to amino acid substitutions. Science 247: 1306-1310, (Mar. 16. NCBI acc. No. AW350423 (2000); Vodkin, et al. GM210008B10E7 1990). Gm-r1021 Glycine max cDNA clone Gm-r1021-2917 3", mRNA Bowman et al., Crabs Claw, a gene that regulates carpel and nectary Sequence. developments in Arabidopsis, encodes a novel protein . . . . Develop NCBI acc. No. NP 200132 (gii: 15238679) (Aug. 21 2001); ment (Jun. 1999) 126:2387-2396. Tabata, S., et al. “putative protein Arabidopsis thaliana'; source: Broun, et al. A bifunctional oleate 12-hydroxylase: desaturase from Arabidopsis thaliana (thale cress); Title: “Sequence and analysis of Lesquerella fendleri. Plant J. (Jan. 1998) 13(2): 201-10. chromosome 5 of the plant Arabidopsis thaliana” (Nature 408 Buttner M. and Singh K., “Arabidopsis thaliana ethylene-responsive (6814), 823-826 (2000)). element biding protein (AtEBP), an ethylene-inducible, GCC box NCBI Acc. No. BM437313 (gi: 18459035) (Jan. 31, 2002); DNA-binding protein interacts with an ocs element binding protein.” Cramer.G.R. and Cushman.J.C., “VVAO17F06 54121 an expressed Proc. Natl Acad Sciences, vol. 94, No. 11, (May 27, 1997), pp. sequence tag database for abiotic stressed leaves of Vitis vinifera var. 5961-5966. Chardonnay Vitis vinifera cDNA clone VVA017F06 5, mRNA Casimiro, I. et al. (Apr. 2003) Dissecting Arabidopsis lateral root sequence”; (Vitis vinifera). development. Trends Plant Sci. 8:165-171. NCBI Acc. No. BU872107 (gi:24063631) (Oct. 16, 2002); Chandler and Bartels, Structure and function of the vp1 gene Unneberg.P. et al., “Q039C07 Populus flower cDNA library Populus homologue from the resurrection plant Craterostigma plantagineum trichocarpa cDNA 5 prime, mRNA sequence”; (Populus Hochst, Mol. Gen Genet (Nov. 1997) 256(5):539-46. trichocarpa) (Populus balsamifera Subsp. trichocarpa). Chao et al., Activation of the Ethylene Gas Response Pathway in Arabidopsis by the Nuclear Protein..., Cell (Jun. 27, 1997)89:1133 NCBI Acc. No. BU83 1849 (gi:24010621) (Oct. 15, 2002); 1144. Unneberg.P. et al., “T026E01 Populus apical shoot cDNA library Chen, Wenqiong, et al., Expression Profile Matrix of Arabidopsis Populus tremula X Populus tremuloides cDNA 5 prime, mRNA Transcription Factor Genes Suggests Their Putative Functions in sequence', (Populus tremula X Populus tremuloides). Response to Environmental Stresses, The Plant Cell (Mar. 2002) vol. NCBI Acc. No. CB289238 (gi: 28602979) (Feb. 27, 2003); Hou.H. 14, pp. 559-574. S., et al., “V-B-1 14F06 VAN-Baker-1 Vitis aestivalis cDNA clone Chenetal. (2000) Expressionprofile matrix of Arabidopsis transcrip V-B-1 14F06 5", mRNA sequence”; (Vitis aestivalis). tion factor genes Suggests their putative functions in response to NCBI Acc. No. A1495284 (gi: 4396287) (Mar. 11, 1999); environmental stresses. Plant Cell 14: 559-574. Shoemaker.R., et al., "sa90e06.y1 Gm-c1004 Glycine may cDNA Chern et al., Evidence for a disease-resistance pathway in rice similar clone Genome Systems Clone ID: Gm-c1004-6587 5" similar to to the NPR1-mediated signaling pathyway in Arabidopsis. Plant J. TR:OO4349 O04349 MYB-Like Protein Isolog, mRNA sequence': Jul. 2001; 27(2):101-13. (Glycine max.). Costa, S., and Dolan, L. (Jul. 2003) Epidermal patterning genes are Ainley et al., Regulatable endogenous production of cytokinins up to active during embryogenesis in Arabidopsis. Development 130: “toxic' levels in transgenic plants and plant tissues, The Plant Mol. 2893-2901. Biol. (Apr. 1993) 22:13–23. Coupland (Oct. 12, 1995). Flower development. LEAFY blooms in Allen, Marketal. A novel mode of DNA re. The EMBO Journal vol. aspen. Nature 377:482-483. 17 No. 18 pp. 5484-5496 (Sep. 15, 1998). Cramer and Cushman, (2002) NCBI accession No. BM437313. Amaya et al., Expression of Centroradialis (CEN) and CEN-like VAO17F06 54121 an expressed sequence tag database for abiotic genes in tobacco reveals a conserved mechanism controlling phase stressed leaves of Vitis vinifera var. Chardonnay Vitis viniferacDNA change in diverse species. Plant Cell. (Aug. 1999); 11 (8): 1405-18. clone VVAO17F065, mRNA seq. Ambrose et al., Molecular and genetic analyses of the silkyl gene Da Costa e Silva et al., BPF-1, a pathogen-induced DNA-binding reveal conservation infloral organ specification between and protein involved in the plant defense response, The Plant J. (Jul. monocots. Mol. Cell. Mar. 2000: 5(3):569-79. 1993) 4:125-135. An et al., Organ-Specific and Developmental Regulation of the Daly et al. (Dec. 2001). Plant Systematics in the Age of Genomics. Nopaline Synthase Promoter in Transgenic Tobacco Plants. (May Plant Physiology 127: 1328-1333. 1988) Plant Physiology 88:547-552. Di Cristina, M., Sessa, G., Dolan, L., Linstead, P., Baima, S., Ruberti, Duboule, Guidebook to the Homebox Genes, Oxford University I., and Morelli, G. (Sep. 1996). The Arabidopsis Athb-10 Press, Oxford, UK (1994) 27-71. (GLABRA2) is an HD-Zipprotein required for regulation of roothair The Cold Spring Harbor Laboratory, Washington University Genome development. Plant J 10, 393-402. Sequencing Center, and PE Biosystems Arabidopsis Sequencing DiLaurenzio et al., The Scarecrow Gene Regulates an Asymmetric Consortium. The complete sequence of a heterochromatic island Cell Division That is Essential . . . . Cell (Aug. 9, 1996) 86:423-433. from a higher eukaryote. Cell (Feb. 4, 2000) 100(3), 377-386. Duckett, C.M., Oparka, K.J., Prior, D.A.M., Dolan, L., and Roberts, Baerson et al., Identification of domains in an Arabidopsis acyl K. (1994). Dye-coupling in the root epidermis of Arabidopsis is carrier protein gene..., Plant Mol. Biol. (Dec. 1994) 26:1947-1959. progressively reduced during development. Development 120,3247 Baerson et al., Developmental regulation of an acyl carrier protein 3255. gene promoter in vegetative and reproductive tissues, Plant Mol. Duggleby, Identification of an acetolatate synthase Small subunit Biol. (May 1993) 22:255-267. gene in two eukaryotes. (May 6, 1997) Gene 190:245-249. Baumann et al., The DNA Binding Site of the Dof Protein NtBBF1 Is David Edwards et al., Multiple Genes Encoding the Conserved Essential for Tissue-Specific ..., The Plant Cell (Mar. 1999) 11:323 CCAAT Box Transcription Factor Complex Are Expressed in 334. Arabidopsis, Plant Physiol. (1998) 117, pp. 1015-1022. US 8,809,630 B2 Page 4

(56) References Cited Hurley, et al., Structural genomics and signaling domains. TRENDS in Biochemical Sciences (Jan. 2002) vol. 27 No. 1 pp. 48-. OTHER PUBLICATIONS Ishiguro and Nakamura, Characterization of cDNA encoding a novel DNA-binding protein, SPF1, that . . . . Mol. Gen. Genet. (Sep. 28. Egea-Cortines and Weiss. A rapid coming of age in tree biotechnol 1994) 244: 563-571. ogy. Nat. Biotechnol. Mar. 2001; 19(3):215-6. Jaglo-Ottosen, K., et al., Arabidopsis CBF1 Overexpression Induces Eisen, J., Phylogenomics: Improving Functional Predictions for COR Genes and Enhances Freezing Tolerance, Science (Apr. 3, Uncharacterized Genes by Evolutionary Analysis. Genome Research 1998) vol. 280, pp. 104-106. (Mar. 1998) 8:163-167. Jinet al., Transcriptional repression by AtMYB4 controls production Elo et al., Three MADS-box genes similar to APETALA1 and of UV-protecting sunscreens in Arabidopsis. EMBO J. Nov. 15. Fruitfull from silver birch (BEtula pendula). Physiol. Plant May 2000; 19(22):6150-61. 2001; 112(1):95-103. Jin et al., (Nov. 1999) Multifunctionality and diversity within the Elomaa et al., Transformation of antisense constructs of the chalcone plant MYB-gene family. Plant Mol Biol 41(5): 577-585. synthase gene Superfamily into Gerbera hybrida: differential effect Kaiser et al., Cis-acting elements of the CHS1 gene from white on the expression of family members, Molecular Breeding 2:41-50 mustard controlling . . . . Plant Mol. Biol. (May 1995) 28:231-243. 1996. Kasuga Mie et al., “Improving plant drought, salt, and freezing toler Forsburg and Guarente, Identification and characterization of HAP4: ance by gene transfer of a single stress-inducible transcription fac a third component of the . . . . Genes Dev. (Aug. 1989) 3: 1166-1178. tor', Nature Biotechnology, vol. 17. No. 3, (Mar. 1999), pp. 287-291. Foster et al., Plant b/IP proteins gather at ACGT elements, FASEB.J. Kater, et al. (Feb. 1998). Multiple AGAMOUS Homologs from (Feb. 1994) 8: 192-200. Cucumber and Petunia Differ in Their Ability to Induce Reproductive Frampton, J., Gibson, T.J., Ness, S.A., Doderlein, G., and Graf, T. Organ Fate. Plant Cell 10:171-182. (Dec. 1991). Proposed structure for the DNA-binding domain of the Kim et al., Isolation of a novel class ofbZIP transcription factors that Myb oncoprotein based on model building and mutational analysis. interact with ABA-responsive and embryo-specific elements....The Protein Eng 4,891-901. Plant J. (Jun. 1997) 11: 1237-1251. Fromm et al. An Octopine Synthase Enhancer Element Directs Tis Kim et al., Reported at American Society of Plant Physiologists Sue-Specific Expression and Binds ASF-1, a Factor from Tobacco meeting, 1991, abstract 394. Regulated expression of transcription Nuclear Extracts. (Oct. 1989) The Plant Cell 1:977-984. factors in transgenic rice confers stress tolerance. Fu et al., Expression of Arabidopsis GAI in transgenic rice represses King, et al., Gibberellins are not required for normal stem growth in multiple gibberellin responses. Plant Cell Aug. 2001; 13(8): 1791 Arabidopsis thaliana in the absence of GAI and RGA. Genetics (Oct. 802. 2001) 159(2):767-76. Gampala et al., ABA Insensitive-5 transactivates abscisic acid-induc Kirik, V. Simon, M., Huelskamp, M., and Schiefelbein, J. (Apr. 15. ible gene expression in rice protoplasts. Reported at 1991 ASPP 2004a). The Enhancer of TRY and CPC1 gene acts redundantly with meeting, abstract 714. TRIPTYCHON and CAPRICE intrichome and roothair cellpattern Gan and Amasino, Inhibition of Leaf Senescence by Autoregulated ing in Arabidopsis. Dev Biol 268, 506-513. Production of Cytokinin, Science (Dec. 1995) 270: 1986-1988. Kirik, V. Simon, M., Wester, K., Schiefelbein, J., and Hulskamp, M. Gatz. C., Chemical Control of Gene Expression, Annu. Rev. Plant (May 2004b). Enhancer of TRY and CPC 2 (ETC2) reveals redun Physiol. Plant Mol. Biol. (Jun. 1997) 48:89-108. dancy in the region-specific control of trichome development of Giraudat et al., Isolation of the Arabidopsis ABI3 Gene by Positional Arabidopsis. Plant Mol Biol 55, 389-398. Cloning. The Plant Cell (Oct. 1992) 4:1251-1261. Klein et al. A new family of DNA binding proteins includes putative Goff, S. (May 1992) Functional analysis of the transcriptional acti transcriptional regulators of . . . . Mol. Gen. Genet. (Jan. 15, 1996) vator encoded by the maize B gene: evidence for a direct functional 250:7-16. interaction between two classes of regulatory proteins. Genes Dev. 6: Klug and Schwabe, Zinc fingers, FASEB J. (May 1995) 9:597-604. 864-875. Koziel, M.G. et al., “Optimizing expression of transgenes with an Grace and Logan, Energy dissipation and radical scavenging by the emphasis on post-transcriptional events.” (Oct. 1996) Plant Molecu plant phenylpropanoid pathway, Philos Trans RSoc Lond B Biol Sci lar Biology, vol. 32, pp. 393-405. (Oct. 29, 2000) 355(1402): 1499-510. Kranz et al., Towards functional characterization of the members of Graf, T. (Apr. 1992). Myb: a transcriptional activator linking prolif the R2R3-MYB gene family from Arabidopsis thaliana. (Oct. 1998) eration and differentiation in hematopoietic cells. Curr Opin Genet The Plant Journal 16:263-276. Dev 2, 249-255. Kuhlemeier et al., The Pea rbcS-3A Promoter Mediates Light Guevara-Garcia, A 42 by fragment of the pmas 1 containing an ocs Responsiveness but not Organ Specificity, The Plant Cell (Apr. 1989) like element confers a development, wound- and chemically . . . . 1:471-478. Plant Mol. Biol. (Nov. 1998) 38:743-753. Kwak, S.H. Shen, R., and Schiefelbein, J. (Feb. 2005). Positional Hall et al., GOLDEN 2: A Novel Transcriptional Regulator of Cel signaling mediated by a receptor-like kinase in Arabidopsis. Science lular Differentiation in the Maize Leaf. The Plant Cell (Jun. 1998) 307, 1111-1113. 10:925-936. Kyozuka et al., Eucalyptus has functional equivalents of the He et al., Transformation of rice with the Arabidopsis floral regulator Arabidopsis AP1 gene. Plant Mol. Bio. (Nov. 1997) 35 573-584. LEAFY causes early heading. Transgenic Res. Jun. 2000; 9(3):223 Larkin, J.C., Brown, M.L., and Schiefelbein, J. (2003). How do cells 7. know what they want to be when they grow up? Lessons from epi Hollung Kristin et al. (Jun. 1997) Developmental stress and ABA dermal patterning in Arabidopsis. Annu Rev Plant Biol 54, 403-430. modulation of mRNA levels for b/IP transcription factors and Vpl in Lee, J. H. etal. (Oct. 1995) Derepression of the activity of genetically barley embryos and embryo-derived Suspension cultures, Plant engineered heat shock factor causes constitutive synthesis of heat Molecular Biology, vol. 35. No. 5, pp. 561-571. shock proteins ... Plant J. 8: 603-612. Huang, et al., Cloning and Functional Characterization of an Lee, M. and Schiefelbein, J. (Nov. 24, 1999) WEREWOLF, a MYB Arabidopsis Nitrate Transporter Gene That Encodes a Constitutive related protein in Arabidopsis is a position-dependent regulator of Component of Low-Affinity Uptake. The Plant Cell (Aug. 1999) vol. cell patterning. Cell 99: 473–483. 11, 1381-1392. Lee, M.M., and Schiefelbein, J. (Mar. 2002). Cell pattern in the Hulskamp et al., Genetic dissection of trichome cell development in Arabidopsis root epidermis determined by lateral inhibition with Arabidopsis. (Feb. 11, 1994) Cell 76:555-66. feedback. Plant Cell 14, 611-618. Hung C-Y., et al. (May 1998) A common position-dependent mecha Lee et al: 'A highly conserved kinase is an essential component for nism controls cell-type patterning and GLABRA2 regulation in the stress tolerance in yeast and plant cells' PNAS,vol. 96, May 1, 1999, root and hypocotyl epidermis of Arabidopsis: Plant Physiol. 1 17: pp. 5873-5877. 73-84. Levee et al. (1999, Molecular Breeding 5:429-440). US 8,809,630 B2 Page 5

(56) References Cited Payne, C.T., Zhang, F., and Lloyd, A.M. (Nov. 2000). GL3 encodes a bHLH protein that regulates trichome development in Arabidopsis OTHER PUBLICATIONS through interaction with GL1 and TTG1. Genetics 156, 1349-1362. Pena et al., Constitutive expression of Arabidopsis LEAFY or Li Zhongsen et al., “PEI1, an embryo-specific Zinc finger protein APETALA1 genes in citrus reduces their generation time. Nat. gene required for heart-stage embryo formation in Arabidopsis.” Biotechnol. Mar. 2001; 19(3):263-7. Plant Cell vol. 10, No. 3, Mar. 1998. Peng et al. (Dec. 1, 1997). The Arabidopsis GAI gene defines a Lin, Xlaoying, et al., Sequence and analysis of chromosome 2 of the signaling pathway that negatively regulates gibberellin responses. plant Arabidopsis thaliana, Nature (Dec. 16, 1999) vol. 402, pp. Genes and Development 11:3194-3205. T61-768. Peng et al. (Jul. 15, 1999). 'Green revolution genes encode mutant Liu, Q, et al., “Two transcription facors, DREB1 and DREB2, with an gibberellin response modulators, Nature 400:256-261. EREBP/AP2 DNA binding domain separate two cellular signal Peng et al., Extragenic Suppressors of the Arabidopsis gai mutation transduction pathways in drought-and low-temperature-responsive alter the dose-response relationship of diverse gibberellin responses. gene expression, respectively, in Arabidopsis. Plant Cell, vol. 10, Plant Physiol. (Apr. 1999) 119(4): 1199-208. No. 8 (Aug. 1998), pp. 1391-1406. Prandl, R. et al. (May 1998). HSF3, a new hat shock factor from Ma, et al. (1998) Seed-specific expression of the isopentenyl Arabidopsis thaliana, derepresses the heat shock response and transferase gene (ipt) intransgenic tobacco. Aust. J. Plant Physiol. 25: conferes thermotolerance . . . Molec. Gen. Genet. 258: 269-278. 53-59. Quattrocchio et al., Analysis of bHLH and MYB domain proteins: Mandel et al. (Oct. 2, 1992). Manipulation of flower structure in species-specific regulatory differences are caused by divergent evo transgenic tobacco, Cell 71-133-143. lution . . . . Plant Journal (Feb. 1998) 13(4), 475-489. Marchler-Bauer, et al., CDD: a database of conserved domain align Razik and Quatrano, Effect of the Nuclear Factors EmEBP1 and ments with links to domain three-dimensional structure. Nucleic Viviparous 1 on the Transcription of the Em Gene in HeLa Nuclear Acids Res. (Jan. 1, 2002) vol. 30, No. 1 pp. 281-283. Extracts. The Plant Cell (Oct. 1997) vol. 9, pp. 1791-1803. Martin and Paz-Ares, MYB transcription factors in plants, Trends in Riechmann and Meyerowitz, The AP2/EREBPFamily of Plant Tran Genetics (Feb. 1997) 13:67-73. scription Factors, J. Biol. Chem. (Jun. 1998) 379:633-646. Mayer et al., The European Union Arabidopsis Genome Sequencing Riechmann and Meyerowitz, MADS Domain Proteins in Plant Consortium & The Cold Spring Harbor, Washington University in St. Development, Biol. Chem. (Oct. 1997) 378: 1079-1101. Louis and PE Biosystems Arabidopsis Sequencing Consortium, Riechmann et al., Arabidopsis Transcription Factors: Genome-Wide Nature, vol. 402, Dec. 16, 1999, p. 769-777. Comparative Analysis Among Eukaryotes, Science (Dec. 2000) McConnell et al., Nature 411 (6838):709–713, (Jun. 2001). 290:2105-2110. Mena et al., Diversification of C-function activity in maize flower Rigola et al., CaMADS1, a MADS box gene expressed in the carpel development. Science. Nov. 29, 1996; 274(5292): 1537-40. of hazelnut. Plant Mol. Biol. Dec. 1998; 38(6): 1147-60. Merzlyak and Chivkunova, Light-stress-induced pigment changes Ringli et al., Specific interaction of the tomato b/IP transcription factor VSF-1 with a non-palindromic DNA sequence that controls and evidence for anthocyanin photoprotection in apples, J. vascular gene expression, Plant Mol. Biol. (Aug. 1998)37:977-988. Photochem Phtobiol. B. (Apr. 2000) 55(2-3): 155-63. Rolland, et al., Sugar Sensing and Signaling in Plants. The Plant Cell Mitsuhara, I., et al., Efficient Promoter Cassettes for Enhanced (2002) Supplement 2002, S185-S205. Expression of Foreign Genes in Dicotyledonous and Rottmann et al., Diverse effects of overexpression of LEAFY and Monocotyledonous Plants, Plant Cell Physiol. (Jan. 1996) 37(1):49 PTLF, a poplar (Populus) homolog of LEAFY/FLORICAULA. in 59. transgenic poplar and Arabidopsis. Plant J. May 2000; 22(3):235-45. Mouradov et al., NEEDLY, a Pinus radiata ortholog of Rounsley et al., Arabidopsis thaliana chromosome II BAC T9D9 FLORIAULA/LEAFY genes, expressed in both reproductive and genomic sequence, 1997 1-128. vegetative meristems. Proc. Natl. Acad. Sci. U.S.A. May 1998; Rouse et al., Changes in Auxin Response from Mutations in an 95(11):6537-42. AUX/IAA Gene, Science 279:1371 (Feb. 1998) 279:1371-1373. Nandi et al. A conserved function for Arabidopsis SUPERMAN in Samach and Gover, Photoperiodism: the consistent use of regulating floral-whorl cell proliferation in rice, a monocotyledonous CONSTANS. Curr. Biol. Aug. 21, 2001; 11(16):R651-4. plant. Curr. Biol. Feb. 24, 2000; 10(4):215-8. Sasaki, et al., The genome sequence and structure of rice chromo Odell et al., Seed-Specific Gene Activation Mediated by the Creflox some 1. Nature (Nov. 21, 2002) 420 (6913), 312-316. Site-Specific Recombination System, Plant Physiol. (Oct. 1994) Schaffer, R. et al. (Jun. 26, 1998) The late elongated hypocotyl 106:447-458. mutation of Arabidopsis disrupts circadian rhythms and the Odell et al., Identification of DNA sequences required for activity of photoperiodic control of flowering. Cell 93: 1219-1229. the cauliflower mosaic virus 35S promoter. (Feb. 28, 1985) Nature Schaffner and Sheen, Maize rbcS Promoter Depends on Sequence 3.13:810-812. Elements Not Found in Dicot recS Promoters, The Plant Cell (Sep. Ogas, et al., Cellular differentiation regulated by gibberellin in the 1991)3:997-1012. Arabidopsis thaliana pickle mutant. Science (Jul. 4, 1997) Schellmann, S., et al. (Oct. 1, 2002) TRIPTYCHON and CAPRICE 277(5322);91-94. mediate lateral inhibition during trichome and root hair patterning in Ogas, et al., PICKLE is a CHD3 chromatin-remodeling factor that Arabidopsis. EMBO J. 21:5036-5046. regulates the transition from embryonic to vegetative development in Schiefelbein, J. (Feb. 2003) Cell-fate specification in the epidermis: Arabidopsis. Proc. Natl. Acad. Sci. U.S.A. (Nov 23, 1999) a common patterning mechanism in the root and shoot. Curr. Opin. 96(24): 13839-44. Plant Biol. 6:74-78. Ohlet al., Functional Properties of a Phenylalanine Ammonia-Lyase Schnittger, A. (Jun. 1999) Generation of a spacing pattern: the role of Promoter from Arabidopsis, The Plant Cell (Sep. 1990) 2:837-848. TRIPTYCHON intrichome patterning in Arabidopsis. Plant Cell 11: Ohme-Takagi et al., Ethylene-Inducible DNA Binding Proteins that 1105-1116. interact with an ethylene-responsive element. The Plant Cell, vol. 7. Schnittger et al., Tissue layer and organ specificity of trichome for 173-182, Feb. 1995. mation are regulated by GLABRA1 and TRIPTYCHON in Oppenheimer et al., Amyb gene required for leaf trichome differen Arabidopsis. (Jun. 1998) Development 125:2283-2289. tiation in Arabidopsis is express in stipules. (Nov. 1, 1991) Cell Shewmaker C. K. et al: “Seed-specific overexpression of phytoene 67:483-493. synthase: increase in carotenoids and other metabolic effects.” Plant Pang and Duggleby, Expression, purification, characterization, and Journal vol. 20, No. 4 (Nov. 1999) pp. 401-412. reconstitution of the large and Small Subunits of yeast Shi et al., Gibberellin and abscisie acids regulate GAST1 expression acetohydroxyacid synthase. Biochemistry (Apr. 20, 1999) at the level of transcription, Plant Mol. Biol. (Dec. 1998) 38: 1053 38(16):5222-31. 1060. US 8,809,630 B2 Page 6

(56) References Cited Weigel and Nilsson (Oct. 12, 1995). A developmental switch suffi cient for flower initiation in diverse plants. Nature 377:495-500. OTHER PUBLICATIONS White, J.A. et al. A new set of Arabidopsis expressed sequence tags from developing seed, Plant Physiol. 124(4) pp. 1582-1594 (Dec. Siebertz et al., cis-Analysis of the Wound-Inducible Promoter wunl 2000). in Transgenic Tobacco Plants and Histochemical Localization of its Willmottet al., Dnasel footprints suggest the involvement of at least Expression, The Plant Cell (Oct. 1989) 1:961-968. three types of transcription factors in the regulation . . . . Plant Mol. Smalle, Janet al., The trihelix DNA-binding motif in higher plants is Biol. (Nov. 1998) 38:817-825. not restricted to the transcription factors GR-1 and GT-2. Proc. Natl. Winicov I, "New molecular approaches to improving salt tolerance in Acad. Sci., USA, vol. 95, pp. 3318-3322, 3 (Mar. 17, 1998). crop plants.” Annals of , vol. 82, No. 6 (Dec. 1998), pp. Song, et al. Isolation and mapping of a family of putative zinc-finger 705-71O. prtein cDNAs from rice. DNA Res. (Apr. 30, 1998) 5(2):95-101. Winicov. Ilga et al., “Transgenic overexpression of the transcription Soueretal. The No Apical Meristem Gene of Petunia is Required for factor Alfinil enhances expression of the endogenous MsPRP2 gene Pattern Formation in Embryos and . . . . Cell (Apr. 19, 1996) 85: in alfalfa and improves salinity tolerance of the plants.” Plant Physi 159-170 ology vol. 120, No. 2 (Jun. 1999) pp. 473-480. Spelt, et al., Anthocyanin1 of petunia encodes a basic helix-loop Wu et al., The Arabidopsis 14-3-3 Multigene Family, Plant Physiol. helix protein that directly activates transcription of structural (Aug. 1997) 114: 1421-1431. anthocyanin genes, Plant Cell (Sep. 2000) 12(9): 1619-32. Yano et al., Hdl., a major photoperiod sensitivity quantitative trait Sundberg, et al., ALBINO3, an Arabidopsis Nuclear Gene Essential locus in rice, is closely related to the Arabidopsis flowering time gene for Chloroplast Differentiation, Encodes a Chloroplast Protein That CONSTANS. Plant Cell. Dec. 2000; 12(12):2473-2484. Shows Homology to Proteins Present in Bacterial Membranes and Yuan, L. and Knauf, V.C. “Modification of plant components.” (Apr. Yeast Mitochondria. The Plant Cell (May 1997) vol. 9, 717-730. 1, 1997) Current Opinion in Biotechnology, vol. 8, pp. 227-233. Suzuki et al. (Nov. 2001). Maize VP1 complements Arabidopsis abi3 Zhang, F. et al. (Oct. 2003). A network of redundant bHLH proteins and confers a novel ABA'auxin interaction in roots. Plant J. 28:409 function sin all TTG1-dependent pathways of Arabidopsis. Develop 418. ment 13: 4859-4869. Zhang et al., Expression of Antisense or Sense RNA of an Ankyrin Tamagnone et al. The AmMYB308 and AmMYB330 transcription Repeat-Containing Gene . . . . The Plant Cell (Dec. 1992) 4: 1575 factors from antirhinum regulate phenylpropanoid and lignin 1588. biosynthesis in transgenic tobacco. Plant Cell Feb. 1998; 10(2):135 Zhang, Y. Brown, G., Whetten, R., Loopstra, C.A., Neale, D., 54. Kieliszewski, M.J., and Sederoff, R.R. (May 2003). An Tanimoto, M., Roberts, K., and Dolan, L. (Dec. 1995). Ethylene is a arabinogalactan protein associated with secondary cell wall forma positive regulator of root hair development in Arabidopsis thaliana. tion in differentiating xylem of loblolly pine. Plant Mol Biol 52, Plant J 8,943-948. 91-102. Theologis, A., et al., Sequence and analysis of chromosome 1 of the Masucci et al. (1994) The rhd6 Mutation of Arabidopsis thaliana plant Arabidopsis thaliana. Nature (Dec. 14, 2000) 408(6814), pp. Alters Root-Hair Initiation through an Auxin- and Ethylene-Associ 816-820. ated Process. Plant Physiol. 106: 1335-1346. Thompson, et al., CLUSTALW.Improving the sensitivity of progres Galway, et al. (1994) The TTG gene is required to specify epidermal sive multiple sequence alignment through sequence weighting, posi cell fate and cell patterning in the Arabidopsis root. Dev Biol 166: tion-specific gap penalties and weight matrix choice. Nucleic Acids T40-754. Res. (Nov. 11, 1994) 22(22):4673-80. U.S. Appl. No. 09/532,591, filed Mar. 22, 2000, Samaha, R. et al. U.S. Appl. No. 09/533,648, filed Mar. 22, 2000, Riechmann, Jose Tucker et al., Crystal structure of the adenovirus DNA binding pro Luis et al. tein reveals a hook-on model ..., EMBO J. (Jul. 1, 1994) 13:2994 U.S. Appl. No. 10/290,627, filed Nov. 7, 2002, Reichmann, Jose Luis 3002. et al. Urao T. etal. (Nov. 1993) An Arabidopsis mybhomologis induced by U.S. Appl. No. 09/713,994, filed Nov. 16, 2000, Keddie, James et al. dehydration stress and its gene product binds to the conserved MYB U.S. Appl. No. 09/837,944, filed Apr. 18, 2001, Creelman, Robert et recognition sequence. Plant Cell 5: 1529-1539. al. Van der Kop et al., Selection of Arabidopsis mutants overexpressing U.S. Appl. No. 09/594,214, filed Jun. 14, 2000, Jones, J. et al. genes driven by the promoter . . . . Plant Mol. Biol. (Mar. 1999) U.S. Appl. No. 10/456,882, filed Jun. 6, 2003, Riechmann, Jose Luis 39:979-990. et al. van der Valk et al., reported in EMBO Workshop. JIC Jul. 2001. U.S. Appl. No. 10/171.468, filed Jun. 14, 2003, Creelman, Robert et Genetic transformation of perennial ryegrass with the Arabidopsis U.S. Appl. No. 12/376,569, filed Aug. 3, 2007, Creelman, Robert et homeobox gene 1 (athl) inhibits flowering. al. Wada, T. (Mar. 1997) Epidermal cell differentiation in Arabidopsis EP Partial Srch Rpt for EP 1601758, Apr. 27, 2007, Mendel Biotech, determined by a Myb homolog, CPC. Science 277: 1113-1116. Wada, T. et al. (Dec. 2002) Role of a positive regulator of root hair .S. Appl. No. 12/526,042, filed Feb. 7, 2008, Repetti, Peter P. et al. development, CAPRICE, in Arabidopsis root epidermal cell differ .S. Appl. No. 12/638,750, filed Dec. 15, 2009, Ratclife. O. et al. entiation Development 129: 5409-5419. .S. Appl. No. 09/394,519, filed Sep. 13, 1999, Zhang, J. et al. Wang, H. et al. (Jun. 2002) Regulation of the cell expansion gene .S. Appl. No. 12/573.311, filed Oct. 5, 2009, Heard, J. et al. RHD3 during Arabidopsis development. Plant Physiol. 129: 638 .S. Appl. No. 12/577,662, filed Oct. 12, 2009, Reuber, T. et al. .S. Appl. No. 12/557, 449, filed Sep. 10, 2009, Repetti, P. et al. 649. .S. Appl. No. 09/627.348, filed Jul. 28, 2000, Thomashow, Michael Wang, Z.Y., Kenigsbuch. D., Sun, L., Harel, E., Ong, M.S., and al. Tobin, E.M. (Apr. 1997). A Myb-related transcription factor is .S. Appl. No. 09/489,376, filed Jan. 21, 2000, Heard, J. et al. involved in the phytochrome regulation of an Arabidopsis Lhcbgene. .S. Appl. No. 09/489,230, filed Jan. 21, 2000, Broun, P. et al. Plant Cell 9, 491-507. .S. Appl. No. 09/506,720, filed Feb. 17, 2000, Keddie, James et al. Wang, Z-Y et al: “Constitutive Expression of the Circadian Clock .S. Appl. No. 09/533,030, filed Mar. 22, 2000, Keddie, James et al. Associated 1 (CCA1) Gene Disrupts Circadian Rhythms and Sup .S. Appl. No. 09/533,392, filed Mar. 22, 2000, Jiang, C-Zetal. presses. Its Own Expression.” Cell vol. 93, No. 7 (Jun. 26, 1998) pp. 1207-1217. * cited by examiner U.S. Patent Aug. 19, 2014 Sheet 1 of 21 US 8,809,630 B2

Fagales Cucurbitales Rosales Fabales Oxalidales Malpighiales Sapindales Malvales Brassicales GeranialesDipsacales Asterales Apiales Aquifoliales Solanales Lamiales Gentianales Garryales Ericales Cornales Saxifracales Santalales Caryophyllales Proteales Ranunculales Zingiberales Commelinales Poales Arecates Pandanales Liliales DiOSCOreales Asparagales Alismatales ACOrales Piperales Magnoliales Laurales Ceratophyllales

FIG. 1 U.S. Patent US 8,809,630 B2

U.S. Patent Aug. 19, 2014 Sheet 4 of 21 US 8,809,630 B2

100 fiO 120 SEQ ID NO 148 SEOID NO 1972 SEQID NO 38 SEQ ID NO 2142 : t f SEO ID NO 1084 : : s . SEOID NO 1085 SEOID NO 1086 SEQID NO 1083 SEQ ID NO 1087 SEOID NO 1088 SEOID NO 559 SEQ ID NO 1082 SEQID NO 1081 SEO ID NO 1089 SEOID NO 1090

FIG. 3B U.S. Patent Aug. 19, 2014 Sheet 5 of 21 US 8,809,630 B2

40 50 SEQ ID NO 170 SEQ ID NO 1950

SEQID NO370 SEQ ID NO 1184 SEQID NO 1183 SEQ ID NO 1182 P S M T S S E K V S L S SEQ ID NO 1176 P P SEQ ID NO 1177 E SEQ ID NO 1179 A A A A A r ------A S SEQ ID NO 1178 A. A. A. A E ------T - SEQ ID NO 1186 V F A A A ...... S. G SEQ ID NO 1185 A A A A S ------A S G

90 SEQ ID NO 170 SEQ ID NO 1950 SEQ ID NO 370 SEQ ID NO 1184 SEQ ID NO 1183 SEQ ID NO 1182 SEQ ID NO 1176 SEQ ID NO 1177 SEQ ID NO 1179 SEQ ID NO 1178 SEQ ID NO 1186 SEQ ID NO 1185 S A W W DEA A S A V V E

O SEQ ID NO 170 . . . . K L E S S K Y K G W W P SEQ ID NO 1950 - - - - K L P S S K Y R G W W P SEQID NO 370 . . . . K L P S S R F K G W W P SEQ ID NO 1184 - - - - K L P S S K Y K G W W P SEQ ID NO 1183 - - - - K L P S S K Y K G W W P SEQ ID NO 1182 . . . . K L S S K Y K G W W P SEQ ID NO 1176 O S S R Y K G W W P SEQID NO 1177 ------P S S R Y K G W W P SEQ ID NO 1179 G G G GK P S S K Y K G W W P SEQ ID NO 1178 G. A. G. G. K L P S S K F K G W W P SEQID NO 1186 S W G G K L P S S R Y K G W W P SEQ ID NO 1185 G W G G L P S S R Y K G W W P K L P S S K Y K G W W P FIG. 4A

U.S. Patent Aug. 19, 2014 Sheet 8 of 21 US 8,809,630 B2

70 380 390

D S F G D

400 410 420 R N P L

G G S D T W G N N N N S R - - - K R E RA ve M

W R L. F G W D

430

460 470 480 SEQ ID NO 170 SEQ ID NO 1950 SEQ ID NO 370 SEQ ID NO 1184 SEQ ID NO 1 183 SEQ ID NO 1182 SEQ ID NO 1176 SEQ ID NO 1177 SEQ ID NO 1179 L T SEQ ID NO 1178 W SEQ ID NO 1186 W SEQ ID NO 1185 E V E Y V D G D

FIG. 4D U.S. Patent Aug. 19, 2014 Sheet 9 of 21 US 8,809,630 B2

?Eja-EE-El-–EjaEEE|<-Elº,EZZE]<

60 SEQ ID NO 186 P SEOID NO 1958 SEQ ID NO 1960 P SEQ ID NO 1962 S SEQ ID NO 1238 R SEQID NO 1242 E SEQ ID NO 1240 P SEQ ID NO 1241 P SEQ ID NO 1243 E SEQ ID NO 1222 P SEQ ID NO 1223 P SEQ ID NO 1232 P SEQ ID NO 1221 A SE3 if Nóis ...... S P K R P A G R T K F SEQ ID NO 1227 - - - - - P. P K R P A G R T K F SEQ ID NO 1235 T ss. S E P C R R L S - . . . P P S S K R P A G R T K F SEQID NO 1230 R - - - Q H Q T W W T A ------PP K R P A G R T K F SEQID NO 1229 - - E Y A TWTs A PPK R P A G R T K F SEQ ID NO 1228 s S G - - Q Q Y K K R P A G R T K F SEQ ID NO 1246 T SEO ID NO 1247 s S P O P P K K R P A SEQ ID NO 1244 s SEQID NO 1245 E G G D D E Y A TV L SA P K R P A G R T K F P K R P A G R T K F

FIG. 5A U.S. Patent Aug. 19, 2014 Sheet 10 of 21 US 8,809,630 B2

70

G R W W C E W R V

OO O

FIG. 5B U.S. Patent Aug. 19, 2014 Sheet 11 of 21 US 8,809,630 B2

f30 140 50

FIG. 5C U.S. Patent Aug. 19, 2014 Sheet 12 of 21 US 8,809,630 B2

210

N A G S - - S

p S G T Y V S

220 240

SEQ ID NO 1240 D G v D, E N C N N ------N K A S SEQID NO 1241 D W S F D E D S N S ------N K G L. SEQID NO 1243 SEQID NO 1222 SEQ ID NO 1223 SEQID NO 1232 SEQ ID NO 1221 S S P W L S P N D ------D N A S S A

SEQID NO 1231 S A P W L S A K Q C E F F L S S L D C W M L M S K L S S SEQID NO 1227 N. K. D. W L P W A A A ------E W F D A SEQID NO 1235 D. D. A W T S S S ------r S T T SEQ ID NO 1230 - - E E S A A T D G ------D E S SEO ID NO 1229 SEQID NO 1228 E. T P G P S S I D G - SEQID NO 1246 - - G G E S S S D ------SEQ ID NO 1247 Q A p A P A E R v ------SEO ID NO 1244 T K P A P A E R V ------P V E A S SEQ ID NO 1245

FIG. 5D U.S. Patent Aug. 19, 2014 Sheet 13 of 21 US 8,809,630 B2

W L S D M G W y G N R

A

FIG. 5E U.S. Patent Aug. 19, 2014 Sheet 14 of 21 US 8,809,630 B2

310 320 330 SEO ID NO 186 ------L W S F D E SEQ ID NO 1958 ------SEOID NO 1960 ------SEQ ID NO 1962 D ------SEQ ID NO 1238 DD ------SEQ ID NO 1242 D ------SEQ ID NO 1240 DD ------SEQ ID NO 1241. DD ------SEQ ID NO 1243 D...... SEQ ID NO 1222 DD H Y - - - HM SEOID NO 1223 C ------SEQ ID NO 1232 G

SEQ ID NO 1221 D) SEQID NO 1231 Y F Q W R R SEQ ID NO 1227 EE. SEQ ID NO 1235 A SEO ID NO 1230 - - - - - SEQ ID NO 1229 G G A G G SEQ ID NO 1228 DE - - - - - G SEQ ID NO 1246 - - - - - E. G. F. SEQ ID NO 1247 DC S - - H S G SEQID NO 1244 E D S G SEQ ID NO 1245 G ------A N A W E S W S S A G G W E R. W L W S

340 350 360 SEQ ID NO 186 SEQ ID NO 1958 SEQ ID NO 1960 SEQ ID NO 1962 SEO ID NO 1238 SEQ ID NO 1242 SEO ID NO 1240 SEQ ID NO 1241 SEQ ID NO 1243 SEQ ID NO 1222 SEQID NO 1223 SEQ ID NO 1232 SEQ ID NO 1221 SEQ ID NO 1231 SEQ ID NO 1227 SEQ ID NO 1235 SEQ ID NO 1230 SEQ ID NO 1229 SEQ ID NO 1228 SEQ ID NO 1246 SEQID NO 1247 SEQ ID NO 1244 SEQ ID NO 1245 R P R S G A T T G Q G R K W G S R E R V R

FIG.SF

U.S. Patent Aug. 19, 2014 Sheet 16 of 21 US 8,809,630 B2

Alignment of Arabidopsis and Soy MAF Sequences

1. 20

E

FIG 7A

U.S. Patent Aug. 19, 2014 Sheet 18 of 21 US 8,809,630 B2

; : :

; : :

f: & S.

i: : * :

8 U.S. Patent Aug. 19, 2014 Sheet 19 of 21 US 8,809,630 B2

Low Temperature

F's FLC Y Mf\

Autonomous- \SOC1 .N pathway

Vegetative growth Flowering

FIG. 9 U.S. Patent Aug. 19, 2014 Sheet 20 of 21 US 8,809,630 B2

U.S. Patent Aug. 19, 2014 Sheet 21 of 21 US 8,809,630 B2

: { US 8,809,630 B2 1. 2 POLYNUCLEOTIDES AND POLYPEPTIDES example, possess advantageous or desirable traits. Strategies IN PLANTS for manipulating traits by altering a plant cells transcription factor content can therefore result in plants and crops with RELATIONSHIP TO COPENDING new and/or improved commercially valuable properties. APPLICATIONS Transcription factors can modulate gene expression, either increasing or decreasing (inducing or repressing) the rate of This application claims the benefit of U.S. Non-provisional transcription. This modulation results in differential levels of application Ser. No. 09/394,519, filed Sep. 13, 1999 (now gene expression at various developmental stages, in different abandoned), U.S. Non-provisional application Ser. No. tissues and cell types, and in response to different exogenous 09/489,376, filed Jan. 21, 2000 (now abandoned), U.S. Non 10 (e.g., environmental) and endogenous stimuli throughout the provisional application Ser. No. 09/506,720, filed Feb. 17, life cycle of the organism. 2000 (now abandoned), U.S. Non-provisional application Because transcription factors are key controlling elements Ser. No. 09/533,030, filed Mar. 22, 2000 (now abandoned), of biological pathways, altering the expression levels of one U.S. Non-provisional application Ser. No. 09/533,392, filed or more transcription factors can change entire biological Mar. 22, 2000 (now abandoned), U.S. Non-provisional appli 15 pathways in an organism. For example, manipulation of the cation Ser. No. 09/533,029, filed Mar. 22, 2000 and issued as levels of selected transcription factors may result in increased U.S. Pat. No. 6,664,446, U.S. Non-provisional application expression of economically useful proteins or biomolecules Ser. No. 09/532.59, filed Mar. 22, 2000 (now abandoned), in plants or improvement in other agriculturally relevant char U.S. Non-provisional application Ser. No. 09/533,648, filed acteristics. Conversely, blocked or reduced expression of a Mar. 22, 2000 (now abandoned), U.S. Non-provisional appli transcription factor may reduce biosynthesis of unwanted cation Ser. No. 09/958,131, filed Jan. 30, 2002 and issued as compounds or remove an undesirable trait. Therefore, U.S. Pat. No. 6,946,586, PCT Application No. PCT/US00/ manipulating transcription factor levels in a plant offers tre 09448, filed Apr. 6, 2000, U.S. Non-provisional application mendous potential in agricultural biotechnology for modify Ser. No. 09/713,994, filed Nov. 16, 2000 (now abandoned), ing a plant's traits. A number of the agriculturally relevant U.S. Non-provisional application Ser. No. 09/819,142, filed 25 characteristics of plants, and desirable traits that may be Mar. 27, 2001 (now abandoned), U.S. Non-provisional appli imbued by modified transcription factor gene expression, are cation Ser. No. 09/837,944, filed Apr. 18, 2001, U.S. Provi listed below. sional Application No. 60/310,847, filed Aug. 9, 2001, U.S. Useful Plant Traits Provisional Application No. 60/336,049, filed Nov. 19, 2001, Category: Abiotic Stress: Desired Trait: Chilling Tolerance U.S. Provisional Application No. 60/338,692, filed Dec. 11, 30 The term "chilling sensitivity” has been used to describe 2001, U.S. Non-provisional application Ser. No. 10/171.468, many types of physiological damage produced at low, but filed Jun. 14, 2002 (now abandoned), U.S. Non-provisional above freezing, temperatures. Most crops of tropical origins application Ser. No. 10/225,066, filed Aug. 9, 2002, U.S. and Such as soybean, rice, maize and cotton are easily damaged by issued as U.S. Pat. No. 7,238.860. Non-provisional applica chilling. Typical chilling damage includes wilting, necrosis, tion Ser. No. 10/225,067, filed Aug.9, 2002 and issued as U.S. 35 chlorosis or leakage of ions from cell membranes. The under Pat. No. 7,135,616, U.S. Non-provisional application Ser. lying mechanisms of chilling sensitivity are not completely No. 10/225,068, filed Aug.9, 2002 and issued as U.S. Pat. No. understood yet, but probably involve the level of membrane 7,193,129, U.S. Provisional Application No. 60/434,166, saturation and other physiological deficiencies. For example, filed Dec. 17, 2002, and U.S. Non-provisional application photoinhibition of photosynthesis (disruption of photosyn Ser. No. 10/374,780, filed Feb. 25, 2003 and issued as U.S. 40 thesis due to high light intensities) often occurs under clear Pat. No. 7,511,190, the entire contents of which are hereby atmospheric conditions Subsequent to cold late Summer/au incorporated by reference. tumn nights. By Some estimates, chilling accounts for mon etary losses in the United States (US) second only to drought JOINT RESEARCH AGREEMENT and flooding. For example, chilling may lead to yield losses 45 and lower product quality through the delayed ripening of The claimed invention, in the field of functional genomics maize. Another consequence of poor growth is the rather poor and the characterization of plant genes for the improvement ground cover of maize fields in spring, often resulting in Soil of plants, was made by or on behalf of Mendel Biotechnology, erosion, increased occurrence of weeds, and reduced uptake Inc. and Monsanto Company as a result of activities under of nutrients. A retarded uptake of mineral nitrogen could also taken, within the scope of a joint research agreement in effect 50 lead to increased losses of nitrate into the ground water. on or before the date the claimed invention was made. Category: Abiotic Stress; Desired Trait: Freezing Tolerance. Freezing is a major environmental stress that limits where TECHNICAL FIELD crops can be grown and that reduces yields considerably, depending on the weather in a particular growing season. In This invention relates to the field of plant biology. More 55 addition to exceptionally stressful years that cause measur particularly, the present invention pertains to compositions able losses of billions of dollars, less extreme stress almost and methods for phenotypically modifying a plant. certainly causes Smaller yield reductions over larger areas to produce yield reductions of similar dollar value every year. BACKGROUND OF THE INVENTION For instance, in the US, the 1995 early fall frosts are estimated 60 to have caused losses of over one billion dollars to corn and A plant's traits, such as its biochemical, developmental, or soybeans. The spring of 1998 saw an estimated S200 M of phenotypic characteristics, may be controlled through a num damages to Georgia alone, in the peach, blueberry and Straw ber of cellular processes. One important way to manipulate berry industries. The occasional freezes in Florida have that control is through transcription factors—proteins that shifted the citrus belt further south due to S100 M or more influence the expression of a particular gene or sets of genes. 65 losses. California sustained S650M of damage in 1998 to the Transformed and transgenic plants that comprise cells having citrus crop due to a winter freeze. In addition, certain crops altered levels of at least one selected transcription factor, for Such as Eucalyptus, which has the very favorable properties US 8,809,630 B2 3 4 of rapid growth and good wood quality for pulping, are not major natural disasters of all time, causing not only economic able to grow in the Southeastern states due to occasional damage, but also loss of human lives. For example, losses freezes. from the US drought of 1988 exceeded $40 billion, exceeding Inherent winter hardiness of the crop determines in which the losses caused by Hurricane Andrew in 1992, the Missis agricultural areas it can Survive the winter. For example, for 5 sippi River floods of 1993, and the San Francisco earthquake wheat, the northern central portion of the US has winters that in 1989. In some areas of the world, the effects of drought can are too cold for goodwinter wheatcrops. Approximately 20% be far more severe. In the Horn of Africa the 1984-1985 of the US wheat crop is spring wheat, with a market value of drought led to a famine that killed 750,000 people. S2 billion. Areas growing spring wheat could benefit by grow Problems for plants caused by low water availability ing winter wheat that had increased winter hardiness. ASSum 10 include mechanical stresses caused by the withdrawal of cel ing a 25% yield increase when growing winter wheat, this lular water. Drought also causes plants to become more Sus would create S500 M of increased value. Additionally, the ceptible to various diseases (Simpson (1981). “The Value of existing winter wheat is severely stressed by freezing condi Physiological Knowledge of Water Stress in Plants’. In Water tions and should have improved yields with increased toler Stress on Plants, (Simpson, G. M., ed.), Praeger, N.Y., pp. ance to these stresses. An estimate of the yield benefit of these 15 235-265). traits is 10% of the S4.4 billion winter wheatcrop in the US or In addition to the many land regions of the world that are S444 Mofyield increase, as well as better survival in extreme too arid for most if not all crop plants, overuse and over freezing conditions that occur periodically. utilization of available water is resulting in an increasing loss Thus plants more resistant to freezing, both midwinter ofagriculturally-usable land, a process which, in the extreme, freezing and Sudden freezes, would protect a farmers invest results in desertification. The problem is further compounded ment, improve yield and quality, and allow some geographies by increasing salt accumulation in soils, as described above, to grow more profitable and productive crops. Additionally, which adds to the loss of available water in soils. winter crops such as canola, wheat and barley have 25% to Category: Abiotic Stress; Desired Trait: Heat Tolerance. 50% yield increases relative to spring planted varieties of the Germination of many crops is very sensitive to tempera same crops. This yield increase is due to the “head start” the 25 ture. A transcription factor that would enhance germination in fall planted crop has over the spring planted crop and its hot conditions would be useful for crops that are planted late reaching maturity earlier while the temperatures, soil mois in the season or in hot climates. ture and lack of pathogens provide more favorable conditions. Seedlings and mature plants that are exposed to excess heat Category: Abiotic Stress; Desired Trait: Salt Tolerance. may experience heat shock, which may arise in various One in five hectares of irrigated land is damaged by salt, an 30 organs, including leaves and particularly fruit, when transpi important historical factor in the decline of ancient agrarian ration is insufficient to overcome heat stress. Heat also dam societies. This condition is only expected to worsen, further ages cellular structures, including organelles and cytoskel reducing the availability of arable land and crop production, eton, and impairs membrane function (Buchanan, Supra). since none of the top five food crops—wheat, corn, rice, Heat shock may result a decrease in overall protein synthe potatoes, and Soybean—can tolerate excessive salt. 35 sis, accompanied by expression of heat shock proteins. Heat Detrimental effects of salt on plants are a consequence of shock proteins function as chaperones and are involved in both water deficit resulting in osmotic stress (similar to refolding proteins denatured by heat. drought stress) and the effects of excess Sodium ions on Category: Abiotic Stress: Desired Trait: Tolerance to Low critical biochemical processes. As with freezing and drought, Nitrogen and Phosphorus. high saline causes water deficit; the presence of high salt 40 The ability of all plants to remove nutrients from their makes it difficult for plant roots to extract water from their environment is essential to survival. Thus, identification of environment (Buchanan et al. (2000) in Biochemistry and genes that encode polypeptides with transcription factor Molecular Biology of Plants, American Society of Plant activity may allow for the generation of transgenic plants that Physiologists, Rockville, Md.). Soil salinity is thus one of the are better able to make use of available nutrients in nutrient more important variables that determines where a plant may 45 poor environments. thrive. In many parts of the world, sizable land areas are Among the most important macronutrients for plant uncultivable due to naturally high soil salinity. To compound growth that have the largest impact on crop yield are nitrog the problem, salination of soils that are used for agricultural enous and phosphorus-containing compounds. Nitrogen- and production is a significant and increasing problem in regions phosphorus-containing fertilizers are used intensively in agri that rely heavily on agriculture. The latter is compounded by 50 culture practices today. An increase in grain crop yields from over-utilization, over-fertilization and water shortage, typi 0.5 to 1.0 metric tons per hectare to 7 metric tons per hectare cally caused by climatic change and the demands of increas accompanied the use of commercial fixed nitrogen fertilizer ing population. Salt tolerance is of particular importance in production farming (Vance (2001) Plant Physiol. 127: early in a plants lifecycle, since evaporation from the soil 390-397). Given current practices, in order to meet food pro Surface causes upward water movement, and Salt accumulates 55 duction demands in years to come, considerable increases in in the upper soil layer where the seeds are placed. Thus, the amount of nitrogen- and phosphorus-containing fertiliz germination normally takes place at a salt concentration ers will be required (Vance, Supra). much higher than the mean salt level in the whole soil profile. Nitrogen is the most abundant element on earth yet it is one Category: Abiotic Stress: Desired Trait: Drought Tolerance. of the most limiting elements to plant growth due to its lack of While much of the weather that we experience is brief and 60 availability in the soil. Plants obtain N from the soil from short-lived, drought is a more gradual phenomenon, slowly several Sources including commercial fertilizers, manure and taking hold of an area and tightening its grip with time. In the mineralization of organic matter. The intensive use of N severe cases, drought can last for many years, and can have fertilizers in present agricultural practices is problematic, the devastating effects on agriculture and water Supplies. With energy intensive Haber-Bosch process makes N fertilizer and burgeoning population and chronic shortage of available 65 it is estimated that the US uses annually between 3-5% of the fresh water, drought is not only the number one weather nation's natural gas for this process. In addition to the expense related problem in agriculture, it also ranks as one of the ofN fertilizer production and the depletion of non-renewable US 8,809,630 B2 5 6 resources, the use of N fertilizers has led to the eutrophication Black Sigatoka, a fungal disease caused by Mycosphaer of freshwater ecosystems and the contamination of drinking ella species that attacks banana foliage, is spreading through water due to the runoff of excess fertilizer into ground water out the regions of the world that are responsible for producing Supplies. most of the world's banana crop; Phosphorus is second only to N in its importance as a Eutylpa dieback, caused by Euty palata, affects a number of macronutrient for plant growth and to its impact on crop yield. crop plants, including vine grape. Eutypa dieback delays Phosphorus (P) is extremely immobile and not readily avail shoot emergence, and causes chlorosis, stunting, and tattering able to roots in the soil and is therefore often growth limiting of leaves; to plants. Inorganic phosphate (Pi) is a constituent of several Pierce's disease, caused by the bacterium Xylella fastid 10 iosa, precludes growth of grapes in the Southeastern United important molecules required for energy transfer, metabolic States, and threatens the profitable wine grape industry in regulation and protein activation (Marschner (1995) Mineral northern California. The bacterium clogs the vasculature of Nutrition of Higher Plants, 2nd ed., Academic Press, San the grapevines, resulting in foliar Scorching followed by slow Diego, Calif.). Plants have evolved several strategies to help death of the vines. There is no known treatment for Pierce's cope with P and N deprivation that include metabolic as well 15 disease; as developmental adaptations. Most, if not all, of these strat Bacterial Spot caused by the bacterium Xanthomonas egies have components that are regulated at the level of tran campestris causes serious disease problems on tomatoes and Scription and therefore are amenable to manipulation by tran peppers. It is a significant problem in the Florida tomato Scription factors. Metabolic adaptations include increasing industry because it spreads rapidly, especially in warm peri the availability of P and N by increasing uptake from the soil ods where there is wind-driven rain. Under these conditions, though the induction of high affinity and low affinity trans there are no adequate control measures; porters, and/or increasing its mobilization in the plant. Devel Diseases caused by viruses of the family Geminiviridae are opmental adaptations include increases in primary and sec a growing agricultural problem worldwide. Geminiviruses ondary roots, increases in root hair number and length, and have caused severe crop losses intomato, cassaya, and cotton. associations with mycorrhizal fungi (Bates and Lynch (1996) 25 For instance, in the 1991-1992 growing season in Florida, Plant Cell Environ. 19:529-538; Harrison (1999) Annu. Rev. geminiviruses caused S140 million in damages to the tomato Plant Physiol. Plant Mol. Biol. 50: 361-389). crop (Moffat (1991) Science 286: 1835). Geminiviruses have Category: Biotic Stress: Desired Trait: Disease Resistance. the ability to recombine between strains to rapidly produce Disease management is a significant expense in crop pro new virulent varieties. Therefore, there is a pressing need for duction worldwide. According to EPA reports for 1996 and 30 broad-spectrum geminivirus control; 1997, us farmers spend approximately S6 billion on fungi The Soybean cyst nematode, Heterodera glycines, causes cides annually. Despite this expenditure, according to a sur stunting and chlorosis of soybean plants, which results in vey conducted by the food and agriculture organization, plant yield losses or plant death from severe infestation. Annual diseases still reduce worldwide crop productivity by 12% and losses in the United States have been estimated at S1.5 billion in the United States alone, economic losses due to plant 35 (University of Minnesota Extension Service). pathogens amounts to 9.1 billion dollars (FAO, 1993). Data The aforementioned pathogens represent a very small frac from these reports and others demonstrate that despite the tion of diverse species that seriously affect plant health and availability of chemical control only a small proportion of the yield. For a more complete description of numerous plant losses due to disease can be prevented. Not only are fungi diseases, see, for example, Vidhyasekaran (1997) Fungal cides and anti-bacterial treatments expensive to growers, but 40 Pathogenesis in Plants and Crops. Molecular Biology and their widespread application poses both environmental and Host Defense Mechanisms, Marcel Dekker, Monticello, health risks. The use of plant biotechnology to engineer dis N.Y.), or Agrios (1997) Plant Pathology, Academic Press, ease resistant crops has the potential to make a significant New York, N.Y.). Plants that are able to resist disease may economic impact on agriculture and forestry industries in two produce significantly higher yields and improved food qual ways: reducing the monetary and environmental expense of 45 ity. It is thus of considerable importance to find genes that fungicide application and reducing both pre-harvest and post reduce or prevent disease. harvest crop losses that occur now despite the use of costly Category: Light Response: Desired Trait: Reduced Shade disease management practices. Avoidance. Fungal, bacterial, oomycete, viral, and nematode diseases Shade avoidance describes the process in which plants of plants are ubiquitous and important problems, and often 50 grown in close proximity attempt to out-compete each other severely impact yield and quality of crop and other plants. A by increasing stem length at the expense of leaf fruit and very few examples of diseases of plants include: storage organ development. This is caused by the plants Powdery mildew, caused by the fungi Erysiphe, Sphaeroth response to far-red radiation reflected from leaves of neigh eca, Phyllactinia, Microsphaera, Podosphaera or Uncinula, boring plants, which is mediated by phytochrome photore in, for example, wheat, bean, cucurbit, lettuce, pea, grape, tree 55 ceptors. Close proximity to other plants, as is produced in fruit crops, as well as roses, phlox, lilacs, grasses, and Euony high-density crop plantings, increases the relative proportion muS. of far-red irradiation, and therefore induces the shade avoid Fusarium-caused diseases such as Fusarium wilt in cucur ance response. Shade avoidance adversely affects biomass bits, Fusarium head blight in barley and wheat, wilt and and yield, particularly when leaves, fruits or other storage crown and root rot in tomatoes; 60 organs constitute the desired crop (see, for example, Smith Sudden oak death, caused by the oomycete Phytophthora (1982) Annu. Rev. Plant Physiol. 33: 481-518; Ballare et al. ramorum; this disease was first detected in 1995 in California (1990) Science 247:329-332: Smith (1995) Annu. Dev. Plant tan oaks. The disease has since killed more than 100,000 tan Physiol. Mol. Biol., 46: 289-315; and Schmitt et al. (1995), oaks, coast live oaks, black oaks, and Shreve's oaks in coastal American Naturalist, 146: 937-953). Alteration of the shade regions of northern California, and more recently in South 65 avoidance response in tobacco through alteration of phyto western Oregon (Roach (2001) National Geographic News, chrome levels has been shown to produce an increase in Dec. 6, 2001): harvest index (leaf biomass/total biomass) at high planting US 8,809,630 B2 7 8 density, which would result in higher yield (Robson et al. cence in an agricultural setting has significant value. For (1996) Nature Biotechnol. 14:995-998). example, a delay in leafsenescence in some maize hybrids is Category: Flowering Time: Desired Trait: Altered Flowering associated with a significant increase in yields and a delay of Time and Flowering Control. a few days in the Senescence of soybean plants can have a Timing of flowering has a significant impact on production large impact on yield. In an experimental setting, tobacco of agricultural products. For example, varieties with different plants engineered to inhibit leaf senescence had a longer flowering responses to environmental cues are necessary to photosynthetic lifespan, and produced a 50% increase in dry adapt crops to different production regions or systems. Such weight and seed yield (Ganand Amasino (1995) Science 270: a range of varieties have been developed for many crops, 1986-1988). Delayed flower senescence may generate plants including wheat, corn, soybean, and Strawberry. Improved 10 methods for alteration of flowering time will facilitate the that retain their blossoms longer and this may be of potential development of new, geographically adapted varieties. interest to the ornamental horticulture industry, and delayed Breeding programs for the development of new varieties foliar and fruit senescence could improve post-harvest shelf can be limited by the seed-to-seed cycle. Thus, breeding new life of produce. varieties of plants with multi-year cycles (such as biennials, 15 Further, programmed cell death plays a role in other plant e.g. carrot, or fruit trees, such as citrus) can be very slow. With responses, including the resistance response to disease, and respect to breeding programs, there would be a significant Some symptoms of diseases, for example, as caused by advantage in having commercially valuable plants that necrotrophic pathogens such as Botrytis cinerea and Sclero exhibit controllable and modified periods to flowering tinia sclerotiorum (Dickman et al. Proc. Natl. Acad. Sci., 98: (“flowering times'). For example, accelerated flowering 6.957-6962). Localized senescence and/or cell death can be would shorten crop and tree breeding programs. used by plants to contain the spread of harmful microorgan Improved flowering control allows more than one planting isms. A specific localized cell death response, the "hypersen and harvest of a crop to be made within a single season. Early sitive response', is a component of race-specific disease flowering would also improve the time to harvest plants in resistance mediated by plant resistance genes. The hypersen which the flower portion of the plant constitutes the product 25 sitive response is thought to help limit pathogen growth and to (e.g., broccoli, cauliflower, and other edible flowers). In addi initiate a signal transduction pathway that leads to the induc tion, chemical control of flowering through induction or inhi tion of systemic plant defenses. Accelerated senescence may bition of flowering in plants could provide a significant be a defense against obligate pathogens, such as powdery advantage to growers by inducing more uniform fruit produc mildew, that rely on healthy plant tissue for nutrients. With tion (e.g., in Strawberry) 30 regard to powdery mildew, Botrytis cinerea and Sclerotinia A sizable number of plants for which the vegetative portion sclerotiorum and other pathogens, transcription factors that of the plant forms the valuable crop tend to “bolt” dramati ameliorate cell death and/or damage may reduce the signifi cally (e.g., spinach, onions, lettuce), after which biomass cant economic losses encountered, such as, for example, Bot production declines and product quality diminishes (e.g., rytis cinerea in Strawberry and grape. through flowering-triggered senescence of vegetative parts). 35 Category: Growth Regulator; Desired Trait: Altered Sugar Delay or prevention of flowering may also reduce or preclude Sensing dissemination of pollen from transgenic plants. Sugars are key regulatory molecules that affect diverse Category: Growth Rate: Desired Trait: Modified Growth processes in higher plants including germination, growth, Rate. flowering, senescence, Sugar metabolism and photosynthesis. For almost all commercial crops, it is desirable to use 40 Sucrose, for example, is the major transport form of photo plants that establish more quickly, since seedlings and young synthate and its flux through cells has been shown to affect plants are particularly Susceptible to stress conditions such as gene expression and alter storage compound accumulation in salinity or disease. Since many weeds may outgrow young seeds (source-sink relationships). Glucose-specific hexose crops or out-compete them for nutrients, it would also be sensing has also been described in plants and is implicated in desirable to determine means for allowing young crop plants 45 cell division and repression of “famine genes (photosyn to out compete weed species. Increasing seedling growth rate thetic or glyoxylate cycles). (emergence) contributes to seedling vigor and allows for Category: Morphology; Desired Trait: Altered Morphology crops to be planted earlier in the season with less concern for Trichomes are branched or unbranched epidermal out losses due to environmental factors. Early planting helps add growths or hair structures on a plant. Trichomes produce a days to the critical grain-filling period and increases yield. 50 variety of secondary biochemicals such as diterpenes and Providing means to speed up or slow down plant growth waxes, the former being important as, for example, insect would also be desirable to ornamental horticulture. If such pheromones, and the latter as protectants against desiccation means be provided, slow growing plants may exhibit pro and herbivorous pests. Since diterpenes also have commercial longed pollen-producing or fruiting period, thus improving value as flavors, aromas, pesticides and cosmetics, and poten fertilization or extending harvesting season. 55 tial value as anti-tumor agents and inflammation-mediating Category: Growth Rate: Desired Trait: Modified Senescence Substances, they have been both products and the target of and Cell Death. considerable research. In most cases where the metabolic Premature senescence, triggered by various plant stresses, pathways are impossible to engineer, increasing trichome can limit production of both leaf biomass and seed yield. density or size on leaves may be the only way to increase plant Transcription factor genes that Suppress premature Senes 60 productivity. Thus, it would be advantageous to discover tri cence or cell death in response to stresses can provide means chome-affecting transcription factor genes for the purpose of for increasing yield. Delay of normal developmental senes increasing trichome density, size, or type to produce plants cence could enhance yield also, particularly for those plants that are better protected from insects or that yield higher for which the vegetative part of the plant represents the com amounts of secondary metabolites. mercial product (e.g., spinach, lettuce). 65 The ability to manipulate wax composition, amount, or Although leaf senescence is thought to be an evolutionary distribution could modify plant tolerance to drought and low adaptation to recycle nutrients, the ability to control senes humidity or resistance to insects, as well as plant appearance. US 8,809,630 B2 10 In particular, a possible application for a transcription factor Altered amino acid composition of seeds, through altered gene that reduces wax production in Sunflower seed coats protein composition, is also a desired objective for nutritional would be to reduce fouling during seed oil processing. Anti improvement. sense or co-suppression of transcription factors involved in Category: Seed Biochemistry Desired Trait: Altered Prenyl wax biosynthesis in a tissue specific manner can be used to Lipids. specifically alterwax composition, amount, or distribution in Prenyl lipids, including the tocopherols, play a role in those plants and crops from which wax is either a valuable anchoring proteins in membranes or membranous organelles. attribute or product or an undesirable constituent of plants. Tocopherols have both anti-oxidant and vitamin E activity. Other morphological characteristics that may be desirable Modified tocopherol composition of plants may thus be use 10 ful in improving membrane integrity and function, which in plants include those of an ornamental nature. These include may mitigate abiotic stresses such as heat stress. Increasing changes in seed color, overall color, leaf and flower shape, the anti-oxidant and vitamin content of plants through leaf color, leaf size, or glossiness of leaves. Plants that pro increased tocopherol content can provide useful human duce dark leaves may have benefits for human health; fla health benefits. vonoids, for example, have been used to inhibit tumor growth, 15 Category: LeafBiochemistry: Desired Trait: Altered Glucosi prevent of bone loss, and prevention lipid oxidation in ani nolate Levels mals and humans. Plants in which leaf size is increased would Increases or decreases in specific glucosinolates or total likely provide greater biomass, which would be particularly glucosinolate content can be desirable depending upon the valuable for crops in which the vegetative portion of the plant particular application. For example: (i) glucosinolates are constitutes the product. Plants with glossy leaves generally undesirable components of the oilseeds used in animal feed, produce greater epidermal wax, which, if it could be aug since they produce toxic effects; low-glucosinolate varieties mented, resulted in a pleasing appearance for many ornamen of canola have been developed to combat this problem; (ii) tals, help prevent desiccation, and resist herbivorous insects Some glucosinolates have anti-cancer activity; thus, increas and disease-causing agents. Changes in plant or plant part ing the levels or composition of these compounds can be of coloration, brought about by modifying, for example, antho 25 use in production of nutraceuticals; and (iii) glucosinolates cyanin levels, would provide novel morphological features. form part of a plant's natural defense against insects; modi In many instances, the seeds of a plant constitute a valuable fication of glucosinolate composition or quantity could there crop. These include, for example, the seeds of many legumes, fore afford increased protection from herbivores. Further nuts and grains. The discovery of means for producing larger more, tissue specific promoters can be used in edible crops to seed would provide significant value by bringing about an 30 ensure that these compounds accumulate specifically in par increase in crop yield. ticular tissues, such as the epidermis, which are not taken for Plants with altered inflorescence, including, for example, human consumption. larger flowers or distinctive floral configurations, may have Category: Leaf Biochemistry: Desired Trait: Flavonoid Pro high value in the ornamental horticulture industry. duction. Modifications to flower structure may have advantageous 35 Expression of transcription factors that increase flavonoid or deleterious effects on fertility, and could be used, for production in plants, including anthocyanins and condensed example, to decrease fertility by the absence, reduction or tannins, may be used to alter pigment production for horti screening of reproductive components. This could be a desir cultural purposes, and possibly to increase stress resistance. able trait, as it could be exploited to prevent or minimize the Flavonoids have antimicrobial activity and could be used to escape of the pollen of genetically modified organisms into 40 engineer pathogen resistance. Several flavonoid compounds the environment. have human health promoting effects Such as inhibition of Manipulation of inflorescence branching patterns may also tumor growth, prevention of bone loss and prevention of lipid be used to influence yield and offer the potential for more oxidation. Increased levels of condensed tannins in forage effective harvesting techniques. For example, a “self prun legumes would provide agronomic benefits in ruminants by ing mutation of tomato results in a determinate growth pat 45 preventing pasture bloat by collapsing protein foams within tern and facilitates mechanical harvesting (Pnueli et al. the rumen. For a review on the utilities of flavonoids and their (2001) Plant Cell 13(12): 2687-2702). derivatives, see Dixon et al. (1999) Trends Plant Sci. 4: 394 Alterations of apical dominance or plantarchitecture could 400. create new plant varieties. Dwarf plants may be of potential Genetic and molecular studies on Arabidopsis have interest to the ornamental horticulture industry. 50 revealed that the timing of flowering is influenced by a large Category: Seed Biochemistry Desired Trait: Altered Seed Oil number of different genes (Martinez-Zapater and Somerville The composition of seeds, particularly with respect to seed (1990) Plant Physiol. 92: 770-776: Koornneef et al. (1991) oil quantity and/or composition, is very important for the Mol. Gen. Genet. 229:57-66; Martinez-Zapater et al. (1994) nutritional value and production of various food and feed In Meyerowitz and Somerville, editors, Arabidopsis, Cold products. Desirable improvements to oils include enhanced 55 Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., heat stability, improved nutritional quality through, for pp. 403-433; Koornneef et al. (1998a) Annu. Rev. Plant example, reducing the number of calories in seed, increasing Physiol. Plant Mol. Biol. 49: 345-370; Koornneef et al. the number of calories in animal feeds, or altering the ratio of (1998b) Genetics 148: 885-892: Levy and Dean (1998) Plant saturated to unsaturated lipids comprising the oils. Cell 10: 1973-1990; Simpson et al. (1999) Annu. Rev. Cell Category: Seed Biochemistry: Desired Trait: Altered Seed 60 Dev. Biol. 15: 519-550; Simpson and Dean (2002) Science Protein 296: 285-289; and Ratcliffe and Riechmann (2002) Curr: As with seed oils, seed protein content and composition is Issues Mol. Biol. 4: 77-91). Such loci ensure that the switch very important for the nutritional value and production of from vegetative to reproductive growth takes place at the most various food and feed products. Altered protein content or appropriate time with respect to a variety of abiotic and biotic concentration in seeds may be used to provide nutritional 65 variables. Amongst the most intensively studied effects are benefits, and may also prolong storage capacity, increase seed the responses to day length and prolonged exposure to low pest or disease resistance, or modify germination rates. temperatures (vernalization). US 8,809,630 B2 11 12 Arabidopsis flowers rapidly in long day photoperiodic con During vernalization, FLC transcript and protein levels ditions of 16 hours or continuous light. However, under short fall, and the plants become competent to flower (Michaels and day conditions of 8-10 hours of light, the plants display a Amasino (1999) Plant Cell 11:949-956; Michaels and Ama much more extensive period of vegetative growth prior to sino (2001) supra; Sheldon et al. (1999) supra; Sheldon et al. flowering. Genes that control this day length response were (2000) Proc. Natl. Acad. Sci. 97: 3753-3758; Johanson et al. originally identified via mutations that cause late flowering (2000) supra; and Rouse et al. (2002) supra). Additionally, under long days, but which do not alter flowering time in short overexpression of FLC from a 35S CaMV promoter in the day conditions. Examples of photoperiod pathway genes Landsberg ecotype (which lacks an active FRI allele) is suf include CONSTANS (CO), GIGANTEA (GI), FE, FD, and ficient to severely delay or prevent flowering, and renders the FHA. A second group of genes, which includes LUMINIDE 10 plants insensitive to Vernalization (Michaels and Amasino PENDENS (LD), FCA, FVE, FY, and FPA, form an autono (1999) supra; Sheldon et al. (1999) supra). These findings mous pathway that monitors the developmental state of the indicate that FLC is a potent floral repressor; it has now been plant and is active under all photoperiodic conditions. shown that such repression is achieved by FLC inhibiting Mutants for this second class of genes flower later than wild downstream genes that promote flowering, including SOC1 type controls irrespective of the day length (Koornneef et al. 15 and FT (Borner et al. (2000) Plant J. 24: 591-599; Lee et al. (1991); Martinez-Zapater et al. (1994); Koornneef et al. (2000) Genes Dev. 14: 2366-2376; Onouchi et al. (2000) (1998a); and Koornneef et al. (1998b); all supra). Plant Cell 12: 885-900; Samach et al. (2000) Science 288: Importantly, mutants from the photoperiod and autono 1613-1616; Michaels and Amasino (2001) supra). Thus, pro mous pathways also show a differential response to Vernal motion of flowering by either the autonomous pathway or ization. Via a Vernalization response, Arabidopsis ecotypes vernalization involves repression of FLC and the subsequent from northern latitudes, such as Stockholm, Sweden, are de-repression of FLC targets. Recently, regions within the adapted to flower in the spring following exposure to cold FLC gene and its promoter have been defined which are winter conditions. This avoids flowering in the late summer required for its vernalization induced repression (Sheldon et when seed maturation might be curtailed by the onset of al. (2002) Supra). However, the molecular signaling events winter conditions. (See, for example, Vince-Prue (1975) In 25 that lead to a fall in FLC levels during vernalization are still Photoperiodism in plants McGraw Hill, London, UK, pp unclear. The products of VERNALIZATION2 and VER 263-291; Napp-Zinn (1957) Z. Indukt. Abstammungs Ver NALIZATION1 maintain repression of FLC, once levels of erbungsl. 88: 253-285; and Reeves and Coupland (2000) FLC transcript have declined (Gendalletal. (2001) Cell 107: Curr. Opin. Plant Biol. 3:37-42). 525-535; Levy et al. (2002) Science 297: 243-246), but it is When such ecotypes are grown in the laboratory they 30 not yet known how the decline is initially achieved. flower late, but will flower much earlier if subjected to a cold A number of additional questions, regarding the molecular period of 4-8 weeks while the seed is germinating. In a com basis of Vernalization, still remain unanswered. First, it has parable manner, mutants from the autonomous pathway been observed that null fle mutants are responsive to vernal exhibit a very marked reduction in flowering time when sub ization (Michaels and Amasino (2001) supra). Therefore ver jected to Vernalization. By contrast, mutants from the photo 35 nalization can promote flowering by other mechanisms as period pathway show only a minor response to cold treat well as via repression of FLC. In addition, vernalization is a ments. Thus, Vernalization can overcome the requirement for quantitative response to prolonged periods of cold (Sheldon the autonomous pathway. (See Martinez-Zapater and Somer et al. (2000) supra); a mechanism must therefore exist to ville (1990) supra; Koornneef et al. (1991) supra; Bagnall ensure that Vernalization does not always occur in response to (1992) Aust. J. Plant Physiol. 19:401-409: Burnet al. (1993) 40 short periods of cold, lasting only a few days. Proc. Natl. Acad. Sci. 90: 287-291; Lee et al. (1993) Mol. Vernalization may also be desirable in plants that do not Gen. Genet. 237: 171-176: Clarke and Dean (1994) Mol. Gen. normally have a vernalization response. Such plants in which Genet. 242: 81-89; Chandler et al. (1996) Plant J. 10: 637 expression of a polynucleotide creates a vernalization 644: Koornneef et al. (1998b) supra.) response therefore may be propagated and cultivated at dif Genetic and molecular analyses have revealed that a 45 ferent latitudes and/or altitudes compared with the native MADS box protein, FLOWERING LOCUS C (FLC), is a plant species that do not express a polynucleotide creating a major determinant of the Vernalization response (Koornneef Vernalization response. et al. (1994) supra; Lee et al. (1994) supra; Sanda and Ama The present invention relates to methods and compositions sino (1996) Mol. Gen. Genet. 251: 69-74; Michaels and Ama for producing transgenic plants with modified traits, particu sino (2000) Plant Cell and Environment 23: 1145-1153; Shel 50 larly traits that address the agricultural and food needs donetal. (1999) Plant Cell 11:445-458: Sheldon et al. (2002) described in the above background information. These traits Plant Cell 14:2527-2537; and Rouse et al. (2002) Plant J. may provide significant value in that they allow the plant to 29:183-191). High levels of both FLC gene transcript and thrive in hostile environments, where, for example, tempera protein are present in mutants for the autonomous pathway ture, water and nutrient availability or salinity may limit or and also in naturally late flowering northern ecotypes, which 55 prevent growth of non-transgenic plants. The traits may also contain active alleles of a second locus, FRIGIDA (FRI; Burn comprise desirable morphological alterations, larger or et al. (1993) supra; Clarke and Dean (1994) supra; Johanson Smaller size, disease and pest resistance, alterations in flow et al. (2000) Science 290: 344-347). By contrast, mutants ering time, light response, and others. from the photoperiod pathway, and backgrounds lacking an We have identified polynucleotides encoding transcription active FRI allele, show relatively low levels of FLC transcript. 60 factors, developed numerous transgenic plants using these Furthermore, null alleles of fle completely suppress the late polynucleotides, and have analyzed the plants for a variety of flowering caused by autonomous pathway mutations and important traits. In so doing, we have identified important active FRI alleles, but have no effect on the delayed flowering polynucleotide and polypeptide sequences for producing in photoperiod pathway mutants (Michaels and Amasino commercially valuable plants and crops as well as the meth (2001) Plant Cell 13:935-941). FLC gene expression there 65 ods for making them and using them. Other aspects and fore appears to be supported by FRI and strongly repressed by embodiments of the invention are described below and can be floral activators within the autonomous pathway. derived from the teachings of this disclosure as a whole. US 8,809,630 B2 13 14 SUMMARY OF THE INVENTION 1662, 1663, 1664, 1665, 1666, 1667, 1668, 1669, 1670, 1671, 1674, 1675, 1676, 1677, 1678, 1679, 1681, 1683, 1684, 1685, Transgenic plants and methods for producing transgenic 1690, 1691, 1692, 1693, 1694, 1695, 1697, 1698, 1699, 1700, plants are provided. The transgenic plants a recombinant 1701, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1711, 1944, polynucleotide having a polynucleotide sequence, or a 1946, 1948, 1950, 1952, 1954, 1956, 1958, 1960, 1962, 1964, complementary polynucleotide sequence thereof, that 1966, 1968, 1970, and 1972. encodes a transcription factor. The transcription factors are comprised of polypeptide The polynucleotide sequences that may encode the tran sequences listed in the Sequence Listing and include any of Scription factors are listed in the Sequence Listing and include SEQ ID NO: 2N, wherein N=1-480, SEQ ID NO: 2N-1, any of SEQID NO: 2N-1, where N=1-480, SEQID NO: 2N, 10 where N=857-970, or SEQ ID NO: 989, 990, 991, 1001, where N=856-969, or SEQID NO: 961, 962, 963,964,965, 1002, 1012, 1018, 1021, 1022, 1025, 1027, 1028, 1029, 1034, 967,968, 969, 970,971,972,973, 974,975, 976, 977,978, 1050, 1051, 1072, 1073, 1074, 1075, 1076, 1091, 1092, 1093, 979,980, 981,982,983, 984, 985, 986,987, 988, 992, 993, 1094, 1095, 1109, 1110, 1111, 1112, 1150, 1165, 1166, 1167, 994, 995, 996,997, 998,999, 1000, 1003, 1004, 1005, 1006, 1168, 1169, 1189, 1190, 1191, 1197, 1198, 1199, 1213, 1214, 1007, 1008, 1009, 1010, 1011, 1013, 1014, 1015, 1016, 1017, 15 1215, 1216, 1226, 1227, 1233, 1239, 1246, 1247, 1258, 1259, 1019, 1020, 1023, 1024, 1026, 1030, 1031, 1032, 1033, 1035, 1269,1307, 1308, 1309, 1310, 1323, 1329, 1330, 1331, 1332, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043, 1044, 1045, 1338,1339,1340, 1361, 1362, 1373, 1374, 1375, 1384, 1389, 1046, 1047, 1048, 1049, 1052, 1053, 1054, 1055, 1056, 1057, 1390, 1391, 1396, 1411, 1412, 1413, 1414, 1424, 1435, 1436, 1058, 1059, 1060, 1061, 1062, 1063, 1064, 1065, 1066, 1067, 1437, 1448, 1456, 1457, 1458, 1459, 1460, 1472, 1483, 1484, 1068, 1069,1070, 1071, 1077, 1078, 1079, 1080, 1081, 1082, 1500, 1508,1510, 1511, 1520, 1538, 1539, 1540, 1541, 1542, 1083, 1084, 1085, 1086, 1087, 1088, 1089, 1090, 1096, 1097, 1543, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1582, 1583, 1098, 1099, 1100, 1101, 1102,1103, 1104, 1105, 1106, 1107, 1594, 1611, 1612, 1618, 1619, 1620, 1626, 1627, 1635, 1636, 1108, 1113, 1114, 1115, 1116, 1117, 1118, 1119, 1120, 1121, 1640, 1641, 1655, 1656, 1657, 1658, 1672, 1673, 1680, 1682, 1122, 1123, 1124, 1125, 1126, 1127, 1128, 1129, 1130, 1131, 1686, 1687, 1688, 1689, 1696, 1702, 1703, 1945, 1947, 1949, 1132, 1133, 1134, 1135, 1136, 1137, 1138, 1139, 1140, 1141, 25 1951, 1953, 1955, 1957, 1959, 1961, 1963, 1965, 1967, 1969, 1142, 1143, 1144, 1145,1146, 1147, 1148, 1149, 1151,1152, 1971, and 1973. 1153,1154, 1155, 1156, 1157,1158, 1159, 1160, 1161,1162, The transgenic plant that comprises the recombinant poly 1163, 1164, 1170, 1171, 1172, 1173, 1174, 1175, 1176, 1177, nucleotide has a polynucleotide sequence (or a sequence that 1178, 1179, 1180, 1181,1182, 1183,1184, 1185, 1186, 1187, is complementary to this polynucleotide sequence) selected 1188, 1192, 1193, 1194, 1195, 1196, 1200, 1201, 1202, 1203, 30 from 1204, 1205, 1206, 1207, 1208, 1209, 1210, 1211, 1212, 1217, (a) a nucleotide sequence that encodes one of the afore 1218, 1219, 1220, 1221, 1222, 1223, 1224, 1225, 1228, 1229, mentioned transcription factor polypeptide sequences; or 1230, 1231, 1232, 1234, 1235, 1236, 1237, 1238, 1240, 1241, (b) a polypeptide sequence that comprises one of the afore 1242, 1243, 1244, 1245, 1248, 1249, 1250, 1251, 1252, 1253, mentioned transcription factor polypeptides. 1254, 1255, 1256, 1257, 1260, 1261, 1262, 1263, 1264, 1265, 35 In an example of a preferred embodiment, the transcription 1266, 1267,1268, 1270, 1271, 1272, 1273, 1274, 1275, 1276, factor polynucleotide sequence of (a) comprises G682, or 1277, 1278, 1279, 1280, 1281, 1282, 1283, 1284, 1285, 1286, SEQID NO: 467. In another example of a preferred embodi 1287, 1288, 1289, 1290, 1291, 1292, 1293, 1294, 1295, 1296, ment, the transcription factor polypeptide of (b) comprises 1297, 1298, 1299, 1300, 1301, 1302, 1303,1304, 1305, 1306, G682, or SEQID NO: 468. 1311, 1312, 1313, 1314, 1315, 1316, 1317, 1318, 1319, 1320, 40 The transgenic plant may also comprise a polynucleotide 1321, 1322, 1324, 1325, 1326, 1327, 1328, 1333, 1334, 1335, sequence that is a variant of the sequences in (a) and (b) that 1336, 1337, 1341, 1342, 1343, 1344, 1345, 1346, 1347, 1348, encode a polypeptide and initiate transcription, including: 1349, 1350, 1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, (c) a sequence variant of the nucleotide sequences of (a) or 1359,1360, 1363, 1364, 1365, 1366, 1367, 1368, 1369,1370, (b): 1371, 1372, 1376, 1377, 1378,1379, 1380, 1381, 1382, 1383, 45 (d) an allelic variant of the nucleotide sequences of (a) or 1385, 1386, 1387, 1388, 1392, 1393, 1394, 1395, 1397, 1398, (b): 1399, 1400, 1401, 1402, 1403, 1404,1405, 1406, 1407, 1408, (e) a splice variant of the nucleotide sequences of (a) or (b): 1409, 1410, 1415, 1416, 1417, 1418, 1419, 1420, 1421, 1422, (f) an orthologous sequence of the nucleotide sequences of 1423, 1425, 1426, 1427, 1428, 1429, 1430, 1431, 1432,1433, (a) or (b): 1434, 1438, 1439, 1440, 1441, 1442, 1443, 1444, 1445, 1446, 50 (g) a paralogous sequence of the nucleotide sequences of 1447, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1461, 1462, (a) or (b): 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1473, (h) a nucleotide sequence encoding a polypeptide compris 1474, 1475, 1476, 1477, 1478, 1479, 1480, 1481, 1482, 1485, ing a conserved domain that exhibits at least 70% sequence 1486, 1487, 1488, 1489, 1490, 1491, 1492, 1493, 1494, 1495, homology with the polypeptide of (a), and the polypeptide 1496, 1497, 1498, 1499, 1501, 1502,1503, 1504,1505, 1506, 55 comprises a conserved domain that initiates transcription; or 1507, 1509, 1512, 1513, 1514, 1515, 1516, 1517, 1518, 1519, (i) a nucleotide sequence that hybridizes under stringent 1521, 1522, 1523, 1524, 1525, 1526, 1527, 1528, 1529, 1530, conditions to a nucleotide sequence of one or more poly 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1544, 1545, 1546, nucleotides of (a) or (b), and the nucleotide sequence encodes 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1554, 1555, 1556, a polypeptide that initiates transcription. 1557, 1558, 1559, 1560, 1561, 1562, 1570, 1571, 1572, 1573, 60 A transcription factor sequence variant is one having at 1574, 1575, 1576, 1577, 1578, 1579, 1580, 1581, 1584, 1585, least 26% amino acid sequence similarity, at least 40% amino 1586, 1587, 1588, 1589, 1590, 1591, 1592, 1593, 1595, 1596, acid sequence identity, a preferred transcription factor 1597, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1606, sequence variant is one having at least 50% amino acid 1607, 1608, 1609, 1610, 1613, 1614, 1615, 1616, 1617, 1621, sequence identity and a more preferred transcription factor 1622, 1623, 1624, 1625, 1628, 1629, 1630, 1631, 1632, 1633, 65 sequence variant is one having at least 65% amino acid 1634, 1637, 1638, 1639, 1642, 1643, 1644, 1645, 1646, 1647, sequence identity to the transcription factor amino acid 1648, 1649, 1650, 1651, 1652, 1653, 1654, 1659, 1660, 1661, sequences SEQID NO: 2N, wherein N=1-480, SEQID NO: US 8,809,630 B2 15 16 2N-1, where N=857-970, or SEQ ID NO: 989, 990, 991, Sequences having lesser degrees of identity but comparable 1001, 1002, 1012, 1018, 1021, 1022, 1025, 1027, 1028, 1029, biological activity are considered to be equivalents. 1034, 1050, 1051, 1072, 1073, 1074, 1075, 1076, 1091, 1092, In a further aspect, the invention provides a progeny plant 1093, 1094, 1095, 1109, 1110, 1111, 1112, 1150, 1165, 1166, derived from a parental plant wherein said progeny plant 1167, 1168, 1169, 1189, 1190, 1191, 1197, 1198, 1199, 1213, 1214, 1215, 1216, 1226, 1227, 1233,1239, 1246, 1247, 1258, exhibits, with respect to a specific gene, at least three fold 1259, 1269, 1307, 1308, 1309, 1310, 1323, 1329, 1330, 1331, greater messenger RNA (mRNA) levels than said parental 1332, 1338, 1339, 1340, 1361, 1362, 1373, 1374, 1375, 1384, plant, wherein the mRNA encodes a DNA-binding protein 1389, 1390, 1391, 1396, 1411, 1412, 1413, 1414, 1424, 1435, that is capable of binding to a DNA regulatory sequence and 1436, 1437, 1448, 1456, 1457, 1458, 1459, 1460, 1472, 1483, inducing expression of a plant trait gene, wherein the progeny 1484, 1500, 1508,1510, 1511, 1520, 1538, 1539, 1540, 1541, 10 plant is characterized by a change in the plant trait compared 1542, 1543, 1563, 1564, 1565, 1566, 1567, 1568, 1569, 1582, to said parental plant. In yet a further aspect, the progeny plant 1583, 1594, 1611, 1612, 1618, 1619, 1620, 1626, 1627, 1635, exhibits at least ten fold greater mRNA levels compared to 1636, 1640, 1641, 1655, 1656, 1657, 1658, 1672, 1673, 1680, said parental plant. In yet a further aspect, the progeny plant 1682, 1686, 1687, 1688, 1689, 1696, 1702, 1703, 1945, 1947, exhibits at least fifty fold greater mRNA levels compared to 1949, 1951, 1953, 1955, 1957, 1959, 1961, 1963, 1965, 1967, 15 said parental plant. 1969, 1971, and 1973, and which contains at least one func tional or structural characteristic of the transcription factor Various types of plants may be used to generate the trans amino acid sequences. Sequences having lesser degrees of genic plants, including soybean, wheat, corn, potato, cotton, identity but comparable biological activity are considered to rice, oilseed rape, Sunflower, alfalfa, clover, Sugarcane, turf. be equivalents. banana, blackberry, blueberry, strawberry, raspberry, canta The transcription factor polypeptides of the present inven loupe, carrot, cauliflower, coffee, cucumber, eggplant, tion include at least one conserved domain, and the portions grapes, honeydew, lettuce, mango, melon, onion, papaya, of the nucleotide sequences encoding the conserved domain peas, peppers, pineapple, pumpkin, spinach, Squash, Sweet exhibit at least 70% sequence identity with the aforemen corn, tobacco, tomato, watermelon, mint and other labiates, tioned preferred nucleotide sequences. In the case of Zinc 25 rosaceous fruits, and vegetable brassicas. finger transcription factors, the percent identity across the The transgenic plant may be monocotyledonous, plant, and conserved domain may be as low as 50%. the polynucleotide sequences used to transform the trans The present invention also encompasses MAF transcrip genic plant may be derived from either a monocot or a dicot tion factor sequence variants. A MAF transcription factor plant. Alternatively, the transgenic plant may be dicotyledon sequence variant is one having at least 26% amino acid 30 sequence similarity, at least 40% amino acid sequence iden ous, plant, and the polynucleotide sequences used to trans tity, a preferred MAF transcription factor sequence variant is form the transgenic plant may be derived from either a mono one having at least 50% amino acid sequence identity and a cot or a dicot plant. more preferred MAF transcription factor sequence variant is These transgenic plants will generally possess traits that one having at least 65% amino acid sequence identity to the 35 are altered as compared to a control plant, such as a wild-type MAF transcription factor sequences SEQID NO: 568, SEQ or non-transformed plant (i.e., the non-transformed plant IDNO:944, SEQIDNO:946, SEQID NO:948, SEQID NO: does not comprise the recombinant polynucleotide), thus pro 1735, SEQ ID NO: 1875, SEQ ID NO: 1971, SEQID NO: ducing an phenotype that is altered when compared to the 1973, SEQ ID NO: 1945, SEQ ID NO: 1947, SEQID NO: control, wild-type or non-transformed plant. These trans 1949, SEQ ID NO: 1951, SEQ ID NO: 1953, SEQID NO: 40 genic plants may also express an altered level of one or more 1955, SEQ ID NO: 1957, SEQ ID NO: 1959, SEQID NO: genes associated with a plant trait as compared to the non 1961, SEQ ID NO: 1963, SEQ ID NO: 1965, SEQID NO: transformed plant. The encoded polypeptides in these trans 1967, SEQID NO: 1969, SEQID NO: 2010, or SEQID NO: genic plants will generally be expressed and regulate tran 2011, and which contains at least one functional or structural Scription of at least one gene; this gene will generally confer characteristic of the MAF transcription factor amino acid 45 at least one altered trait, phenotype or expression level. sequences. In a further embodiment, the invention is a poly The polynucleotide sequences (those listed in the nucleotide encoding a polypeptide having at least 36% amino Sequence Listing, their complements, of functional variants) acid residue identity to a MAF transcription factor selected used to transform the transgenic plants of the present inven from the group consisting of SEQID NOs: 568, SEQID NO: tion may further comprise regulatory elements, for example, 944, SEQID NO:946, SEQID NO:948, SEQID NO: 1735, 50 a constitutive, inducible, or tissue-specific promoter operably SEQ ID NO: 1875, SEQID NO: 1971, SEQ ID NO: 1973, linked to the polynucleotide sequence. SEQ ID NO: 1945, SEQID NO: 1947, SEQ ID NO: 1949, Transformation of plants with presently disclosed tran SEQ ID NO: 1951, SEQID NO: 1953, SEQ ID NO: 1955, Scription factor sequences will produce a variety of improved SEQ ID NO: 1957, SEQID NO: 1959, SEQ ID NO: 1961, traits. For example, the altered trait may be an enhanced SEQ ID NO: 1963, SEQID NO: 1965, SEQ ID NO: 1967, 55 SEQID NO: 1969, SEQID NO: 2010, or SEQID NO: 2011, tolerance to abiotic stress, such as Salt tolerance. Salt toler and having MAF transcription factor activity. In a yet further ance, a form of osmotic stress, may be mediated by increased embodiment the invention is a polynucleotide encoding a root growth or increased root hairs relative to a non-trans polypeptide having a conserved domain of a MAF transcrip formed, control or wild-type plant. Tolerance to abiotic tion factor wherein the conserved domain has at least 54% 60 stresses Such as Salt tolerance may confer a number of Sur identity to the conserved domain of SEQID NO: 568 com vival, quality and yield improvements, including improved prising amino acid residues 2-74 of SEQID NO:568. In a still seed germination and improved seedling vigor, as well as further embodiment the invention is a polynucleotide encod improved yield, quality, and range. ing a polypeptide having a conserved domain of a MAF Another example of an altered trait that may be conferred transcription factor wherein the conserved domain has at least 65 by transforming plants with the presently disclosed transcrip 64% identity to the conserved domain of SEQ ID NO: 568 tion factor sequences includes altered Sugar sensing. Altered comprising amino acid residues 2-57 of SEQ ID NO: 568. Sugar sensing may also be used to confer improved seed US 8,809,630 B2 17 18 germination and improved seedling vigor, as well as altered In another aspect the invention is a method of Screening a flowering, senescence, Sugar metabolism and photosynthesis plurality of plants to identify at least one plant that comprises characteristics. a polynucleotide encoding a MAF transcription factor protein The invention also pertains to method to produce these wherein the expression of the polynucleotide alters at least transgenic plants. one of the plants traits. The method comprises (a) selecting a The present invention also relates to a method of using first polynucleotide from the group consisting of a combina transgenic plants transformed with the presently disclosed tion of plant polynucleotide sequences of SEQID NO: 567, transcription factor sequences, their complements or their SEQID NO: 943, SEQID NO: 945, SEQID NO:947, SEQ variants to grow a progeny plant by crossing the transgenic ID NO: 1734, SEQID NO: 1874, SEQID NO: 1014, SEQID plant with either itself or another plant, selecting seed that 10 develops as a result of the crossing; and then growing the NO: 1970, SEQ ID NO: 1972, SEQ ID NO: 1944, SEQ ID progeny plant from the seed. The progeny plant will generally NO: 1946, SEQ ID NO: 1948, SEQ ID NO: 1950, SEQ ID express mRNA that encodes a transcription factor: that is, a NO: 1952, SEQ ID NO: 1954, SEQ ID NO: 1956, SEQ ID DNA-binding protein that binds to a DNA regulatory NO: 1958, SEQ ID NO: 1960, SEQ ID NO: 1962, SEQ ID sequence and induces expression, such as that of a plant trait 15 NO: 1964, SEQ ID NO: 1966, or SEQ ID NO: 1968; (b) gene. The mRNA will generally be expressed at a level comparing the first polynucleotide sequence with a second greater than a non-transformed plant; and the progeny plantis polynucleotide sequence wherein the second polynucleotide characterized by a change in a plant trait compared to the sequence is isolated from a second plant; (c) selecting the non-transformed plant. second polynucleotide sequence, wherein the second poly The present invention also pertains to an expression cas nucleotide sequence encodes a polypeptide sequence that has sette. The expression cassette comprises at least two ele at least 60% identity with a polypeptide sequence encoded by ments, including: the sequence of the first polynucleotide; (d) comparing the (1) a constitutive, inducible, or tissue-specific promoter; second polynucleotide sequence with a third polynucleotide and sequence wherein the third polynucleotide sequence is iso (2) a recombinant polynucleotide having a polynucleotide 25 lated from a third plant, wherein the third plant is selected sequence, or a complementary polynucleotide sequence from a plurality of plants; (e) selecting the third polynucle thereof, selected from the group consisting of a nucleotide otide sequence, wherein the third polynucleotide sequence sequence encoding a polypeptide sequence selected from the encodes a polypeptide sequence that has at least one amino transcription factor sequences in the sequence listing, for acid Substitution compared with a polypeptide sequence example, polypeptide sequence G682, SEQ ID NO: 468; a 30 encoded by the sequence of the second polynucleotide; (f) nucleotide sequence selected from the transcription factor polynucleotides of the Sequence Listing, for example, poly identifying the third plant from which the third polynucle nucleotide sequence G682, SEQ ID NO: 467, or sequence otide came; (g) measuring the expression level of the endog variants such as allelic or splice variants of the nucleotide enous third polynucleotide sequence in another third plant; sequences of (a) or (b), where the sequence variant encodes a 35 (h) identifying which other third plant expresses the third polypeptide that initiates transcription. The nucleotide polynucleotide sequence; and (i) identifying a trait in the sequence may also comprise an orthologous or paralogous other third plant of step (h) which is changed when compared sequence of the nucleotide sequences of (a) or (b), and these with the same trait in the second plant, wherein the trait is sequences encodes a polypeptide that initiates transcription, a selected from the group consisting of at least one trait listed nucleotide sequence that encodes a polypeptide having a 40 below. conserved domain that exhibits 72% or greater sequence In another embodiment, the method further comprises the homology with the polypeptide of (a), where the polypeptide third polynucleotide sequence that has at last one nucleotide comprising the conserved domain initiates transcription, or a base Substitution compared with the polynucleotide sequence nucleotide sequence that hybridizes under stringent condi of the second polynucleotide. tions to a nucleotide sequence of one or more polynucleotides 45 of (a) or (b), where the latter nucleotide sequence initiates transcription. In all of these cases, the recombinant poly BRIEF DESCRIPTION OF THE SEQUENCE nucleotide is operably linked to the promoter of the expres LISTING, TABLES, AND DRAWINGS sion cassette. The invention includes a host cell that comprises the 50 The Sequence Listing provides exemplary polynucleotide expression cassette. The host cell may be a plant cell. Such as and polypeptide sequences of the invention. The traits asso a cell of a crop plant. ciated with the use of the sequences are included in the The invention also concerns a method for identifying at Examples. least one downstream polynucleotide sequence that is subject CD-ROM 1 (Copy 1) and CD-ROM2 (Copy 2) are read to a regulatory effect of any of the polypeptide transcription 55 only memory computer-readable compact discs and each factors of the present invention, or a sequence variant, contains a copy of the Sequence Listing in ASCII text format ortholog or paralog of any of these sequences. This method is and a copy of Table 8 in ASCII text format. The Sequence conducted by expressing a polypeptide transcription factor, Listing is named "MBIOO48.5T25.txt, was created on Apr. variant, ortholog or paralog in a plant cell, and then identify 10, 2003, and is 4,267 kilobytes in size. Table 8 is named ing an expression product, Such as RNA or protein, produced 60 “MBI-0048CIPTable 8 rev.txt” and is 822 kilobytes in size. as a result. The identification method used can be any method that identifies RNA or protein products of expression, such as, Table 8 as filed was created on May 15, 2006. The copies of for example, Northern analysis, RT-PCR, microarray gene the Sequence Listing and Table 8 on the CD-ROM discs are expression assays, reporter gene expression systems subtrac hereby incorporated by reference in their entirety. tive hybridization, differential display, representational dif 65 CD-ROM2 (Copy 2) is an exact copy of CD-R1 (Copy 1). ferential analysis, or by two-dimensional gel electrophoresis CD-ROM3 contains a computer-readable format (CRF) of one or more protein products. copy of the Sequence Listing as a text (..txt) file. US 8,809,630 B2 19 20

LENGTHY TABLES The patent contains a lengthy table section. A copy of the table is available in electronic form from the USPTO Web site (http://seqdata.uspto.gov/?pageRequest=docIDetail&DocID=US08809630B2). An electronic copy of the table will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).

Table 8 lists a Summary of homologous sequences identi phenotype). The fifth column shows the observed phenotype fied using BLAST (tblastX program). The first column shows in the T2 transgenic plants (T2 phenotype). The sixth column the polynucleotide sequence identifier (SEQ ID NO:), the shows the number of days of cold treatment for the T2 lines second column shows the corresponding cDNA identifier 15 (Days of cold treatment). The seventh column show the num (Gene ID or GID), the third column shows the orthologous or ber of hours the transgenic plants were exposed to light per homologous polynucleotide GenBank Accession Number day (Photoperiod (hours)). The eighth column shows the (Test Sequence ID), the fourth column shows the calculated number of days to a visible flower bud in all transgenic plants probability value that the sequence identity is due to chance (Days to visible flower bud; range and (mean+/-S.E.M.)). (Smallest Sum Probability), the fifth column shows the plant The ninth column shows the total leaf number of all trans species from which the test sequence was isolated (Test genic plants (Total leaf number, range and (mean+/- Sequence Species), and the sixth column shows the ortholo S.E.M.)). NA: not applicable; ND: not determined. gous or homologous test sequence GenBank annotation (Test FIG. 1 shows a conservative estimate of phylogenetic rela Sequence GenBank Annotation). tionships among the orders of flowering plants (modified Table 14 shows the polypeptide sequence identities and 25 from Angiosperm Phylogeny Group (1998) Ann. Missouri similarities between exemplary polypeptides of the invention. Bot. Gard. 84: 1-49). Those plants with a single cotyledon The first column and first row shows the polypeptide SEQID (monocots) are a monophyletic clade nested within at least NO (polypeptide SEQID NO); the second column and sec two major lineages of dicots; the eudicots are further divided ond row shows the Mendel Name (Name); the third column into and asterids. Arabidopsis is a rosid eudicot clas and third row shows the Mendel Gene identifier number 30 sified within the order Brassicales; rice is a member of the (Gene ID). The percentage identity, and percentage similarity monocot order Poales. FIG. 1 was adapted from Daly et al. in parentheses, between the two polypeptide sequences are (2001) Plant Physiol. 127: 1328-1333. indicated at the intersection of each column and row. FIG. 2 shows a phylogenic dendogram depicting phyloge Table 15 shows flowering times for the maf2 mutant. The netic relationships of higher plant taxa, including clades con first column shows the genotype of the Arabidopsis plant 35 taining tomato and Arabidopsis; adapted from Ku et al. tested. The second column shows the number of plants tested; (2000) Proc. Natl. Acad. Sci. 97: 9121-9126; and Chase et al. N=number. The third column shows the number of days the (1993) Ann. Missouri Bot. Gard. 80: 528-580. plant was cold-treated. The fourth column shows the photo FIGS. 3A, and 3B show an alignment of G682 (SEQ ID period in hours. The fifth column shows the number of days NO: 468) and polypeptide sequences that are paralogous and elapsed from beginning of cold treatment at which time 40 orthologous to G682. flower buds were visible. The sixth column shows the total FIGS. 4A, 4B, 4C and 4D show an alignment of G867 leaf primordia produced by primary shoot meristem before (SEQID NO: 580) and polypeptide sequences that are paralo first flower. Four separate experiments were done (Experi gous and orthologous to G867. ments 1 through 4). FIGS.5A, 5B, 5C, 5D, 5E and 5F show an alignment of Table 16 shows flowering times of transgenic plants over 45 G912 (SEQID NO: 616) and polypeptide sequences that are expressing MAF2 (SEQID NO: 567), MAF3 (SEQ ID NO: paralogous and orthologous to G912. 943), MAF4 (SEQID NO: 945), and MAF5 (SEQ ID NO: FIG. 6 shows an alignment of the polypeptide sequences of 947) in Arabidopsis Stockholm accession and Columbia FLC (SEQ ID NO: 1875) and MAF1-5 (SEQID NOs: 1735, accession transgenic T1 lines. The first column lists the geno 568,944, 946, and 948) full-length polypeptides encoded by type of the transgenic plants (Genotype). The second column 50 SEQID NOs: 1874, 1734, 567,943, 945, and 947, respec shows the observed phenotype in the transgenic plants (Phe tively. The alignment was produced by manual alignment of notype). The third column shows the penetrance of the the polypeptide sequences. observed altered phenotype as a fraction of total transgenic FIGS. 7A and 7B show an alignment of the polypeptide plants (Penetrance). The fourth column shows the number of sequences of SEQID NO:948 (G1844-pep; MAF5), SEQID days to a visible flower bud in all transgenic plants (Days to 55 NO: 946 (G1843 pep; MAF4), SEQIDNO: 944 (G1842.pep: visible flower bud; range and (mean+/-S.E.M.)). The fifth MAF3), SEQID NO: 568 (G859.pep; MAF2), SEQID NO: column shows the total leaf number of all transgenic plants 1735 (G157 pep; MAF1), SEQ ID NO: 1875 (G1759.pep; (Total leaf number; range and (mean+/-S.E.M.)). FLC), SEQ ID NO: 1971 (Soy1 pep; SOY MADS1), and Table 17 shows flowering times of transgenic plants over SEQ ID NO: 1973 (Soy3 pep; SOY MADS3) full-length expressing MAF2 (SEQID NO: 567), MAF3 (SEQ ID NO: 60 polypeptides encoded by SEQ ID NOs: 947, 945,943, 567, 943), MAF4 (SEQID NO: 945), and MAF5 (SEQ ID NO: 1734, 1874, 1970, and 1972, respectively. Regions of identity 947) in Arabidopsis T2 Columbia accession and Landsberg between the polypeptide sequences are boxed. The calculated accession transgenic T1 lines. The first column lists the geno consensus sequence is shown beneath the alignments. The type of the transgenic plants (Genotype). The second column alignment was made using the CLUSTALW alignment pro shows the T2 line analyzed (Line). The third column shows 65 gram in the MACVECTOR sequence data package the total number of plants analyzed (N). The fourth column (MACVECTOR 6.0 or MACVECTOR 6.5 applications, shows the observed phenotype in the T1 transgenic plants (T1 Accelrys, San Diego Calif.). US 8,809,630 B2 21 22 FIG. 8 shows the effects of vernalization on endogenous also be used to transform a plant and introduce desirable traits expression of MAF2-5 (SEQ ID NOs: 567, 943, 945, and not found in the wild-type cultivar or strain. Plants may then 947) in different genetic backgrounds (accessions). Expres be selected for those that produce the most desirable degree of sion was monitored by RT-PCR (MAF1, 2, 3, 4, and 5, and over- or under-expression of target genes of interest and coin FLC transcripts). Vernalized (+) samples were cold-treated 5 cident trait improvement. for 6 weeks at 4° C., whereas non-vernalized (-) samples The sequences of the present invention may be from any were stratified for only 3 days at 4° C. as imbibed seeds. species, particularly plant species, in a naturally occurring Col=Columbia, Pi-0=Pitztal, St-0=Stockholm, fca=fca-9 form or from any source whether natural, synthetic, semi mutant. synthetic or recombinant. The sequences of the invention may FIG. 9 is a schematic diagram Summarizing the responses 10 also include fragments of the present amino acid sequences. of FLC and MAF1-5 (SEQID NOs: 567,943, 945, and 947) In this context, a "fragment refers to afragment of a polypep to vernalization, and their potential effects on the floral tran tide sequence which is at least 5 to about 15 amino acids in sition. Arrows indicate positive interactions, blunt-ended length, most preferably at least 14 amino acids, and which lines denote inhibition. retain some biological activity of a transcription factor. FIG. 10 shows the effect of Vernalization on the maf2 15 Where “amino acid sequence” is recited to refer to an amino mutant: (A) in the absence of vernalization; (B) following a acid sequence of a naturally occurring protein molecule, vernalization treatment; (C) days to visible bud of ma) “amino acid sequence' and like terms are not meant to limit mutant compared with wild type; and (D) RT-PCR transcript the amino acid sequence to the complete native amino acid analysis of endogenous genes of maf2 mutant compared with sequence associated with the recited protein molecule. those in wild type. As one of ordinary skill in the art recognizes, transcription FIG. 11 shows the effects of MAF2 overexpression in the factors can be identified by the presence of a region or domain Columbia ecotype: (A) shows vernalized or non-vernalized of structural similarity or identity to a specific consensus transgenic 35S:MAF2 plants and wild type plants; and (B) sequence or the presence of a specific consensus DNA-bind shows RT-PCR transcript analysis of endogenous FLC, ing site or DNA-binding site motif (see, for example, Riech MAF2, and SOC1 transcripts in wild type (Columbia acces 25 mann et al. (2000) Science 290: 2105-2110). The plant tran sion), transgenic 35S:MAF2, and transgenic 35S:FLC seed Scription factors may belong to one of the following lings. transcription factor families: the AP2 (APETALA2) domain transcription factor family (Riechmann and Meyerowitz DETAILED DESCRIPTION OF EXEMPLARY (1998) Biol. Chem. 379: 633-646); the MYB transcription EMBODIMENTS 30 factor family (ENBib; Martin and Paz-Ares (1997) Trends Genet. 13: 67-73); the MADS domain transcription factor In an important aspect, the present invention relates to family (Riechmann and Meyerowitz (1997) Biol. Chem. 378: polynucleotides and polypeptides, for example, for modify 1079-1101: Immink et al. (2003) Mol. Gen. Genomics 268: ing phenotypes of plants. Throughout this disclosure, various 598-606); the WRKY protein family (Ishiguro and Nakamura information Sources are referred to and/or are specifically 35 (1994) Mol. Gen. Genet. 244: 563-571); the ankyrin-repeat incorporated. The information Sources include Scientific jour protein family (Zhanget al. (1992) Plant Cell 4: 1575-1588): nal articles, patent documents, textbooks, and World Wide the Zinc finger protein (Z) family (Klug and Schwabe (1995) Web browser-inactive page addresses, for example. While the FASEBJ 9: 597-604): Takatsuji (1998) Cell. Mol. Life. Sci. reference to these information sources clearly indicates that 54:582-596); the homeobox (HB) protein family (Buerglin they can be used by one of skill in the art, each and every one 40 (1994) in Guidebook to the Homeobox Genes. Duboule (ed.) of the information sources cited herein are specifically incor Oxford University Press); the CAAT-element binding pro porated in their entirety, whether or not a specific mention of teins (Forsburg and Guarente (1989) Genes Dev. 3: 1166 “incorporation by reference' is noted. The contents and 1178); the squamosa promoter binding proteins (SPB) (Klein teachings of each and every one of the information sources et al. (1996) Mol. Gen. Genet. 1996 250: 7-16); the NAM can be relied on and used to make and use embodiments of the 45 protein family (Souer et al. (1996) Cell 85: 159-170); the invention. IAA/AUX proteins (Abel et al. (1995) J. Mol. Biol. 251: It must be noted that as used herein and in the appended 533-549): the HLH/MYC protein family (Littlewood et al. claims, the singular forms “a,” “an,” and “the include plural (1994) Prot. Profile 1: 639-709); the DNA-binding protein reference unless the context clearly dictates otherwise. Thus, (DBP) family (Tuckeret al. (1994) EMBO.J. 13:2994-3002); for example, a reference to “a plant includes a plurality of 50 the bZIP family of transcription factors (Foster et al. (1994) Such plants, and a reference to “a stress' is a reference to one FASEBJ 8: 192-200); the Box P-binding protein (the BPF-1) or more stresses and equivalents thereof known to those family (da Costa e Silva et al. (1993) Plant J4: 125-135); the skilled in the art, and so forth. high mobility group (HMG) family (Bustin and Reeves The polynucleotide sequences of the invention encode (1996) Prog Nucl. Acids Res. Mol. Biol. 54: 35-100); the polypeptides that are members of well-known transcription 55 scarecrow (SCR) family (Di Laurenzio et al. (1996) Cell 86: factor families, including plant transcription factor families, 423-433); the GF14 family (Wu et al. (1997) Plant Physiol. as disclosed in Tables 4-5. Generally, the transcription factors 114: 1421-1431); the polycomb (PCOMB) family (Goodrich encoded by the present sequences are involved in cell differ et al. (1997) Nature 386:44-51); the teosintebranched (TEO) entiation and proliferation and the regulation of growth. family (Luo et al. (1996) Nature 383: 794-799); the AB13 Accordingly, one skilled in the art would recognize that by 60 family (Giraudat et al. (1992) Plant Cell 4: 1251-1261); the expressing the present sequences in a plant, one may change triple helix (TH) family (Dehesh et al. (1990) Science 250: the expression of autologous genes or induce the expression 1397-1399); the EIL family (Chao et al. (1997) Cell 89: of introduced genes. By affecting the expression of similar 1133-44); the AT-HOOKfamily (Reeves and Nissen (1990).J. autologous sequences in a plant that have the biological activ Biol. Chem. 265: 8573-8582); the S1FA family (Zhou et al. ity of the present sequences, or by introducing the present 65 (1995) Nucleic Acids Res. 23: 1165-1169); the bZIPT2 family sequences into a plant, one may alter a plants phenotype to (Lu and Ferl (1995) Plant Physiol. 109: 723); the YABBY one with improved traits. The sequences of the invention may family (Bowman et al. (1999) Development 126: 2387-96): US 8,809,630 B2 23 24 the PAZ family (Bohmert et al. (1998) EMBO.J. 17: 170-80); nucleotide sequence encoding a polypeptide (or protein) or a a family of miscellaneous (MISC) transcription factors domain or fragment thereof. Additionally, the polynucleotide including the DPBF family (Kim et al. (1997) Plant J 11: may comprise a promoter, an intron, an enhancer region, a 1237-1251) and the SPF1 family (Ishiguro and Nakamura polyadenylation site, a translation initiation site, 5' or 3 (1994) Mol. Gen. Genet. 244: 563-571): the GARP family untranslated regions, a reporter gene, a selectable marker, or (Hall et al. (1998) Plant Cell 10: 925-936), the TUBBY the like. The polynucleotide can be single stranded or double family (Boggin etal (1999) Science 286: 21 19-2125), the heat stranded DNA or RNA. The polynucleotide optionally com shock family (Wu (1995) Annu. Rev. Cell Dev. Biol. 11: prises modified bases or a modified backbone. The polynucle 441-469), the ENBP family (Christiansen et al. (1996) Plant otide can be, e.g., genomic DNA or RNA, a transcript (such as Mol. Biol. 32:809-821), the RING-zinc family (Jensen et al. 10 an mRNA), a cDNA, a PCR product, a cloned DNA, a syn (1998) FEBS Letters 436: 283-287), the PDBP family (Janik thetic DNA or RNA, or the like. The polynucleotide can et al. (1989) Virology 168: 320-329), the PCF family (Cubas comprise a sequence in either sense or antisense orientations. et al. Plant J. (1999) 18:215-22), the SRS(SHI-related) fam Definitions ily (Fridborget al. (1999) Plant Cell 11: 1019-1032), the CPP A “recombinant polynucleotide' is a polynucleotide that is (cysteine-rich polycomb-like) family (Cvitanich et al. (2000) 15 not in its native state, e.g., the polynucleotide comprises a Proc. Natl. Acad. Sci. 97: 8.163-8168), the ARF (auxin nucleotide sequence not found in nature, or the polynucle response factor) family (Ulmasov et al. (1999) Proc. Natl. otide is in a context other than that in which it is naturally Acad. Sci. 96: 5844-5849), the SWI/SNF family (Colling found, e.g., separated from nucleotide sequences with which wood et al. (1999) J. Mol. Endocrinol. 23:255-275), the it typically is in proximity in nature, or adjacent (or contigu ACBF family (Seguin et al. (1997) Plant Mol. Biol. 35: 281 ous with) nucleotide sequences with which it typically is not 291), PCGL (CG-1 like) family (da Costa e Silva et al. (1994) in proximity. For example, the sequence at issue can be Plant Mol. Biol. 25: 921-924) the ARID family (Vazquez et cloned into a vector, or otherwise recombined with one or al. (1999) Development 126: 733-742), the Jumonji family more additional nucleic acid. (Balciunas et al. (2000), Trends Biochem. Sci. 25: 274-276), An "isolated polynucleotide' is a polynucleotide whether the bZIP-NIN family (Schauser et al. (1999) Nature 402: 25 naturally occurring or recombinant, that is present outside the 191-195), the E2F family (Kaelin et al. (1992) Cell 70:351 cell in which it is typically found in nature, whether purified 364) and the GRF-like family (Knaap et al. (2000) Plant or not. Optionally, an isolated polynucleotide is Subject to one Physiol. 122: 695-704). As indicated by any part of the list or more enrichment or purification procedures, e.g., cell lysis, above and as known in the art, transcription factors have been extraction, centrifugation, precipitation, or the like. Sometimes categorized by class, family, and Sub-family 30 A "polypeptide' is an amino acid sequence comprising a according to their structural content and consensus DNA plurality of consecutive polymerized amino acid residues binding site motif, for example. Many of the classes and many e.g., at least about 15 consecutive polymerized amino acid of the families and sub-families are listed here. However, the residues, optionally at least about 30 consecutive polymer inclusion of one Sub-family and not another, or the inclusion ized amino acid residues, at least about 50 consecutive poly of one family and not another, does not mean that the inven 35 merized amino acid residues. In many instances, a polypep tion does not encompass polynucleotides or polypeptides of a tide comprises a polymerized amino acid residue sequence certain family or sub-family. The list provided here is merely that is a transcription factor or a domain or portion or frag an example of the types of transcription factors and the ment thereof. Additionally, the polypeptide may comprise 1) knowledge available concerning the consensus sequences a localization domain, 2) an activation domain, 3) a repres and consensus DNA-binding site motifs that help define them 40 sion domain, 4) an oligomerization domain, or 5) a DNA as known to those of skill in the art (each of the references binding domain, or the like. The polypeptide optionally com noted above are specifically incorporated herein by refer prises modified amino acid residues, naturally occurring ence). A transcription factor may include, but is not limited to, amino acid residues not encoded by a codon, non-naturally any polypeptide that can activate or repress transcription of a occurring amino acid residues. single gene or a number of genes. This polypeptide group 45 A “recombinant polypeptide' is a polypeptide produced by includes, but is not limited to, DNA-binding proteins, DNA translation of a recombinant polynucleotide. A “synthetic binding protein binding proteins, protein kinases, protein polypeptide' is a polypeptide created by consecutive poly phosphatases, protein methyltransferases, GTP-binding pro merization of isolated amino acid residues using methods teins, and receptors, and the like. well known in the art. An "isolated polypeptide, whether a In addition to methods for modifying a plant phenotype by 50 naturally occurring or a recombinant polypeptide, is more employing one or more polynucleotides and polypeptides of enriched in (or out of) a cell than the polypeptide in its natural the invention described herein, the polynucleotides and state in a wild-type cell, e.g., more than about 5% enriched, polypeptides of the invention have a variety of additional more than about 10% enriched, or more than about 20%, or uses. These uses include their use in the recombinant produc more than about 50%, or more, enriched, i.e., alternatively tion (i.e., expression) of proteins; as regulators of plant gene 55 denoted: 105%, 110%, 120%, 150% or more, enriched rela expression, as diagnostic probes for the presence of comple tive to wildtype standardized at 100%. Such an enrichment is mentary or partially complementary nucleic acids (including not the result of a natural response of a wild-type plant. for detection of natural coding nucleic acids); as Substrates Alternatively, or additionally, the isolated polypeptide is for further reactions, e.g., mutation reactions, PCR reactions, separated from other cellular components with which it is or the like; as Substrates for cloning e.g., including digestion 60 typically associated, e.g., by any of the various protein puri or ligation reactions; and for identifying exogenous or endog fication methods herein. enous modulators of the transcription factors. A "polynucle “Identity” or “similarity” refers to sequence similarity otide' is a nucleic acid sequence comprising a plurality of between two polynucleotide sequences or between two polymerized nucleotides, e.g., at least about 15 consecutive polypeptide sequences, with identity being a more strict com polymerized nucleotides, optionally at least about 30 con 65 parison. The phrases “percent identity” and “96 identity” refer secutive nucleotides, at least about 50 consecutive nucle to the percentage of sequence similarity found in a compari otides. In many instances, a polynucleotide comprises a Son of two or more polynucleotide sequences or two or more US 8,809,630 B2 25 26 polypeptide sequences. “Sequence similarity refers to the 60% identity, or more preferably greater than about 70% percent similarity in base pair sequence (as determined by any identity, most preferably 72% or greater identity with dis suitable method) between two or more polynucleotide closed transcription factors. sequences. Two or more sequences can be anywhere from The term “equivalog describes members of a set of 0-100% similar, or any integer value therebetween. Identity homologous proteins that are conserved with respect to func or similarity can be determined by comparing a position in tion since their last common ancestor. Related proteins are each sequence that may be aligned for purposes of compari grouped into equivalog families, and otherwise into protein son. When a position in the compared sequence is occupied families with other hierarchically defined homology types by the same nucleotide base oramino acid, then the molecules (definition provided the Institute for Genomic Research 10 (TIGR) website). are identical at that position. A degree of similarity or identity The term “variant, as used herein, may refer to polynucle between polynucleotide sequences is a function of the num otides or polypeptides, that differ from the presently dis ber of identical or matching nucleotides at positions shared by closed polynucleotides or polypeptides, respectively, in the polynucleotide sequences. A degree of identity of sequence from each other, and as set forth below. polypeptide sequences is a function of the number of identical 15 With regard to polynucleotide variants, differences amino acids at positions shared by the polypeptide sequences. between presently disclosed polynucleotides and polynucle A degree of homology or similarity of polypeptide sequences otide variants are limited so that the nucleotide sequences of is a function of the number of amino acids at positions shared the former and the latter are closely similar overall and, in by the polypeptide sequences. many regions, identical. Due to the degeneracy of the genetic Alignment” refers to a number of DNA or amino acid code, differences between the former and latter nucleotide sequences aligned by lengthwise comparison so that compo sequences may be silent (i.e., the amino acids encoded by the nents in common (i.e., nucleotide bases or amino acid resi polynucleotide are the same, and the variant polynucleotide dues) may be readily and graphically identified. The number sequence encodes the same amino acid sequence as the pres of components in common is related to the homology or ently disclosed polynucleotide. Variant nucleotide sequences identity between the sequences. Alignments such as those of 25 may encode different amino acid sequences, in which case FIG. 3, 4 or 5 may be used to identify “conserved domains' such nucleotide differences will result in amino acid substi and relatedness within these domains. An alignment may tutions, additions, deletions, insertions, truncations or fusions Suitably be determined by means of computer programs with respect to the similar disclosed polynucleotide known in the art, such as MacVector (1999) (Accelrys, Inc., sequences. These variations result in polynucleotide variants San Diego, Calif.). 30 encoding polypeptides that share at least one functional char The terms “highly stringent” or “highly stringent condi acteristic. The degeneracy of the genetic code also dictates tion” refer to conditions that permit hybridization of DNA that many different variant polynucleotides can encode iden Strands whose sequences are highly complementary, wherein tical and/or Substantially similar polypeptides in addition to these same conditions exclude hybridization of significantly those sequences illustrated in the Sequence Listing. mismatched DNAs. Polynucleotide sequences capable of 35 Also within the scope of the invention is a variant of a hybridizing under stringent conditions with the polynucle transcription factor nucleic acid listed in the Sequence List otides of the present invention may be, for example, variants ing, that is, one having a sequence that differs from the one of of the disclosed polynucleotide sequences, including allelic the polynucleotide sequences in the Sequence Listing, or a or splice variants, or sequences that encode orthologs or para complementary sequence, that encodes a functionally equiva logs of presently disclosed polypeptides. Nucleic acid 40 lent polypeptide (i.e., a polypeptide having some degree of hybridization methods are disclosed in detail by Kashima et equivalent or similar biological activity) but differs in al. (1985) Nature 313:402-404, and Sambrook et al. (1989) sequence from the sequence in the Sequence Listing, due to Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold degeneracy in the genetic code. Included within this defini Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (“Sam tion are polymorphisms that may or may not be readily detect brook”); and by Haymes et al., “Nucleic Acid Hybridization: 45 able using a particular oligonucleotide probe of the poly A Practical Approach, IRL Press, Washington, D.C. (1985), nucleotide encoding polypeptide, and improper or which references are incorporated herein by reference. unexpected hybridization to allelic variants, with a locus In general, stringency is determined by the temperature, other than the normal chromosomal locus for the polynucle ionic strength, and concentration of denaturing agents (e.g., otide sequence encoding polypeptide. formamide) used in a hybridization and washing procedure 50 “Allelic variant' or “polynucleotide allelic variant” refers (for a more detailed description of establishing and determin to any of two or more alternative forms of a gene occupying ing stringency, see below). The degree to which two nucleic the same chromosomal locus. Allelic variation arises natu acids hybridize under various conditions of stringency is cor rally through mutation, and may result in phenotypic poly related with the extent of their similarity. Thus, similar morphism within populations. Gene mutations may be nucleic acid sequences from a variety of Sources. Such as 55 'silent” or may encode polypeptides having altered amino within a plant's genome (as in the case of paralogs) or from acid sequence. “Allelic variant' and “polypeptide allelic vari another plant (as in the case of orthologs) that may perform ant’ may also be used with respect to polypeptides, and in this similar functions can be isolated on the basis of their ability to case the term refer to a polypeptide encoded by an allelic hybridize with known transcription factor sequences. Numer variant of a gene. ous variations are possible in the conditions and means by 60 “Splice variant' or “polynucleotide splice variant” as used which nucleic acid hybridization can be performed to isolate herein refers to alternative forms of RNA transcribed from a transcription factor sequences having similarity to transcrip gene. Splice variation naturally occurs as a result of alterna tion factor sequences known in the art and are not limited to tive sites being spliced within a single transcribed RNA mol those explicitly disclosed herein. Such an approach may be ecule or between separately transcribed RNA molecules, and used to isolate polynucleotide sequences having various 65 may result in several different forms of mRNA transcribed degrees of similarity with disclosed transcription factor from the same gene. This, splice variants may encode sequences, such as, for example, transcription factors having polypeptides having different amino acid sequences, which US 8,809,630 B2 27 28 may or may not have similar functions in the organism. A “transgenic plant” refers to a plant that contains genetic “Splice variant' or “polypeptide splice variant may also material not found in a wild-type plant of the same species, refer to a polypeptide encoded by a splice variant of a tran variety or cultivar. The genetic material may include a trans scribed mRNA. gene, an insertional mutagenesis event (Such as by transposon As used herein, “polynucleotide variants' may also refer to or T-DNA insertional mutagenesis), an activation tagging polynucleotide sequences that encode paralogs and orthologs sequence, a mutated sequence, a homologous recombination of the presently disclosed polypeptide sequences. “Polypep event or a sequence modified by chimeraplasty. Typically, the tide variants' may refer to polypeptide sequences that are foreign genetic material has been introduced into the plant by paralogs and orthologs of the presently disclosed polypeptide human manipulation, but any method can be used as one of Sequences. 10 Differences between presently disclosed polypeptides and skill in the art recognizes. polypeptide variants are limited so that the sequences of the A transgenic plant may contain an expression vector or former and the latter are closely similar overall and, in many cassette. The expression cassette typically comprises a regions, identical. Presently disclosed polypeptide sequences polypeptide-encoding sequence operably linked (i.e., under and similar polypeptide variants may differ in amino acid 15 regulatory control of) to appropriate inducible or constitutive sequence by one or more substitutions, additions, deletions, regulatory sequences that allow for the expression of fusions and truncations, which may be present in any combi polypeptide. The expression cassette can be introduced into a nation. These differences may produce silent changes and plant by transformation or by breeding after transformation of result in a functionally equivalent transcription factor. Thus, it a parent plant. A plant refers to a whole plant as well as to a will be readily appreciated by those of skill in the art, that any plant part, Such as seed, fruit, leaf, or root, plant tissue, plant of a variety of polynucleotide sequences is capable of encod cells or any other plant material, e.g., a plant explant, as well ing the transcription factors and transcription factor homolog as to progeny thereof, and to in vitro systems that mimic polypeptides of the invention. A polypeptide sequence variant biochemical or cellular components or processes in a cell. may have "conservative' changes, wherein a Substituted “Fragment', with respect to a polynucleotide, refers to a amino acid has similar structural or chemical properties. 25 clone or any part of a polynucleotide molecule that retains a Deliberate amino acid substitutions may thus be made on the usable, functional characteristic. Useful fragments include basis of similarity in polarity, charge, solubility, hydropho oligonucleotides and polynucleotides that may be used in bicity, hydrophilicity, and/or the amphipathic nature of the hybridization or amplification technologies or in the regula residues, as long as the functional or biological activity of the tion of replication, transcription or translation. A polynucle transcription factor is retained. For example, negatively 30 otide fragment” refers to any Subsequence of a polynucle charged amino acids may include aspartic acid and glutamic otide, typically, of at least about 9 consecutive nucleotides, acid, positively charged amino acids may include lysine and preferably at least about 30 nucleotides, more preferably at arginine, and amino acids with uncharged polar head groups least about 50 nucleotides, of any of the sequences provided having similar hydrophilicity values may include leucine, herein. Exemplary polynucleotide fragments are the first isoleucine, and Valine; glycine and alanine; asparagine and 35 sixty consecutive nucleotides of the transcription factor poly glutamine; serine and threonine; and phenylalanine and nucleotides listed in the Sequence Listing. Exemplary frag tyrosine (for more detail on conservative Substitutions, see ments also include fragments that comprise a region that Table 2). More rarely, a variant may have “non-conservative' encodes a conserved domain of a transcription factor. changes, e.g., replacement of a glycine with a tryptophan. Fragments may also include Subsequences of polypeptides Similar minor variations may also include amino acid dele 40 and protein molecules, or a Subsequence of the polypeptide. tions or insertions, or both. Related polypeptides may com Fragments may have uses in that they may have antigenic prise, for example, additions and/or deletions of one or more potential. In some cases, the fragment or domain is a Subse N-linked or O-linked glycosylation sites, or an addition and/ quence of the polypeptide which performs at least one bio or a deletion of one or more cysteine residues. Guidance in logical function of the intact polypeptide in Substantially the determining which and how many amino acid residues may 45 same manner, or to a similar extent, as does the intact be substituted, inserted or deleted without abolishing func polypeptide. For example, a polypeptide fragment can com tional or biological activity may be found using computer prise a recognizable structural motif or functional domain programs well known in the art, for example, DNASTAR such as a DNA-binding site or domain that binds to a DNA software (see U.S. Pat. No. 5,840,544). promoter region, an activation domain, or a domain for pro The term “plant' includes whole plants, shoot vegetative 50 tein-protein interactions, and may initiate transcription. Frag organs/structures (e.g., leaves, stems and tubers), roots, flow ments can vary in size from as few as 3 amino acids to the full ers and floral organs/structures (e.g., bracts, sepals, petals, length of the intact polypeptide, but are preferably at least stamens, carpels, anthers and ovules), seed (including about 30 amino acids in length and more preferably at least embryo, endosperm, and seed coat) and fruit (the mature about 60 amino acids in length. Exemplary polypeptide frag ovary), plant tissue (e.g., vascular tissue, ground tissue, and 55 ments are the first twenty consecutive amino acids of a mam the like) and cells (e.g., guard cells, egg cells, and the like), malian protein encoded by are the first twenty consecutive and progeny of same. The class of plants that can be used in amino acids of the transcription factor polypeptides listed in the method of the invention is generally as broad as the class the Sequence Listing. Exemplary fragments also include of higher and lower plants amenable to transformation tech fragments that comprise a conserved domain of a transcrip niques, including angiosperms (monocotyledonous and 60 tion factor, for example, amino acid residues 27-63 of G682 dicotyledonous plants), gymnosperms, ferns, horsetails, (SEQ ID NO: 468), as noted in Table 5. psilophytes, lycophytes, bryophytes, and multicellular algae. The invention also encompasses production of DNA (See for example, FIG. 1, adapted from Daly et al. (2001) sequences that encode transcription factors and transcription Plant Physiol. 127: 1328-1333; FIG.2, adapted from Ku et al. factor derivatives, or fragments thereof, entirely by synthetic (2000) Proc. Natl. Acad. Sci. 97: 9121-9126; and see also 65 chemistry. After production, the synthetic sequence may be Tudge, in The Variety of Life, Oxford University Press, New inserted into any of the many available expression vectors and York, N.Y. (2000) pp. 547-606). cell systems using reagents well known in the art. Moreover, US 8,809,630 B2 29 30 synthetic chemistry may be used to introduce mutations into supra); the teosinte branched (TEO) family (Luo et al. (1996) a sequence encoding transcription factors or any fragment supra); the AB13 family (Giraudat et al. (1992) supra); the thereof. triple helix (TH) family (Dehesh et al. (1990) supra); the EIL A “conserved domain or “conserved region' as used family (Chao et al. (1997) Cell supra); the AT-HOOK family herein refers to a region in heterologous polynucleotide or 5 (Reeves and Nissen (1990 supra); the SIFA family (Zhou et polypeptide sequences where there is a relatively high degree al. (1995) supra); the bZIPT2 family (Lu and Ferl (1995) of sequence identity between the distinct sequences. supra); the YABBY family (Bowman et al. (1999) supra); the With respect to polynucleotides encoding presently dis PAZ family (Bohmert et al. (1998) supra); a family of mis closed transcription factors, a conserved region is preferably cellaneous (MISC) transcription factors including the DPBF at least 10 base pairs (bp) in length. 10 family (Kim et al. (1997) supra) and the SPF1 family (Ishig A “conserved domain, with respect to presently disclosed uro and Nakamura (1994) supra); the GARP family (Hall et polypeptides refers to a domain within a transcription factor al. (1998) supra), the TUBBY family (Boggin et al. (1999) family that exhibits a higher degree of sequence homology, supra), the heat shock family (Wu (1995 supra), the ENBP Such as at least 26% sequence similarity, at least 16% family (Christiansen et al. (1996) supra), the RING-zinc fam sequence identity, preferably at least 40% sequence identity, 15 ily (Jensen et al. (1998) supra), the PDBP family (Janik et al. preferably at least 65% sequence identity including conser (1989) supra), the PCF family (Cubas et al. (1999) supra), the vative substitutions, and more preferably at least 80% SRS(SHI-related) family (Fridborg et al. (1999) supra), the sequence identity, and even more preferably at least 85%, or CPP (cysteine-rich polycomb-like) family (Cvitanich et al. at least about 86%, or at least about 87%, or at least about (2000) supra), the ARF (auxin response factor) family (Ulma 88%, or at least about 90%, or at least about 95%, or at least sov et al. (1999) supra), the SWI/SNF family (Collingwoodet about 98% amino acid residue sequence identity of a al. (1999) supra), the ACBF family (Seguin et al. (1997) polypeptide of consecutive amino acid residues. A fragment supra), PCGL (CG-1 like) family (da Costa e Silva et al. or domain can be referred to as outside a conserved domain, (1994) supra) the ARID family (Vazquez et al. (1999) supra), outside a consensus sequence, or outside a consensus DNA the Jumonji family, (Balciunas et al. (2000) supra), the bZIP binding site that is known to exist or that exists for a particular 25 NIN family (Schauser et al. (1999) supra), the E2F family transcription factor class, family, or Sub-family. In this case, Kaelin et al. (1992) supra) and the GRF-like family (Knaap et the fragment or domain will not include the exact amino acids all (2000) supra). of a consensus sequence or consensus DNA-binding site of a The conserved domains for each of polypeptides of SEQ transcription factor class, family or Sub-family, or the exact ID NO: 2N, wherein N=1-480, are listed in Table 5 as amino acids of a particular transcription factor consensus 30 described in Example VII. Also, many of the polypeptides of sequence or consensus DNA-binding site. Furthermore, a Table 5 have conserved domains specifically indicated by particular fragment, region, or domain of a polypeptide, or a start and stop sites. A comparison of the regions of the polynucleotide encoding a polypeptide, can be “outside a polypeptides in SEQ ID NO: 2N, wherein N=1-480, or of conserved domain if all the amino acids of the fragment, those in Table 5, allows one of skill in the art to identify region, or domain fall outside of a defined conserved 35 conserved domain(s) for any of the polypeptides listed or domain(s) for a polypeptide or protein. Sequences having referred to in this disclosure, including those in Tables 4-8. lesser degrees of identity but comparable biological activity The conserved domains for each of polypeptides of SEQ are considered to be equivalents. ID NOs: 568,944, 946,948, 1735, 1875, 1971, and 1973, are As one of ordinary skill in the art recognizes, conserved listed in Table 5 as described in Example VII. Also, many of domains may be identified as regions or domains of identity to 40 the polypeptides of Table 5 have conserved domains specifi a specific consensus sequence (see, for example, Riechmann cally indicated by start and stop sites. A comparison of the et al. (2000) supra). Thus, by using alignment methods well regions of the polypeptides in SEQID NO:568, SEQID NO: known in the art, the conserved domains of the plant tran 944, SEQID NO:946, SEQID NO:948, SEQID NO: 1735, scription factors for each of the following may be determined: SEQ ID NO: 1875, SEQID NO: 1971, SEQ ID NO: 1973, the AP2 (APETALA2) domain transcription factor family 45 SEQ ID NO: 1945, SEQID NO: 1947, SEQ ID NO: 1949, (Riechmann and Meyerowitz (1998) supra; the MYB tran SEQ ID NO: 1951, SEQID NO: 1953, SEQ ID NO: 1955, scription factor family (ENBib: Martin and Paz-Ares (1997) SEQ ID NO: 1957, SEQID NO: 1959, SEQ ID NO: 1961, supra); the MADS domain transcription factor family (Riech SEQID NO: 1963, SEQID NO: 1965, SEQID NO: 1967, or mann and Meyerowitz (1997) supra: Immink et al. (2003) SEQID NO: 1969, or of those in Table 5, allows one of skill supra); the WRKY protein family (Ishiguro and Nakamura 50 in the art to identify conserved domain(s) for any of the (1994) Supra); the ankyrin-repeat protein family (Zhang et al. polypeptides listed or referred to in this disclosure, including (1992) supra); the Zinc finger protein (Z) family (Klug and those in Tables 4-8. Schwabe (1995) supra; Takatsuji (1998) supra); the A gene is a functional unit of inheritance, and in physical homeobox (HB) protein family (Buerglin (1994) supra); the terms is aparticular segment or sequence of nucleotides along CAAT-element binding proteins (Forsburg and Guarente 55 a molecule of DNA (or RNA, in the case of RNA viruses) (1989) Supra); the squamosa promoter binding proteins involved in producing a polypeptide chain. The latter may be (SPB) (Klein et al. (1996) supra); the NAM protein family Subjected to Subsequent processing Such as splicing and fold (Souer et al. (1996) supra); the IAA/AUX proteins (Abeletal. ing to obtain a functional protein or polypeptide. A gene may (1995) supra); the HLH/MYC protein family (Littlewood et be isolated, partially isolated, or be found with an organism’s al. (1994) supra); the DNA-binding protein (DBP) family 60 genome. By way of example, a transcription factor gene (Tucker et al. (1994) supra); the bZIP family of transcription encodes a transcription factor polypeptide, which may be factors (Foster et al. (1994) supra); the Box P-binding protein functional or require processing to function as an initiator of (the BPF-1) family (da Costa e Silva et al. (1993) supra); the transcription. high mobility group (HMG) family (Bustin and Reeves Operationally, genes may be defined by the cis-trans test, a (1996) supra); the scarecrow (SCR) family (DiLaurenzio et 65 genetic test that determines whether two mutations occur in al. (1996) supra); the GF14 family (Wu et al. (1997) supra); the same gene and which may be used to determine the limits the polycomb (PCOMB) family (Goodrich et al. (1997) of the genetically active unit (Rieger et al. (1976) Glossary of US 8,809,630 B2 31 32 Genetics and Cytogenetics. Classical and Molecular; 4th ed., e.g., a transgenic plant or plant tissue, is different from the Springer Verlag. Berlin). A gene generally includes regions expression pattern in a wild-type plant or a reference plant of preceding (“leaders'; upstream) and following (“trailers': the same species. The pattern of expression may also be downstream) of the coding region. A gene may also include compared with a reference expression pattern in a wild-type intervening, non-coded sequences, referred to as “introns'. plant of the same species. For example, the polynucleotide or located between individual coding segments, referred to as polypeptide is expressed in a cell or tissue type other than a “exons'. Most genes have an associated promoter region, a cell or tissue type in which the sequence is expressed in the regulatory sequence 5' of the transcription initiation codon wild-type plant, or by expression at a time other than at the (there are some genes that do not have an identifiable pro time the sequence is expressed in the wild-type plant, or by a moter). The function of a gene may also be regulated by 10 response to different inducible agents. Such as hormones or enhancers, operators, and other regulatory elements. environmental signals, or at different expression levels (either A “trait” refers to a physiological, morphological, bio higher or lower) compared with those found in a wild-type chemical, or physical characteristic of a plant or particular plant. The term also refers to altered expression patterns that plant material or cell. In some instances, this characteristic is are produced by lowering the levels of expression to below the visible to the human eye, Such as seed or plant size, or can be 15 detection level or completely abolishing expression. The measured by biochemical techniques, such as detecting the resulting expression pattern can be transient or stable, consti protein, starch, or oil content of seed or leaves, or by obser tutive or inducible. In reference to a polypeptide, the term Vation of a metabolic or physiological process, e.g. by mea “ectopic expression or altered expression' further may relate suring uptake of carbon dioxide, or by the observation of the to altered activity levels resulting from the interactions of the expression level of a gene or genes, e.g., by employing North polypeptides with exogenous or endogenous modulators or ern analysis, RT-PCR, microarray gene expression assays, or from interactions with factors or as a result of the chemical reporter gene expression systems, or by agricultural observa modification of the polypeptides. tions such as stress tolerance, yield, or pathogen tolerance. The term “overexpression' as used herein refers to a Any technique can be used to measure the amount of com greater expression level of a gene in a plant, plant cell or plant parative level of, or difference in any selected chemical com 25 tissue, compared to expression in a wild-type plant, cell or pound or macromolecule in the transgenic plants, however. tissue, at any developmental or temporal stage for the gene. “Trait modification” refers to a detectable difference in a Overexpression can occur when, for example, the genes characteristic in a plant ectopically expressing a polynucle encoding one or more transcription factors are under the otide or polypeptide of the present invention relative to a plant control of a strong expression signal. Such as one of the not doing so. Such as a wild-type plant. In some cases, the trait 30 promoters described herein (e.g., the cauliflowermosaic virus modification can be evaluated quantitatively. For example, 35S transcription initiation region). Overexpression may the trait modification can entail at least about a 2% increase or occur throughout a plant or in specific tissues of the plant, decrease in an observed trait (difference), at least a 5% dif depending on the promoter used, as described below. ference, at least about a 10% difference, at least about a 20% Overexpression may take place in plant cells normally difference, at least about a 30%, at least about a 50%, at least 35 lacking expression of polypeptides functionally equivalent or about a 70%, or at least about a 100%, or an even greater identical to the present transcription factors. Overexpression difference compared with a wild-type plant. It is known that may also occur in plant cells where endogenous expression of there can be a natural variation in the modified trait. There the present transcription factors or functionally equivalent fore, the trait modification observed entails a change of the molecules normally occurs, but such normal expression is at normal distribution of the trait in the plants compared with the 40 a lower level. Overexpression thus results in a greater than distribution observed in wild-type plant. normal production, or “overproduction of the transcription “Wild type', as used herein, refers to a cell, tissue or plant factor in the plant, cell or tissue. that has not been genetically modified to knock out or over The term "phase change” refers to a plant's progression express one or more of the presently disclosed transcription from embryo to adult, and, by some definitions, the transition factors. Wild-type cells, tissue or plants may be used as con 45 wherein flowering plants gain reproductive competency. It is trols to compare levels of expression and the extent and nature believed that phase change occurs either after a certain num of trait modification with cells, tissue or plants in which ber of cell divisions in the shoot apex of a developing plant, or transcription factor expression is altered or ectopically when the shoot apex achieves a particular distance from the expressed, e.g., in that it has been knocked out or overex roots. Thus, altering the timing of phase changes may affect a pressed. 50 plant's size, which, in turn, may affect yield and biomass. The term “transcript profile” refers to the expression levels Traits that Maybe Modified in Overexpressing or Knock-Out of a set of genes in a cell in a particular state, particularly by Plants comparison with the expression levels of that same set of Trait modifications of particular interest include those to genes in a cell of the same type in a reference state. For seed (such as embryo or endosperm), fruit, root, flower, leaf. example, the transcript profile of a particular transcription 55 stem, shoot, seedling or the like, including: enhanced toler factor in a suspension cell is the expression levels of a set of ance to environmental conditions including freezing, chill genes in a cell overexpressing that transcription factor com ing, heat, drought, water saturation, radiation and oZone; pared with the expression levels of that same set of genes in a improved tolerance to microbial, fungal or viral diseases; Suspension cell that has normal levels of that transcription improved tolerance to pest infestations, including insects, factor. The transcript profile can be presented as a list of those 60 nematodes, mollicutes, parasitic higher plants or the like; genes whose expression level is significantly different decreased herbicide sensitivity; improved tolerance of heavy between the two treatments, and the difference ratios. Differ metals or enhanced ability to take up heavy metals; improved ences and similarities between expression levels may also be growth under poor photoconditions (e.g., low light and/or evaluated and calculated using statistical and clustering meth short day length), or changes in expression levels of genes of ods. 65 interest. Other phenotype that can be modified relate to the “Ectopic expression' or “altered expression' in reference production of plant metabolites, such as variations in the to a polynucleotide indicates that the pattern of expression in, production of taxol, tocopherol, tocotrienol, Sterols, phy US 8,809,630 B2 33 34 tosterols, vitamins, wax monomers, anti-oxidants, amino Gao et al. (2002) Plant Molec. Biol. 49: 459-471) have acids, lignins, cellulose, tannins, prenyllipids (such as chlo recently described four CBF transcription factors from Bras rophylls and carotenoids), glucosinolates, and terpenoids, sica napus: BNCBFs 5, 7, 16 and 17. They note that the first enhanced or compositionally altered protein or oil production three CBFs (GenBank Accession Numbers AAM18958, (especially in seeds), or modified Sugar (insoluble or soluble) AAM18959, and AAM18960, respectively) are very similar and/or starch composition. Physical plant characteristics that to Arabidopsis CBF1, whereas BNCBF 17 (GenBankAcces can be modified include cell development (such as the num sion Number AAM18961) is similar but contains two extra ber of trichomes), fruit and seed size and number, yields of regions of 16 and 21 amino acids in its acidic activation plant parts such as stems, leaves, inflorescences, and roots, domain. All four B. napus CBFs accumulate in leaves of the the stability of the seeds during storage, characteristics of the 10 plants after cold-treatment, and BNCBFs 5, 7, 16 accumu seed pod (e.g., Susceptibility to shattering), root hair length lated after salt stress treatment. The authors concluded that and quantity, internode distances, or the quality of seed coat. these BNCBFs likely function in low-temperature responses Plant growth characteristics that can be modified include in B. napus. growth rate, germination rate of seeds, vigor of plants and In a functional study of CBF genes, Hsieh et al. (2002) seedlings, leaf and flower senescence, male sterility, apo 15 Plant Physiol. 129: 1086-1094) found that heterologous mixis, flowering time, flower abscission, rate of nitrogen expression of Arabidopsis CBF1 in tomato plants confers uptake, osmotic sensitivity to soluble Sugar concentrations, increased tolerance to chilling and considerable tolerance to biomass or transpiration characteristics, as well as plant oxidative stress, which Suggested to the authors that ectopic architecture characteristics Such as apical dominance, Arabidopsis CBF1 expression may induce several tomato branching patterns, number of organs, organ identity, organ stress responsive genes to protect the plants. shape or size. Polypeptides and Polynucleotides of the Invention Transcription Factors Modify Expression of Endogenous The present invention provides, among other things, tran Genes Scription factors (TFs), and transcription factor homolog Expression of genes that encode transcription factors that polypeptides, and isolated or recombinant polynucleotides modify expression of endogenous genes, polynucleotides, 25 encoding the polypeptides, or novel sequence variant and proteins are well known in the art. In addition, transgenic polypeptides or polynucleotides encoding novel variants of plants comprising isolated polynucleotides encoding tran transcription factors derived from the specific sequences pro Scription factors may also modify expression of endogenous vided here. These polypeptides and polynucleotides may be genes, polynucleotides, and proteins. Examples include Peng employed to modify a plant's characteristics. et al. (1997) Genes and Development 11: 3194-3205, and 30 Exemplary polynucleotides encoding the polypeptides of Peng et al. (1999) Nature 400: 256-261. In addition, many the invention were identified in the Arabidopsis thaliana others have demonstrated that an Arabidopsis transcription GenBank database using publicly available sequence analy factor expressed in an exogenous plant species elicits the sis programs and parameters. Sequences initially identified same or very similar phenotypic response. See, for example, were then further characterized to identify sequences com Fu et al. (2001) Plant Cell 13: 1791-1802: Nandietal. (2000, 35 prising specified sequence strings corresponding to sequence Curr. Biol. 10:215-218: Coupland (1995) Nature 377: 482 motifs present in families of known transcription factors. In 483; and Weigel and Nilsson (1995) Nature 377: 482-500. addition, further exemplary polynucleotides encoding the In another example, Mandeletal. (1992) Cell 71-133-143 polypeptides of the invention were identified in the plant and Suzuki et al. (2001) Plant J. 28: 409–418, teach that a GenBank database using publicly available sequence analy transcription factor expressed in another plant species elicits 40 sis programs and parameters. Sequences initially identified the same or very similar phenotypic response of the endog were then further characterized to identify sequences com enous sequence, as often predicted in earlier studies of Ara prising specified sequence strings corresponding to sequence bidopsis transcription factors in Arabidopsis (see Mandel et motifs present in families of known transcription factors. al. (1992) supra; Suzuki et al. (2001) supra). Polynucleotide sequences meeting Such criteria were con Other examples include Müller et al. (2001) Plant J. 28: 45 firmed as transcription factors. 169-179; Kimetal. (2001) Plant J. 25: 247-259; Kyozuka and Additional polynucleotides of the invention were identi Shimamoto (2002) Plant Cell Physiol. 43: 130-135; Boss and fied by Screening Arabidopsis thaliana and/or other plant Thomas (2002) Nature 416: 847-850; He et al. (2000) Trans cDNA libraries with probes corresponding to known tran genic Res. 9: 223-227; and Robson et al. (2001) Plant J. 28: Scription factors under low stringency hybridization condi 619-631. 50 tions. Additional sequences, including full length coding In yet another example, Gilmour et al. (1998) Plant J 16: sequences were Subsequently recovered by the rapid ampli 433-442, teach an Arabidopsis AP2 transcription factor, fication of cDNA ends (RACE) procedure, using a commer CBF1 (SEQID NO:44), which, when overexpressed in trans cially available kit according to the manufacturers instruc genic plants, increases plant freezing tolerance. Jaglo et al. tions. Where necessary, multiple rounds of RACE are (2001) Plant Physiol. 127: 910-917, further identified 55 performed to isolate 5' and 3' ends. The full-length cDNA was sequences in Brassica napus which encode CBF-like genes then recovered by a routine end-to-end polymerase chain and that transcripts for these genes accumulated rapidly in reaction (PCR) using primers specific to the isolated 5' and 3 response to low temperature. Transcripts encoding CBF-like ends. Exemplary sequences are provided in the Sequence proteins were also found to accumulate rapidly in response to Listing. low temperature in wheat, as well as in tomato. An alignment 60 The polynucleotides of the invention can be or were ectopi of the CBF proteins from Arabidopsis, B. napus, wheat, rye, cally expressed in overexpressor or knockout plants and the and tomato revealed the presence of conserved consecutive changes in the characteristic(s) or trait(s) of the plants amino acid residues, PKK/RPAGRXKFXETRHP and observed. Therefore, the polynucleotides and polypeptides DSAWR, that bracket the AP2/EREBP DNA binding can be employed to improve the characteristics of plants. domains of the proteins and distinguish them from other 65 The polynucleotides of the invention can be or were ectopi members of the AP2/EREBP protein family. (See Jaglo et al. cally expressed in overexpressor plant cells and the changes Supra). in the expression levels of a number of genes, polynucle US 8,809,630 B2 35 36 otides, and/or proteins of the plant cells observed. Therefore, SOY MADS1), and 34% identity and 56% similarity to SEQ the polynucleotides and polypeptides can be employed to ID NO: 1973 (SOY3: SOY MADS3), respectively. change expression levels of a genes, polynucleotides, and/or In addition, Table 14 shows that SEQID NO: 946 (MAF4; proteins of plants. G1843) has 57% identity and 74% similarity to SEQID NO: Mads Affecting Flowering (MAF) Transcription Factor Gene 1875 (FLC: G1759), 65% identity and 81% similarity to SEQ Family Polynucleotide and Polypeptide Sequences ID NO: 1735 (MAF 1: G157), 64% identity and 77% similar Examination of the Arabidopsis genome sequence has ity to SEQID NO:568 (MAF2: G859), 64% identity and 78% revealed the existence offive MADS box genes which encode similarity to SEQID NO: 944 (MAF3: G1842), 71% identity proteins that are highly related to FLC (Alvarez-Buylla et al. and 83% similarity to SEQID NO:948 (MAF5: G 1844).35% 10 identity and 53% similarity to SEQ ID NO: 1971 (SOY1: (2000a) Proc. Natl. Acad. Sci. 97: 5328-5333: Ratcliffe et al. SOY MADS1), and 36% identity and 55% similarity to SEQ (2001) Plant Physiol. 126: 122-132). The first of the genes to ID NO: 1973 (SOY3: SOY MADS3), respectively. be analyzed, MADS AFFECTING FLOWERING1, MAF1 In addition, Table 14 shows that SEQID NO: 948 (MAF5; (which has also been referred to as FLOWERINGLOCUSM, G1844) has 53% identity and 73% similarity to SEQID NO: FLM, Scortecci et al. (2001) Plant J. 26: 229-236, and as 15 1875 (FLC: G1759), 63% identity and 80% similarity to SEQ AGL27, Alvarez-Buylla et al. (2000a) supra), was shown to ID NO: 1735 (MAF 1: G157), 63% identity and 79% similar be a floral repressor (Ratcliffe et al. (2001) supra; Scortecci et ity to SEQID NO:568 (MAF2: G859), 65% identity and 81% al. (2001) supra). MAF1 expression shows a less clear-cut similarity to SEQID NO: 944 (MAF3: G1842), 71% identity association with the vernalization response than that of FLC, and 83% similarity to SEQID NO:946 (MAF4: G 1843), 36% and the gene potentially acts downstream or independently of identity and 54% similarity to SEQ ID NO: 1971 (SOY1: FLC transcription (Ratcliffe et al. (2001) supra). The func SOY MADS1), and 36% identity and 56% similarity to SEQ tions of four FLC related genes were analyzed and it was ID NO: 1973 (SOY3: SOY MADS3), respectively. demonstrated that they influence the timing of flowering. In The results show that a polypeptide of 186 amino acid particular, a mechanism that prevents Arabidopsis plants residues (the length of the shortest of the MAF sequences: becoming vernalized by short periods of cold was revealed. 25 Soy 1..SOY MADS1; SEQID NO: 1971) shows at least 32% The Arabidopsis genome contains four genes, which are identity over the complete polypeptide sequence compared highly related to FLC and MAF1, arranged in a tightcluster at with SEQID NO: 944 (MAF3: G 1842). the bottom of chromosome 5 (Ratcliffe et. al. (2001) supra). FIGS. 7A and 7B shows the alignment between the Arabi The gene cluster occupies approximately 22 kb and com dopsis and Soy MAF polypeptide sequences produced using prises At3.g65050 (which corresponds to AGL31; Alvarez 30 the CLASTAL W(1.4) multiple sequence alignment algo Buylla et al. (2000a) supra), At5g65060, AtSg65070, and rithm. Atg65080. As shown in FIG.6, an alignmentofthefull-length As shown in FIGS. 7A and 7B, a conserved domain of SEQ protein sequences of SEQID NO: 1875 (FLC), SEQID NO: ID NO: 568 (MAF2, G859 pep; amino acid residues Gly2 1735 (MAF1), SEQID NO:568 (MAF2), SEQID NO: 944 through Ser57) has amino acid sequence identity with a con (MAF3), SEQID NO: 946 (MAF4), and SEQ ID NO: 948 35 served domain of SEQ ID NO: 1875 (FLC; G1759 pep: (MAF5), shows that the protein sequences share a large 83.9%), SEQID NO: 1735 (MAF1, G157 pep: 87.5%), SEQ degree of identity (residue identity denoted by asterisk: “*”: ID NO: 944 (MAF3, G1842.pep; 92.9%), SEQ ID NO: 946 residue similarity denoted by period: “”) across the entire (MAF4, G1843 pep; 76.8%), SEQ ID NO: 948 (MAF5, sequence, depending upon the pair-wise combination. This G1844-pep; 78.6%), SEQ ID NO:14 (SOY MADS1, similarity Suggested that the polynucleotide sequences and 40 Soy1 pep; 69.6%), and SEQ ID NO: 1973 (SOY MADS3, the encoded polypeptide sequences of MAF2, MAF3, MAF4. Soy3.pep; 66.1%). SOY MADS1 (SEQID NO:14) and SOY and MAF5 are MADS-domain family transcription factors MADS3 (SEQ ID NO:16) are therefore soy (Glycine max) which have functions related to those of MAF1 and FLC in MAF homologs of the Arabidopsis SEQID NOs: 568,944, the regulation offlowering time (Riechmann and Meyerowitz 946, 948, 1735, and 1875. (1997) supra). 45 The conserved domain of MAF transcription factors exem MAF polypeptide sequences from Arabidopsis and Soy plified by amino acid residues Gly2 through Ser57 of SEQID were aligned using the CLUSTALW (1.4) multiple sequence NO: 568 as shown in FIG. 7A is defined by the consensus alignment algorithm (MACVETOR) to create pairwise align amino acid residue Sequence ments. The results are shown in Table 14. GX1X1X1X2EIKRIENKSXRQX2TFXKRRXGLX As shown in Table 14, SEQID NO:568 (MAF2: G859) has 50 XKARX3LSX2 LCXXXX2AX2XX2XSXX4GX1LYXX 61% identity and 74% similarity to SEQID NO: 1875 (FLC: (SEQ ID NO: 2010), wherein X1 represent a basic residue G1759), 76% identity and 85% similarity to SEQ ID NO: Such as Kor R, X2 represent an aliphatic residue Such as V. I. 1735 (MAF 1: G157), 87% identity and 90% similarity to L, A, or G, X3 represents an acid or amide residue Such as E SEQ ID NO: 944 (MAF3: G 1842), 64% identity and 77% or Q, or D or N, X4 represents an aliphatic hydroxyl residue similarity to SEQID NO: 946 (MAF4: G1843), 63% identity 55 Such as T or S, and wherein X represents any amino acid and 79% similarity to SEQID NO:948 (MAF5: G 1844), 34% residue. An exemplary consensus sequence is SEQ ID NO: identity and 52% similarity to SEQ ID NO: 1971 (SOY1: 2010. SOY MADS1), and 34% identity and 53% similarity to SEQ As also shown in FIG. 7A, an additional conserved domain ID NO: 1973 (SOY3: SOY MADS3), respectively. (amino acid residues Alaš8 through GlnT4) flanks a region In addition, Table 14 shows that SEQID NO: 944 (MAF3; 60 C-terminal to the conserved domain of amino acid residues G1842) has 60% identity and 75% similarity to SEQID NO: Gly2 through Ser57 of SEQ ID NO: 568. This conserved 1875 (FLC: G1759), 78% identity and 87% similarity to SEQ domain of SEQID NO: 568 (MAF2, G859 pep; amino acid ID NO: 1735 (MAF 1: G157), 87% identity and 90% similar residues Ala58 through Gln14) has amino acid sequence ity to SEQID NO:568 (MAF2: G859), 64% identity and 78% identity with a conserved domain of SEQID NO: 1875 (FLC: similarity to SEQID NO: 946 (MAF4: G1843), 65% identity 65 G1759 pep; 58.8%), SEQ ID NO: 1735 (MAF1, G157 pep: and 81% similarity to SEQID NO:948 (MAF5: G 1844), 32% 76.5%), SEQID NO: 944 (MAF3, G1842-pep: 100%), SEQ identity and 43% similarity to SEQ ID NO: 1971 (SOY1: ID NO: 946 (MAF4, G1843.pep; 59.2%), SEQ ID NO: 948 US 8,809,630 B2 37 38 (MAF5, G1844-pep; 58.8%), SEQ ID NO: 1971 (SOY thereby prevents a flowering response being triggered. In MADS1, Soy 1 pep; 23.5%), and SEQ ID NO: 1973 (SOY Some environments, promotion of flowering in response to a MADS3, Soy3-pep; 23.5%). few days of cold weather might be advantageous. However, As also shown in FIG. 7A, the larger conserved domain winter annual strains of Arabidopsis from northern latitudes (amino acid residues Gly2 through Gln14) of SEQ ID NO: 5 have evolved to over-winter vegetatively and commence 568. This conserved domain of SEQ ID NO: 568 (MAF2, flowering in the spring only after a Sustained period of low G859. pep; amino acid residues Gly2 through Gln14) has temperature (Reeves and Coupland (2000) supra; Michaels and Amasino (2000) supra). Individual plants without MAF2 amino acid sequence identity with a conserved domain of like activity would be more susceptible to transient cold spells SEQ ID NO: 1875 (FLC; G1759 pep; 78.1%), SEQID NO: in the autumn, when conditions for seed set are unfavorable. 1735 (MAF1, G157 pep; 84.9%), SEQID NO: 944 (MAF3, 10 Thus, there would be likely a selective advantage for a plant to G1842pep: 93.2%), SEQID NO: 946 (MAF4, G1843.pep: evolve MAF2 function. It would be advantageous for a plant 72.6%), SEQID NO: 948 (MAF5, G1844-pep; 72.6%), SEQ that does not have an endogenous MAF2-like activity to be ID NO: 1971 (SOY MADS1, Soy1 pep: 65.8%), and SEQID bred and/or transformed to have MAF2-like (MAF) activity NO: 1973 (SOY MADS3, Soy3 pep; 54.8%). in order to be less susceptible to transient cold. The larger conserved domain shown in FIG. 7A is defined 15 The polynucleotide sequences of SEQID NOs: 1970, and by the consensus amino acid residue sequence 1972 may be altered such that amino acid sequences SEQID GX1X1X1X2EIKRIENKSXRQX2TFXKRRXGL NOs: 1971 and 1973 (SOY MADS1 and SOY MADS3, XXKARX3LSX2 respectively) have conservative and non-conservative similar LCXXXX2AX2XX2XSXX4GX1LYXXXXGDXXX amino acid substitutions to create sequences which can have XX2X2XXX5XXXX (SEQID NO: 2011), wherein X1 rep MAF (such as MAF2-like) activity. resentabasic residue Such as Kor R, X2 representanaliphatic In general, a wide variety of applications exist for systems residue such as V.I., L, A, or G, X3 represents an acid oramide that either lengthen or shorten the time to flowering. residue such as E or Q, or D or N, X4 represents an aliphatic Accelerated Flowering: hydroxyl residue such as T or S, X5 represents an aromatic Most modern crop varieties are the result of extensive residue such as For Y, and wherein X represents any amino 25 breeding programs. Many generations of backcrossing can be acid residue. An exemplary consensus sequence is SEQ ID required to introduce desired traits. Transgenic plants com prising systems that accelerate flowering could have valuable NO: 2011. applications in Such programs. A faster generation time can In addition, FIG.7 shows that FLC, MAF1, MAF2, MAF3, allow additional harvests of a crop to be made within a given MAF4, MAF5, SOY MADS1, and SOY MADS3 share three growing season. With the advent of transformation systems potential protein kinase C phosphorylation sites atamino acid 30 for tree species such as oil palm and Eucalyptus, forest bio residues Ser15, Ser22, and Ser51; two potential protein technology is a growing area of interest. Acceleration of kinase A phosphorylation sites at amino acid residues Thr20 flowering, again, can reduce generation times and make and Ser36; a potential CaMPKII phosphorylation site at breeding programs feasible which would otherwise be impos amino acid residue Ser36; FLC, MAF1, MAF2, MAF3, sible. That this is a real possibility has already been demon MAF4, and MAF5 share three potential protein kinase casein 35 strated in aspen, a tree species that usually takes 8-20 years to kinase I phosphorylation sites at amino acid residues Y55, flower. Transgenic aspen that over-express the Arabidopsis S116, and S129; and MAF1, MAF2, MAF3, and MAF5 share LFY gene flower after only 5 months. The flowers produced three potential protein kinase casein kinase II phosphoryla by these young aspen plants, however, were sterile (Weigel tion sites at amino acid residue S57, S59, and S119. and Nilsson (1995) Nature 377: 495-500). Allelic variants of MAF2-5 are represented by SEQ ID 40 Delayed Flowering: NOs: 567, 1944, 1946, 1948, and 1950 (MAF2 variants): In species such as Sugarbeet, where the vegetative parts of SEQ ID NOs: 943, 1952, 1954, 1956, and 1958 (MAF3 the plants constitute the crop and the reproductive tissues are variants); SEQ ID NOs: 945, 1960, 1962, 1964, and 1966 discarded, it would be advantageous to delay or prevent flow (MAF4 variants); and SEQ ID NOs: 947 and 1968 (MAF5 ering. Extending vegetative development could bring about variants). 45 large increases in yields. FIG. 8 shows the effects of vernalization on expression of Inducible Flowering: MAF2-5 (SEQID NOs: 567,943, 945, and 947) in different By regulating the expression of flowering-time controlling genetic backgrounds (Arabidopsis accessions). FIG.9 shows genes, using inducible promoters, flowering could potentially a schematic diagram Summarizing the responses of FLC and be triggered as desired (for example, by application of a MAF1-5 to vernalization, and their potential effects on the 50 chemical inducer). This would allow, for example, flowering floral transition. Arrows indicate positive interactions, blunt to be synchronized across a crop and facilitate more efficient ended lines denote inhibition. harvesting. Such inducible systems could be used to tune the MAF2-5 (SEQID NOs: 567,943,945, and 947; encoding flowering of crop varieties to different latitudes. At present, SEQ ID NOs: 568, 944, 946, and 948, respectively) are species such as Soybean and cotton are available as a series of involved in regulation of the vernalization response. MAF2 55 maturity groups that are suitable for different latitudes on the (SEQ ID NO: 567) encodes a floral repressor (SEQID NO: basis of their flowering time (which is governed by day 568), which participates in a previously unrecognized mecha length). A system in which flowering could be chemically nism that prevents the plants being vernalized by short cold controlled would allow a single high-yielding northern matu periods. MAF3 and MAF4 (SEQ ID NO: 943 and SEQ ID rity group to be grown at any latitude. In Southern regions NO: 945, encoding SEQID NOs: 944 and 946, respectively) 60 Such plants could be grown for longer, thereby increasing may have parallel roles to FLC in the maintenance of a ver yields, before flowering was induced. In more northern areas, nalization requirement. MAF5 (SEQID NO: 947; encoding the induction would be used to ensure that the crop flowers SEQ ID NO: 948), is activated by vernalization, and could prior to the first winter frosts. Currently, the existence of a therefore have an opposing role to FLC and MAF1-4. series of maturity groups for different latitudes represents a Therefore, it was concluded that MAF2 (SEQID NO:567; 65 major barrier to the introduction of new valuable traits. Any encoding SEQID NO: 568) compensates for the decrease in trait (e.g. disease resistance, Vernalization response) has to be FLC levels that occurs following short cold spells, and bred into each of the different maturity groups separately; a US 8,809,630 B2 39 40 laborious and costly exercise. The availability of single strain, sequence, e.g., a polynucleotide encoding all or part of a which could be grown at any latitude, would therefore greatly transcription factor. For example, chemical synthesis using increase the potential for introducing new traits to crop spe the phosphoramidite method is described, e.g., by Beaucage cies such as soybean and cotton. et al. (1981) Tetrahedron Letters 22: 1859-1869; and Matthes Producing Polypeptides et al. (1984) EMBO.J. 3: 801-805. According to such meth The polynucleotides of the invention include sequences ods, oligonucleotides are synthesized, purified, annealed to that encode transcription factors and transcription factor their complementary Strand, ligated and then optionally homolog polypeptides and sequences complementary cloned into suitable vectors. And if so desired, the polynucle thereto, as well as unique fragments of coding sequence, or otides and polypeptides of the invention can be custom sequence complementary thereto. Such polynucleotides can 10 ordered from any of a number of commercial Suppliers. be, e.g., DNA or RNA, e.g., mRNA, cRNA, synthetic RNA, Homologous Sequences genomic DNA, cDNA synthetic DNA, oligonucleotides, etc. Sequences homologous, i.e., that share significant The polynucleotides are either double-stranded or single sequence identity or similarity, to those provided in the Stranded, and include either, or both sense (i.e., coding) Sequence Listing, derived from Arabidopsis thaliana or from sequences and antisense (i.e., non-coding, complementary) 15 other plants of choice, are also an aspect of the invention. sequences. The polynucleotides include the coding sequence Homologous sequences can be derived from any plant includ of a transcription factor, or transcription factor homolog ing monocots and dicots and in particular agriculturally polypeptide, in isolation, in combination with additional cod important plant species, including but not limited to, crops ing sequences (e.g., a purification tag, a localization signal, as Such as soybean, wheat, corn (maize), potato, cotton, rice, a fusion-protein, as a pre-protein, or the like), in combination rape, oilseed rape (including canola), Sunflower, alfalfa, clo with non-coding sequences (e.g., introns or inteins, regula ver, Sugarcane, and turf; or fruits and vegetables, such as tory elements such as promoters, enhancers, terminators, and banana, blackberry, blueberry, strawberry, and raspberry, the like), and/or in a vector or host environment in which the cantaloupe, carrot, cauliflower, coffee, cucumber, eggplant, polynucleotide encoding a transcription factor or transcrip grapes, honeydew, lettuce, mango, melon, onion, papaya, tion factor homolog polypeptide is an endogenous or exog 25 peas, peppers, pineapple, pumpkin, spinach, Squash, Sweet enous gene. corn, tobacco, tomato, tomatillo, watermelon, rosaceous A variety of methods exist for producing the polynucle fruits (such as apple, peach, pear, cherry and plum) and Veg otides of the invention. Procedures for identifying and isolat etable brassicas (such as broccoli, cabbage, cauliflower, Brus ing DNA clones are well known to those of skill in the art, and sels sprouts, and kohlrabi). Other crops, including fruits and are described in, e.g., Berger and Kimmel, Guide to Molecu 30 Vegetables, whose phenotype can be changed and which com lar Cloning Techniques, Methods in Enzymology, Vol. 152 prise homologous sequences include barley, rye: millet; Sor Academic Press, Inc., San Diego, Calif. (“Berger); Sam ghum.; currant; avocado: citrus fruits such as oranges, lemons, brook et al. (1989) Molecular Cloning A Laboratory grapefruit and tangerines, artichoke, cherries; nuts such as the Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, walnut and peanut, endive; leek; roots such as arrowroot, Cold Spring Harbor, N.Y., and Current Protocols in Molecu 35 beet, cassaya, turnip, radish, yam, and Sweet potato; and lar Biology, Ausubel et al. eds. Current Protocols, a joint beans. The homologous sequences may also be derived from venture between Greene Publishing Associates, Inc. and John woody species, such pine, poplar and eucalyptus, or mint or Wiley & Sons, Inc., (supplemented through 2000) other labiates. In addition, homologous sequences may be (Ausubel'). derived from plants that are evolutionarily-related to crop Alternatively, polynucleotides of the invention, can be pro 40 plants, but which may not have yet been used as crop plants. duced by a variety of in vitro amplification methods adapted Examples include deadly nightshade (Atropa belladona), to the present invention by appropriate selection of specific or related to tomato; jimson weed (Datura stromnium), related degenerate primers. Examples of protocols Sufficient to direct to peyote; and teosinte (Zea species), related to corn (maize). persons of skill through in vitro amplification methods, Orthologs and Paralogs including the polymerase chain reaction (PCR) the ligase 45 Homologous sequences as described above can comprise chain reaction (LCR), Qbeta-replicase amplification and orthologous or paralogous sequences. Several different meth other RNA polymerase mediated techniques (e.g., NASBA), ods are known by those of skill in the art for identifying and e.g., for the production of the homologous nucleic acids of the defining these functionally homologous sequences. Three invention are found in Berger (Supra), Sambrook (Supra), and general methods for defining orthologs and paralogs are Ausubel (supra), as well as Mullis et al. (1987) PCR Proto 50 described; an ortholog or paralog, including equivalogs, may cols A Guide to Methods and Applications (Innis et al. eds) be identified by one or more of the methods described below. Academic Press Inc. San Diego, Calif. (1990) (Innis). Orthologs and paralogs are evolutionarily related genes Improved methods for cloning invitro amplified nucleic acids that have similar sequence and similar functions. Orthologs are described in Wallace et al. U.S. Pat. No. 5,426,039. are structurally related genes in different species that are Improved methods for amplifying large nucleic acids by PCR 55 derived by a speciation event. Paralogs are structurally related are summarized in Cheng et al. (1994) Nature 369: 684-685 genes within a single species that are derived by a duplication and the references cited therein, in which PCR amplicons of event. up to 40 kb are generated. One of skill will appreciate that Within a single plant species, gene duplication may cause essentially any RNA can be converted into a double stranded two copies of a particular gene, giving rise to two or more DNA suitable for restriction digestion, PCR expansion and 60 genes with similar sequence and often similar function known sequencing using reverse transcriptase and a polymerase. as paralogs. A paralog is therefore a similar gene formed by See, e.g., Ausubel, Sambrook and Berger, all Supra. duplication within the same species. Paralogs typically clus Alternatively, polynucleotides and oligonucleotides of the ter together or in the same clade (a group of similar genes) invention can be assembled from fragments produced by when a gene family phylogeny is analyzed using programs Solid-phase synthesis methods. Typically, fragments of up to 65 such as CLUSTAL (Thompson et al. (1994) Nucleic Acids approximately 100 bases are individually synthesized and Res. 22:4673-4680; Higgins et al. (1996) Methods Enzymol. then enzymatically or chemically ligated to produce a desired 266: 383-402). Groups of similar genes can also be identified US 8,809,630 B2 41 42 with pair-wise BLAST analysis (Feng and Doolittle (1987).J. 75:519-530; Linet al. (1991) Nature 353: 569-571; Sadowski Mol. Evol. 25:351-360). For example, a clade of very similar et al. (1988) Nature 335:563-564).et al. Plants are no excep MADS domain transcription factors from Arabidopsis all tion to this observation; diverse plant species possess tran share a common function in flowering time (Ratcliffe et al. Scription factors that have similar sequences and functions. (2001) Plant Physiol. 126: 122-132), and a group of very 5 Orthologous genes from different organisms have highly similar AP2 domain transcription factors from Arabidopsis conserved functions, and very often essentially identical are involved in tolerance of plants to freezing (Gilmour et al. functions (Lee etal. (2002) Genome Res. 12:493-502; Remm (1998) Plant J. 16: 433-442). Analysis of groups of similar etal. (2001).J. Mol. Biol. 314: 1041-1052). Paralogous genes, genes with similar function that fall within one clade can yield which have diverged through gene duplication, may retain Sub-sequences that are particular to the clade. These Sub 10 similar functions of the encoded proteins. In Such cases, para sequences, known as consensus sequences, can not only be logs can be used interchangeably with respect to certain used to define the sequences within each clade, but define the embodiments of the instant invention (for example, trans functions of these genes; genes within a clade may contain genic expression of a coding sequence). An example of Such paralogous sequences, or orthologous sequences that share highly related paralogs is the CBF family, with three well the same function (see also, for example, Mount (2001), in 15 defined members in Arabidopsis and at least one ortholog in Bioinformatics. Sequence and Genome Analysis, Cold Brassica napus (SEQID NOs: 44, 46, 48, and 1941, respec Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., tively), all of which control pathways involved in both freez page 543.) ing and drought stress (Gilmour et al. (1998) Plant J. 16: Speciation, the production of new species from a parental 433-442; Jaglo et al. (1998) Plant Physiol. 127: 910-917). species, can also give rise to two or more genes with similar The following references represent a small sampling of the sequence and similar function. These genes, termed many studies that demonstrate that conserved transcription orthologs, often have an identical function within their host factor genes from diverse species are likely to function simi plants and are often interchangeable between species without larly (i.e., regulate similar target sequences and control the losing function. Because plants have common ancestors, same traits), and that transcription factors may be trans many genes in any plant species will have a corresponding 25 formed into diverse species to confer or improve traits. orthologous gene in another plant species. Once a phylogenic (1) The Arabidopsis NPR1 gene regulates systemic tree for a gene family of one species has been constructed acquired resistance (SAR); over-expression of NPR1 leads to using a program such as CLUSTAL (Thompson et al. (1994) enhanced resistance in Arabidopsis. When either Arabidopsis Nucleic Acids Res. 22: 4673-4680; Higgins et al. (1996) NPR1 or the rice NPR1 ortholog was overexpressed in rice Supra) potential orthologous sequences can be placed into the 30 (which, as a monocot, is diverse from Arabidopsis), challenge phylogenetic tree and their relationship to genes from the with the rice bacterial blight pathogen Xanthomonas Oryzae species of interest can be determined. Orthologous sequences pv. Oryzae, the transgenic plants displayed enhanced resis can also be identified by a reciprocal BLAST strategy. Once tance (Chern et al. (2001) Plant J. 27: 101-113). NPR1 acts an orthologous sequence has been identified, the function of through activation of expression of transcription factor genes, the ortholog can be deduced from the identified function of 35 such as TGA2 (Fan and Dong (2002) Plant Cell 14: 1377 the reference sequence. 1389). Results from alignment analysis using SSERCH of 2,079 (2) E2F genes are involved in transcription of plant genes protein domains have shown that alignment of two unrelated for proliferating cell nuclear antigen (PCNA). Plant E2Fs polypeptide sequences have an upper threshold of percentage share a high degree of similarity in amino acid sequence amino acid residue identity and that this percentage identity 40 between monocots and dicots, and are even similar to the threshold decreases with the length of the alignment (Brenner conserved domains of the animal E2Fs. Such conservation et al. (1998) Proc. Natl. Acad. Sci. 95: 6073-6078). Brenner indicates a functional similarity between plant and animal Suggested that it is probably necessary for alignments to be at E2Fs. E2F transcription factors that regulate meristem devel least 70 residues in length before 40% identity is a reasonable opment act through common cis-elements, and regulate threshold, such that the likelihood of two sequences being 45 related (PCNA) genes (Kosugi and Ohashi, (2002) Plant J. related is not due to chance. Furthermore, Brenner et al. 29:45-59). (1998, supra) showed that two sequences which have 30% (3) The AB15 gene (abscisic acid (ABA) insensitive 5) identity is a reliable threshold for the database of 2,079 encodes a basic leucine Zipper factor required for ABA domains only for sequence alignments of at least 150 resi response in the seed and vegetative tissues. Co-transforma dues. Brenneretal. disclosed a chart which showed where the 50 tion experiments with AB15 cDNA constructs in rice proto percentage identity and alignment length limits of such unre plasts resulted in specific transactivation of the ABA-induc lated sequences cluster (see Brenner et al. (1998, supra), FIG. ible wheat, Arabidopsis, bean, and barley promoters. These 3, page 6075). From such a chart, an alignment of two results demonstrate that sequentially similar AB15 transcrip polypeptide sequences may be plotted to ascertain whether tion factors are key targets of a conserved ABA signaling they fall within or without the region of unrelated sequences. 55 pathway in diverse plants. (Gampala et al. (2001) J. Biol. For example, two polypeptide sequences which are aligned Chem. 277: 1689-1694). over 73 residues having at least 50% sequence identity are (4) Sequences of three Arabidopsis GAMYB-like genes above the threshold for which they might be considered unre were obtained on the basis of sequence similarity to GAMYB lated and therefore are, more likely than not, related to one genes from barley, rice, and L. temulentum. These three Ara another. 60 bidopsis genes were determined to encode transcription fac In addition, Bork has shown that there is a 70% accuracy tors (AtMYB33, AtMYB65, and AtMYB101) and could sub rate for bioinformatics-based predictions in general, and a stitute for a barley GAMYB and control alpha-amylase 90% accuracy rate for the prediction of functional features by expression (Gocal et al. (2001) Plant Physiol. 127: 1682 homology (Bork (2000) Genome Res. 10:398-400; Table Ion 1693). page 399). 65 (5) The floral control gene LEAFY from Arabidopsis can Transcription factor gene sequences are conserved across dramatically accelerate flowering in numerous dictoyledon diverse eukaryotic species lines (Goodrich et al. (1993) Cell ous plants. Constitutive expression of Arabidopsis LEAFY US 8,809,630 B2 43 44 also caused early flowering in transgenic rice (a monocot), encoded protein. Conserved domains within a transcription with a heading date that was 26-34 days earlier than that of factor family may exhibit a higher degree of sequence homol wild-type plants. These observations indicate that floral regu ogy. Such as at least 65% amino acid sequence identity includ latory genes from Arabidopsis are useful tools for heading ing conservative substitutions, and preferably at least 80% date improvement in cereal crops (He et al. (2000) Transgenic sequence identity, and more preferably at least 85%, or at Res. 9: 223-227). least about 86%, or at least about 87%, or at least about 88%, (6) Bioactive gibberellins (GAs) are essential endogenous or at least about 90%, or at least about 95%, or at least about regulators of plant growth. GA signaling tends to be con 98% sequence identity. Transcription factors that are homolo served across the plant kingdom. GA signaling is mediated gous to the listed sequences should share at least 30%, or at via GAI, a nuclear member of the GRAS family of plant 10 transcription factors. Arabidopsis GAI has been shown to least about 60%, or at least about 75%, or at least about 80%, function in rice to inhibit gibberellin response pathways (Fu or at least about 90%, or at least about 95% amino acid et al. (2001) Plant Cell 13: 1791-1802). sequence identity over the entire length of the polypeptide or (7) The Arabidopsis gene SUPERMAN(SUP), encodes a the homolog. putative transcription factor that maintains the boundary 15 Percent identity can be determined electronically, e.g., by between stamens and carpels. By over-expressing Arabidop using the MEGALIGN program (DNASTAR, Inc. Madison, sis SUP in rice, the effect of the gene’s presence on whorl Wis.). The MEGALIGN program can create alignments boundaries was shown to be conserved. This demonstrated between two or more sequences according to different meth that SUP is a conserved regulator of floral whorl boundaries ods, for example, the clustal method. (See, for example, Hig and affects cell proliferation (Nandi et al. (2000) Curr. Biol. gins and Sharp (1988) Gene 73:237-244.) The clustal algo 10: 215-218). rithm groups sequences into clusters by examining the (8) Maize, petunia and Arabidopsis myb transcription fac distances between all pairs. The clusters are aligned pairwise tors that regulate flavonoid biosynthesis are very genetically and then in groups. Other alignment algorithms or programs similar and affect the same trait in their native species, there may be used, including FASTA, BLAST, or ENTREZ, fore sequence and function of these myb transcription factors 25 FASTA and BLAST, and which may be used to calculate correlate with each other in these diverse species (Borevitz, et percent similarity. These are available as a part of the GCG al. (2000) Plant Cell 12:2383-2394). sequence analysis package (University of Wisconsin, Madi (9) Wheat reduced height-1 (Rht-B1/Rht-D1) and maize son, Wis.), and can be used with or without default settings. dwarf-8 (d8) genes are orthologs of the Arabidopsis gibber ENTREZ is available through the National Center for Bio ellin insensitive (GAI) gene. Both of these genes have been 30 technology Information. In one embodiment, the percent used to produce dwarf grain varieties that have improved identity of two sequences can be determined by the GCG grain yield. These genes encode proteins that resemble program with a gap weight of I, e.g., each amino acid gap is nuclear transcription factors and contain an SH2-like domain, weighted as if it were a single amino acid or nucleotide indicating that phosphotyrosine may participate in gibberel mismatch between the two sequences (see U.S. Pat. No. lin signaling. Transgenic rice plants containing a mutant GAI 35 6,262,333). allele from Arabidopsis have been shown to produce reduced Other techniques for alignment are described in Doolittle, responses to gibberellin and are dwarfed, indicating that R. F. (1996) Methods in Enzymology: Computer Methods for mutant GAI orthologs could be used to increase yield in a Macromolecular Sequence Analysis, Vol. 266, Academic wide range of crop species (Peng et al. (1999) Nature 400: Press, Orlando, Fla., USA. Preferably, an alignment program 256-261). 40 that permits gaps in the sequence is utilized to align the Transcription factors that are homologous to the listed sequences. The Smith-Waterman is one type of algorithm that sequences will typically share, in at least one conserved permits gaps in sequence alignments (see Shpaer (1997) domain, at least about 70% amino acid sequence identity, and Methods Mol. Biol. 70: 173-187). Also, the GAP program with regard to Zinc finger transcription factors, at least about using the Needleman and Wunsch alignment method can be 50% amino acid sequence identity. More closely related tran 45 utilized to align sequences. An alternative search strategy scription factors can share at least about 70%, or about 75% or uses MPSRCH software, which runs on a MASPAR com about 80% or about 90% or about 95% or about 98% or more puter. MPSRCH uses a Smith-Waterman algorithm to score sequence identity with the listed sequences, or with the listed sequences on a massively parallel computer. This approach sequences but excluding or outside a known consensus improves ability to pick up distantly related matches, and is sequence or consensus DNA-binding site, or with the listed 50 especially tolerant of Small gaps and nucleotide sequence sequences excluding one or all conserved domain. Factors errors. Nucleic acid-encoded amino acid sequences can be that are most closely related to the listed sequences share, e.g., used to search both protein and DNA databases. at least about 85%, about 90% or about 95% or more 96 The percentage similarity between two polypeptide sequence identity to the listed sequences, or to the listed sequences, e.g., sequence A and sequence B, is calculated by sequences but excluding or outside a known consensus 55 dividing the length of sequence A, minus the number of gap sequence or consensus DNA-binding site or outside one or all residues in sequence A, minus the number of gap residues in conserved domain. At the nucleotide level, the sequences will sequence B, into the Sum of the residue matches between typically share at least about 40% nucleotide sequence iden sequence A and sequence B, times one hundred. Gaps of low tity, preferably at least about 50%, about 60%, about 70% or or of no similarity between the two amino acid sequences are about 80% sequence identity, and more preferably about 60 not included in determining percentage similarity. Percent 85%, about 90%, about 95% or about 97% or more sequence identity between polynucleotide sequences can also be identity to one or more of the listed sequences, or to a listed counted or calculated by other methods known in the art, e.g., sequence but excluding or outside a known consensus the Jotun Hein method. (See, e.g., Hein (1990) Methods Enzy sequence or consensus DNA-binding site, or outside one or mol. 183: 626-645.) Identity between sequences can also be all conserved domain. The degeneracy of the genetic code 65 determined by other methods known in the art, e.g., by vary enables major variations in the nucleotide sequence of a poly ing hybridization conditions (see US Patent Application No. nucleotide while maintaining the amino acid sequence of the 20010010913). US 8,809,630 B2 45 46 The percent identity between two conserved domains of a Transcription factor-encoding cDNA is then isolated using, transcription factor DNA-binding domain consensus for example, PCR, using primers designed from a presently polypeptide sequence can be as low as 16%, as exemplified in disclosed transcription factor gene sequence, or by probing the case of GATA1 family of eukaryotic Cys/Cys-type Zinc with a partial or complete cDNA or with one or more sets of finger transcription factors. The DNA-binding domain con degenerate probes based on the disclosed sequences. The sensus polypeptide sequence of the GATA1 family is cDNA library may be used to transform plant cells. Expres CXCX,CX-C, where X is any amino acid residue. (See, for sion of the cDNAs of interest is detected using, for example, example, Takatsuji, Supra.) Other examples of Such con methods disclosed herein such as microarrays, Northern served consensus polypeptide sequences with low overall blots, quantitative PCR, or any other technique for monitor percent sequence identity are well known to those of skill in 10 ing changes in expression. Genomic clones may be isolated the art. using similar techniques to those. Thus, the invention provides methods for identifying a Identifying Polynucleotides or Nucleic Acids by Hybridiza sequence similar or paralogous or orthologous or homolo tion gous to one or more polynucleotides as noted herein, or one or 15 Polynucleotides homologous to the sequences illustrated more target polypeptides encoded by the polynucleotides, or in the Sequence Listing and tables can be identified, e.g., by otherwise noted herein and may include linking or associat ing a given plant phenotype or gene function with a sequence. hybridization to each other under Stringent or under highly In the methods, a sequence database is provided (locally or stringent conditions. Single stranded polynucleotides hybrid across an internet or intranet) and a query is made against the ize when they associate based on a variety of well character sequence database using the relevant sequences herein and ized physical-chemical forces, such as hydrogen bonding, associated plant phenotypes or gene functions. Solvent exclusion, base stacking and the like. The stringency In addition, one or more polynucleotide sequences or one or more polypeptides encoded by the polynucleotide of a hybridization reflects the degree of sequence identity of sequences may be used to search against a BLOCKS (Bairoch 25 the nucleic acids involved, such that the higher the Stringency, et al. (1997) Nucleic Acids Res. 25: 217-221), PFAM, and the more similar are the two polynucleotide strands. Strin other databases which contain previously identified and gency is influenced by a variety of factors, including tempera annotated motifs, sequences and gene functions. Methods that search for primary sequence patterns with secondary ture, salt concentration and composition, organic and non in both the structure gap penalties (Smith et al. (1992) Protein Engineer 30 organic additives, solvents, etc. present ing 5: 35-51) as well as algorithms such as Basic Local hybridization and wash solutions and incubations (and num Alignment Search Tool (BLAST: Altschul (1993) J. Mol. berthereof), as described in more detail in the references cited Evol. 36: 290-300; Altschul et al. (1990) supra), BLOCKS above. (Henikoff and Henikoff (1991) Nucleic Acids Res. 19: 6565 6572), Hidden Markov Models (HMM; Eddy (1996) Curr: 35 Encompassed by the invention are polynucleotide Opin. Str. Biol. 6: 361-365: Sonnhammer et al. (1997) Pro sequences that are capable of hybridizing to the claimed teins 28: 405-420), and the like, can be used to manipulate and polynucleotide sequences, including any of the transcription analyze polynucleotide and polypeptide sequences encoded factor polynucleotides within the Sequence Listing, and frag by polynucleotides. These databases, algorithms and other ments thereofunder various conditions of stringency (See, for methods are well known in the art and are described in 40 example, Wahl and Berger (1987) Methods Enzymol. 152: Ausubel et al. (1997: Short Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y., unit 7.7) and in Meyers 399-407; and Kimmel (1987) Methods Enzymol. 152: 507 (1995; Molecular Biology and Biotechnology, Wiley VCH, 511). In addition to the nucleotide sequences listed in Tables New York, N.Y., p856-853). 4 and 5, full length cDNA, orthologs, and paralogs of the Furthermore, methods using manual alignment of 45 present nucleotide sequences may be identified and isolated sequences similar or homologous to one or more polynucle using well-known methods. The cDNA libraries, orthologs, otide sequences or one or more polypeptides encoded by the and paralogs of the present nucleotide sequences may be polynucleotide sequences may be used to identify regions of screened using hybridization methods to determine their util similarity and conserved domains. Such manual methods are ity as hybridization target or amplification probes. well-known of those of skill in the art and can include, for 50 example, comparisons of tertiary structure between a With regard to hybridization, conditions that are highly polypeptide sequence encoded by a polynucleotide which stringent, and means for achieving them, are well known in comprises a known function with a polypeptide sequence the art. See, for example, Sambrook et al. (1989) “Molecular encoded by a polynucleotide sequence which has a function Cloning: A Laboratory Manual (2nd ed., Cold Spring Har not yet determined. Such examples of tertiary structure may 55 bor Laboratory); Berger and Kimmel, eds., (1987) “Guide to comprise predicted alpha helices, beta-sheets, amphipathic Molecular Cloning Techniques'. In Methods in Enzymology: helices, leucine Zipper motifs, Zinc finger motifs, proline-rich 152: 467-469; and Anderson and Young (1985) “Quantitative regions, cysteine repeat motifs, and the like. Filter Hybridisation.” In: Hames and Higgins, ed., Nucleic Orthologs and paralogs of presently disclosed transcrip Acid Hybridisation, A Practical Approach. Oxford, IRL tion factors may be cloned using compositions provided by 60 Press, 73-111. the present invention according to methods well known in the Stability of DNA duplexes is affected by such factors as art. cDNAs can be cloned using mRNA from a plant cell or base composition, length, and degree of base pair mismatch. tissue that expresses one of the present transcription factors. Hybridization conditions may be adjusted to allow DNAs of Appropriate mRNA sources may be identified by interrogat different sequence relatedness to hybridize. The melting tem ing Northern blots with probes designed from the present 65 perature (T,) is defined as the temperature when 50% of the transcription factor sequences, after which a library is pre duplex molecules have dissociated into their constituent pared from the mRNA obtained from a positive cell or tissue. single strands. The melting temperature of a perfectly US 8,809,630 B2 47 48 matched duplex, where the hybridization buffer contains for nucleic acids that have more than 100 complementary resi mamide as a denaturing agent, may be estimated by the fol dues is about 5° C. to 20° C. lower than the thermal melting lowing equations: point (T) for the specific sequence at a defined ionic strength and pH. Conditions used for hybridization may include about (I)DNA-DNA: 0.02 M to about 0.15M sodium chloride, about 0.5% to about T(C.)=81.5+16.6(log Na+I)+0.41 (% G+C)- 5% casein, about 0.02% SDS or about 0.1% N-laurylsar 0.62(% formamide)-500/L cosine, about 0.001 M to about 0.03 M sodium citrate, at hybridization temperatures between about 50° C. and about (II) DNA-RNA: 70° C. More preferably, high stringency conditions are about 10 0.02 M sodium chloride, about 0.5% casein, about 0.02% SDS, about 0.001M sodium citrate, at a temperature of about 50° C. Nucleic acid molecules that hybridize under stringent conditions will typically hybridize to a probe based on either (III)RNA-RNA: the entire DNA molecule or selected portions, e.g., to a 15 unique Subsequence, of the DNA. Stringent salt concentration will ordinarily be less than about 750 mM NaCl and 75 mM trisodium citrate. Increas where L is the length of the duplex formed, Na+ is the ingly stringent conditions may be obtained with less than molar concentration of the sodium ion in the hybridization or about 500 mM. NaCl and 50 mM trisodium citrate, to even washing solution, and % G+C is the percentage of (guanine-- greater stringency with less than about 250 mM NaCl and 25 cytosine) bases in the hybrid. For imperfectly matched mM trisodium citrate. Low stringency hybridization can be hybrids, approximately 1°C. is required to reduce the melting obtained in the absence of organic solvent, e.g., formamide, temperature for each 1% mismatch. whereas high Stringency hybridization may be obtained in the Hybridization experiments are generally conducted in a presence of at least about 35% formamide, and more prefer buffer of pH between 6.8 to 7.4, although the rate of hybrid 25 ably at least about 50% formamide. Stringent temperature ization is nearly independent of pH at ionic strengths likely to conditions will ordinarily include temperatures of at least be used in the hybridization buffer (Anderson et al. (1985) about 30° C., more preferably of at least about 37° C., and Supra). In addition, one or more of the following may be used most preferably of at least about 42° C. with formamide to reduce non-specific hybridization: Sonicated Salmon sperm present. Varying additional parameters, such as hybridization DNA or another non-complementary DNA, bovine serum 30 time, the concentration of detergent, e.g., sodium dodecyl albumin, Sodium pyrophosphate, Sodium dodecylsulfate sulfate (SDS) and ionic strength, are well known to those (SDS), polyvinyl-pyrrolidone, ficolland Denhardt’s solution. skilled in the art. Various levels of stringency are accom Dextran sulfate and polyethylene glycol 6000 act to exclude plished by combining these various conditions as needed. DNA from solution, thus raising the effective probe DNA The washing steps that follow hybridization may also vary concentration and the hybridization signal within a given unit 35 in stringency; the post-hybridization wash steps primarily of time. In some instances, conditions of even greater Strin determine hybridization specificity, with the most critical gency may be desirable or required to reduce non-specific factors being temperature and the ionic strength of the final and/or background hybridization. These conditions may be wash Solution. Wash stringency can be increased by decreas created with the use of higher temperature, lower ionic ing salt concentration or by increasing temperature. Stringent strength and higher concentration of a denaturing agent Such 40 salt concentration for the wash steps will preferably be less as formamide. thanabout 30 mMNaCl and 3 mMtrisodium citrate, and most Stringency conditions can be adjusted to screen for mod preferably less than about 15 mM NaCl and 1.5 mM triso erately similar fragments such as homologous sequences dium citrate. from distantly related organisms, or to highly similar frag Thus, hybridization and wash conditions that may be used ments such as genes that duplicate functional enzymes from 45 to bind and remove polynucleotides with less than the desired closely related organisms. The stringency can be adjusted homology to the nucleic acid sequences or their complements either during the hybridization step or in the post-hybridiza that encode the present transcription factors include, for tion washes. Salt concentration, formamide concentration, example: hybridization temperature and probe lengths are variables 6xSSC at 65° C.: that can be used to alter stringency (as described by the 50 50% formamide, 4xSSC at 42°C.; or formula above). As a general guidelines high Stringency is 0.5xSSC, 0.1% SDS at 65° C.; typically performed at T-5° C. to T-20° C. moderate with, for example, two wash steps of 10-30 minutes each. stringency at T-20°C. to T-35°C. and low stringency at Useful variations on these conditions will be readily apparent T-35°C. to T-50° C. for duplex >150 base pairs. Hybrid to those skilled in the art. ization may be performed at low to moderate stringency (25 55 A person of skill in the art would not expect substantial 50° C. below T), followed by post-hybridization washes at variation among polynucleotide species encompassed within increasing stringencies. Maximum rates of hybridization in the scope of the present invention because the highly stringent solution are determined empirically to occurat T-25°C. for conditions set forth in the above formulae yield structurally DNA-DNA duplex and T-15° C. for RNA-DNA duplex. similar polynucleotides. Optionally, the degree of dissociation may be assessed after 60 If desired, one may employ wash steps of even greater each wash step to determine the need for Subsequent, higher stringency, including about 0.2xSSC, 0.1% SDS at 65° C. and stringency wash steps. washing twice, each wash step being about 30 min, or about High stringency conditions may be used to select for 0.1xSSC, 0.1% SDS at 65° C. and washing twice for 30 min. nucleic acid sequences with high degrees of identity to the The temperature for the wash solutions will ordinarily be at disclosed sequences. An example of stringent hybridization 65 least about 25°C., and for greater stringency at least about 42 conditions obtained in a filter-based method such as a South C. Hybridization stringency may be increased further by ern or northern blot for hybridization of complementary using the same conditions as in the hybridization steps, with US 8,809,630 B2 49 50 the wash temperature raised about 3° C. to about 5°C., and NO: 1962, SEQ ID NO: 1964, SEQ ID NO: 1966, SEQ ID stringency may be increased even further by using the same NO: 1968, SEQID NO: 1970, SEQID NO: 1972, and frag conditions except the wash temperature is raised about 6°C. ments thereof under various conditions of stringency. (See, to about 9° C. For identification of less closely related e.g., Wahl and Berger (1987) Methods Enzymol. 152:399 homolog, wash steps may be performed at a lower tempera 407; Kimmel (1987) Methods Enzymol. 152:507-511.) Esti ture, e.g., 50° C. mates of homology are provided by either DNA-DNA or An example of a low stringency wash step employs a DNA-RNA hybridization under conditions of stringency as is solution and conditions of at least 25°C. in 30 mM NaCl, 3 well understood by those skilled in the art (Hames and Hig mM trisodium citrate, and 0.1% SDS over 30 min. Greater gins, Eds. (1985) Nucleic Acid Hybridisation, IRL Press, stringency may be obtained at 42°C. in 15 mMNaCl, with 1.5 10 Oxford, U.K.). Stringency conditions can be adjusted to mM trisodium citrate, and 0.1% SDS over 30 min. Even screen for moderately similar fragments, such as homologous higher stringency wash conditions are obtained at 65° C.-68° sequences from distantly related organisms, to highly similar C. in a solution of 15 mMNaCl, 1.5 mMtrisodium citrate, and fragments, such as genes that duplicate functional enzymes 0.1% SDS. Wash procedures will generally employ at least from closely related organisms. Post-hybridization washes two final wash steps. Additional variations on these condi 15 determine stringency conditions. tions will be readily apparent to those skilled in the art (see, Sequence Variations for example, U.S. Patent Application No. 20010010913). It will readily be appreciated by those of skill in the art, that Stringency conditions can be selected Such that an oligo any of a variety of polynucleotide sequences are capable of nucleotide that is perfectly complementary to the coding oli encoding the transcription factors and transcription factor gonucleotide hybridizes to the coding oligonucleotide with at homolog polypeptides of the invention. Due to the degen least about a 5-10x higher signal to noise ratio than the ratio eracy of the genetic code, many different polynucleotides can for hybridization of the perfectly complementary oligonucle encode identical and/or Substantially similar polypeptides in otide to a nucleic acid encoding a transcription factor known addition to those sequences illustrated in the Sequence List as of the filing date of the application. It may be desirable to ing. Nucleic acids having a sequence that differs from the select conditions for a particular assay Such that a higher 25 sequences shown in the Sequence Listing, or complementary signal to noise ratio, that is, about 15x or more, is obtained. sequences, that encode functionally equivalent peptides (i.e., Accordingly, a subject nucleic acid will hybridize to a unique peptides having some degree of equivalent or similar biologi coding oligonucleotide with at least a 2x or greater signal to cal activity) but differ in sequence from the sequence shown noise ratio as compared to hybridization of the coding oligo in the Sequence Listing due to degeneracy in the genetic code, nucleotide to a nucleic acid encoding known polypeptide. 30 are also within the scope of the invention. The particular signal will depend on the label used in the Altered polynucleotide sequences encoding polypeptides relevantassay, e.g., a fluorescent label, a colorimetric label, a include those sequences with deletions, insertions, or substi radioactive label, or the like. Labeled hybridization or PCR tutions of different nucleotides, resulting in a polynucleotide probes for detecting related polynucleotide sequences may be encoding a polypeptide with at least one functional charac produced by oligolabeling, nick translation, end-labeling, or 35 teristic of the instant polypeptides. Included within this defi PCR amplification using a labeled nucleotide. nition are polymorphisms which may or may not be readily Identifying Polynucleotides or Nucleic Acids with Expres detectable using a particular oligonucleotide probe of the sion Libraries polynucleotide encoding the instant polypeptides, and In addition to hybridization methods, transcription factor improper or unexpected hybridization to allelic variants, with homolog polypeptides can be obtained by Screening an 40 a locus other than the normal chromosomal locus for the expression library using antibodies specific for one or more polynucleotide sequence encoding the instant polypeptides. transcription factors. With the provision herein of the dis Allelic variant refers to any of two or more alternative closed transcription factor, and transcription factor homolog forms of a gene occupying the same chromosomal locus. nucleic acid sequences, the encoded polypeptide(s) can be Allelic variation arises naturally through mutation, and may expressed and purified in a heterologous expression system 45 result in phenotypic polymorphism within populations. Gene (e.g., E. coli) and used to raise antibodies (monoclonal or mutations can be silent (i.e., no change in the encoded polyclonal) specific for the polypeptide(s) in question. Anti polypeptide) or may encode polypeptides having altered bodies can also be raised against Synthetic peptides derived amino acid sequence. The term allelic variant is also used from transcription factor, or transcription factor homolog, herein to denote a protein encoded by an allelic variant of a amino acid sequences. Methods of raising antibodies are well 50 gene. Splice variant refers to alternative forms of RNA tran known in the art and are described in Harlow and Lane scribed from a gene. Splice variation arises naturally through (1988), Antibodies: A Laboratory Manual, Cold Spring Har use of alternative splicing sites within a transcribed RNA bor Laboratory, New York. Such antibodies can then be used molecule, or less commonly between separately transcribed to screen an expression library produced from the plant from RNA molecules, and may result in several mRNAs tran which it is desired to clone additional transcription factor 55 scribed from the same gene. Splice variants may encode homologs, using the methods described above. The selected polypeptides having altered amino acid sequence. The term cDNAS can be confirmed by sequencing and enzymatic activ splice variant is also used herein to denote a protein encoded ity. by a splice variant of an mRNA transcribed from a gene. Encompassed by the invention are polynucleotide Those skilled in the art would recognize that, for example, sequences that are capable of hybridizing to the claimed 60 G682, SEQ ID NO: 468, represents a single transcription polynucleotide sequences, and, in particular, to those shown factor; allelic variation and alternative splicing may be in SEQIDNO:567, SEQID NO:943, SEQID NO:945, SEQ expected to occur. Allelic variants of SEQID NO: 467 can be ID NO:947, SEQID NO: 1734, SEQID NO: 1874, SEQID cloned by probing cDNA or genomic libraries from different NO: 1014, SEQ ID NO: 1970, SEQID NO: 1972, SEQ ID individual organisms according to standard procedures. NO: 1944, SEQ ID NO: 1946, SEQ ID NO: 1948, SEQ ID 65 Allelic variants of the DNA sequence shown in SEQID NO: NO: 1950, SEQ ID NO: 1952, SEQID NO: 1954, SEQ ID 467, including those containing silent mutations and those in NO: 1956, SEQ ID NO: 1958, SEQID NO: 1960, SEQ ID which mutations result in amino acid sequence changes, are US 8,809,630 B2 51 52 within the scope of the present invention, as are proteins of the above nucleotide sequences. Related nucleic acid mol which are allelic variants of SEQID NO: 468. cDNAs gen ecules also include nucleotide sequences encoding a polypep erated from alternatively spliced mRNAs, which retain the tide comprising or consisting essentially of a Substitution, properties of the transcription factor are included within the modification, addition and/or deletion of one or more amino Scope of the present invention, as are polypeptides encoded acid residues compared to the polypeptide as set forth in any by such cDNAs and mRNAs. Allelic variants and splice vari of SEQID NO: 2N, wherein N=1-SEQID NO: 2N, wherein ants of these sequences can be cloned by probing cDNA or N=1-480, SEQID NO: 2N-1, where N=857-970, or SEQID genomic libraries from different individual organisms or tis NO: 989, 990, 991, 1001, 1002, 1012, 1018, 1021, 1022, Sues according to standard procedures known in the art (see 1025, 1027, 1028, 1029, 1034, 1050, 1051, 1072, 1073, 1074, U.S. Pat. No. 6,388,064). 10 Those skilled in the art would recognize that G859, SEQID 1075, 1076, 1091, 1092, 1093, 1094, 1095, 1109, 1110, 1111, NOs: 567 and 568, represents a single MAF transcription 1112, 1150, 1165, 1166, 1167, 1168, 1169, 1189, 1190, 1191, factor, allelic variation and alternative splicing may be 1197, 1198, 1199, 1213, 1214, 1215, 1216, 1226, 1227, 1233, expected to occur. Allelic variants of SEQID NO:567 can be 1239, 1246, 1247, 1258, 1259, 1269, 1307, 1308, 1309, 1310, cloned by probing cDNA or genomic libraries from different 15 1323, 1329, 1330, 1331, 1332, 1338, 1339, 1340, 1361, 1362, individual organisms according to standard procedures. 1373, 1374, 1375, 1384, 1389, 1390, 1391, 1396, 1411, 1412, Allelic variants of the DNA sequence shown in SEQID NO: 1413, 1414, 1424, 1435, 1436, 1437, 1448, 1456, 1457, 1458, 567, including those containing silent mutations and those in 1459, 1460, 1472, 1483, 1484, 1500, 1508,1510, 1511, 1520, which mutations result in amino acid sequence changes, are 1538, 1539, 1540, 1541, 1542, 1543, 1563, 1564, 1565, 1566, within the scope of the present invention, as are proteins 1567, 1568, 1569, 1582, 1583, 1594, 1611, 1612, 1618, 1619, which are allelic variants of SEQID NO: 568. cDNAs gen 1620, 1626, 1627, 1635, 1636, 1640, 1641, 1655, 1656, 1657, erated from alternatively spliced mRNAs, which retain the 1658, 1672, 1673, 1680, 1682, 1686, 1687, 1688, 1689, 1696, properties of the MAF transcription factor are included 1702 or 1703. Such related polypeptides may comprise, for within the scope of the present invention, as are polypeptides example, additions and/or deletions of one or more N-linked encoded by such cDNAs and mRNAs. Allelic variants and 25 or O-linked glycosylation sites, or an addition and/or a dele splice variants of these sequences can be cloned by probing tion of one or more cysteine residues. cDNA or genomic libraries from different individual organ isms or tissues according to standard procedures known in the For example, Table I illustrates, e.g., that the codons AGC, art (see U.S. Pat. No. 6,388,064). AGT, TCA, TCC, TCG, and TCT all encode the same amino An example of allelic variants or alternatively spliced vari 30 acid: serine. Accordingly, at each position in the sequence ants of SEQ ID NO: 567 (G859) are SEQ ID NO: 1946 where there is a codon encoding serine, any of the above (G859.1), SEQID NO: 1944 (G859.3), SEQID NO: 1948 trinucleotide sequences can be used without altering the (G859.4), and SEQ ID NO: 1950 (G849.5). These variants encoded polypeptide. encode SEQ ID NOs: 1947, 1945, 1949, and 1951, respec tively, which are variants of SEQID NO: 568. 35 TABLE 1. Thus, in addition to the sequences set forth in the Sequence Listing and in Table 4, the invention also encompasses related Amino acid Possible Codons nucleic acid molecules that include allelic or splice variants of Alanine Ala A. GCA GCC GCG GCU SEQID NO:567, SEQID NO: 943, SEQID NO: 945, SEQ Cysteine Cys C TGC TGT ID NO:947, SEQID NO: 1734, SEQID NO: 1874, SEQID 40 NO: 1014, SEQ ID NO: 1970, SEQID NO: 1972, SEQ ID Aspartic acid Asp D GAC GAT NO: 1944, SEQ ID NO: 1946, SEQ ID NO: 1948, SEQ ID NO: 1950, SEQ ID NO: 1952, SEQID NO: 1954, SEQ ID Glutamic acid Glu E GAA GAG NO: 1956, SEQ ID NO: 1958, SEQID NO: 1960, SEQ ID Phenylalanine Phe F TTC TTT NO: 1962, SEQID NO: 1964, SEQID NO: 1966, or SEQID 45 NO: 1968, and include sequences which are complementary Glycine Gly G GGA. GGC GGG GGT to any of the above nucleotide sequences. Related nucleic acid molecules also include nucleotide sequences encoding a Histidine His H CAC CAT polypeptide comprising or consisting essentially of a Substi Isoleucine Ile I ATA ATC ATT tution, modification, addition and/or deletion of one or more 50 amino acid residues compared to the polypeptide as set forth Lysine AAG in any of SEQ ID NO: 568, SEQID NO: 944, SEQID NO: Leucine Le L TTA TTG CTA CTC CTG CTT 946, SEQIDNO:948, SEQID NO: 1735, SEQIDNO: 1875, SEQ ID NO: 1971, SEQID NO: 1973, SEQ ID NO: 1945, Methionine Met M ATG SEQ ID NO: 1947, SEQID NO: 1949, SEQ ID NO: 1951, 55 SEQ ID NO: 1953, SEQID NO: 1955, SEQ ID NO: 1957, Asparagine As n N AAC AAT SEQ ID NO: 1959, SEQID NO: 1961, SEQ ID NO: 1963, SEQID NO: 1965, SEQID NO: 1967, or SEQID NO: 1969. Proline Pro P CCA CCC CCG CCT Such related polypeptides may comprise, for example, addi Glutamine Glin Q CAA CAG tions and/or deletions of one or more N-linked or O-linked 60 glycosylation sites, or an addition and/or a deletion of one or Arginine Arg R AGA AGG CGA CGC CGG CGT more cysteine residues. Thus, in addition to the transcription factor sequences set Serine Ser S AGC AGT TCA TCC TCG TCT forth in the Sequence Listing, the invention also encompasses Threonine Thir T ACA ACC ACG ACT related nucleic acid molecules that include allelic or splice 65 variants of the transcription factor sequences in the Sequence Waline Wal W GTA GTC GTG GTT Listing, and include sequences that are complementary to any US 8,809,630 B2 53 54 TABLE 1 - continued TABLE 2-continued

Amino acid Possible Codons Conservative Residue Substitutions Tryptophan Trp W TGG Ser Thr; Gly Thr Ser; Val Tyrosine Tyr Y TAC TAT Trp Tyr Tyr Trp; Phe Wal Ile: Leu Sequence alterations that do not change the amino acid sequence encoded by the polynucleotide are termed “silent” 10 variations. With the exception of the codons ATG and TGG. Similar substitutions are those in which at least one residue encoding methionine and tryptophan, respectively, any of the in the amino acid sequence has been removed and a different possible codons for the same amino acid can be substituted by residue inserted in its place. Such Substitutions generally are a variety of techniques, e.g., site-directed mutagenesis, avail made in accordance with the Table 3 when it is desired to able in the art. Accordingly, any and all Such variations of a 15 maintain the activity of the protein. Table 3 shows amino sequence selected from the above table are a feature of the acids which can be substituted for an amino acid in a protein invention. and which are typically regarded as structural and functional In addition to silent variations, other conservative varia substitutions. For example, a residue in column 1 of Table 3 tions that alter one, or a few amino acids in the encoded may be substituted with a residue in column 2; in addition, a polypeptide, can be made without altering the function of the residue in column 2 of Table 3 may be substituted with the polypeptide, these conservative variants are, likewise, a fea residue of column 1. ture of the invention. For example, Substitutions, deletions and insertions intro TABLE 3 duced into the sequences provided in the Sequence Listing, Residue Similar Substitutions are also envisioned by the invention. Such sequence modifi 25 cations can be engineered into a sequence by site-directed Ala Ser: Thr; Gly; Val; Leu: Ile Arg Lys; His; Gly mutagenesis (Wu (ed.) Methods Enzymol. (1993) vol. 217, ASn Gln: His: Gly: Ser: Thr Academic Press) or the other methods noted below. Amino Asp Glu, Ser: Thr acid Substitutions are typically of single residues; insertions Gln ASn; Ala 30 Cys Ser: Gly usually will be on the order of about from 1 to 10 amino acid Glu Asp residues; and deletions will rangeabout from 1 to 30 residues. Gly Pro; Arg In preferred embodiments, deletions or insertions are made in His ASn; Gln: Tyr; Phe, Lys; Arg adjacent pairs, e.g., a deletion of two residues or insertion of Ile Ala; Leu; Val; Gly; Met Leu Ala: Ile:Val; Gly; Met two residues. Substitutions, deletions, insertions or any com 35 Lys Arg; His; Glin; Gly; Pro bination thereof can be combined to arrive at a sequence. The Met Leu: Ile: Phe mutations that are made in the polynucleotide encoding the Phe Met; Leu: Tyr; Trp: His; Val; Ala transcription factor should not place the sequence out of Ser Thr; Gly; Asp; Ala; Val; Ile: His reading frame and should not create complementary regions Thr Ser; Val; Ala; Gly Trp Tyr; Phe: His that could produce secondary mRNA structure. Preferably, 40 Tyr Trp; Phe: His the polypeptide encoded by the DNA performs the desired Wall Ala: Ile; Leu; Gly: Thr; Ser; Glu function. Conservative substitutions are those in which at least one residue in the amino acid sequence has been removed and a Substitutions that are less conservative than those in Table different residue inserted in its place. Such substitutions gen 2 can be selected by picking residues that differ more signifi erally are made in accordance with the Table 2 when it is 45 cantly in their effect on maintaining (a) the structure of the desired to maintain the activity of the protein. Table 2 shows polypeptide backbone in the area of the substitution, for amino acids which can be substituted for an amino acid in a example, as a sheet or helical conformation, (b) the charge or protein and which are typically regarded as conservative Sub hydrophobicity of the molecule at the target site, or (c) the stitutions. bulk of the side chain. The substitutions which in general are 50 expected to produce the greatest changes in protein properties TABLE 2 will be those in which (a) a hydrophilic residue, e.g., seryl or threonyl, is substituted for (orby) a hydrophobic residue, e.g., Conservative Residue Substitutions leucyl, isoleucyl phenylalanyl, Valyl or alanyl; (b) a cysteine 55 or proline is substituted for (or by) any other residue; (c) a Ala Ser Arg Lys residue having an electropositive side chain, e.g., lysyl, argi ASn Gln: His nyl, or histidyl, is substituted for (or by) an electronegative Asp Glu residue, e.g., glutamyl or aspartyl; or (d) a residue having a Gln ASn bulky side chain, e.g., phenylalanine, is Substituted for (orby) Cys Ser Glu Asp 60 one not having a side chain, e.g., glycine. Gly Pro Further Modifying Sequences of the Invention—Mutation/ His ASn; Glin Forced Evolution Ile Leu, Val In addition to generating silent or conservative Substitu Leu Ile: Val Lys Arg: Gln tions as noted, above, the present invention optionally Met Leu: Ile 65 includes methods of modifying the sequences of the Phe Met; Leu: Tyr Sequence Listing. In the methods, nucleic acid or protein modification methods are used to alter the given sequences to US 8,809,630 B2 55 56 produce new sequences and/or to chemically or enzymati activation domain. A transcription activation domain assists cally modify given sequences to change the properties of the in initiating transcription from a DNA-binding site. Examples nucleic acids or proteins. include the transcription activation region of VP16 or GAL4 Thus, in one embodiment, given nucleic acid sequences are (Moore et al. (1998) Proc. Natl. Acad. Sci. 95: 376-381: modified, e.g., according to standard mutagenesis or artificial Aoyama et al. (1995) Plant Cell 7:1773-1785), peptides evolution methods to produce modified sequences. The modi derived from bacterial sequences (Ma and Ptashne (1987) fied sequences may be created using purified natural poly Cell 51: 113-119) and synthetic peptides (Giniger and nucleotides isolated from any organism or may be synthe Ptashne (1987) Nature 330: 670-672). sized from purified compositions and chemicals using Expression and Modification of Polypeptides chemical means well know to those of skill in the art. For 10 example, Ausubel, Supra, provides additional details on Typically, polynucleotide sequences of the invention are mutagenesis methods. Artificial forced evolution methods are incorporated into recombinant DNA (or RNA) molecules that described, for example, by Stemmer (1994) Nature 370: 389 direct expression of polypeptides of the invention in appro 391, Stemmer (1994) Proc. Natl. Acad. Sci. 91: 10747-10751, priate host cells, transgenic plants, in vitro translation sys and U.S. Pat. Nos. 5,811,238, 5,837,500, and 6.242,568. 15 tems, or the like. Due to the inherent degeneracy of the genetic Methods for engineering synthetic transcription factors and code, nucleic acid sequences which encode Substantially the other polypeptides are described, for example, by Zhanget al. same or a functionally equivalent amino acid sequence can be (2000).J. Biol. Chem. 275: 33.850-33860, Liu et al. (2001).J. Substituted for any listed sequence to provide for cloning and Biol. Chem. 276: 11323-11334, and Isalan et al. (2001) expressing the relevant homolog. Nature Biotechnol. 19: 656-660. Many other mutation and The transgenic plants of the present invention comprising evolution methods are also available and expected to be recombinant polynucleotide sequences are generally derived within the skill of the practitioner. from parental plants, which may themselves be non-trans Similarly, chemical or enzymatic alteration of expressed formed (or non-transgenic) plants. These transgenic plants nucleic acids and polypeptides can be performed by standard may eitherhave a transcription factor gene"knocked out' (for methods. For example, sequence can be modified by addition 25 example, with a genomic insertion by homologous recombi of lipids, Sugars, peptides, organic or inorganic compounds, nation, an antisense or ribozyme construct) or expressed to a by the inclusion of modified nucleotides or amino acids, or normal or wild-type extent. However, overexpressing trans the like. For example, protein modification techniques are genic “progeny' plants will exhibit greater mRNA levels, illustrated in Ausubel, supra. Further details on chemical and wherein the mRNA encodes a transcription factor, that is, a enzymatic modifications can be found herein. These modifi 30 DNA-binding protein that is capable of binding to a DNA cation methods can be used to modify any given sequence, or regulatory sequence and inducing transcription, and prefer to modify any sequence produced by the various mutation and ably, expression of a plant trait gene. Preferably, the mRNA artificial evolution modification methods noted herein. expression level will be at least three-fold greater than that of Accordingly, the invention provides for modification of the parental plant, or more preferably at least ten-fold greater any given nucleic acid by mutation, evolution, chemical or 35 mRNA levels compared to said parental plant, and most pref enzymatic modification, or other available methods, as well erably at least fifty-fold greater compared to said parental as for the products produced by practicing such methods, e.g., plant. using the sequences herein as a starting Substrate for the Vectors, Promoters, and Expression Systems various modification approaches. The present invention includes recombinant constructs For example, optimized coding sequence containing 40 comprising one or more of the nucleic acid sequences herein. codons preferred by a particular prokaryotic or eukaryotic The constructs typically comprise a vector, such as a plasmid, host can be used e.g., to increase the rate of translation or to a cosmid, a phage, a virus (e.g., a plant virus), a bacterial produce recombinant RNA transcripts having desirable prop artificial chromosome (BAC), a yeast artificial chromosome erties, such as a longer half-life, as compared with transcripts (YAC), or the like, into which a nucleic acid sequence of the produced using a non-optimized sequence. Translation stop 45 invention has been inserted, in a forward or reverse orienta codons can also be modified to reflect host preference. For tion. In a preferred aspect of this embodiment, the construct example, preferred stop codons for Saccharomyces cerevi further comprises regulatory sequences, including, for siae and mammals are TAA and TGA, respectively. The pre example, a promoter, operably linked to the sequence. Large ferred stop codon for monocotyledonous plants is TGA, numbers of suitable vectors and promoters are known to those whereas insects and E. coli prefer to use TAA as the stop 50 of skill in the art, and are commercially available. codon. General texts that describe molecular biological tech The polynucleotide sequences of the present invention can niques useful herein, including the use and production of also be engineered in order to alter a coding sequence for a vectors, promoters and many other relevant topics, include variety of reasons, including but not limited to, alterations Berger, Sambrook, Supra and Ausubel, Supra. Any of the which modify the sequence to facilitate cloning, processing 55 identified sequences can be incorporated into a cassette or and/or expression of the gene product. For example, alter vector, e.g., for expression in plants. A number of expression ations are optionally introduced using techniques which are vectors suitable for stable transformation of plant cells or for well known in the art, e.g., site-directed mutagenesis, to insert the establishment of transgenic plants have been described new restriction sites, to alterglycosylation patterns, to change including those described in Weissbach and Weissbach codon preference, to introduce splice sites, etc. 60 (1989) Methods for Plant Molecular Biology, Academic Furthermore, a fragment or domain derived from any of the Press, and Gelvin et al. (1990) Plant Molecular Biology polypeptides of the invention can be combined with domains Manual, Kluwer Academic Publishers. Specific examples derived from other transcription factors or synthetic domains include those derived from a T1 plasmid of Agrobacterium to modify the biological activity of a transcription factor. For tumefaciens, as well as those disclosed by Herrera-Estrella et instance, a DNA-binding domainderived from a transcription 65 al. (1983) Nature 303: 209, Bevan (1984) Nucleic Acids Res. factor of the invention can be combined with the activation 12: 8711-8721, Klee (1985) Bio/Technology 3: 637-642, for domain of another transcription factor or with a synthetic dicotyledonous plants. US 8,809,630 B2 57 58 Alternatively, non-Ti vectors can be used to transfer the (U.S. Pat. No. 4,943,674) and the tomato polygalacturonase DNA into monocotyledonous plants and cells by using free promoter (Bird et al. (1988) Plant Mol. Biol. 11: 651-662), DNA delivery techniques. Such methods can involve, for root-specific promoters, such as those disclosed in U.S. Pat. example, the use of liposomes, electroporation, microprojec Nos. 5,618,988, 5,837,848 and 5,905,186, pollen-active pro tile bombardment, silicon carbide whiskers, and viruses. By moters such as PTA29, PTA26 and PTA13 (U.S. Pat. No. using these methods transgenic plants such as wheat, rice 5,792.929), promoters active in vascular tissue (Ringli and (Christou (1991) Bio/Technology 9:957-962) and corn (Gor Keller (1998) Plant Mol. Biol. 37: 977-988), flower-specific don-Kamm (1990) Plant Cell 2: 603-618) can be produced. (Kaiser et al. (1995) Plant Mol. Biol. 28: 231-243), pollen An immature embryo can also be a good target tissue for (Baerson et al. (1994) Plant Mol. Biol. 26: 1947-1959), car monocots for direct DNA delivery techniques by using the 10 pels (Ohl et al. (1990) Plant Cell 2: 837-848), pollen and particle gun (Weeks et al. (1993) Plant Physiol. 102: 1077 ovules (Baerson et al. (1993) Plant Mol. Biol. 22:255-267), 1084; Vasil (1993) Bio/Technology 10: 667-674; Wan and auxin-inducible promoters (such as that described in Van der Lemeaux (1994) Plant Physiol. 104:37-48, and for Agrobac Kopetal. (1999) Plant Mol. Biol. 39:979-990 or Baumannet terium-mediated DNA transfer (Ishida et al. (1996) Nature al. (1999) Plant Cell 11: 323-334), cytokinin-inducible pro Biotechnol. 14: 745-750). 15 moter (Guevara-Garcia (1998) Plant Mol. Biol. 38: 743-753), Typically, plant transformation vectors include one or promoters responsive to gibberellin (Shi et al. (1998) Plant more cloned plant coding sequence (genomic or cDNA) Mol. Biol. 38: 1053-1060, Willmottet al. (1998)38: 817-825) under the transcriptional control of 5' and 3' regulatory and the like. Additional promoters are those that elicit expres sequences and a dominant selectable marker. Such plant sion in response to heat (Ainley et al. (1993) Plant Mol. Biol. transformation vectors typically also contain a promoter (e.g., 22: 13-23), light (e.g., the pearbcS-3A promoter, Kuhlemeier a regulatory region controlling inducible or constitutive, envi et al. (1989) Plant Cell 1: 471-478, and the maize rbcS pro ronmentally- or developmentally-regulated, or cell- or tissue moter, Schaffner and Sheen (1991) Plant Cell 3: 997-1012): specific expression), a transcription initiation start site, an wounding (e.g., wun I, Siebertz et al. (1989) Plant Cell 1: RNA processing signal (such as intron splice sites), a tran 961-968); pathogens (such as the PR-1 promoter described in Scription termination site, and/or a polyadenylation signal. 25 Buchel et al. (1999) Plant Mol. Biol. 40: 387-396, and the A potential utility for the transcription factor polynucle PDF1.2 promoter described in Manners et al. (1998) Plant otides disclosed herein is the isolation of promoter elements Mol. Biol. 38: 1071-1080), and chemicals such as methyl from these genes that can be used to program expression in jasmonate or salicylic acid (Gatz (1997) Annu. Rev. Plant plants of any genes. Each transcription factor gene disclosed Physiol. Plant Mol. Biol. 48: 89-108). In addition, the timing herein is expressed in a unique fashion, as determined by 30 of the expression can be controlled by using promoters such promoter elements located upstream of the start of transla as those acting at Senescence (Gan and Amasino (1995) Sci tion, and additionally within an intron of the transcription ence 270: 1986-1988); or late seed development (Odelletal. factor gene or downstream of the termination codon of the (1994) Plant Physiol. 106:447-458). gene. As is well known in the art, for a significant portion of Plant expression vectors can also include RNA processing genes, the promoter sequences are located entirely in the 35 signals that can be positioned within, upstream or down region directly upstream of the start of translation. In Such stream of the coding sequence. In addition, the expression cases, typically the promotersequences are located within 2.0 vectors can include additional regulatory sequences from the kb of the start of translation, or within 1.5 kb of the start of 3'-untranslated region of plant genes, e.g., a 3' terminator translation, frequently within 1.0kb of the start of translation, region to increase mRNA stability of the mRNA, such as the and sometimes within 0.5 kb of the start of translation. 40 PI-II terminator region of potato or the octopine or nopaline The promoter sequences can be isolated according to meth synthase 3' terminator regions. ods known to one skilled in the art. Additional Expression Elements Examples of constitutive plant promoters which can be Specific initiation signals can aid in efficient translation of useful for expressing the TF sequence include: the cauli coding sequences. These signals can include, e.g., the ATG flower mosaic virus (CaMV) 35S promoter, which confers 45 initiation codon and adjacent sequences. In cases where a constitutive, high-level expression in most plant tissues (see, coding sequence, its initiation codon and upstream sequences e.g., Odell et al. (1985) Nature 313: 810-812); the nopaline are inserted into the appropriate expression vector, no addi synthase promoter (An et al. (1988) Plant Physiol. 88: 547 tional translational control signals may be needed. However, 552); and the octopine synthase promoter (Fromm et al. in cases where only coding sequence (e.g., a mature protein (1989) Plant Cell 1:977-984). 50 coding sequence), or a portion thereof, is inserted, exogenous A variety of plant gene promoters that regulate gene transcriptional control signals including the ATG initiation expression in response to environmental, hormonal, chemi codon can be separately provided. The initiation codon is cal, developmental signals, and in a tissue-active manner can provided in the correct reading frame to facilitate transcrip be used for expression of a TF sequence in plants. Choice of tion. Exogenous transcriptional elements and initiation a promoter is based largely on the phenotype of interest and is 55 codons can be of various origins, both natural and synthetic. determined by Such factors as tissue (e.g., seed, fruit, root, The efficiency of expression can be enhanced by the inclusion pollen, vascular tissue, flower, carpel, etc.), inducibility (e.g., of enhancers appropriate to the cell System in use. in response to wounding, heat, cold, drought, light, patho Expression Hosts gens, etc.), timing, developmental stage, and the like. Numer The present invention also relates to host cells which are ous known promoters have been characterized and can favor 60 transduced with vectors of the invention, and the production ably be employed to promote expression of a polynucleotide of polypeptides of the invention (including fragments of the invention in a transgenic plant or cell of interest. For thereof) by recombinant techniques. Host cells are geneti example, tissue specific promoters include: seed-specific pro cally engineered (i.e., nucleic acids are introduced, e.g., trans moters (such as the napin, phaseolin or DC3 promoter duced, transformed or transfected) with the vectors of this described in U.S. Pat. No. 5,773.697), fruit-specific promot 65 invention, which may be, for example, a cloning vector or an ers that are active during fruit ripening (such as the dru 1 expression vector comprising the relevant nucleic acids promoter (U.S. Pat. No. 5,783,393), or the 2A11 promoter herein. The vector is optionally a plasmid, a viral particle, a US 8,809,630 B2 59 60 phage, a naked nucleic acid, etc. The engineered host cells can amino acids, etc. References adequate to guide one of skill in be cultured in conventional nutrient media modified as appro the modification of amino acid residues are replete through priate for activating promoters, selecting transformants, or out the literature. amplifying the relevant gene. The culture conditions, such as The modified amino acid residues may prevent or increase temperature, pH and the like, are those previously used with 5 affinity of the polypeptide for another molecule, including, the host cell selected for expression, and will be apparent to but not limited to, polynucleotide, proteins, carbohydrates, those skilled in the art and in the references cited herein, lipids and lipid derivatives, and other organic or synthetic including, Sambrook, Supra and Ausubel, Supra. compounds. The host cell can be a eukaryotic cell. Such as a yeast cell, Identification of Additional Factors 10 A transcription factor provided by the present invention or a plant cell, or the host cell can be a prokaryotic cell. Such can also be used to identify additional endogenous or exog as a bacterial cell. Plant protoplasts are also suitable for some enous molecules that can affect a phenotype or trait of inter applications. For example, the DNA fragments are introduced est. On the one hand, such molecules include organic (Small into plant tissues, cultured plant cells or plant protoplasts by or large molecules) and/or inorganic compounds that affect standard methods including electroporation (Fromm et al. 15 expression of (i.e., regulate) a particular transcription factor. (1985) Proc. Natl. Acad. Sci. 82: 5824-5828, infection by Alternatively, Such molecules include endogenous molecules viral vectors such as cauliflower mosaic virus (CaMV) (Hohn that are acted upon either at a transcriptional level by a tran et al. (1982) Molecular Biology of Plant Tumors Academic Scription factor of the invention to modify a phenotype as Press, NewYork, N.Y., pp. 549-560; U.S. Pat. No. 4,407,956), desired. For example, the transcription factors can be high velocity ballistic penetration by small particles with the employed to identify one or more downstream genes that are nucleic acid either within the matrix of small beads or par Subject to a regulatory effect of the transcription factor. In one ticles, or on the surface (Klein et al. (1987) Nature 327: approach, a transcription factor or transcription factor 70-73), use of pollen as vector (WO 85/01856), or use of homolog of the invention is expressed in a host cell, e.g., a Agrobacterium tumefaciens or A. rhizogenes carrying a transgenic plant cell, tissue or explant, and expression prod T-DNA plasmid in which DNA fragments are cloned. The 25 ucts, either RNA or protein, of likely or random targets are T-DNA plasmid is transmitted to plant cells upon infection by monitored, e.g., by hybridization to a microarray of nucleic Agrobacterium tumefaciens, and a portion is stably integrated acid probes corresponding to genes expressed in a tissue or into the plant genome (Horsch et al. (1984) Science 233: cell type of interest, by two-dimensional gel electrophoresis 496-498; Fraley et al. (1983) Proc. Natl. Acad. Sci. 80: 4803 of protein products, or by any other method known in the art 30 for assessing expression of gene products at the level of RNA 4807). or protein. Alternatively, a transcription factor of the inven The cell can include a nucleic acid of the invention that tion can be used to identify promoter sequences (such as encodes a polypeptide, wherein the cell expresses a polypep binding sites on DNA sequences) involved in the regulation of tide of the invention. The cell can also include vector a downstream target. After identifying a promoter sequence, sequences, or the like. Furthermore, cells and transgenic 35 interactions between the transcription factor and the promoter plants that include any polypeptide or nucleic acid above or sequence can be modified by changing specific nucleotides in throughout this specification, e.g., produced by transduction the promoter sequence or specific amino acids in the tran of a vector of the invention, are an additional feature of the Scription factor that interact with the promoter sequence to invention. altera plant trait. Typically, transcription factor DNA-binding For long-term, high-yield production of recombinant pro 40 sites are identified by gel shift assays. After identifying the teins, stable expression can be used. Host cells transformed promoter regions, the promoter region sequences can be with a nucleotide sequence encoding a polypeptide of the employed in double-stranded DNA arrays to identify mol invention are optionally cultured under conditions Suitable ecules that affect the interactions of the transcription factors for the expression and recovery of the encoded protein from with their promoters (Bulyket al. (1999) Nature Biotechnol. cell culture. The protein or fragment thereof produced by a 45 17:573-577). recombinant cell may be secreted, membrane-bound, or con The identified transcription factors are also useful to iden tained intracellularly, depending on the sequence and/or the tify proteins that modify the activity of the transcription fac vector used. As will be understood by those of skill in the art, tor. Such modification can occur by covalent modification, expression vectors containing polynucleotides encoding Such as by phosphorylation, or by protein-protein (homo or mature proteins of the invention can be designed with signal 50 -heteropolymer) interactions. Any method suitable for detect sequences which direct secretion of the mature polypeptides ing protein-protein interactions can be employed. Among the through a prokaryotic or eukaryotic cell membrane. methods that can be employed are co-immunoprecipitation, Modified Amino Acid Residues cross-linking and co-purification through gradients or chro Polypeptides of the invention may contain one or more matographic columns, and the two-hybrid yeast system. modified amino acid residues. The presence of modified 55 The two-hybrid system detects protein interactions in vivo amino acids may be advantageous in, for example, increasing and is described in Chien et al. (1991) Proc. Natl. Acad. Sci. polypeptide half-life, reducing polypeptide antigenicity or 88: 9578-9582, and is commercially available from Clontech toxicity, increasing polypeptide storage stability, or the like. (Palo Alto, Calif.). In Such a system, plasmids are constructed Amino acid residue(s) are modified, for example, co-transla that encode two hybrid proteins: one consists of the DNA tionally or post-translationally during recombinant produc 60 binding domain of a transcription activator protein fused to tion or modified by synthetic or chemical means. the TF polypeptide and the other consists of the transcription Non-limiting examples of a modified amino acid residue activator protein's activation domain fused to an unknown include incorporation or other use of acetylated amino acids, protein that is encoded by a cDNA that has been recombined glycosylated amino acids, Sulfated amino acids, prenylated into the plasmid as part of a cDNA library. The DNA-binding (e.g., farnesylated, geranylgeranylated) amino acids, PEG 65 domain fusion plasmid and the cDNA library are transformed modified (e.g., “PEGylated') amino acids, biotinylated into a strain of the yeast Saccharomyces cerevisiae that con amino acids, carboxylated amino acids, phosphorylated tains a reporter gene (e.g., lacZ) whose regulatory region US 8,809,630 B2 61 62 contains the transcription activator's binding site. Either example, amino acids) in every possible way for a given hybrid protein alone cannot activate transcription of the compound length (i.e., the number of amino acids in a reporter gene. Interaction of the two hybrid proteins recon polypeptide compound of a set length). Exemplary libraries stitutes the functional activator protein and results in expres include peptide libraries, nucleic acid libraries, antibody sion of the reporter gene, which is detected by an assay for the libraries (see, e.g., Vaughn et al. (1996) Nature Biotechnol. reporter gene product. Then, the library plasmids responsible 14: 309-314 and PCT/US96/10287), carbohydrate libraries for reporter gene expression are isolated and sequenced to (see, e.g., Liang et al. Science (1996) 274: 1520-1522 and identify the proteins encoded by the library plasmids. After U.S. Pat. No. 5,593.853), peptide nucleic acid libraries (see, identifying proteins that interact with the transcription fac e.g., U.S. Pat. No. 5,539,083), and small organic molecule tors, assays for compounds that interfere with the TF protein 10 protein interactions can be preformed. libraries (see, e.g., benzodiazepines, in Baum Chem. & Engi Identification of Modulators neering News Jan. 18, 1993, page 33; isoprenoids, U.S. Pat. In addition to the intracellular molecules described above, No. 5,569,588; thiazolidinones and metathiazanones, U.S. extracellular molecules that alter activity or expression of a Pat. No. 5,549,974; pyrrolidines, U.S. Pat. Nos. 5,525,735 transcription factor, either directly or indirectly, can be iden 15 and 5,519,134; morpholino compounds, U.S. Pat. No. 5,506, tified. For example, the methods can entail first placing a 337) and the like. candidate molecule in contact with a plant or plant cell. The Preparation and screening of combinatorial or other librar molecule can be introduced by topical administration, such as ies is well known to those of skill in the art. Such combina spraying or soaking of a plant, or incubating a plant in a torial chemical libraries include, but are not limited to, pep Solution containing the molecule, and then the molecule's tide libraries (see, e.g., U.S. Pat. No. 5,010, 175: Furka, (1991) effect on the expression or activity of the TF polypeptide or Int. J. Pept. Prot. Res. 37: 487-493; and Houghton et al. the expression of the polynucleotide monitored. Changes in (1991) Nature 354: 84-88). Other chemistries for generating the expression of the TF polypeptide can be monitored by use chemical diversity libraries can also be used. of polyclonal or monoclonal antibodies, gel electrophoresis In addition, as noted, compound screening equipment for or the like. Changes in the expression of the corresponding 25 high-throughput screening is generally available, e.g., using polynucleotide sequence can be detected by use of microar any of a number of well known robotic systems that have also rays, Northerns, quantitative PCR, or any other technique for been developed for Solution phase chemistries useful in assay monitoring changes in mRNA expression. These techniques systems. These systems include automated workstations are exemplified in Ausubel et al. (eds.) Current Protocols in including an automated synthesis apparatus and robotic sys Molecular Biology, John Wiley & Sons (1998, and supple 30 ments through 2001).Changes in the activity of the transcrip tems utilizing robotic arms. Any of the above devices are tion factor can be monitored, directly or indirectly, by assay suitable for use with the present invention, e.g., for high ing the function of the transcription factor, for example, by throughput Screening of potential modulators. The nature and measuring the expression of promoters known to be con implementation of modifications to these devices (if any) so trolled by the transcription factor (using promoter-reporter 35 that they can operate as discussed herein will be apparent to constructs), measuring the levels of transcripts using microar persons skilled in the relevant art. rays, Northern blots, quantitative PCR, etc. Such changes in Indeed, entire high-throughput screening systems are com the expression levels can be correlated with modified plant mercially available. These systems typically automate entire traits and thus identified molecules can be useful for soaking procedures including all sample and reagent pipetting, liquid or spraying on fruit, vegetable and grain crops to modify traits 40 dispensing, timed incubations, and final readings of the in plants. microplate in detector(s) appropriate for the assay. These Essentially any available composition can be tested for configurable systems provide high throughput and rapid start modulatory activity of expression or activity of any nucleic up as well as a high degree of flexibility and customization. acid or polypeptide herein. Thus, available libraries of com Similarly, microfluidic implementations of screening are also pounds such as chemicals, polypeptides, nucleic acids and the 45 commercially available. like can be tested for modulatory activity. Often, potential The manufacturers of Such systems provide detailed pro modulator compounds can be dissolved in aqueous or organic tocols the various high throughput. Thus, for example, (e.g., DMSO-based) solutions for easy delivery to the cell or Zymark Corp. provides technical bulletins describing screen plant of interest in which the activity of the modulator is to be ing systems for detecting the modulation of gene transcrip tested. Optionally, the assays are designed to screen large 50 tion, ligand binding, and the like. The integrated systems modulator composition libraries by automating the assay herein, in addition to providing for sequence alignment and, steps and providing compounds from any convenient source optionally, synthesis of relevant nucleic acids, can include to assays, which are typically run in parallel (e.g., in micro Such screening apparatus to identify modulators that have an titer formats on microplates in robotic assays). effect on one or more polynucleotides or polypeptides In one embodiment, high throughput screening methods 55 according to the present invention. involve providing a combinatorial library containing a large In some assays it is desirable to have positive controls to number of potential compounds (potential modulator com ensure that the components of the assays are working prop pounds). Such “combinatorial chemical libraries' are then erly. At least two types of positive controls are appropriate. screened in one or more assays, as described herein, to iden That is, known transcriptional activators or inhibitors can be tify those library members (particular chemical species or 60 incubated with cells or plants, for example, in one sample of Subclasses) that display a desired characteristic activity. The the assay, and the resulting increase? decrease in transcription compounds thus identified can serve as target compounds. can be detected by measuring the resulting increase in RNA A combinatorial chemical library can be, e.g., a collection levels and/or protein expression, for example, according to of diverse chemical compounds generated by chemical Syn the methods herein. It will be appreciated that modulators can thesis or biological synthesis. For example, a combinatorial 65 also be combined with transcriptional activators or inhibitors chemical library such as a polypeptide library is formed by to find modulators that inhibit transcriptional activation or combining a set of chemical building blocks (e.g., in one transcriptional repression. Either expression of the nucleic US 8,809,630 B2 63 64 acids and proteins herein or any additional nucleic acids or polypeptide fragment can comprise a recognizable structural proteins activated by the nucleic acids or proteins herein, or motif or functional domain Such as a DNA binding domain both, can be monitored. that activates transcription, e.g., by binding to a specific DNA In an embodiment, the invention provides a method for promoter region an activation domain, or a domain for pro identifying compositions that modulate the activity or expres tein-protein interactions. sion of a polynucleotide or polypeptide of the invention. For Production of Transgenic Plants example, a test compound, whether a small or large molecule, Modification of Traits is placed in contact with a cell, plant (or plant tissue or The polynucleotides of the invention are favorably explant), or composition comprising the polynucleotide or employed to produce transgenic plants with various traits, or polypeptide of interestanda resulting effect on the cell, plant, 10 characteristics, that have been modified in a desirable man (or tissue or explant) or composition is evaluated by monitor ner, e.g., to improve the seed characteristics of a plant. For ing, either directly or indirectly, one or more of expression example, alteration of expression levels or patterns (e.g., spa level of the polynucleotide or polypeptide, activity (or modu tial or temporal expression patterns) of one or more of the lation of the activity) of the polynucleotide or polypeptide. In transcription factors (or transcription factor homologs) of the Some cases, an alteration in a plant phenotype can be detected 15 invention, as compared with the levels of the same protein following contact of a plant (or plant cell, or tissue or explant) found in a wild-type plant, can be used to modify a plants with the putative modulator, e.g., by modulation of expres traits. An illustrative example of trait modification, improved sion or activity of a polynucleotide or polypeptide of the characteristics, by altering expression levels of a particular invention. Modulation of expression or activity of a poly transcription factor is described further in the Examples and nucleotide or polypeptide of the invention may also be caused the Sequence Listing. by molecular elements in a signal transduction second mes Arabidopsis as a Model System senger pathway and Such modulation can affect similar ele Arabidopsis thaliana is the object of rapidly growing atten ments in the same or another signal transduction second mes tion as a model for genetics and metabolism in plants. Ara senger pathway. bidopsis has a small genome, and well-documented Studies Subsequences 25 are available. It is easy to grow in large numbers and mutants Also contemplated are uses of polynucleotides, also defining important genetically controlled mechanisms are referred to herein as oligonucleotides, typically having at either available, or can readily be obtained. Various methods least 12 bases, preferably at least 15, more preferably at least to introduce and express isolated homologous genes are avail 20, 30, or 50 bases, which hybridize under at least highly able (see Koncz et al. eds., et al. Methods in Arabidopsis stringent (or ultra-high Stringent or ultra-ultra-high Stringent 30 Research (1992) et al. World Scientific, New Jersey, N.J., in conditions) conditions to a polynucleotide sequence “Preface'). Because of its small size, short life cycle, obligate described above. The polynucleotides may be used as probes, autogamy and high fertility, Arabidopsis is also a choice primers, sense and antisense agents, and the like, according to organism for the isolation of mutants and studies in morpho methods as noted Supra. genetic and development pathways, and control of these path Subsequences of the polynucleotides of the invention, 35 ways by transcription factors (Koncz. Supra, p. 72). A number including polynucleotide fragments and oligonucleotides are of studies introducing transcription factors into A. thaliana useful as nucleic acid probes and primers. An oligonucleotide have demonstrated the utility of this plant for understanding suitable for use as a probe or primer is at least about 15 the mechanisms of gene regulation and trait alteration in nucleotides in length, more often at least about 18 nucle plants. (See, for example, Koncz supra, and U.S. Pat. No. otides, often at least about 21 nucleotides, frequently at least 40 6,417,428). about 30 nucleotides, or about 40 nucleotides, or more in Arabidopsis Genes in Transgenic Plants. length. A nucleic acid probe is useful in hybridization proto Expression of genes which encode transcription factors cols, e.g., to identify additional polypeptide homologs of the modify expression of endogenous genes, polynucleotides, invention, including protocols for microarray experiments. and proteins are well known in the art. In addition, transgenic Primers can be annealed to a complementary target DNA 45 plants comprising isolated polynucleotides encoding tran strand by nucleic acid hybridization to form a hybrid between Scription factors may also modify expression of endogenous the primer and the target DNA strand, and then extended genes, polynucleotides, and proteins. Examples include Peng along the target DNA strand by a DNA polymerase enzyme. et al. (1997) et al. Genes and Development 11: 3194-3205, Primer pairs can be used for amplification of a nucleic acid and Penget al. (1999) Nature 400: 256-261. In addition, many sequence, e.g., by the polymerase chain reaction (PCR) or 50 others have demonstrated that an Arabidopsis transcription other nucleic-acid amplification methods. See Sambrook, factor expressed in an exogenous plant species elicits the Supra, and Ausubel, Supra. same or very similar phenotypic response. See, for example, In addition, the invention includes an isolated or recombi Fuet al. (2001) Plant Cell 13:1791-1802: Nandietal. (2000) nant polypeptide including a Subsequence of at least about 15 Curr. Biol. 10: 215-218: Coupland (1995) Nature 377: 482 contiguous amino acids encoded by the recombinant or iso 55 483; and Weigel and Nilsson (1995) Nature 377: 482-500. lated polynucleotides of the invention. For example, such Homologous Genes Introduced into Transgenic Plants. polypeptides, or domains or fragments thereof, can be used as Homologous genes that may be derived from any plant, or immunogens, e.g., to produce antibodies specific for the from any source whether natural, synthetic, semi-synthetic or polypeptide sequence, or as probes for detecting a sequence recombinant, and that share significant sequence identity or of interest. A Subsequence can range in size from about 15 60 similarity to those provided by the present invention, may be amino acids in length up to and including the full length of the introduced into plants, for example, crop plants, to confer polypeptide. desirable or improved traits. Consequently, transgenic plants To be encompassed by the present invention, an expressed may be produced that comprise a recombinant expression polypeptide which comprises such a polypeptide Subse vector or cassette with a promoter operably linked to one or quence performs at least one biological function of the intact 65 more sequences homologous to presently disclosed polypeptide in Substantially the same manner, or to a similar sequences. The promoter may be, for example, a plant or viral extent, as does the intact polypeptide. For example, a promoter. US 8,809,630 B2 65 66 The invention thus provides for methods for preparing plant products, such as different fatty acid ratios. For transgenic plants, and for modifying plant traits. These meth example, overexpression of G720 caused a plant to become ods include introducing into a planta recombinant expression more freezing tolerant, but knocking out the same transcrip vector or cassette comprising a functional promoter operably tion factor imparted greater Susceptibility to freezing. Thus, Suppressing a gene that causes a plant to be more sensitive to linked to one or more sequences homologous to presently cold may improve a plants tolerance of cold. More than one disclosed sequences. Plants and kits for producing these transcription factor gene may be introduced into a plant, plants that result from the application of these methods are either by transforming the plant with one or more vectors also encompassed by the present invention. comprising two or more transcription factors, or by selective Transcription Factors of Interest for the Modification of 10 breeding of plants to yield hybrid crosses that comprise more Plant Traits than one introduced transcription factor. Currently, the existence of a series of maturity groups for A listing of specific effects and utilities that the presently different latitudes represents a major barrier to the introduc disclosed transcription factor genes have on plants, as deter tion of new valuable traits. Any trait (e.g. disease resistance) mined by direct observation and assay analysis, is provided in has to be bred into each of the different maturity groups 15 Table 4. Table 4 shows the polynucleotides identified by SEQ separately, a laborious and costly exercise. The availability of IDNO; Mendel Gene ID No. (GID); and if the polynucleotide single strain, which could be grown at any latitude, would was tested in a transgenic assay. The first column shows the therefore greatly increase the potential for introducing new polynucleotide SEQ ID NO; the second column shows the traits to crop species such as soybean and cotton. GID; the third column shows whether the gene was overex For many of the specific effects, traits and utilities listed in pressed (OE) or knocked out (KO) in plant studies; the fourth Table 4 and Table 6 that may be conferred to plants, one or column shows the trait(s) resulting from the knock out or more transcription factor genes may be used to increase or overexpression of the polynucleotide in the transgenic plant; decrease, advance or delay, or improve or prove deleterious to the fifth column shows the category of the trait; and the sixth a given trait. Overexpressing or suppressing one or more column (“Comment'), includes specific observations made genes can impart significant differences in production of with respect to the polynucleotide of the first column. TABLE 4 Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: GID No. OE/KO Trait(s) Category Observations 1. G3 OE Size Dev and morph Small plant OE Hea Abiotic stress More sensitive to heat in a growth assay 5 G5 OE Size Dev and morph Small plant 11 G8 OE Flowering time Flowering time Late flowering 13 G9 OE Roo Dev and morph increased root mass 21 G19 OE Erysiphe Disease increased tolerance to Erysiphe Hormone sensitivity Hormone sensitivity Repressed by methylasmonate and induced by ACC 23 G20 OE Seed sterols Seed biochemistry increase in campesterol 2S G21 OE Size Dev and morph Reduced size 27 G22 OE Sodium chloride Abiotic stress increased tolerance to high salt 29 G24 OE Morphology: other Dev and morph Reduced size OE Necrosis Dev and morph Necrotic patches 31 G25 OE Trichome Dev and morph Fewer trichomes at seedling stage OE Fusarium Disease Expression induced by Fusarium infection 33 G26 OE Sugar sensing Sugar sensing Decreased germination and growth on glucose medium 35 G27 OE Morphology: other Dev and morph Abnormal development, Small 37 G28 OE Botrytis Disease increased tolerance to Botrytis OE Erysiphe Disease increased resistance to Erysiphe OE Scierotinia Disease increased tolerance to Scierotinia 39 G32 OE Leaf Dev and morph Curled leaves 41 G38 OE Sugar sensing Sugar sensing Reduced germination on glucose medium 49 G43 OE Sugar sensing Sugar sensing Decreased germination and growth on glucose medium S3 G46 OE Size Dev and morph increased size OE Drought Abiotic stress increased tolerance to drought SS G134 OE Flower Dev and morph Homeotic transformation OE Cold Abiotic stress increased sensitivity to cold 57 G142 OE Flowering time Flowering time Early flowering S9 G145 OE Flowering time Flowering time Early flowering OE Inflorescence Dev and morph Terminal flowers 61 G146 OE Abiotic stress Abiotic stress Better growth in low nitrogen OE Nutrient uptake Abiotic stress Altered C:N sensing: reduced anthocyanin production on high Sucroseflow nitrogen OE Flowering time Flowering time Early flowering 6S G153 OE Abiotic stress Abiotic stress Tolerant to low nitrogen conditions 67 G156 KO Seed Dev and morph Seed color alteration US 8,809,630 B2 67 68 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: D No. E/KO Trait(s) Category Observations 69 57 O E Flowering time Flowering time Modest overexpression triggers early flowering; greater overexpression delays flowering 71 62 Seed oil content Seed biochemistry increased seed oil content Seed protein content Seed biochemistry Altered seed protein content 77 71 Cold Abiotic stress Expression induced by cold and heat Heat Abiotic stress 87 Seed oil content Seed biochemistry Decreased seed oil Flowering time Flowering time Early flowering 89 84 Flowering time Flowering time Early flowering Size Dev and morph Small plant 91 85 Leaf glucosinolates Leaf biochemistry increased M39481 Flowering time Flowering time Early flowering 95 87 Morphology: other Dev and morph Variety of morphological alterations 97 88 Osmotic Abiotic stress Better germination under osmotic StreSS Fusarium isease increased susceptibility to Fusarium O1 92 Seed oil content eed biochemistry Decreased oil content Flowering time owering time Late flowering O3 Size ev and morph Small plant 05 Sodium chloride biotic stress increased tolerance to high Salt 09 Flowering time owering time Late flowering 11 Seed protein content Seed biochemistry increased seed protein content Seed oil content Seed biochemistry Decreased seed oil content 15 Seed Dev and morph Large seeds 17 Sugar sensing Sugar sensing Decreased germination on glucose medium Botrytis Disease increased susceptibility to Botrytis 19 G208 Flowering time Flowering time Early flowering 21 G211 Leaf insoluble Sugars Leaf biochemistry increase in xylose 23 G212 Trichome Dev and morph Partially to fully glabrous on adaxial surface of leaves 27 G214 Leaf fatty acids Leaf biochemistry increased leaf fatty acids g Leaf prenyl lipids Leaf biochemistry increased leaf chlorophyll and carotenoids Flowering time Flowering time Late flowering Seed prenyl lipids Seed biochemistry increased seed lutein 135 G222 Seed oil content Seed biochemistry Decreased seed oil content Seed protein content Seed biochemistry increased seed protein content 137 G.224 Cold Abiotic stress increased tolerance to cold Leaf Dev and morph Altered leaf shape Sugar sensing Sugar sensing increased germination and seedling vigor on high glucose 139 G225 Root Dev and morph increased root hairs Trichome Dev and morph Glabrous, lack of trichomes : Nutrient uptake Abiotic stress increased tolerance to nitrogen-limited medium 141 G226 E Nutrient uptake Abiotic stress increased tolerance to nitrogen-limited medium Seed protein content Seed biochemistry increased seed protein Root Dev and morph increased root hairs Trichome Dev and morph Glabrous, lack of trichomes Sodium chloride Abiotic stress increased tolerance to high Salt 43 G227 Flowering time Flowering time Early flowering 47 G229 Seed protein content Seed biochemistry Decreased seed protein Seed oil content Seed biochemistry increased seed oil Biochemistry: other Biochem: misc Up-regulation of genes involved in secondary metabolism 51 G231 Leaf fatty acids Leaf biochemistry increased leaf unsaturated fatty acids Seed protein content Seed biochemistry Decreased seed protein content Seed oil content Seed biochemistry increased seed oil content 57 G234 Flowering time Flowering time Late flowering, Small plant 59 G237 Leaf insoluble Sugars Leaf biochemistry increased leaf insoluble Sugars Erysiphe Disease increased tolerance to Erysiphe 61 G239 ABA Abiotic stress Expression induced by ABA Drought Abiotic stress Expression induced by drought Hea Abiotic stress Expression induced by heat Osmotic stress Abiotic stress Expression induced by osmotic stress 63 G241 Seed oil content Seed biochemistry Decreased seed oil Seed protein content Seed biochemistry Altered seed protein content Sugar sensing Sugar sensing Decreased germination and growth on glucose medium 65 G242 Leaf insoluble Sugars Leaf biochemistry increased arabinose 69 G247 Trichome Dev and morph Altered trichome distribution 71 G248 Botrytis Disease increased susceptibility to Botrytis US 8,809,630 B2 69 70 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQI NO: GID No. E/KO Trait(s) Category Observations 173 G249 Flowering time Flowering time Late flowering Senescence Dev and morph Delayed senescence 179 Sugar sensing Sugar sensing Decreased germination and growth on glucose medium 181 Flowering time Flowering time Early flowering 183 Cold Abiotic stress Better germination and growth in cold 187 Size Dev and morph Reduced size 191 Botrytis Disease increased susceptibility to Botrytis 193 Sugar sensing Sugar sensing Decreased root growth on Sucrose medium, root specific expression 199 G274 Leaf insoluble Sugars Leaf biochemistry increased leafarabinose 2O3 G280 Size Dev and morph Reduced size Leaf prenyl lipids Leaf biochemistry increased delta and gammatocopherol 209 Seed oil content Seed biochemistry increased seed oil content 215 Leaf insoluble Sugars Leaf biochemistry Altered leaf insoluble Sugars 217 Sugar sensing Sugar sensing No germination on glucose medium 223 Osmotic Abiotic stress Better germination on high Sucrose and high NaCl 231 G346 Leaf fatty acids Leaf biochemistry Altered leaf fatty acids Seed oil content Seed biochemistry Decreased seed oil 235 G351 Light response Dev and morph Altered leaf orientation and lightgreen coloration 237 G361 Flowering time Flowering time Late flowering 239 G374 Embryo lethal Dev and morph Embryo lethal 241 G378 Erysiphe Disease increased resistance 245 G385 Dev and morph Small plant inflorescence Dev and morph Short inflorescence stems Leaf Dev and morph Dark green plant 249 G390 Flowering time Flowering time Early flowering Architecture Dev and morph Altered shoot development 251 G394 Cold Abiotic stress More sensitive to chilling 261 G409 Erysiphe Disease increased tolerance to Erysiphe 267 G418 Pseudomonas Disease increased tolerance Seed protein content Seed biochemistry Decreased seed protein content 269 G419 Nutrient uptake Abiotic stress increased tolerance to potassium-free medium 271 G428 Leaf insoluble Sugars Leaf biochemistry increased leaf insoluble Sugars LC8 Dev and morph Altered leaf shape 273 G431 Morphology: other Dev and morph Developmental defect, sterile 275 G434 Flowering time Flowering time Late flowering 277 G435 Leaf insoluble Sugars Leaf biochemistry increased leaf insoluble Sugars 283 G438 Architecture Dev and morph Reduced branching Stem Dev and morph Reduced lignin LC8 Dev and morph increased leaf size LC8 Dev and morph Altered leaf shape 285 G447 Size Dev and morph Reduced size Morphology: other Dev and morph Altered cotyledon shape LC8 Dev and morph Dark green leaves 287 G456 Seed protein content Seed biochemistry Decreased seed protein Seed oil content Seed biochemistry increased seed oil 291 G464 Seed oil content Seed biochemistry increased seed oil Hea Abiotic stress Better germination and growth in heat LC8 Dev and morph Altered leaf shape Seed protein content Seed biochemistry Decreased seed protein content 295 G470 Fertility Dev and morph Short stamen filaments 301 G475 Flowering time Flowering time Early flowering 303 G477 Scierotinia Disease increased susceptibility to Sclerotinia Oxidative Abiotic stress increased sensitivity to oxidative stress 305 G482 Sodium chloride Abiotic stress Tolerant to high salt 307 G486 Flowering time Flowering time Late flowering 309 G489 Osmotic Abiotic stress increased tolerance to osmotic stress 313 GSO2 Osmotic Abiotic stress increased sensitivity to osmotic stress 317 GSO9 Seed oil content Seed biochemistry Altered seed oil content Seed protein content Seed biochemistry Altered seed protein content 327 G515 Morphology: other Dev and morph Lethal when overexpressed 331 GS21 Leaf Dev and morph Leaf cell expansion 337 G525 Pseudomonas Disease increased tolerance to Pseudomonas Leaf insoluble Sugars Leaf biochemistry increased leaf insoluble Sugars 339 G526 Osmotic Abiotic stress increased sensitivity to osmotic stress 343 GS36 Sugar sensing Sugar sensing Decreased germination and growth on glucose medium 345 GS45 Sodium chloride Abiotic stress Susceptible to high salt Nutrient uptake Abiotic stress increased tolerance to phosphate-free medium US 8,809,630 B2 71 72 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: GID No. E/KO Trait(s) Category Observations Erysiphe Disease increased Susceptibility to Erysiphe Pseudomonas Disease increased Susceptibility to Pseudomonas Fusarium Disease increased susceptibility to Fusarium 355 G558 Defense gene Disease increased expression of defense genes expression 357 G559 Architecture Dev and morph Loss of apical dominance Fertility Dev and morph Reduced fertility 359 GS61 Seed oil content Seed biochemistry Altered seed oil content Nutrient uptake Abiotic stress increased tolerance to potassium-free medium 361 GS62 Flowering time Flowering time Late flowering 369 GS67 Seed protein content Seed biochemistry Altered seed protein content Seed oil content Seed biochemistry Altered seed oil content Sugar sensing Sugar sensing Decreased seedling vigor on high glucose 3.71 G569 E Defense gene Disease Decreased expression of defense genes expression 375 G571 Senescence Dev and morph Delayed senescence Flowering time Flowering time Late flowering 379 G578 Morphology: other Dev and morph Lethal when overexpressed 381 GS8O Flower Dev and morph Altered development Architecture Dev and morph Altered inflorescences 38S GS84 Seed Dev and morph Large seeds 387 G590 Seed oil content Seed biochemistry increased seed oil content Flowering time Flowering time Early flowering 389 G591 Erysiphe Disease increased tolerance to Erysiphe Flowering time Flowering time Late flowering 391 GS92 Flowering time Flowering time Early flowering 393 G598 Seed oil content Seed biochemistry increased seed oil Leaf insoluble Sugars Leaf biochemistry Altered insoluble Sugars 39S G605 Leaf fatty acids Leaf biochemistry Altered leaf fatty acid composition 399 G615 Architecture Dev and morph Altered plant architecture Fertility Dev and morph Little or no pollen production, poor filament elongation 401 G616 Erysiphe Disease increased tolerance to Erysiphe 403 G624 g Sodium chloride Abiotic stress increased tolerance to high Salt Size Dev and morph increased biomass Nutrient uptake Abiotic stress increased tolerance to low phosphate Flowering time Flowering time Late flowering 40S G627 Flowering time Flowering time Early flowering 407 G629 Lea Dev and morph Altered leaf morphology Seed oil content Seed biochemistry increased seed protein content 409 G-630 Seed protein content Seed biochemistry increased seed protein content, embryo specific expression 415 G634 Trichome Dev and mor increased trichome density and size 417 G636 Dev and mor Premature senescence Dev and mor Reduced size 419 G638 Fl OWe Dev and morph Altered flower development 435 G663 oil content Seed biochemistry Decreased seed oil Seed protein content Seed biochemistry increased seed protein Biochemistry: other Biochem: misc increased anthocyanins in leaf, root, seed 437 G664 Col Abiotic stress Better germination and growth in cold 443 G668 See protein content Seed biochemistry increased seed protein content See oil content Seed biochemistry Decreased seed oil content See Dev and morph Reduced seed color 447 G670 Size Dev and morph Small plant 449 G671 Stem Dev and morph Altered inflorescence stem structure Flower Dev and morph Reduced petal abscission Leaf Dev and morph Altered leaf shape Size Dev and morph Small plant Fertility Dev and morph Reduced fertility 457 G676 Trichome Dev and morph Reduced trichomes 463 G68O Flowering time Flowering time Late flowering Sugar sensing Sugar sensing Reduced germination on glucose medium 46S G681 Leaf glucosinolates Leaf biochemistry increase in M394.80 467 G682 Heat Abiotic stress Better germination and growth in heat Trichome Dev and morph Glabrous, lack of trichomes 473 G718 Seed protein content Seed biochemistry increased seed protein Leaf fatty acids Leaf biochemistry Altered leaf fatty acid composition Seed prenyl lipids Seed biochemistry increased seed lutein Seed oil content Seed biochemistry Decreased seed oil US 8,809,630 B2 73 74 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: GID No. E/KO Trait(s) Category Observations 483 G732 E Seed protein content Seed biochemistry One OE line had increased, another decreased seed protein content O E Seed oil content Seed biochemistry One OE line had increased, another decreased seed oil content Architecture Dev and morph Reduced apical dominance Flower Dev and morph Abnormal flowers 487 G736 Flowering time Flowering time Late flowering Leaf Dev and morph Altered leaf shape 489 G738 Flowering time Flowering time Late flowering Size Dev and morph Reduced size 491 G740 Morphology: other Dev and morph Slow growth 497 G748 Stem Dev and morph More vascular bundles in stem Flowering time Flowering time Late flowering Seed prenyl lipids Seed biochemistry increased lutein content 503 G752 Flowering time Flowering time Late flowering 507 G760 Hormone sensitivity Hormone sensitivity Hypersensitive to ACC Size Dev and morph Reduced size 519 G776 Seed oil composition Seed biochemistry Altered seed fatty acid composition 521 G777 Seed oil content Seed biochemistry Decreased seed oil Leaf insoluble Sugars Leaf biochemistry increased leaf rhamnose 523 G778 Seed oil composition Seed biochemistry increased seed 18:1 fatty acid 525 G779 Fertility Dev and morph Reduced fertility Flower Dev and morph Homeotic transformations 529 G782 Sugar sensing Sugar sensing Better germination and growth on Sucrose medium 531 G783 E Sugar sensing Sugar sensing Better germination and growth on Sucrose medium 539 G789 Scierotinia Disease increased susceptibility to Sclerotinia Flowering time Flowering time Early flowering Oxidative Abiotic stress increased sensitivity 541 G791 Seed oil composition Seed biochemistry Decreased seed fatty acid composition Leaf insoluble Sugars Leaf biochemistry Altered leaf cell wall polysaccharide composition Leaf fatty acids Leaf biochemistry Altered leaf fatty acid composition 549 G8O1 Sodium chloride Abiotic stress Better germination on NaCl 553 G805 Scierotinia Disease increased susceptibility to Sclerotinia 557 G831 Size Dev and morph Reduced size S6S G849 Seed protein content Seed biochemistry Altered seed protein content Seed oil content Seed biochemistry increased seed oil content 567 G859 Flowering time Flowering time Late flowering 571 G861 Seed oil composition Seed biochemistry increase in 16:1 573 G864 Size Dev and morph Small plant Cold Abiotic stress Adult stage chilling sensitivity Heat Abiotic stress Better germination in heat 575 G865 Erysiphe Disease increased Susceptibility to Erysiphe Seed protein content Seed biochemistry increased seed protein Botrytis Disease increased susceptibility to Botrytis Flowering time Flowering time Early flowering Morphology: other Dev and morph Altered morphology 579 G867 Sugar sensing Sugar sensing Better seedling vigor on Sucrose medium Sodium chloride Abiotic stress Better seedling vigor on high Salt 581 G869 Leaf insoluble Sugars Leaf biochemistry increased fucose Erysiphe Disease increased tolerance to Erysiphe Morphology: other Dev and morph Small and spindly plant Flower Dev and morph Abnormal anther development Seed oil composition Seed biochemistry Altered seed fatty acids Leaf fatty acids Leaf biochemistry Altered leaf fatty acids 583 G877 Embryo lethal Dev and morph Embryo lethal 585 G878 Senescence Dev and morph Delayed senescence Flowering time Flowering time Late flowering S87 G881 Botrytis Disease increased susceptibility to Botrytis Erysiphe Disease increased Susceptibility to Erysiphe S89 G883 Seed prenyl lipids Seed biochemistry Decreased seed lutein 591 G884 Sodium chloride Abiotic stress increased root growth in high salt Size Dev and morph Reduced size 595 G896 Fusarium Disease increased susceptibility to Fusarium 603 G903 Leaf Dev and morph Altered leaf morphology 60S G905 Flowering time Flowering time Late flowering Leaf Dev and morph Altered leaf shape Sugar sensing Sugar sensing increased seedling vigor on high glucose 613 G911 O E Nutrient uptake Abiotic stress C8Se. growth on potassium-free medium US 8,809,630 B2 75 76 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: GID No. E/KO Trait(s) Category Observations Seed protein content Seed biochemistry increased seed protein content Seed oil content Seed biochemistry Decreased seed oil content 615 G912 Freezing Abiotic stress increased freezing tolerance Morphology: other Dev and morph Dark green color Drought Abiotic stress increased Survival in drought conditions E Sugar sensing Sugar sensing Reduced cotyledon expansion in glucose Size Dev and morph Small plant Flowering time Flowering time Late flowering 619 G921 Osmotic Abiotic stress increased sensitivity to osmotic stress Leaf Dev and morph Serrated leaves 627 G932 Leaf Dev and morph Altered development, dark green color Size Dev and morph Reduced size 631 G938 Seed oil composition Seed biochemistry Overexpressors had increased 16:0, 8:0, 20:0, and 18:3 fatty acids, decreased 18:2, 20:1, 22:1 fatty acids 633 G961 Seed oil content Seed biochemistry increased seed oil content 63S G964 s Heat Abiotic stress More tolerant to heat in germination assay 637 G965 Seed oil composition Seed biochemistry increase in 18:1 fatty acid 639 G971 Flowering time Flowering time Late flowering 641 G974 Seed oil content Seed biochemistry Altered seed oil content 643 G975 Leaf fatty acids Leaf biochemistry increased wax in leaves 647 G977 Size Dev and morph Small plant Morphology: other Dev and morph Dark green Leaf Dev and morph Altered leaf shape Fertility Dev and morph Reduced fertility 649 G979 O Seed Dev and morph Altered seed development, ripening, and germination 653 G987 KO Leaf fatty acids Leaf biochemistry Reduction in 16:3 fatty acid K O Leaf prenyl lipids Leaf biochemistry Presence of two xanthophylls, ocopherol not normally found in eaves; reduced chlorophylla and b 6SS G991 Size Dev and morph Slightly reduced size 657 G994 Flowering time Flowering time Late flowering Size Dev and morph Small plants 659 G996 Sugar sensing Sugar sensing Reduced germination on glucose medium 671 G1012 Leaf insoluble Sugars Leaf biochemistry Decreased rhamnose 673 G102O Size Dev and morph Very small T1 plants 677 G1023 Size Dev and morph Reduced size 68S G1037 Flowering time Flowering time Early flowering 687 G1038 Leaf Dev and morph Altered leaf shape Leaf insoluble Sugars Leaf biochemistry Decreased insoluble Sugars 695 G1048 Erysiphe Disease increased tolerance to Erysiphe Orontii Seed protein content Seed biochemistry increased seed protein content 697 G1 OSO Senescence Dev and morph Delayed senescence 699 G1052 Flowering time Flowering time Late flowering Seed prenyl lipids Seed biochemistry Decrease in lutein and increase in Xanthophyll 1 701 G1053 Size Dev and morph Small plant 713 G1062 Light response Dev and morph Constitutive photomorphogenesis Hormone sensitivity Hormone sensitivity Altered response to ethylene Seed Dev and morp Altered seed shape Morphology: other Dev and morp Slow growth 717 G1067 Leaf Dev and morp Altered leaf shape Size Dev and mor Small plant Fertility Dev and mor Reduced fertility 719 G1068 Sugar sensing Sugar sensing Reduced cotyledon expansion in glucose 721 G1069 O E Osmotic Abiotic stress Better germination under osmotic StreSS Hormone sensitivity Hormone sensitivity Reduced ABA sensitivity Leaf glucosinolates Leaf biochemistry Altered composition 723 G1073 Size Dev and morph increased plant size Leaf Dev and morph Serrated leaves Flowering time Flowering time Flowering slightly delayed 725 G1075 Size Dev and morph Small plant Flower Dev and morph Reduced or absent petals, sepals and Stamens Fertility Dev and morph Reduced fertility Leaf Dev and morph Altered leaf shape 727 G1076 Morphology: other Dev and morph Lethal when overexpressed US 8,809,630 B2 77 78 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: D No. O E/KO Trait(s) Category Observations 731 KO Osmotic Abiotic stress Better germination under osmotic StreSS O E Morphology: other Dev and morph Developmental defects at seedling Stage 737 128 Lea Dev and morph Dark green leaves Senescence Dev and morph Premature leaf and flower senescence 739 133 Leaf prenyl lipids Leaf biochemistry Decreased leaflutein 741 134 Silique Dev and morph Siliques with altered shape Hormone sensitivity Hormone sensitivity Altered response to the growth hormone ethylene 745 136 Flowering time Flowering time Late flowering Nutrient uptake Abiotic stress increased sensitivity to low nitrogen 749 145 See Seed morphology Reduced seed size See Seed morphology Altered seed shape 753 181 Size Dev and morph Small T1 plants 759 190 Seed oil content Seed biochemistry increased seed oil content 763 198 Seed oil content Seed biochemistry increased seed oil content Leaf glucosinolates Leaf biochemistry Altered glucosinolate composition 775 228 Size Dev and morph Reduced size 787 242 Flowering time Flowering time Early flowering 793 255 See Dev and morph increased seed size Morphology: other Dev and morph Reduced apical dominance Botrytis Disease increased susceptibility to Botrytis 799 266 Leaf fatty acids Leaf biochemistry Changes in leaf fatty acids Erysiphe Disease increased tolerance to Erysiphe Size Dev and morph Small plant Fertility Dev and morph Reduced fertility Leaf insoluble Sugars Leaf biochemistry Changes in leaf insoluble Sugars G1267 Size Dev and morph Small plant Leaf Leaf morphology Dark green leaves Leaf Leaf morphology Shiny leaves 803 G1269 Leaf Dev and morph Long petioles, upturned leaves 805 G1274 Cold Abiotic stress increased tolerance to cold Nutrient uptake Abiotic stress increased tolerance to nitrogen-limited medium Morphology: other Dewan O inflorescence architecture Leaf Dewan O Large leaves 807 275 Size Dewan O Small plant Architecture Dewan O Reduced apical dominance 809 277 Size Dewan O Small plant 817 304 Morphology: other Dewan O Lethal when overexpressed 819 305 Flowering time Flowering tim Early flowering Heat Abiotic stress Reduced chlorosis at high temperature 825 309 Size Dev and morph Small plant Leaf insoluble Sugars Leaf biochemistry increased mannose 827 311 Fertility Dev and morp Reduced fertility Size Dev and morph Small plant 829 313 Morphology: other increased seedling size 831 314 Sugar sensing Reduced seedling vigor on high glucose Size Reduced size 837 317 Size Reduced size 841 322 Cold increased seedling vigor in cold conditions Leaf glucosinolates Leaf biochemistry increase in M394.80 Light response Dev and morph Photomorphogenesis in the dark Size Dev and morph Reduced size 843 323 Seed oil content Seed biochemistry Decreased seed oil Seed protein content Seed biochemistry increased seed protein Size Dev and morph Small T1 plants, darkgreen 845 324 Leaf prenyl lipids Leaf biochemistry Decreased leaflutein, increased leaf Xanthophyll 849 326 Flower Dewan O Petals and sepals are Smaller Size Dewan O Small plant Fertility Dewan O) Reduced fertility 851 327 Leaf Dev and mor Dark green leaves 853 328 Seed prenyl lipids Seed biochemistry Decreased seed lutein 855 332 Trichome Dev and morph Reduced trichome density Size Dev and morph Reduced plant size 857 334 Size Oil Small plants Leaf Leaf morphology Dark green leaves 859 335 Flowering time Flowering time Late flowering Dev and morph Dev and morph Slow growth US 8,809,630 B2 79 80 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: D No. OE/KO Trait(s) Category Observations 861 337 OE Sugar sensing Sugar sensing Decreased germination on Sucrose medium OE Leaf fatty acids Leaf biochemistry Altered leaf fatty acid composition 867 380 OE Flowering time Flowering time Early flowering 869 382 OE Size Dev and morph Small plant 877 399 OE Leaf fatty acids Leaf biochemistry Altered composition 879 412 OE Hormone sensitivity Hormone sensitivity ABA insensitive Osmotic Abiotic stress increased tolerance to osmotic stress 881 417 KO Morphology: other Dev and morph Reduced seedling germination and vigor KO Seed oil composition Seed biochemistry increase in 18:2, decrease in 18:3 fatty acids 883 425 E Flower Dev and morph Altered flower and inflorescence development 887 435 Flowering time Flowering time Late flowering Size Dev and morph increased plant size 891 449 Seed protein content Seed biochemistry increased seed protein content Flower Dev and morph Altered flower structure 893 451 Leaf Dev and morph Large leaf size Seed oil content Seed biochemistry Altered seed oil content Morphology: other Dev and morph increased plant size Flowering time Flowering time Late flowering 895 468 Flowering time Flowering time Late flowering Size Dev and morph increased biomass Leaf Dev and morph Grayish and narrow leaves Morphology: other Dev and morph Slow growth rate 897 471 Seed oil content Seed biochemistry increased seed oil content 899 472 Morphology: other Dev and morph No shoot meristem 901 474 Morphology: other Dev and morph Reduced plant size Flowering time Flowering time Late flowering Inflorescence Dev and morph inflorescence architecture 903 476 Morphology: other Dev and morph Faster seedling growth rate 905 482 Root Dev and morph increased root growth Biochemistry: other Biochem: misc increased anthocyanins 911 493 Sugar sensing Sugar sensing increased seedling vigor on high glucose Flowering time Flowering time Late flowering Leaf Dev and morph Altered leaf shape 913 499 Architecture Dev and morph Altered plantarchitecture Flower Dev and morph Altered floral organ identity and development Morphology: other Dark green color Dark green leaves 919 Morphology: other Dev and morph Reduced cell differentiation in meristem 921 545 Flowering time Flowering time Early flowering Size Dev and morph Reduced size 925 S60 Fertility Dev and morph Reduced fertility Size Dev and morph Reduced size Flower Dev and morph Abnormal flowers 927 634 Seed protein content Seed biochemistry Altered seed protein content 929 645 inflorescence Dev and morph Altered inflorescence structure Leaf Dev and morph Altered leaf development Heat Abiotic stress Reduced germination vigor 937 760 Flowering time Flowering time Early flowering 939 816 Trichome Dev and morph Glabrous Root Dev and morph Ectopic root hairs, more root hairs Nutrient uptake Abiotic stress improved tolerance to low nitrogen Sugar sensing Sugar sensing insensitive to growth retardation effects of high glucose Pigment Dev and morph Reduced pigment 941 G1820 Osmotic Abiotic stress Better germination in NaCl Seed protein content Seed biochemistry increased seed protein content Seed oil content Seed biochemistry Decreased seed oil content Hormone sensitivity Hormone sensitivity Reduced ABA sensitivity Flowering time owering time Early flowering Drought biotic stress increased tolerance to drought 943 G1842 Flowering time owering time Early flowering 945 G1843 Flowering time owering time Early flowering 949 G1947 Fertility ev and morph Reduced fertility Flower ev and morph Extended period of flowering 951 G2010 Flowering time owering time Early flowering 957 G2347 Flowering time owering time Early flowering 959 G2718 Trichome Devel and morph Reduction in trichome density ranging rom mild to glabrous US 8,809,630 B2 81 82 TABLE 4-continued Traits, trait categories, and effects and utilities that transcription factor genes have on plants. SEQID NO: GID No. OE/KO Trait(s) Category Observations OE Nutrient uptake Abiotic stress Tolerant to low nitrogen conditions OE Root Dev and morph Ectopic root hairs, more root hairs OE Pigment Dev and morph Reduced pigment

Tables 5A and 5B show the polypeptides identified by SEQ TABLE 5A-continued ID NO; Mendel Gene ID (GID) No.: the transcription factor family to which the polypeptide belongs, and conserved Gene families and conserved domains domains of the polypeptide. The first column shows the 15 Polypeptide Conserved Domains polypeptide SEQ ID NO; the third column shows the tran SEQI in Amino Acid scription factor family to which the polynucleotide belongs: NO: GID No. Family Coordinates and the fourth column shows the amino acid residue positions OO G190 WRKY of the conserved domain in amino acid (AA) co-ordinates. O2 G192 WRKY O4 G194 WRKY O6 G196 WRKY TABLE 5A O8 G197 10 G198 Gene families and conserved domains 12 G2O1 14 G2O3 Polypeptide Conserved Domains 16 G2O6 SEQID in Amino Acid 25 18 G2O7 NO: GID No. Family Coordinates 2O G208 22 G211 2 G3 AP2 28-95 24 G212 4 G4 AP2 121-188 26 G213 6 G5 AP2 149-216 28 G214 MYB-related 8 G6 AP2 22-89 30 30 G215 MYB-related 10 G7 AP2 58-125 32 G216 12 G8 AP2 151-217, 243-296 34 G22O 14 G9 AP2 62-127 36 G222 16 G10 AP2 21-88 38 G.224 PMR 18 G13 AP2 19-86 40 G225 MYB-related 2O G14 AP2 122-189 35 42 G226 MYB-related 22 G19 AP2 76-145 44 G227 24 G20 AP2 68-144 46 G228 26 G21 AP2 97-164 48 G229 28 G22 AP2 89-157 50 G230 30 G24 AP2 25-93 52 G231 32 G2S AP2 47-114 40 S4 G232 34 G26 AP2 67-134 56 G233 36 G27 AP2 37-104 58 G234 38 G28 AP2 145-213 60 G237 40 G32 AP2 17-84 62 G239 2 1 2 42 G38 AP2 76-143 64 G241 44 G40 AP2 45-112 66 G242 46 G41 AP2 39-106 45 68 G245 48 G42 AP2 48-115 70 G247 50 G43 AP2 104-172 72 G248 2 52 G44 AP2 85-154 74 G249 S4 G46 AP2 107-175 76 G2S1 56 G134 MADS 1-57 78 G2S2 58 G142 MADS 2-57 50 8O G2S4 60 G145 MADS 1-57 82 G255 62 G146 MADS 1-57 84 G2S6 64 G152 MADS 2-57 86 G257 66 G153 MADS 1-57 88 G258 68 G156 MADS 2-57 90 G260 70 G157 MADS 2-57 92 G261 72 G162 MADS 2-57 55 74 G166 MADS 2-56 94 G263 76 G170 MADS 2-57 96 G266 45-13S 78 G171 MADS 1-57 98 G270 259-424 8O G173 MADS 1-57 2OO G274 82 G176 WRKY 117-173,234-290 2O2 G279 84 G177 WRKY 166-221, 328-384 60 204 G280 97-104, 130-137-155-162, 86 G179 WRKY 65-121 185-192 88 G18O WRKY 118-174 2O6 G285 90 G184 WRKY 295-352 208 G290 538-784,919-958, 92 G185 WRKY 113-172 1086-1169 94 G186 WRKY 312-369 210 G291 132-160 96 G187 WRKY 172-228 65 212 G295 98 G188 WRKY 175-222 214 G301

US 8,809,630 B2 87 88 TABLE 5A-continued TABLE 5A-continued

Gene families and conserved domains Gene families and conserved domains

Polypeptide Conserved Domains Polypeptide Conserved Domains SEQID in Amino Acid SEQID in Amino Acid NO: D No. Family Coordinates NO: GID No. Family Coordinates 802 267 WRKY 70-127 948 G1844 MADS 2-57 804 269 MYB-related 27-83 950 G1947 HS 37-120 806 274 WRKY 11-164 952 G2010 SBP 53-127 808 275 WRKY 10 954 G2119 RINGi C3H2C3 810 277 956 G2120 WAR O-O 812 278 958 G2347 SBP 60-136 814 290 96.O G2718 MYB-related 21-76 816 300 818 304 82O 305 15 822 307 TABLE 5B 824 3O8 826 309 828 311 MAF Gene family and conserved domains 830 313 Polypeptide 832 314 SEQID NO: GID No. Family Conserved Domains 834 315 836 316 568 G859 MADS 2-57 838 317 944 G1842 MADS 2-57 840 319 946 G1843 MADS 2-57 842 322 948 G1844 MADS 2-57 84.4 323 735 G157 MADS 2-57 846 324 25 875 G1759 MADS 2-57 848 325 971 Soy1 MADS 2-57 8SO 326 973 Soy3 MADS 2-57 852 327 945 G859.3 MADS 2-57 854 328 947 G859.1 MADS 2-57 856 332 949 G859.4 MADS 2-57 858 334 30 951 G859.5 MADS 2-57 860 335 953 G1842.2 MADS 2-57 862 337 955 G1842.7 MADS 2-57 864 362 957 G1842.8 MADS 2-57 866 366 959 G1842.6 MADS 2-57 868 380 961 G1843.1 MADS 2-57 870 382 210-266,385-437 35 963 G1843.2 MADS 2-57 872 391 230-277 96S G1843.3 MADS 2-57 874 395 -72 967 G1843.4 MADS 2-57 876 398 969 G1844.2 MADS 2-57 878 399 971 Soy MADS1 MADS 2-57 880 412 973 Soy MADS3 MADS 2-57 882 417 884 425 40 886 426 For many crops, high yielding winter strains can only be 888 435 grown in regions where the growing season is Sufficiently 890 448 43-50, 144-154, 187-220, 254-290 cold and prolonged to elicit Vernalization. A system that could 892 449 IAA 48-53, 74-107, 122-152 trigger flowering at higher temperatures would greatly 894 451 ARF 22-357 45 expand the acreage over which winter varieties can be culti 896 468 Z-C2H2 95-115, 170-190 vated. The finding that G157 (SEQID NO:1734) overexpres 898 471 Z-C2H2 49-70 900 472 Z-C2H2 83-106 sion causes early flowering in Arabidopsis Stockholm and 902 474 Z-C2H2 41-68 Pitztal plants, indicates that the gene can overcome the high 904 476 Z-C2H2 37-57 level of FRIGIDA and FLC activity present in those late 906 482 Z-CO-like S-63 50 ecotypes. The effects are similar to those caused by Vernal 908 483 Z-CO-like 17-66 910 490 GARP 193-241 ization, which implies that G157 (SEQID NO: 1734) and the 912 493 GARP 242-289 related MADS-box transcription factors (MAFs: SEQ ID 914 499 HLHAMYC 118-181 NOs: 567, 1944, 1946, 1948, 1950, 943, 1952, 1954, 1956, 916 SO4 GATAZn 193-2O6 1958, 945, 1960, 1962, 1964, 1966, 947, 1968, 1970, and 918 SO8 GATAZn 38-63 55 920 S4O HB 1972) might be used in winter strains of crop species. To date, 922 545 HB 54-117 a Substantial number of genes have been found to promote 924 552 SBP 101-177 flowering. Many, however, including those encoding the tran 926 S60 HS 62-151 scription factors, APETALA1, LEAFY, and CONSTANS, 928 634 MYB-related 129-18O 930 645 MYB-(R1)R2R3 90-210 produce extreme dwarfing and/or shoot termination when 932 6SO HLHAMYC 284-334 60 over-expressed. Importantly, overexpression of G157 was not 934 664 HLHAMYC observed to have deleterious effects. In particular, 35S::G157 936 669 Z-CO-like transgenic Arabidopsis plants are healthy and attain a wild 938 760 MADS 2-57 940 816 MYB-related 31-81 type stature when mature. Irrespective of the mode of G157 942 820 CAAT 70-133 action, and whetherits true biological role is as an activator or 944 842 MADS 2-57 65 a repressor of flowering, the results suggest that G157 can be 946 843 MADS 2-57 applied to produce either early or late flowering, according to the level of over-expression of the particular polynucleotide. US 8,809,630 B2 89 90 Examples of some of the utilities that may be desirable in herbivore resistance, or increased cotton yield, but the latter plants, and that may be provided by transforming the plants may be undesirable in that it may reduce biomass. However, with the presently disclosed sequences, are listed in Table 6. by operably linking G362 with a flower-specific promoter, Many of the transcription factors listed in Table 6 may be one may achieve the desirable benefits of the gene without operably linked with a specific promoter that causes the tran affecting overall biomass to a significant degree. For Scription factor to be expressed in response to environmental, examples of flower specific promoters, see Kaiser et al. (Su tissue-specific or temporal signals. For example, G362 pra). For examples of other tissue-specific, temporal-specific induces ectopic trichomes on flowers but also produces Small or inducible promoters, see the above discussion under the plants. The former may be desirable to produce insect or heading “Vectors, Promoters, and Expression Systems’. TABLE 6 Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category Phenotype(s) genes that impact traits Utility Abiotic stress Effect of chilling on plants increased sensitivity G394; G864 improved germination, growth Ce3S(OO(8Ce G256; G664: G1274 rate, earlier planting, yield Germination in cold increased sensitivity G134 Earlier planting; improved Ce3S(OO(8Ce G224: G256; G664: Survival, yield G1274; G1322 Freezing tolerance Ce3S(OO(8Ce G912 Earlier planting; improved quality, Survival, yield Drought improved Survival, vigor, appearance, yield Heat increased sensitivity improved germination, growth Ce3S(OO(8Ce rate, later planting, yield Osmotic stress increased sensitivity Abiotic stress response manipulation G188: G325; G489; Improved germination rate, G1069; G1089; G1412: Survival, yield G1820 Salt tolerance increased sensitivity Manipulation of response to high Salt conditions increased tolerance Improved germination rate, Survival, yield; extended growth range Nitrogen stress More sensitive to N limitation G1136 Manipulation of response to low nutrient conditions Less sensitive to N limitation G153; G225; G226; improved yield and nutrient G1274; G1816; G2718 stress tolerance, decreased ertilizer usage Phosphate stress More sensitive to Plimitation Manipulation of response to low nutrient conditions Less sensitive to Plimitation improved yield and nutrient stress tolerance, decreased ertilizer usage Potassium stress increased tolerance to K improved yield and nutrient imitation stress tolerance, decreased ertilizer usage Oxidative stress increased sensitivity improved yield, quality, increased tolerance ultraviolet and chemical stress olerance Herbicide Glyphosate Generation of glyphosate resistant plants to improve weed control Hormone sensitivity Abscisic acid (ABA) sensitivity Reduced sensitivity or insensitive G1412: G1069; G 1820 Modification of seed to ABA development, improved seed dormancy, cold and dehydration tolerance Sensitivity to ethylene Hypersensitive to ACC G760 Manipulation of fruit ripening Altered response G1062; G1134 Manipulation of fruit ripening Insensitive to ethylene Manipulation of fruit ripening US 8,809,630 B2 91 92 TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category genes that impact traits Utility Disease increased Susceptibility G207: G248; G261; Manipulation of response to G865; G881; G1255 disease organism increased resistance or tolerance G28 Improved yield, appearance, Survival, extended range Fusarium increased Susceptibility G188: G545; G896 Manipulation of response to disease organism increased resistance or tolerance Improved yield, appearance, Survival, extended range Erysiphe increased Susceptibility G545; G865; G881 Manipulation of response to disease organism increased resistance or tolerance Improved yield, appearance, Survival, extended range

Pseudomonas increased Susceptibility GS45 Manipulation of response to disease organism increased resistance or tolerance G418; G525 Improved yield, appearance, Survival, extended range Scierotinia increased Susceptibility G477; G789; G805 Manipulation of response to disease organism increased resistance or tolerance G28 Improved yield, appearance, Survival, extended range Growth regulator Altered Sugar sensing Decreased tolerance to Sugars Alteration of energy balance, photosynthetic rate, carbohydrate accumulation, biomass production, source-sink relationships, Senescence; Increased tolerance to Sugars alteration of storage compound accumulation in seeds Altered C/N sensing Modification of sensing and respond to changes C and N metabolite levels, regulation of expression and activity of proteins involved in C and N transport and metabolism Flowering time Early flowering G142; G145; G146; Faster generation time; G157: G 180; G184: synchrony of flowering: G185; G208; G227; additional harvests within a G255; G390; G475; growing season, shortening of G590; G592; G627; breeding programs G789; G865; G1037; G1242; G1305; G1380; G1545; G1760; G1820; G1842; G1843; G2010; G2347 Late flowering G8; G157: G192: G198: Increased yield or biomass, G214; G234: G249; alleviate risk of transgenic G361; G434; G486; pollen escape, synchrony of G562; G571; G591; flowering G624; G680; G736; G738; G748; G752; G859; G878; G903; G9121 G971; G994: G1052; G 1073; G1136; G1335; G1435; G1451; G1468; G1474; G1493 Extended period of flowering G1947 Increased fertilization, yield, ornamental applications Development and Altered flower structure morphology Stamen G470; G615; G869; Ornamental modification of G1425; G1499; G1560 plant architecture, improved or Sepal G1075; G1326 reduced fertility to mitigate Petal G638; G671; G 1075; escape of transgenic pollen, G1326; G1449; G1499; improved fruit size, shape, G1560 number or yield US 8,809,630 B2 93 94 TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category Phenotype(s) genes that impact traits Utility Pedicel Carpel Homeotic transformation G134; G638; G779 Multiple alterations G187: G580; G615; G671; G732; G 1075; G1425; G1449; G1499; G1560; G1645 Enlarged floral organs Siliques G1134 Reduced fertility G559; G615; G638; G671; G779; G977; G1067; G 1075; G1266; G1311; G1326; G1560; G1645; G1947 Aerial rosettes Inflorescence architectural change Altered branching pattern Ornamental modification of Short internodes/bushy G385; G580; G865; flower architecture; timing of inflorescences G1274: G1425; G1474 flowering; altered plant habit for Internode elongation G1274 yield or harvestability benefit; Terminal flowers G145 reduction in pollen production Poorly developed inflorescences of genetically modified plants; Lack of inflorescence G1499 manipulation of seasonality and annual or perennial habit; manipulation of determinate vs. indeterminate growth Altered shoot meristem development Stem bifurcations G390 Ornamental modification of No shoot meristem G1472 plantarchitecture, manipulation Reduced meristem cell G1540 of growth and development, differentiation increase in leaf numbers, modulation of branching patterns to provide improved yield or biomass Altered branching pattern G438: G1499 Ornamental modification of plantarchitecture, improved odging resistance Altered phyllotaxy Ornamental modification of plantarchitecture Apical dominance Reduced apical dominance G559; G732; G1255; Ornamental modification of G1275; G1645 plantarchitecture Altered trichome density; development, or structure Reduced or no trichomes G25; G212; G225; Ornamental modification of G226; G676; G682: plantarchitecture, increased G1332: G1816; G2718 plant product (e.g., diterpenes, Ectopic trichomesfaltered G247 cotton) productivity, insect and trichome distribution or herbivore resistance development cell fate Increased trichome number, density and/or size Stem morphology and altered G438: G748 Modulation of lignin content; vascular tissue structure improvement of wood, palatability of fruits and vegetables Root development Increased root growth and Improved yield, stress tolerance; proliferation anchorage Decreased root growth Modification of root architecture and mass Increased root hairs G225, G.226, G682: Improved yield, stress tolerance; G1816; G2718 anchorage Altered seed development, G979 Modification of seed ripening and germination germination properties and performance Cell differentiation and cell G1540 Increase in carpel or fruit proliferation development; improve regeneration of shoots from callus in transformation or micro-propagation systems Rapid growth andfor Promote faster development and development reproduction in plants US 8,809,630 B2 95 96 TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category Phenotype(s) genes that impact traits Utility Slow growth G447; G740; G1062; Ornamental applications G1335; G1468; G1474 Senescence Premature Senescence G636; G1128 improvement in response to Delayed senescence G249; G571; G878: disease, fruit ripening G1OSO Lethality when overexpressed G374; G515; G578; Herbicide target; ablation of G877; G 1076; G1304 specific tissues or organs such as stamen to prevent pollen escape Necrosis G24 Disease resistance Plant size increased plantsize G46; G624; G 1073; improved yield, biomass, G1435; G1451; G1468 appearance Larger seedlings G1313 increased Survival and vigor of Seedlings, yield Dwarfed or more compact plants G3; G5; G21; G24; G27; Dwarfism, lodging resistance, G184: G194; G258: manipulation of gibberellin G280; G385; G447; responses G636; G670; G671; G738; G760; G831; G864; G869; G884; G932; G977: G991; G994; G1020; G1023; G1053; G1067; G 1075; G1181; G1228: G1266; G1267; G1275; G1277; G1309; G 1311; G 1314: G1317; G1322; G1323; G1326; G1332: G1334; G1382; G1474; G1545; G1560 Leaf morphology Dark green leaves G385; G447: G912; Increased photosynthesis, G932; G977: G1128; biomass, yield; ornamental G1267; G1323; G1327; applications G1334: G1499 Change in leaf shape G32; G224: G428; Ornamental applications G464; G629; G671; G736: G903; G905; G921; G932; G977; G1038; G1067; G 1073; G1075; G1269; G1468; G1493; G1645 Altered leaf development G1645 Ornamental applications Altered leaf size Increased leaf size and mass G438: G1274: G1451 Increased yield, ornamental applications Gray leaves G1468 Ornamental applications Variegation Ornamental applications Glossy leaves G975; G1267 Ornamental applications, manipulation of wax composition, amount, or distribution Cell expansion GS21 Ornamental applications; modification of adaptation to environmental changes or damage Seed morphology Altered seed coloration G156; G663; G668 Appearance, ornamental applications Seed size and shape Increased seed size G206; G584: G1255 Yield, appearance Decreased seed size G1145 Appearance Altered seed shape G1062; G1145 Appearance Leaf biochemistry Increased leaf wax G975; G1267 Insect, pathogen resistance Leaf prenyl lipids Reduced chlorophyll Improved antioxidant and Increase in tocopherols G280; G987 vitamin E. content, nutritional Increased lutein content G214 content; prevention of ARMD; Decreased lutein content G1133; G1324; G1328 modified photosynthetic Increase in chlorophyll or G214: G987: G1324 capability carotenoids Decrease in chlorophyll or carotenoids US 8,809,630 B2 97 98 TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category Phenotype(s) genes that impact traits Utility Leaf insoluble Sugars Increased leaf insoluble Sugars Alteration of plant cell wall including xylose, fucose, composition affecting food rhamnose, galactose, arabinose or digestibility, plant tensile 8OSC strength, wood quality, pathogen resistance and pulp production. Decreased leaf insoluble Sugars, G307: G428: G1012 Alteration of plant cell wall including rhamnose, arabinose or composition affecting food 8OSC digestibility, plant tensile strength, wood quality, pathogen resistance and pulp production. Increased leaf anthocyanins Increased photosynthesis, biomass, yield; ornamental applications Leaf fatty acids Reduction in leaf fatty acids G718; G1266; G1347 Modification of nutritional Increase in leaf fatty acids G214; G231 content; heat stability of essential oils Altered leaf glucosinolate content G185; G681; G1069; Modification of toxic G1198; G1322 glucosinolate content in animal feeds; anti-cancer activity Seed biochemistry Seed oil content Increased oil content Improved oil yield

Decreased oil content Reduced caloric content

Altered oil content Modification of seed caloric content and nutritional value Seed protein content Increased protein content Improved protein yield, nutritional value

Decreased protein content Reduced caloric content

Altered protein content Modification of seed caloric content and nutritional value Altered seed prenyl lipid content Improved antioxidant and vitamin E content; prevention of ARMD Seed glucosinolate Altered profile Modification of toxic glucosinolate content in animal feeds; anti-cancer activity Increased seed anthocyanins Ornamental applications Increased seed sterols G20 Alteration of human steroid precursors content, some of which have cholesterol-lowering activity Increased seed fatty acid G778; G861; G869; Modification of seed caloric composition G938: G965; G1399; content and nutritional value G1417 Decreased seed fatty acid G776; G791; G938; Modification of seed caloric composition G987: G1417 content and nutritional value Up-regulation of genes involved G229 Alteration of tolerance to insect in secondary metabolism herbivores, UV, oxidative stress, and pathogen attack; increased production of valuable alkaloid base medicines Root Biochemistry Increased root anthocyanins Ornamental applications, improved anti-cancer activity Light response shade Altered cotyledon, hypocotyl, G351; G1062; G1322 Potential for increased planting avoidance development; altered leaf densities and yield enhancement orientation; constitutive photomorphogenesis; photomorphogenesis in low light US 8,809,630 B2 99 100 TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor Trait Category Phenotype(s) genes that impact traits Utility Pigment Increased anthocyanin level G663; G1482 Enhanced health benefits, improved ornamental appearance, increased stress resistance, attraction of pollinating and seed-dispersing animals

Abbreviations: N = nitrogen P = phosphate ABA = abscisic acid C/N = carbon nitrogen balance

Detailed Description of Genes, Traits and Utilities that Affect Chilling tolerance could also serve as a model for under Plant Characteristics standing how plants adapt to water deficit. Both chilling and The following descriptions of traits and utilities associated water stress share similar sensory transduction pathways and with the present transcription factors offer a more compre tolerance/adaptation mechanisms. For example, acclimation hensive description than that provided in Table 6. to chilling temperatures can be induced by water stress or Abiotic Stress, General Considerations treatment with abscisic acid. Genes induced by low tempera Plant transcription factors can modulate gene expression, ture include dehydrins (or LEA proteins). Dehydrins are also and, in turn, be modulated by the environmental experience of 25 induced by Salinity, abscisic acid, water stress or during the a plant. Significant alterations in a plant's environment invari late stages of embryogenesis. Thus, genes that protect the ably result in a change in the plant's transcription factor gene plant against chilling could also have a role in protection expression pattern. Altered transcription factor expression against water deficit. patterns generally result in phenotypic changes in the plant. Abiotic Stress: Cold Germination. Transcription factor gene product(s) in transgenic plants then 30 Several of the presently disclosed transcription factor differ(s) in amounts or proportions from that found in wild genes confer better germination and growth in cold condi type or non-transformed plants, and those transcription fac tions. For example, the improved germination in cold condi tors likely represent polypeptides that are used to alter the tions seen with G224, G256, G664, G1274, and G1322, indi response to the environmental change. By way of example, it cates a role in regulation of cold responses by these genes and is well accepted in the art that analytical methods based on 35 their equivalogs. These genes might be engineered to manipu altered expression patterns may be used to screen for pheno late the response to low temperature stress. Genes that would typic changes in a plant far more effectively than can be allow germination and seedling vigor in the cold would have achieved using traditional methods. highly significant utility in allowing seeds to be planted ear Abiotic Stress: Adult Stage Chilling. lier in the season with a high rate of survival. Transcription Enhanced chilling tolerance may extend the effective 40 factor genes that confer better survival in cooler climates growth range of chilling sensitive crop species by allowing allow a grower to move up planting time in the spring and earlier planting or later harvest. a gene that enhances growth extend the growing season further into autumn for higher crop under Such conditions could also enhance yields, extend the yields. Germination of seeds and Survival at temperatures effective growth range of chilling sensitive crop species, and significantly below that of the mean temperature required for reduce fertilizer and herbicide usage. Chilling tolerance 45 germination of seeds and Survival of non-transformed plants could also serve as a model for understanding how plants would increase the potential range of a crop plant into regions adapt to water deficit. Both chilling and water stress share in which it would otherwise fail to thrive. similar signal transduction pathways and tolerance/adapta Abiotic Stress: Freezing Tolerance and Osmotic Stress. tion mechanisms. For example, acclimation to chilling tem Presently disclosed transcription factor genes, including peratures can be induced by water stress or treatment with 50 G188, G325, G489, G1069, G1089, G1412, G1820 and their abscisic acid. Genes induced by low temperature include equivalogs, that may increase germination rate and/or growth dehydrins (or LEA proteins). Dehydrins are also induced by under adverse osmotic conditions, could impact Survival and salinity, abscisic acid, water stress, and during the late stages yield of seeds and plants. Osmotic stresses may be regulated of embryogenesis. by specific molecular control mechanisms that include genes Another large impact of chilling occurs during post-har 55 controlling water and ion movements, functional and struc Vest storage. For example, Some fruits and vegetables do not tural stress-induced proteins, signal perception and transduc store well at low temperatures (for example, bananas, avoca tion, and free radical scavenging, and many others (Wang et dos, melons, and tomatoes). The normal ripening process of al. (2001) Acta Hort. (ISHS) 560: 285-292). Instigators of the tomato is impaired if it is exposed to cool temperatures. osmotic stress include freezing, drought and high salinity, Transcription factor genes conferring resistance to chilling 60 each of which are discussed in more detail below. temperatures, including G256, G664, G1274 and their In many ways, freezing, high salt and drought have similar equivalogs, may thus enhance tolerance during post-harvest effects on plants, not the least of which is the induction of Storage. common polypeptides that respond to these different stresses. Improved chilling tolerance may be conferred by increased For example, freezing is similar to water deficit in that freez expression of glycerol-3-phosphate acetyltransferase in chlo 65 ing reduces the amount of water available to a plant. Exposure roplasts (see, for example, Wolter et al. (1992) et al. EMBO.J. to freezing temperatures may lead to cellular dehydration as 4685-4692. water leaves cells and forms ice crystals in intercellular US 8,809,630 B2 101 102 spaces (Buchanan, Supra). As with high Salt concentration In addition, by providing improved nitrogen uptake capabil and freezing, the problems for plants caused by low water ity, these genes can be used to alter seed protein amounts availability include mechanical stresses caused by the with and/or composition in Such a way that could impact yield as drawal of cellular water. Thus, the incorporation of transcrip well as the nutritional value and production of various food tion factors that modify a plant's response to osmotic stress or 5 products. improve tolerance to (e.g., by G912 or its equivalogs) into, for example, a crop or ornamental plant, may be useful in reduc A number of the transcription factor-overexpressing lines ing damage or loss. Specific effects caused by freezing, high make less anthocyanin on high Sucrose plus glutamine indi salt and drought are addressed below. cates that these genes can be used to modify carbon and Abiotic Stress: Drought and Low Humidity Tolerance. nitrogen status, and hence assimilate partitioning (assimilate Exposure to dehydration invokes similar Survival strategies 10 partitioning refers to the manner in which an essential ele in plants as does freezing stress (see, for example, Yelenosky ment, Such as nitrogen, is distributed among different pools (1989) Plant Physiol89: 444-451) and drought stress induces inside a plant, generally in a reduced form, for the purpose of freezing tolerance (see, for example, Siminovitch et al. transport to various tissues). (1982) Plant Physiol 69: 250-255; and Guy et al. (1992) Increased Tolerance of Plants to Oxidative Stress. Planta 188: 265-270). In addition to the induction of cold 15 acclimation proteins, strategies that allow plants to Survive in In plants, as in all living things, abiotic and biotic stresses low water conditions may include, for example, reduced Sur induce the formation of oxygen radicals, including SuperoX face area, or Surface oil or wax production. A number of ide and peroxide radicals. This has the effect of accelerating presently disclosed transcription factor genes, e.g., G46. senescence, particularly in leaves, with the resulting loss of G912, and G 1820, increase a plants tolerance to low water yield and adverse effect on appearance. Generally, plants that conditions and, along with their equivalogs, would provide have the highest level of defense mechanisms. Such as, for the benefits of improved survival, increased yield and an example, polyunsaturated moieties of membrane lipids, are extended geographic and temporal planting range. most likely to thrive under conditions that introduce oxidative Abiotic Stress: Heat Stress Tolerance. stress (e.g., high light, oZone, water deficit, particularly in The germination of many crops is also sensitive to high 25 combination). Introduction of the presently disclosed tran temperatures. Presently disclosed transcription factor genes scription factor genes, including G477, G789 and their that provide increased heat tolerance, including G464, G682, equivalogs, that increase the level of oxidative stress defense G864, G964, G1305 and their equivalogs, would be generally mechanisms would provide beneficial effects on the yield and useful in producing plants that germinate and grow in hot appearance of plants. One specific oxidizing agent, oZone, conditions, may find particular use for crops that are planted 30 has been shown to cause significant foliar injury, which late in the season, or extend the range of a plant by allowing impacts yield and appearance of crop and ornamental plants. growth in relatively hot climates. In addition to reduced foliar injury that would be found in Abiotic Stress: Salt. oZone resistant plant created by transforming plants with The genes in Table 6 that provide tolerance to salt may be Some of the presently disclosed transcription factor genes, the used to engineer salt tolerant crops and trees that can flourish 35 latter have also been shown to have increased chlorophyll in soils with high Saline content or under drought conditions. fluorescence (Yu-Sen Changetal. (2001) Bot. Bull. Acad. Sin. In particular, increased salt tolerance during the germination 42: 265-272). stage of a plant enhances Survival and yield. Presently dis Decreased Herbicide Sensitivity. closed transcription factor genes, including G22, G196, Presently disclosed transcription factor genes that confer G226, G482, G624, G801, G867, G884, and their equivalogs 40 resistance or tolerance to herbicides (e.g., glyphosate) will that provide increased salt tolerance during germination, the find use in providing means to increase herbicide applications seedling stage, and throughout a plant's life cycle, would find without detriment to desirable plants. This would allow for particular value for imparting Survival and yield in areas the increased use of a particular herbicide in a local environ where a particular crop would not normally prosper. ment, with the effect of increased detriment to undesirable Nutrient Uptake and Utilization: Nitrogen and Phospho 45 species and less harm to transgenic, desirable cultivars. U.S. Knockouts of a number of the presently disclosed tran Presently disclosed transcription factor genes introduced scription factor genes, including G374 and G877, have been into plants provide a means to improve uptake of essential shown to be lethal to developing embryos. Thus, these genes nutrients, including nitrogenous compounds, phosphates, and their equivalogs are potentially useful as herbicide tar potassium, and trace minerals. The enhanced performance of 50 getS. for example, G153, G225, G.226, G682, G1274, G1816, Hormone Sensitivity. G2718 and other overexpressing lines under low nitrogen, ABA plays regulatory roles in a host of physiological pro and G545 and G624 under low phosphorous conditions indi cesses in all higher as well as in lower plants (Davies et al. cate that these genes and their equivalogs can be used to (1991) Abscisic Acid: Physiology and Biochemistry. Bios engineer crops that could thrive under conditions of reduced 55 Scientific Publishers, Oxford, UK: Zeevaartet al. (1988) Ann nutrient availability. Phosphorus, in particular, tends to be a Rev Plant Physiol. Plant Mol. Biol. 49: 439-473; Shimizu limiting nutrient in Soils and is generally added as a compo Sato et al. (2001) Plant Physiol 127: 1405-1413). ABA medi nent in fertilizers. Young plants have a rapid intake of phos ates stress tolerance responses in higher plants, is a key signal phate and sufficient phosphate is important for yield of root compound that regulates stomatal aperture and, in concert crops such as carrot, potato and parsnip. 60 with other plant signaling compounds, is implicated in medi The effect of these modifications is to increase the seedling ating responses to pathogens and wounding or oxidative dam germination and range of ornamental and crop plants. The age (for example, see Larkindale et al. (2002) Plant Physiol. utilities of presently disclosed transcription factor genes con 128: 682-695). In seeds, ABA promotes seed development, ferring tolerance to conditions of low nutrients also include embryo maturation, synthesis of storage products (proteins cost savings to the grower by reducing the amounts of fertil 65 and lipids), desiccation tolerance, and is involved in mainte izer needed, environmental benefits of reduced fertilizer run nance of dormancy (inhibition of germination), and apoptosis off into watersheds; and improved yield and stress tolerance. (Zeevaart et al. (1988) Ann Rev Plant Physiol. Plant Mol. US 8,809,630 B2 103 104 Biol. 49: 439-473; Davies (1991), supra: Thomas (1993) regulatory molecules that control several aspects of plant Plant Cell 5: 1401-1410; and Bethke et al. (1999) Plant Cell physiology, metabolism and development (Hsiehet al. (1998) 11: 1033-1046). ABA also affects plant architecture, includ Proc. Natl. Acad. Sci.95: 13965-13970). It is thought that this ing root growth and morphology and root-to-shoot ratios. control is achieved by regulating gene expression and, in ABA action and metabolism is modulated not only by envi higher plants, Sugars have been shown to repress or activate ronmental signals but also by endogenous signals generated plant genes involved in many essential processes such as by metabolic feedback, transport, hormonal cross-talk and photosynthesis, glyoxylate metabolism, respiration, starch developmental stage. Manipulation of ABA levels, and hence and Sucrose synthesis and degradation, pathogen response, by extension the sensitivity to ABA, has been described as a wounding response, cell cycle regulation, pigmentation, very promising means to improve productivity, performance 10 flowering and senescence. The mechanisms by which Sugars and architecture in plants Zeevaart (1999) in: Biochemistry control gene expression are not understood. and Molecular Biology of Plant Hormones, Hooykaas et al. Because Sugars are important signaling molecules, the eds, Elsevier Science pp 189-207; and Cutler et al. (1999) ability to control either the concentration of a signaling Sugar Trends Plant Sci. 4: 472-478). or how the plant perceives or responds to a signaling Sugar A number of the presently disclosed transcription factor 15 could be used to control plant development, physiology or genes affect plant abscisic acid (ABA) sensitivity, including metabolism. For example, the flux of sucrose (a disaccharide G1069, G1412, and G1820. Thus, by affecting ABA sensi Sugar used for systemically transporting carbon and energy in tivity, these introduced transcription factor genes and their most plants) has been shown to affect gene expression and equivalogs would affect cold, drought, oxidative and other alter storage compound accumulation in seeds. Manipulation stress sensitivities, plant architecture, and yield. of the Sucrose signaling pathway in seeds may therefore cause Several other of the present transcription factor genes have seeds to have more protein, oil or carbohydrate, depending on been used to manipulate ethylene signal transduction and the type of manipulation. Similarly, in tubers. Sucrose is con response pathways. These genes, including G760, G1062, Verted to starch which is used as an energy store. It is thought G1134 and their equivalogs, may thus be used to manipulate that Sugar signaling pathways may partially determine the the processes influenced by ethylene, Such as seed germina 25 levels of starch synthesized in the tubers. The manipulation of tion or fruit ripening, and to improve seed or fruit quality. Sugar signaling in tubers could lead to tubers with a higher Diseases, Pathogens and Pests. starch content. A number of the presently disclosed transcription factor Thus, the presently disclosed transcription factor genes genes have been shown to or are likely to affect a plants that manipulate the Sugar signal transduction pathway, response to various plant diseases, pathogens and pests. The 30 including G26, G38, G43, G207, G224, G241, G254, G263, offending organisms include fungal pathogens Fusarium G308, G536, G567, G680, G782, G783,G867, G905, G912, Oxysporum, Botrytis cinerea, Sclerotinia Sclerotiorum, and G996, G1068, G1314, G1347, and G1493, along with their Erysiphe Orontii. Bacterial pathogens to which resistance equivalogs, may lead to altered gene expression to produce may be conferred include Pseudomonas Syringae. Other plants with desirable traits. In particular, manipulation of problem organisms may potentially include nematodes, mol 35 Sugar signal transduction pathways could be used to alter licutes, parasites, or herbivorous arthropods. In each case, one Source-sink relationships in seeds, tubers, roots and other or more transformed transcription factor genes may provide storage organs leading to increase in yield. some benefit to the plant to help prevent or overcome infes Growth Regulator: C/N Sensing. tation, or be used to manipulate any of the various plant Nitrogen and carbon metabolism are tightly linked in responses to disease. These mechanisms by which the tran 40 almost every biochemical pathway in the plant. Carbon Scription factors work could include increasing Surface waxes metabolites regulate genes involved in N acquisition and or oils, Surface thickness, or the activation of signal transduc metabolism, and are known to affect germination and the tion pathways that regulate plant defense in response to expression of photosynthetic genes (Coruzzi et al. (2001) attacks by herbivorous pests (including, for example, pro Plant Physiol. 125: 61-64) and hence growth. Early studies on tease inhibitors). Another means to combat fungal and other 45 nitrate reductase (NR) in 1976 showed that NR activity could pathogens is by accelerating local cell death or senescence, be affected by Glc/Suc (Crawford (1995) Plant Cell 7:859 mechanisms used to impair the spread of pathogenic micro 886; Daniel-Vedele et al. (1996) CR Acad Sci Paris 319: organisms throughout a plant. For instance, the best known 961-968). Those observations were supported by later experi example of accelerated cell death is the resistance gene-me ments that showed sugars induce NR mRNA in dark-adapted, diated hypersensitive response, which causes localized cell 50 green seedlings (Cheng CL, et al. (1992) Proc Natl Acad Sci death at an infection site and initiates a systemic defense USA 89: 1861-1864). C and N may have antagonistic rela response. Because many defenses, signaling molecules, and tionships as signaling molecules; light induction of NRactiv signal transduction pathways are common to defense against ity and mRNA levels can be mimicked by C metabolites and different pathogens and pests, such as fungal, bacterial, N-metabolites cause repression of NR induction in tobacco oomycete, nematode, and insect, transcription factors that are 55 (Vincentz et al. (1992) Plant J 3: 315-324). Gene regulation implicated in defense responses against the fungal pathogens by C/N status has been demonstrated for a number of N-meta tested may also function in defense against other pathogens bolic genes (Stitt (1999) Curr. Opin. Plant. Biol. 2:178-186); and pests. These transcription factors include, for example, CoruZZi et al. (2001) Supra). Thus, transcription factor genes G28 (improved resistance or tolerance to Botrytis), G19, G28, that affect C/N sensing such as G153 or its equivalogs can be G237, G378, G409, G591, G616, G869, G1048, G1266 (im 60 used to alter or improve germination and growth under nitro proved resistance or tolerance to Erysiphe), G418, G525 (im gen-limiting conditions. proved resistance or tolerance to Pseudomonas), G28 (im Flowering Time: Early, Late and Inducible Flowering. proved resistance or tolerance to Sclerotinia), and their Early Flowering. equivalogs. Presently disclosed transcription factor genes that acceler Growth Regulator: Sugar Sensing. 65 ate flowering, which include G142, G145, G146, G153, In addition to their important role as an energy source and G157, G180, G184, G185, G208, G227, G255, G390, G475, structural component of the plant cell, Sugars are central G590, G592, G627, G789, G865, G1037, G1242, G1305, US 8,809,630 B2 105 106 G1380, G1545, G1760, G1820, G1842, G1843, G1844, G736, G738, G748, G752, G859, G878, G903, G9121 G971, G2010, G2347, and their functional equivalogs, could have G994, G1052, G1073, G1136, G1335, G1435, G1451, valuable applications in Such programs, since they allow G1468, G1474, G1493, and equivalogs, delay flowering time much faster generation times. In a number of species, for in transgenic plants. Extending vegetative development with example, broccoli, cauliflower, where the reproductive parts 5 presently disclosed transcription factor genes could thus of the plants constitute the crop and the vegetative tissues are bring about large increases in yields. Prevention of flowering discarded, it would be advantageous to accelerate time to can help maximize vegetative yields and prevent escape of flowering. Accelerating flowering could shorten crop and tree genetically modified organism (GMO) pollen. breeding programs. Additionally, in some instances, a faster Presently disclosed transcription factors that extend flow generation time would allow additional harvests of a crop to 10 ering time, including G1947, have utility in engineering be made within a given growing season. A number of Arabi dopsis genes have already been shown to accelerate flowering plants with longer-lasting flowers for the horticulture indus when constitutively expressed. These include LEAFY. try, and for extending the time in which the plant is fertile. APETALA1 and CONSTANS (Mandel et al. (1995) Nature A number of the presently disclosed transcription factors 377:522-524; Weigel and Nilsson (1995) Nature 377:et al. 15 may extend flowering time, and delay flower abscission, 495-500; Simon et al. (1996) Nature 384: 59-62). which would have utility in engineering plants with longer With the advent of transformation systems for tree species lasting flowers for the horticulture industry. This would pro Such as oil palm and Eucalyptus, forest biotechnology is a vide a significant benefit to the ornamental industry, for both growing area of interest. Acceleration of flowering, again, cut flowers and woody plant varieties (of, for example, might reduce generation times and make breeding programs maize), as well as have the potential to lengthen the fertile feasible which would otherwise be impossible, such as with period of a plant, which could positively impact yield and plants with multi-year cycles (such as biennials, e.g. carrot, or breeding programs. fruit trees, such as citrus) that can be very slow to develop and General Development and Morphology: Flower Structure begin flowering. That this is a real possibility has already been and Inflorescence: Architecture. Altered Flower Organs, demonstrated in aspen, a tree species that usually takes 8-20 25 Reduced Fertility, Multiple Alterations, Aerial Rosettes, years to flower. Transgenic aspen that over-express the Ara Branching, Internode Distance, Terminal Flowers and Phase bidopsis LFY gene flower after only 5 months. The flowers Change. produced by these young aspen plants, however, were sterile; Presently disclosed transgenic transcription factors such as the challenge of producing fertile early flowering trees there G1134, G187, G470, G580, G615, G638, G671, G732, G779, fore still remains (Weigel, D. and Nilsson, O., 1995, Nature 30 377,495-500). G869, G1075, G1134, G1326, G1425, G1449, G1499, Breeding programs for the development of new varieties G1645, and their equivalogs, may be used to create plants can be limited by the seed-to-seed cycle. with larger flowers or arrangements offlowers that are distinct Inducible Flowering. from wild-type or non-transformed cultivars. This would By regulating the expression of potential flowering using 35 likely have the most value for the ornamental horticulture inducible promoters, flowering could be triggered by appli industry, where larger flowers or interesting floral configura cation of an inducer chemical. This would allow flowering to tions are generally preferred and command the highest prices. be synchronized across a crop and facilitate more efficient Flower structure may have advantageous or deleterious harvesting (e.g., strawberry). Such inducible systems could effects on fertility, and could be used, for example, to also be used to tune the flowering of crop varieties to different 40 decrease fertility by the absence, reduction or screening of latitudes. At present, species such as soybean and cotton are reproductive components. In fact, plants that overexpress a available as a series of maturity groups that are suitable for sizable number of the presently disclosed transcription factor different latitudes on the basis of their flowering time (which genes e.g., G559, G615, G638, G671, G779, G977, G1067, is governed by day-length). A system in which flowering G1075, G1266, G1311, G1326, G1645, G1947 and their could be chemically controlled would allow a single high 45 functional equivalogs, possess reduced fertility; flowers are yielding northern maturity group to be grown at any latitude. infertile and fail to yield seed. These could be desirable traits, In Southern regions such plants could be grown for longer as low fertility could be exploited to prevent or minimize the periods before flowering was induced, thereby increasing escape of the pollen of genetically modified organisms yields. In more northern areas, the induction would be used to (GMOs) into the environment. ensure that the crop flowers prior to the first winter frosts. 50 The morphological phenotype shown by plants overex Currently, the existence of a series of maturity groups for pressing some of the present transcription factors indicate different latitudes represents a major barrier to the introduc that these genes and their equivalogs may be used to alter tion of new valuable traits. Any trait (e.g. disease resistance) inflorescence architecture. In particular, a reduction in has to be bred into each of the different maturity groups pedicellength and a change in the position at which flowers separately; a laborious and costly exercise. The availability of 55 and fruits are held, might influence harvesting or pollination single strain, which could be grown at any latitude, would efficiency. Additionally, such changes may produce attractive therefore greatly increase the potential for introducing new novel forms for the ornamental markets. traits to crop species such as soybean and cotton. One interesting application for manipulation of flower Late Flowering. structure, for example, by introduced transcription factors In a sizeable number of species, for example, root crops, 60 could be in the increased production of edible flowers or where the vegetative parts of the plants constitute the crop flower parts, including saffron, which is derived from the (e.g., onions, lettuce) and the reproductive tissues are dis Stigmas of Crocus sativus. carded, it is advantageous to identify and incorporate tran Genes that later silique conformation in brassicates may be Scription factor genes that delay or prevent flowering in order used to modify fruit ripening processes in brassicates and to prevent resources being diverted into reproductive devel 65 other plants, which may positively affect seed or fruit quality. opment. For example, G8, G0157, G192, G198, G214, G234, A number of the presently disclosed transcription factors G249, G361, G434, G486, G562, G571, G591, G624, G680, may affect the timing of phase changes in plants. Since the US 8,809,630 B2 107 108 timing or phase changes generally affects a plant's eventual way meristems and/or cells develop during different phases size, these genes may prove beneficial by providing means for of the plant life cycle. In particular, altering the timing of improving yield and biomass. phase changes could afford positive effects on yield and bio General Development and Morphology: Shoot Meristem mass production. and Branching Patterns. General Development and Morphology: Stem Morphol Several of the presently disclosed transcription factor ogy and Altered Vascular Tissue Structure. genes, including G390, when introduced into plants, have Plants transformed with transcription factor genes that been shown to cause stem bifurcations in developing shoots in modify stem morphology or lignin content may be used to which the shoot meristems split to form two or three separate affect overall plant architecture and the distribution of ligni shoots. These transcription factors and their functional 10 equivalogs may thus be used to manipulate branching. This fied fiber cells within the stem. would provide a unique appearance, which may be desirable Modulating lignin content might allow the quality of wood in ornamental applications, and may be used to modify lateral used for furniture or construction to be improved. Lignin is branching for use in the forestry industry. A reduction in the energy rich; increasing lignin composition could therefore be formation of lateral branches (e.g., with G1499 or equivalogs) 15 valuable in raising the energy content of wood used for fuel. could reduce knot formation. Conversely, increasing the Conversely, the pulp and paper industries seek wood with a number of lateral branches (e.g., with G438 or equivalogs) reduced lignin content. Currently, lignin must be removed in could provide utility when a plant is used as a view- or a costly process that involves the use of many polluting windscreen. chemicals. Consequently, lignin is a serious barrier to effi General Development and Morphology: Apical Domi cient pulp and paper production (Tzfira et al. (1998) aCC TIBTECH 16:439-446: Robinson (1999) Nature Biotechnol The modified expression of presently disclosed transcrip ogy 17: 27-30). In addition to forest biotechnology applica tion factors (e.g., G.559, G732, G1255, G1275, G1645, and tions, changing lignin content by selectively expressing or their equivalogs) that reduce apical dominance could be used repressing transcription factors in fruits and vegetables might in ornamental horticulture, for example, to modify plant 25 increase their palatability. architecture, for example, to produce a shorter, more bushy Transcription factors that modify stem structure, including stature than wildtype. The latterform would have ornamental G438, G748 and their equivalogs, may also be used to achieve utility as well as provide increased resistance to lodging. reduction of higher-order shoot development, resulting in General Development and Morphology: Trichome Den significant plant architecture modification. Overexpression sity, Development or Structure. 30 of the genes that encode these transcription factors in Woody Several of the presently disclosed transcription factor plants might result in trees that lack side branches, and have genes have been used to modify trichome number, density, fewer knots in the wood. Altering branching patterns could trichome cell fate, amount of trichome products produced by also have applications amongst ornamental and agricultural plants, or produce ectopic trichome formation. These may crops. For example, applications might exist in any species include G25, G212, G225, G.226, G247, G634, G676, G682, 35 where secondary shoots currently have to be removed manu G1332 G 1816, G2718, and their equivalogs. In most cases ally, or where changes in branching pattern could increase where the metabolic pathways are impossible to engineer, yield or facilitate more efficient harvesting. increasing trichome density or size on leaves may be the only General Development and Morphology: Altered Root way to increase plant productivity. Thus, by increasing tri Development. chome density, size or type, these trichome-affecting genes 40 By modifying the structure or development of roots by and their functional equivalogs would have profound utilities transforming into a plant one or more of the presently dis in molecular farming practices by making use of trichomes as closed transcription factor genes, including G9, G.225, G.226, a manufacturing system for complex secondary metabolites. G1482, and their equivalogs, plants may be produced that Trichome glands on the Surface of many higher plants have the capacity to thrive in otherwise unproductive soils. produce and secrete exudates that give protection from the 45 For example, grape roots extending further into rocky soils elements and pests Such as insects, microbes and herbivores. would provide greater anchorage, greater coverage with These exudates may physically immobilize insects and increased branching, or would remain viable in waterlogged spores, may be insecticidal orant-microbial or they may act as soils, thus increasing the effective planting range of the crop allergens or irritants to protect against herbivores. By modi and/or increasing yield and Survival. It may be advantageous fying trichome location, density or activity with presently 50 to manipulate a plant to produce short roots, as when a soil in disclosed transcription factors that modify these plant char which the plant will be growing is occasionally flooded, or acteristics, plants that are better protected and higher yielding when pathogenic fungi or disease-causing nematodes are may be the result. prevalent. A potential application for these trichome-affecting genes In addition, presently disclosed transcription factors and their equivalogs also exists in cotton: cotton fibers are 55 including G225; G226; G682: G1816: G2718 and their modified unicellular trichomes that develop from the outer equivalogs may be used to increase root hair density and thus ovule epidermis. In fact, only about 30% of these epidermal increase tolerance to abiotic stresses, thereby improvingyield cells develop into trichomes, but all have the potential to and quality. develop a trichome fate. Trichome-affecting genes can trigger General Development and Morphology: Seed Develop an increased number of these cells to develop as trichomes 60 ment, Ripening and Germination Rate. and thereby increase the yield of cotton fibers. Since the A number of the presently disclosed transcription factor mallow family is closely related to the Brassica family, genes genes (e.g., G979) have been shown to modify seed develop involved in trichome formation will likely have homologs in ment and germination rate, including when the seeds are in cotton or function in cotton. conditions normally unfavorable for germination (e.g., cold, If the effects on trichome patterning reflect a general 65 heat or salt stress, or in the presence of ABA), and may, along change in heterochronic processes, trichome-affecting tran with functional equivalogs, thus be used to modify and Scription factors or their equivalogs can be used to modify the improve germination rates under adverse conditions. US 8,809,630 B2 109 110 General Development and Morphology: Cell Differentia Transcription factors that promote faster development Such as tion and Cell Proliferation. G807 and its functional equivalogs may also be used to Several of the disclosed transcription factors regulate cell modify the reproductive cycle of plants. proliferation and/or differentiation, including G1540 and its General Development and Morphology: Slow Growth functional equivalogs. Control of these processes could have valuable applications in plant transformation, cell culture or Rate. micro-propagation systems, as well as in control of the pro A number of the presently disclosed transcription factor liferation of particular useful tissues or cell types. Transcrip genes, including G447, G740, G1062, G1335, G1468, and tion factors that induce the proliferation of undifferentiated G1474, have been shown to have significant effects on retard cells can be operably linked with an inducible promoter to ing plant growth rate and development. These observations promote the formation of callus that can be used for transfor 10 have included, for example, delayed growth and development mation or production of cell Suspension cultures. Transcrip of reproductive organs. Slow growing plants may be highly tion factors that prevent cells from differentiating, such as desirable to ornamental horticulturists, both for providing G1540 or its equivalogs, could be used to confer stem cell house plants that display little change in their appearance over identity to cultured cells. Transcription factors that promote differentiation of shoots could be used in transformation or 15 time, or outdoor plants for which wild-type or rapid growth is micro-propagation systems, where regeneration of shoots undesirable (e.g., ornamental palm trees). Slow growth may from callus is currently problematic. In addition, transcrip also provide for a prolonged fruiting period, thus extending tion factors that regulate the differentiation of specific tissues the harvesting season, particularly in regions with long grow could be used to increase the proportion of these tissues in a ing seasons. Slow growth could also provide a prolonged plant. Genes that promote the differentiation of carpel tissue period in which pollen is available for improved self- or could be introduced into commercial species to induce for cross-fertilization, or cross-fertilization of cultivars that nor mation of increased numbers of carpels or fruits. A particular mally flower over non-overlapping time periods. The latter application might exist in Saffron, one of the world's most aspect may be particularly useful to plants comprising two or expensive spices. Saffron filaments, or threads, are actually more distinct grafted cultivars (e.g., fruit trees) with normally the dried Stigmas of the Saffron flower, Crocus sativus Lin 25 non-overlapping flowering periods. neaus. Each flower contains only three Stigmas, and more General Development And Morphology: Senescence. than 75,000 of these flowers are needed to produce just one Presently disclosed transcription factor genes may be used pound of Saffron filaments. An increase in carpel number to alter senescence responses in plants. Although leaf Senes would increase the quantity of Stigmatic tissue and improve cence is thought to be an evolutionary adaptation to recycle yield. 30 nutrients, the ability to control senescence in an agricultural General Development and Morphology: Cell Expansion. setting has significant value. For example, a delay in leaf Plant growth results from a combination of cell division senescence in some maize hybrids is associated with a sig and cell expansion. Transcription factors such as G521 or its nificant increase in yields and a delay of a few days in the equivalogs may be useful in regulation of cell expansion. senescence of soybean plants can have a large impact on Altered regulation of cell expansion could affect stem length, 35 yield. In an experimental setting, tobacco plants engineered an important agronomic characteristic. For instance, short to inhibit leaf Senescence had a longer photosynthetic cultivars of wheat contributed to the Green Revolution, lifespan, and produced a 50% increase in dry weight and seed because plants that put fewer resources into stem elongation yield (Gan and Amasino (1995) Science 270: 1986-1988). allocate more resources into developing seed and produce Delayed flower senescence caused by overexpression of tran higher yield. These plants are also less vulnerable to wind and 40 scription factors (e.g., G249, G571, G878, G1050 or their rain damage. These cultivars were found to be altered in their equivalogs) may generate plants that retain their blossoms sensitivity to gibberellins, hormones that regulate stem elon longer and this may be of potential interest to the ornamental gation through control of both cell expansion and cell divi horticulture industry, and delayed foliar and fruit senescence Sion. Altered cell expansion in leaves could also produce could improve post-harvest shelf-life of produce. novel and ornamental plant forms. 45 Premature senescence caused by, for example, G636, General Development and Morphology: Phase Change and G1128 and their equivalogs may be used to improve a plants Floral Reversion. response to disease and hasten fruit ripening. Transcription factors that regulate phase change can modu Growth Rate and Development: Lethality and Necrosis. late the developmental programs of plants and regulate devel Overexpression of transcription factors, for example, G24. opmental plasticity of the shoot meristem. In particular, these 50 G374, G515; G578, G877, G1076, G1304 and their equiva genes might be used to manipulate seasonality and influence logs that have a role in regulating cell death may be used to whether plants display an annual or perennial habit. induce lethality in specific tissues or necrosis in response to General Development and Morphology: Rapid Growth pathogen attack. For example, if a transcription factor gene and/or Development. inducing lethality or necrosis was specifically active in A number of the presently disclosed transcription factor 55 gametes or reproductive organs, its expression in these tissues genes have been shown to have significant effects on plant would lead to ablation and subsequent male or female steril growth rate and development. These observations have ity. Alternatively, under pathogen-regulated expression, a included, for example, more rapid or delayed growth and necrosis-inducing transcription factor can restrict the spread development of reproductive organs. Thus, by causing more of a pathogen infection through a plant. rapid development, genes that induce rapid growth or devel 60 Plant Size: Large Plants. opment and their functional equivalogs would prove useful Plants overexpressing G46, G624, G1073, G1435, G1451, for regions with short growing seasons; other transcription and G1468, for example, have been shown to be larger than factors that delay development may be useful for regions with controls. For some ornamental plants, the ability to provide longer growing seasons. Accelerating plant growth would larger varieties with these genes or their equivalogs may be also improve early yield or increase biomass at an earlier 65 highly desirable. For many plants, including fruit-bearing stage, when Such is desirable (for example, in producing trees, trees that are used for lumber production, or trees and forestry products or vegetable sprouts for consumption). shrubs that serve as view or wind screens, increased stature US 8,809,630 B2 111 112 provides improved benefits in the forms of greater yield or Leaf Morphology: Changes in Leaf Shape. improved screening. Crop species may also produce higher Presently disclosed transcription factors produce marked yields on larger cultivars, particularly those in which the and diverse effects on leaf development and shape. The tran vegetative portion of the plant is edible. scription factors include G32; G224: G428: G464; G629; Plant Size: Large Seedlings. G671; G736: G903; G905; G921; G932; G977; G1038; Presently disclosed transcription factor genes, that produce G1067; G 1073; G 1075; G1269; G1493; G1645, G1468, and large seedlings can be used to produce crops that become their equivalogs. At early stages of growth, transgenic seed established faster. Large seedlings are generally hardier, less lings have developed narrow, upward pointing leaves with Vulnerable to stress, and better able to out-compete weed long petioles, possibly indicating a disruption in circadian 10 clock controlled processes or nyctinastic movements. Other species. Seedlings transformed with presently disclosed tran transcription factor genes can be used to alter leaf shape in a scription factors, including G1313, for example, have been significant manner from wild type, Some of which may find shown to possess larger cotyledons and were more develop use in ornamental applications. mentally advanced than control plants. Rapid seedling devel Leaf Morphology: Altered Leaf Size. opment made possible by manipulating expression of these 15 Large leaves, such as those produced in plants overexpress genes or their equivalogs is likely to reduce loss due to dis ing G438, G1274, G1451 and their functional equivalogs, eases particularly prevalent at the seedling stage (e.g., damp generally increase plant biomass. This provides benefit for ing off) and is thus important for Survivability of plants ger crops where the vegetative portion of the plant is the market minating in the field or in controlled environments. able portion. Plant Size: Dwarfed Plants. Leaf Morphology: Light Green, Gray and Variegated Presently disclosed transcription factor genes, including Leaves. G24 and many others and their equivalogs, for example, that Transcription factor genes that provide an altered appear can be used to decrease plant stature are likely to produce ance, including G1468 and its equivalogs, may positively plants that are more resistant to damage by wind and rain, affect a plant's value to the ornamental horticulture industry. have improved lodging resistance, or more resistant to heat or 25 Leaf Morphology: Glossy Leaves. low humidity or water deficit. Dwarf plants are also of sig Transcription factor genes such as G1267 and its equiva nificant interest to the ornamental horticulture industry, and logs that induce the formation of glossy leaves generally do so particularly for home garden applications for which space by elevating levels of epidermal wax. Thus, the genes could availability may be limited. be used to engineer changes in the composition and amount of 30 leaf Surface components, including waxes. The ability to Plant Size: Fruit Size and Number. manipulate wax composition, amount, or distribution could Introduction of presently disclosed transcription factor modify plant tolerance to drought and low humidity, or resis genes that affect fruit size will have desirable impacts on fruit tance to insects or pathogens. Additionally, wax may be a size and number, which may comprise increases in yield for valuable commodity in Some species, and altering its accu fruit crops, or reduced fruit yield, Such as when vegetative 35 mulation and/or composition could enhance yield. growth is preferred (e.g., with bushy ornamentals, or where Seed Morphology: Altered Seed Coloration. fruit is undesirable, as with ornamental olive trees). Presently disclosed transcription factor genes, including Leaf Morphology: Dark Leaves. G156, and G668 have been used to modify seed color, which, Color-affecting components in leaves include chlorophylls along with the equivalogs of these genes, could provide added (generally green), anthocyanins (generally red to blue) and 40 appeal to seeds or seed products. carotenoids (generally yellow to red). Transcription factor Seed Morphology: Altered Seed Size and Shape. genes that increase these pigments in leaves, including G385. The introduction of presently disclosed transcription factor G447, G912, G932, G977, G1128, G1267, G1323, G1327, genes into plants that increase (e.g., G206.G584.G1255) or G1334, G1499, and their equivalogs, may positively affect a decrease (e.g., G1145). the size of seeds may have a signifi plant's value to the ornamental horticulture industry. Varie 45 cant impact on yield and appearance, particularly when the gated varieties, in particular, would show improved contrast. product is the seed itself (e.g., in the case of grains, legumes, Other uses that result from overexpression of transcription nuts, etc.). Seed size, in addition to seed coat integrity, thick factor genes include improvements in the nutritional value of ness and permeability, seed water content and a number of foodstuffs. For example, lutein is an important nutraceutical; other components including antioxidants and oligosaccha lutein-rich diets have been shown to help prevent age-related 50 rides, also affects affect seed longevity in storage, with larger macular degeneration (ARMD), the leading cause of blind seeds often being more desirable for prolonged storage. ness in elderly people. Consumption of dark green leafy Transcription factor genes that alter seed shape, including vegetables has been shown in clinical studies to reduce the G1062, G1145 and their equivalogs may have both ornamen risk of ARMD. tal applications and improve or broaden the appeal of seed Enhanced chlorophyll and carotenoid levels could also 55 products. improve yield in crop plants. Lutein, like other Xanthophylls Leaf Biochemistry: Increased Leaf Wax. Such as Zeaxanthin and violaxanthin, is an essential compo Overexpression of transcription factors genes, including nent in the protection of the plant against the damaging effects G975 and its equivalogs, which results in increased leaf wax of excessive light. Specifically, lutein contributes, directly or could be used to manipulate wax composition, amount, or indirectly, to the rapid rise of non-photochemical quenching 60 distribution. These transcription factors can improve yield in in plants exposed to high light. Crop plants engineered to those plants and crops from which wax is a valuable product. contain higher levels of lutein could therefore have improved The genes may also be used to modify plant tolerance to photo-protection, leading to less oxidative damage and better drought and/or low humidity or resistance to insects, as well growth under highlight (e.g., during long Summer days, or at as plant appearance (glossy leaves). The effect of increased higher altitudes or lower latitudes than those at which a non 65 wax deposition on leaves of a plant like may improve water transformed plant would survive). Additionally, elevated use efficiency. Manipulation of these genes may reduce the chlorophyll levels increases photosynthetic capacity. wax coating on Sunflower seeds; this wax fouls the oil extrac US 8,809,630 B2 113 114 tion system during Sunflower seed processing for oil. For the serve as precursors for prostaglandin synthesis. In mamma latter purpose or any other where wax reduction is valuable, lian connective tissue, prostaglandins serve as important sig antisense or coSuppression of the transcription factor genes in nals regulating the balance between resorption and formation a tissue-specific manner would be valuable. in bone and cartilage. Thus dietary fatty acid ratios altered in Leaf Biochemistry: Leaf Prenyl Lipids, Including Toco seeds may affect the etiology and outcome of bone loss. pherol. Transcription factors that reduce leaf fatty acids, for Prenyl lipids play a role in anchoring proteins in mem example, 16:3 fatty acids, may be used to control thylakoid branes or membranous organelles. Thus modifying the prenyl membrane development, including proplastid to chloroplast lipid content of seeds and leaves could affect membrane development. The genes that encode these transcription fac integrity and function. One important group of prenyl lipids, 10 tors (e.g., G718, G1266, and G1347) might thus be useful for the tocopherols, have both anti-oxidant and vitamin E activ controlling the transition from proplastid to chromoplast in ity. A number of presently disclosed transcription factor fruits and vegetables. It may also be desirable to change the genes, including G214, G280, G987. G1133, G1324, and expression of these genes to prevent cotyledon greening in G1328 have been shown to modify the prenyl lipid content of Brassica napus or B. campestris to avoid green oil due to leaves in plants, and these genes and their equivalogs may 15 early frost. thus be used to alter prenyl lipid content of leaves. Transcription factor genes that increase leaf fatty acid pro Leaf Biochemistry: Altered Leaf Insoluble Sugars. duction, including G214 and G231, could potentially be used Overexpression of a number of presently disclosed tran to manipulate seed composition, which is very important for scription factors, including G211, G237, G242. G274, G307, the nutritional value and production of various food products. G428, G435, G525, G598, G777, G869, G1012, and G1309, A number of transcription factor genes are involved in medi resulted in plants with altered leaf insoluble sugar content. ating an aspect of the regulatory response to temperature. This transcription factor and its equivalogs that alterplant cell These genes may be used to alter the expression of desatu wall composition have several potential applications includ rases that lead to production of 18:3 and 16:3 fatty acids, the ing altering food digestibility, plant tensile strength, wood balance of which affects membrane fluidity and mitigates quality, pathogen resistance and in pulp production. In par 25 damage to cell membranes and photosynthetic structures at ticular, hemicellulose is not desirable in paper pulps because high and low temperatures. of its lack of strength compared with cellulose. Thus modu Leaf and Seed Biochemistry: Glucosinolates. lating the amounts of cellulose vs. hemicellulose in the plant A number of glucosinolates have been shown to have anti cell wall is desirable for the paper/lumber industry. Increasing cancer activity; thus, increasing the levels or composition of the insoluble carbohydrate content in various fruits, veg 30 these compounds by introducing several of the presently dis etables, and other edible consumer products will result in closed transcription factors, including G185, G681, G1069; enhanced fiber content. Increased fiber content would not G1198, and G1322, can have a beneficial effect on human only provide health benefits in food products, but might also diet. increase digestibility of forage crops. In addition, the hemi Glucosinolates are undesirable components of the oilseeds cellulose and pectin content of fruits and berries affects the 35 used in animal feed since they produce toxic effects. Low quality of jam and catsup made from them. Changes in hemi glucosinolate varieties of canola, for example, have been cellulose and pectin content could result in a Superior con developed to combat this problem. Glucosinolates form part Sumer product. of a plant's natural defense against insects. Modification of Leaf Biochemistry: Increased Leaf Anthocyanin. glucosinolate composition or quantity by introducing tran Several presently disclosed transcription factor genes, 40 scription factors that affect these characteristics can therefore including G663 and its equivalogs, may be used to alter afford increased protection from herbivores. Furthermore, in anthocyanin production in numerous plant species. Expres edible crops, tissue specific promoters can be used to ensure sion of presently disclosed transcription factor genes that that these compounds accumulate specifically in tissues. Such increase flavonoid production in plants, including anthocya as the epidermis, which are not taken for consumption. nins and condensed tannins, may be used to alter in pigment 45 Leaf and Seed Biochemistry: Production of Seed and Leaf production for horticultural purposes, and possibly increas Phytosterols: ing stress resistance. A number of flavonoids have been Presently disclosed transcription factor genes that modify shown to have antimicrobial activity and could be used to levels of phytosterols in plants may have at least two utilities. engineer pathogen resistance. Several flavonoid compounds First, phytosterols are an important source of precursors for have health promoting effects such as inhibition of tumor 50 the manufacture of human steroid hormones. Thus, regulation growth, prevention of bone loss and prevention of the oxida of transcription factor expression or activity could lead to tion of lipids. Increased levels of condensed tannins, in forage elevated levels of important human steroid precursors for legumes would be an important agronomic trait because they steroid semi-synthesis. For example, transcription factors prevent pasture bloat by collapsing protein foams within the that cause elevated levels of campesterol in leaves, or sito rumen. For a review on the utilities of flavonoids and their 55 sterols and stigmasterols in seed crops, would be useful for derivatives, refer to Dixon et al. (1999) Trends Plant Sci. 4: this purpose. Phytosterols and their hydrogenated derivatives 394-400. phytostanols also have proven cholesterol-lowering proper Leaf and Seed Biochemistry: Altered Fatty Acid Content. ties, and transcription factor genes that modify the expression A number of the presently disclosed transcription factor of these compounds in plants would thus provide health ben genes have been shown to alter the fatty acid composition in 60 efits. plants, and seeds and leaves in particular. This modification Seed Biochemistry: Modified Seed Oil and Fatty Acid Suggests several utilities, including improving the nutritional Content. value of seeds or whole plants. Dietary fatty acids ratios have The composition of seeds, particularly with respect to seed been shown to have an effect on, for example, bone integrity oil amounts and/or composition, is very important for the and remodeling (see, for example, Weiler (2000) Pediatr. Res. 65 nutritional and caloric value and production of various food 47:5692-697). The ratio of dietary fatty acids may alter the and feed products. Several of the presently disclosed tran precursorpools of long-chain polyunsaturated fatty acids that Scription factor genes in seed lipid saturation that alter seed US 8,809,630 B2 115 116 oil content could be used to improve the heat stability of oils response to various light intensities, quality or duration to or to improve the nutritional quality of seed oil, by, for which a non-transformed plant would not similarly respond. example, reducing the number of calories in seed by decreas Examples of Such responses that have been demonstrated ing oil or fatty acid content (e.g., G180, G192. G201, G.222, include leaf number and arrangement, and early flower bud G241, G663, G668, G718, G732, G777, G911, G1323, and appearances. Elimination of shading responses may lead to G1820), increasing the number of calories in animal feeds by increased planting densities with Subsequent yield enhance increasing oil or fatty acid content (e.g. G162. G229, G231, ment. As these genes may also alter plant architecture, they G291, G456, G464, G561, G590, G598, G732,G849,G961, may find use in the ornamental horticulture industry. G1190, and G1198), or altering seed oil content (G509, Pigment: Increased Anthocyanin Level in Various Plant G567.G732, G974,G1451, and G1471). 10 Organs and Tissues. Seed Biochemistry: Modified Seed Protein Content. In addition to seed, leaves and roots, as mentioned above, As with seed oils, the composition of seeds, particularly several presently disclosed transcription factor genes (i.e., with respect to protein amounts and/or composition, is very G663 and equivalogs) can be used to alter anthocyanin levels important for the nutritional value and production of various in one or more tissues, depending on the organ in which these food and feed products. A number of the presently disclosed 15 genes are expressed. The potential utilities of these genes transcription factor genes modify the protein concentrations include alterations in pigment production for horticultural in seeds, including G201.G.222,G226, G241, G629, G630, purposes, and possibly increasing stress resistance, antimi G663, G668,G718,G732, G865,G911,G1048.G1323, crobial activity and health promoting effects such as inhibi G1449, and G 1820, which increase seed protein, G229.G231, tion of tumor growth, prevention of bone loss and prevention G418,G456, G464,G732, and G1634, which decrease seed of the oxidation of lipids. protein, and G162, G509, G567, G732, and G849, which alter Miscellaneous Biochemistry: Diterpenes in Leaves and seed protein content, would provide nutritional benefits, and Other Plant Parts. may be used to prolong storage, increase seed pest or disease Depending on the plant species, varying amounts of resistance, or modify germination rates. diverse secondary biochemicals (often lipophilic terpenes) Seed Biochemistry: Seed Prenyl Lipids. 25 are produced and exuded or volatilized by trichomes. These Prenyl lipids play a role in anchoring proteins in mem exotic secondary biochemicals, which are relatively easy to branes or membranous organelles. Thus, presently disclosed extract because they are on the surface of the leaf, have been transcription factor genes and their equivalogs that modify widely used in Such products as flavors and aromas, drugs, the prenyl lipid content of seeds and leaves could affect mem pesticides and cosmetics. Thus, the overexpression of genes brane integrity and function. A number of presently disclosed 30 that are used to produce diterpenes in plants may be accom transcription factor genes, including G214, G718.G748, plished by introducing transcription factor genes that induce G883, and G1052, have been shownto modify the tocopherol said overexpression. One class of secondary metabolites, the composition of plants. O-Tocopherol is better known as Vita diterpenes, can effect several biological systems such as min E. Tocopherols such as C.- and Y-tocopherol both have tumor progression, prostaglandin synthesis and tissue inflam anti-oxidant activity. 35 mation. In addition, diterpenes can act as insect pheromones, Seed Biochemistry: Increased Seed Anthocyanin. termite allomones, and can exhibit neurotoxic, cytotoxic and Several presently disclosed transcription factor genes, antimitotic activities. As a result of this functional diversity, including G663 and its equivalogs, may be used to alter diterpenes have been the target of research several pharma anthocyanin production in the seeds of plants. As with leaf ceutical ventures. In most cases where the metabolic path anthocyanins, expression of presently disclosed transcription 40 ways are impossible to engineer, increasing trichome density factor genes that increase flavonoid (anthocyanins and con or size on leaves may be the only way to increase plant densed tannins) production in seeds, including G663 and its productivity. equivalogs, may be used to alter in pigment production for Miscellaneous Biochemistry: Production of Miscella horticultural purposes, and possibly increasing stress resis neous Secondary Metabolites. tance, antimicrobial activity and health promoting effects 45 Microarray data Suggests that flux through the aromatic Such as inhibition of tumor growth, prevention of bone loss amino acid biosynthetic pathways and primary and secondary and prevention of the oxidation of lipids. metabolite biosynthetic pathways are up-regulated. Gene Root Biochemistry: Increased Root Anthocyanin. coding for enzymes involved in alkaloid biosynthesis include Presently disclosed transcription factor genes, including indole-3-glycerol phosphatase and strictosidine synthase are G663, may be used to alter anthocyanin production in the root 50 induced in G.229 overexpressors. Genes for enzymes of plants. As described above for seed anthocyanins, expres involved in aromatic amino acid biosynthesis are also up sion of presently disclosed transcription factor genes that regulated including tryptophan synthase and tyrosine tran increase flavonoid (anthocyanins and condensed tannins) saminase. Phenylalanine ammonia lyase, chalcone synthase production in seeds, including G663 and its equivalogs, may and trans-cinnamate mono-oxygenase are also induced and be used to alter in pigment production for horticultural pur 55 are involved in phenylpropenoid biosynthesis. poses, and possibly increasing stress resistance, antimicrobial Antisense and Co-Suppression activity and health promoting effects such as inhibition of In addition to expression of the nucleic acids of the inven tumor growth, prevention of bone loss and prevention of the tion as gene replacement or plant phenotype modification oxidation of lipids. nucleic acids, the nucleic acids are also useful for sense and Light Response/Shade Avoidance: 60 anti-sense Suppression of expression, e.g., to down-regulate altered cotyledon, hypocotyl, petiole development, altered expression of a nucleic acid of the invention, e.g., as a further leaf orientation, constitutive photomorphogenesis, photo mechanism for modulating plant phenotype. That is, the morphogenesis in low light. Presently disclosed transcription nucleic acids of the invention, or Subsequences or anti-sense factor genes, including G351, G1062, and G1322, that sequences thereof, can be used to block expression of natu modify a plant's response to light may be useful for modify 65 rally occurring homologous nucleic acids. A variety of sense ing plant growth or development, for example, photomorpho and anti-sense technologies are known in the art, e.g., as set genesis in poor light, or accelerating flowering time in forth in Lichtenstein and Nellen (1997) Antisense Technol US 8,809,630 B2 117 118 ogy: A Practical Approach IRL Press at Oxford University increased, e.g., as the introduced sequence is lengthened, Press, Oxford, U.K. Antisense regulation is also described in and/or as the sequence similarity between the introduced Crowley et al. (1985) Cell 43: 633-641; Rosenberg et al. sequence and the endogenous transcription factor gene is (1985) Nature 313: 703-706; Preiss et al. (1985) Nature 313: increased. 27-32: Melton (1985) Proc. Natl. Acad. Sci. 82: 144-148: Vectors expressing an untranslatable form of the transcrip Izant and Weintraub (1985) Science 229: 345-352; and Kim tion factor mRNA, e.g., sequences comprising one or more and Wold (1985) Cell 42: 129-138. Additional methods for stop codon, or nonsense mutation) can also be used to Sup antisense regulation are known in the art. Antisense regula press expression of an endogenous transcription factor, tion has been used to reduce or inhibit expression of plant thereby reducing or eliminating its activity and modifying genes in, for example in European Patent Publication No. 10 one or more traits. Methods for producing Such constructs are 271988. Antisense RNA may be used to reduce gene expres described in U.S. Pat. No. 5,583,021. Preferably, such con sion to produce a visible or biochemical phenotypic change in structs are made by introducing a premature stop codon into a plant (Smith et al. (1988) Nature, 334: 724-726; Smith et al. the transcription factor gene. Alternatively, a plant trait can be (1990) Plant Mol. Biol. 14: 369-379). In general, sense or modified by gene silencing using double-strand RNA (Sharp anti-sense sequences are introduced into a cell, where they are 15 (1999) Genes and Development 13: 139-141). Another optionally amplified, e.g., by transcription. Such sequences method for abolishing the expression of a gene is by insertion include both simple oligonucleotide sequences and catalytic mutagenesis using the T-DNA of Agrobacterium tumefa sequences such as ribozymes. ciens. After generating the insertion mutants, the mutants can For example, a reduction or elimination of expression (i.e., be screened to identify those containing the insertion in a a “knock-out”) of a transcription factor or transcription factor transcription factor or transcription factor homolog gene. homolog polypeptide in a transgenic plant, e.g., to modify a Plants containing a single transgene insertion event at the plant trait, can be obtained by introducing an antisense con desired gene can be crossed to generate homozygous plants struct corresponding to the polypeptide of interest as a cDNA. for the mutation. Such methods are well known to those of For antisense Suppression, the transcription factor or skill in the art (See for example Koncz et al. (1992) Methods homolog cDNA is arranged in reverse orientation (with 25 in Arabidopsis Research, World Scientific Publishing Co. Pte. respect to the coding sequence) relative to the promoter Ltd., River Edge, N.J.). sequence in the expression vector. The introduced sequence Alternatively, a plant phenotype can be altered by elimi need not be the full length cDNA or gene, and need not be nating an endogenous gene. Such as a transcription factor or identical to the cDNA or gene found in the plant type to be transcription factor homolog, e.g., by homologous recombi transformed. Typically, the antisense sequence need only be 30 nation (Kempinet al. (1997) Nature 389: 802-803). capable of hybridizing to the target gene or RNA of interest. A plant trait can also be modified by using the Cre-lox Thus, where the introduced sequence is of shorter length, a system (for example, as described in U.S. Pat. No. 5,658, higher degree of homology to the endogenous transcription 772). A plant genome can be modified to include first and factor sequence will be needed for effective antisense Sup second lox sites that are then contacted with a Cre recombi pression. While antisense sequences of various lengths can be 35 nase. If the loX sites are in the same orientation, the interven utilized, preferably, the introduced antisense sequence in the ing DNA sequence between the two sites is excised. If the lox vector will be at least 30 nucleotides in length, and improved sites are in the opposite orientation, the intervening sequence antisense Suppression will typically be observed as the length is inverted. of the antisense sequence increases. Preferably, the length of The polynucleotides and polypeptides of this invention can the antisense sequence in the vector will be greater than 100 40 also be expressed in a plant in the absence of an expression nucleotides. Transcription of an antisense construct as cassette by manipulating the activity or expression level of the described results in the production of RNA molecules that are endogenous gene by other means, such as, for example, by the reverse complement of mRNA molecules transcribed ectopically expressing a gene by T-DNA activation tagging from the endogenous transcription factor gene in the plant (Ichikawa et al. (1997) Nature 390 698-701; Kakimoto et al. cell. 45 (1996) Science 274: 982-985). This method entails trans Suppression of endogenous transcription factor gene forming a plant with a gene tag containing multiple transcrip expression can also be achieved using a ribozyme. tional enhancers and once the tag has inserted into the Ribozymes are RNA molecules that possess highly specific genome, expression of a flanking gene coding sequence endoribonuclease activity. The production and use of becomes deregulated. In another example, the transcriptional ribozymes are disclosed in U.S. Pat. No. 4,987,071 and U.S. 50 machinery in a plant can be modified so as to increase tran Pat. No. 5,543,508. Synthetic ribozyme sequences including Scription levels of a polynucleotide of the invention (See, e.g., antisense RNAs can be used to confer RNA cleaving activity PCT Publications WO 96/06166 and WO 98/53057 which on the antisense RNA, such that endogenous mRNA mol describe the modification of the DNA-binding specificity of ecules that hybridize to the antisense RNA are cleaved, which Zinc finger proteins by changing particular amino acids in the in turn leads to an enhanced antisense inhibition of endog 55 DNA-binding motif). enous gene expression. The transgenic plant can also include the machinery nec Vectors in which RNA encoded by a transcription factor or essary for expressing or altering the activity of a polypeptide transcription factor homolog cDNA is over-expressed can encoded by an endogenous gene, for example, by altering the also be used to obtain co-suppression of a corresponding phosphorylation state of the polypeptide to maintain it in an endogenous gene, e.g., in the manner described in U.S. Pat. 60 activated State. No. 5,231,020 to Jorgensen. Such co-suppression (also Transgenic plants (or plant cells, or plant explants, or plant termed sense Suppression) does not require that the entire tissues) incorporating the polynucleotides of the invention transcription factor cDNA be introduced into the plant cells, and/or expressing the polypeptides of the invention can be nor does it require that the introduced sequence be exactly produced by a variety of well established techniques as identical to the endogenous transcription factor gene of inter 65 described above. Following construction of a vector, most est. However, as with antisense Suppression, the Suppressive typically an expression cassette, including a polynucleotide, efficiency will be enhanced as specificity of hybridization is e.g., encoding a transcription factor or transcription factor US 8,809,630 B2 119 120 homolog, of the invention, standard techniques can be used to For example, the instruction set can include, e.g., a introduce the polynucleotide into a plant, a plant cell, a plant sequence comparison or other alignment program, e.g., an explant or a plant tissue of interest. Optionally, the plant cell, available program Such as, for example, the Wisconsin Pack explant or tissue can be regenerated to produce a transgenic age Version 10.0, such as BLAST, FASTA, PILEUP. FIND plant. PATTERNS or the like (GCG, Madison, Wis.). Public The plant can be any higher plant, including gymnosperms, sequence databases such as GenBank, EMBL, Swiss-Prot monocotyledonous and dicotyledenous plants. Suitable pro and PIR or private sequence databases such as PHYTOSEQ tocols are available for Leguminosae (alfalfa, soybean, clo sequence database (Incyte Genomics, Palo Alto, Calif.) can ver, etc.), Umbelliferae (carrot, celery, parsnip), Cruciferae be searched. (cabbage, radish, rapeseed, broccoli, etc.), Curcurbitaceae 10 Alignment of sequences for comparison can be conducted (melons and cucumber), Gramineae (wheat, corn, rice, bar ley, millet, etc.), Solanaceae (potato, tomato, tobacco, pep by the local homology algorithm of Smith and Waterman pers, etc.), and various other crops. See protocols described in (1981) Adv. Appl. Math. 2: 482-489, by the homology align Ammirato et al., Eds. (1984) Handbook of Plant Cell Cul ment algorithm of Needleman and Wunsch (1970) J. Mol. ture-Crop Species, Macmillan Publ. Co., New York, N.Y.: 15 Biol. 48: 443-453, by the search for similarity method of Shimamoto et al. (1989) Nature 338: 274–276: Fromm et al. Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85: 2444 (1990) Bio/Technol 8: 833-839; and Vasil et al. (1990) Bio/ 2448, by computerized implementations of these algorithms. Technol. 8: 429-434. After alignment, sequence comparisons between two (or Transformation and regeneration of both monocotyledon more) polynucleotides or polypeptides are typically per ous and dicotyledonous plant cells is now routine, and the formed by comparing sequences of the two sequences over a selection of the most appropriate transformation technique comparison window to identify and compare local regions of will be determined by the practitioner. The choice of method sequence similarity. The comparison window can be a seg will vary with the type of plant to be transformed; those ment of at least about 20 contiguous positions, usually about skilled in the art will recognize the suitability of particular 50 to about 200, more usually about 100 to about 150 con methods for given plant types. Suitable methods can include, 25 tiguous positions. A description of the method is provided in but are not limited to: electroporation of plant protoplasts; Ausubel et al. Supra. liposome-mediated transformation; polyethylene glycol A variety of methods for determining sequence relation (PEG) mediated transformation; transformation using ships can be used, including manual alignment and computer viruses; micro-injection of plant cells; micro-projectile bom assisted sequence alignment and analysis. This later approach bardment of plant cells; vacuum infiltration; and Agrobacte 30 is a preferred approach in the present invention, due to the rium tumefaciens mediated transformation. Transformation increased throughput afforded by computer assisted methods. means introducing a nucleotide sequence into a plant in a As noted above, a variety of computer programs for perform manner to cause stable or transient expression of the ing sequence alignment are available, or can be produced by Sequence. one of skill. Successful examples of the modification of plant charac 35 One example algorithm that is suitable for determining teristics by transformation with cloned sequences which percent sequence identity and sequence similarity is the serve to illustrate the current knowledge in this field of tech BLAST algorithm, which is described in Altschul et al. nology, and which are herein incorporated by reference, (1990).J. Mol. Biol. 215: 403-410. Software for performing include: U.S. Pat. Nos. 5,571,706; 5,677,175: 5,510,471; BLAST analyses is publicly available, e.g., through the 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268,526; 40 National Library of Medicine's National Center for Biotech 5,780,708: 5,538,880; 5,773,269; 5,736,369 and 5,610,042. nology Information (ncbi.nlm.nih; see at world wide web Following transformation, plants are preferably selected (www) National Institutes of Health US government (gov) using a dominant selectable marker incorporated into the website). This algorithm involves first identifying high scor transformation vector. Typically, such a marker will confer ing sequence pairs (HSPs) by identifying short words of antibiotic or herbicide resistance on the transformed plants, 45 length W in the query sequence, which either match or satisfy and selection of transformants can be accomplished by some positive-valued threshold score T when aligned with a exposing the plants to appropriate concentrations of the anti word of the same length in a database sequence. T is referred biotic or herbicide. to as the neighborhood word score threshold (Altschul et al. After transformed plants are selected and grown to matu supra). These initial neighborhood word hits act as seeds for rity, those plants showing a modified trait are identified. The 50 initiating searches to find longer HSPs containing them. The modified trait can be any of those traits described above. word hits are then extended in both directions along each Additionally, to confirm that the modified trait is due to sequence for as far as the cumulative alignment score can be changes in expression levels or activity of the polypeptide or increased. Cumulative scores are calculated using, for nucle polynucleotide of the invention can be determined by analyz otide sequences, the parameters M (reward score for a pair of ing mRNA expression using Northern blots, RT-PCR or 55 matching residues; always >0) and N (penalty score for mis microarrays, or protein expression using immunoblots or matching residues; always <0). For amino acid sequences, a Western blots or gel shift assays. scoring matrix is used to calculate the cumulative score. Integrated Systems—Sequence Identity Extension of the word hits in each direction are halted when: Additionally, the present invention may be an integrated the cumulative alignment score falls off by the quantity X system, computer or computer readable medium that com 60 from its maximum achieved value; the cumulative score goes prises an instruction set for determining the identity of one or to zero or below, due to the accumulation of one or more more sequences in a database. In addition, the instruction set negative-scoring residue alignments; or the end of either can be used to generate or identify sequences that meet any sequence is reached. The BLAST algorithm parameters W.T. specified criteria. Furthermore, the instruction set may be and X determine the sensitivity and speed of the alignment. used to associate or link certain functional benefits, such 65 The BLASTN program (for nucleotide sequences) uses as improved characteristics, with one or more identified defaults a wordlength (W) of 11, an expectation (E) of 10, a Sequence. cutoff of 100, M-5, N=-4, and a comparison of both strands. US 8,809,630 B2 121 122 For amino acid sequences, the BLASTP program uses as database Such as those noted herein, and the querying can be defaults a wordlength (W) of 3, an expectation (E) of 10, and done from a remote terminal or computer across an internet or the BLOSUM62 scoring matrix (see Henikoff and Henikoff intranet. (1992) Proc. Natl. Acad. Sci. 89: 10915-10919). Unless oth Any sequence herein can be used to identify a similar, erwise indicated, “sequence identity” here refers to the % homologous, paralogous, or orthologous sequence in another sequence identity generated from a thiastx using the NCBI plant. This provides means for identifying endogenous version of the algorithm at the default settings using gapped sequences in other plants that may be useful to alter a trait of alignments with the filter “off” (see, for example, NIHNLM progeny plants, which results from crossing two plants of NCBI website at ncbi.nlm.nih, supra). different strain. For example, sequences that encode an In addition to calculating percent sequence identity, the 10 ortholog of any of the sequences herein that naturally occur in a plant with a desired trait can be identified using the BLAST algorithm also performs a statistical analysis of the sequences disclosed herein. The plant is then crossed with a similarity between two sequences (see, e.g. Karlin and Alts second plant of the same species but which does not have the chul (1993) Proc. Natl. Acad. Sci. 90: 5873-5787). One mea desired trait to produce progeny which can then be used in sure of similarity provided by the BLAST algorithm is the 15 further crossing experiments to produce the desired trait in the smallest sum probability (P(N)), which provides an indica second plant. Therefore the resulting progeny plant contains tion of the probability by which a match between two nucle no transgenes; expression of the endogenous sequence may otide or amino acid sequences would occur by chance. For also be regulated by treatment with a particular chemical or example, a nucleic acid is considered similar to a reference other means, such as EMR. Some examples of Such com sequence (and, therefore, in this context, homologous) if the pounds well known in the art include: ethylene; cytokinins; Smallest Sum probability in a comparison of the test nucleic phenolic compounds, which stimulate the transcription of the acid to the reference nucleic acid is less than about 0.1, or less genes needed for infection; specific monosaccharides and than about 0.01, and or even less than about 0.001. An addi acidic environments which potentiate vir gene induction; tional example of a useful sequence alignment algorithm is acidic polysaccharides which induce one or more chromo PILEUP, PILEUP creates a multiple sequence alignment 25 Somal genes; and opines; other mechanisms include light or from a group of related sequences using progressive, pairwise dark treatment (for a review of examples of Such treatments, alignments. The program can align, e.g., up to 300 sequences see, Winans (1992) Microbiol. Rev. 56: 12-31; Eyalet al. of a maximum length of 5,000 letters. (1992) Plant Mol. Biol. 19:589-599; Chrispeels et al. (2000) The integrated system, or computer typically includes a Plant Mol. Biol. 42: 279-290; Piazza et al. (2002) Plant user input interface allowing a user to selectively view one or 30 Physiol. 128: 1077-1086). more sequence records corresponding to the one or more Table 7 lists sequences discovered to be orthologous to a character strings, as well as an instruction set which aligns the number of representative transcription factors of the present one or more character strings with each other or with an invention. The column headings include the transcription fac additional character string to identify one or more region of tors listed by SEQ ID NO; corresponding Gene ID (GID) sequence similarity. The system may include a link of one or 35 numbers; the species from which the orthologs to the tran more character strings with a particular phenotype or gene scription factors are derived; the type of sequence (i.e., DNA function. Typically, the system includes a user readable out or protein) discovered to be orthologous to the transcription put element that displays an alignment produced by the align factors; and the SEQID NO of the orthologs, the latter cor ment instruction set. responding to the ortholog SEQ ID NOs listed in the The methods of this invention can be implemented in a 40 Sequence Listing. localized or distributed computing environment. In a distrib uted environment, the methods may implemented on a single TABLE 7 computer comprising multiple processors or on a multiplicity of computers. The computers can be linked, e.g. through a Orthologs of Representative Arabidopsis Transcription Factor Genes common bus, but more preferably the computer(s) are nodes 45 SEQID SEQ ID NO: of on a network. The network can be a generalizedora dedicated NO: of Sequence Nucleotide local or wide-area network and, in certain preferred embodi Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous ments, the computers may be components of an intra-net oran Nucleotide Which determination Arabidopsis Arabidopsis internet. Encoding Orthologis (DNA Transcription Transcription Thus, the invention provides methods for identifying a 50 Ortholog Derived or Protein) Factor Factor sequence similar or homologous to one or more polynucle 961 Glycine max DNA G8 1 otides as noted herein, or one or more target polypeptides 962 Glycine max DNA G8 1 encoded by the polynucleotides, or otherwise noted herein 963 Glycine max DNA G8 1 and may include linking or associating a given plant pheno 964 Glycine max DNA G8 1 type or gene function with a sequence. In the methods, a 55 96S Oryza sativa DNA G8 1 966 Oryza sativa DNA G8 1 sequence database is provided (locally or across an inter or 967 Zea mays DNA G8 1 intra net) and a query is made against the sequence database 968 Zea mays DNA G8 1 using the relevant sequences herein and associated plant phe 969 Zea mays DNA G8 1 notypes or gene functions. 970 Glycine max DNA G19 2 971 Glycine max DNA G19 2 Any sequence herein can be entered into the database, 60 972 Glycine max DNA G19 2 before or after querying the database. This provides for both 973 Glycine max DNA G19 2 expansion of the database and, if done before the querying 974 Oryza sativa DNA G19 2 step, for insertion of control sequences into the database. The 975 Oryza sativa DNA G19 2 976 Oryza sativa DNA G19 2 control sequences can be detected by the query to ensure the 977 Zea mays DNA G19 2 general integrity of both the database and the query. As noted, 65 978 Zea mays DNA G19 2 the query can be performed using a web browser based inter 979 Glycine max DNA G22 27 face. For example, the database can be a centralized public US 8,809,630 B2 123 124 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor H 10 980 Glycine max DNA G22 27 O41 Zea mays DNA G214 27 981 Glycine max DNA G24 29 Zea mays DNA G68O 463 982 Glycine max DNA G24 29 042 Zea mays DNA G214 27 983 Glycine max DNA G24 29 Zea mays DNA G68O 463 984 Glycine max DNA G24 29 043 Zea mays DNA G214 27 985 Glycine max DNA G24 29 15 Zea mays DNA G68O 463 986 Glycine max DNA G24 29 044 Glycine max DNA G226 4 987 Glycine max DNA G24 29 Glycine max DNA G682 467 988 Oryza sativa DNA G24 29 Glycine max DNA G1816 939 989 Oryza sativa PRT G24 29 Glycine max DNA G2718 959 990 Oryza sativa PRT G24 29 O45 Glycine max DNA G226 4 991 Oryza sativa PRT G24 29 Glycine max DNA G1816 939 992 Zea mays DNA G24 29 2O Glycine max DNA G2718 959 993 Glycine max DNA G28 37 O46 Glycine max DNA G226 4 994 Glycine max DNA G28 37 Glycine max DNA G682 467 995 Glycine max DNA G28 37 Glycine max DNA G1816 939 996 Glycine max DNA G28 37 Glycine max DNA G2718 959 997 Glycine max DNA G28 37 O47 Glycine max DNA G226 4 998 Glycine max DNA G28 37 25 Glycine max DNA G682 467 999 Glycine max DNA G28 37 Glycine max DNA G1816 939 000 Glycine max DNA G28 37 Glycine max DNA G2718 959 001 Oryza sativa PRT G28 37 048 Glycine max DNA G226 4 002 Oryza sativa PRT G28 37 Glycine max DNA G682 467 003 Zea mays DNA G28 37 Glycine max DNA G1816 939 004 Glycine max DNA G46 53 30 Glycine max DNA G2718 959 005 Glycine max DNA G46 53 049 Oryza sativa DNA G226 4 006 Glycine max DNA G46 53 Oryza sativa DNA G682 467 007 Glycine max DNA G46 53 Oryza sativa DNA G1816 939 008 Glycine max DNA G46 53 Oryza sativa DNA G2718 959 009 Glycine max DNA G46 53 050 Oryza sativa PRT G226 4 010 Glycine max DNA G46 53 35 Oryza sativa PRT G682 467 011 Glycine max DNA G46 53 Oryza sativa PRT G1816 939 012 Oryza sativa PRT G46 53 Oryza sativa PRT G2718 959 013 Zea mays DNA G46 53 051 Oryza sativa PRT G226 4 014 Glycine max DNA G157 69 Oryza sativa PRT G682 467 Glycine max DNA G1842 943 Oryza sativa PRT G1816 939 Glycine max DNA G1843 945 Oryza sativa PRT G2718 959 Glycine max DNA G859 567 40 052 Zea mays DNA G226 4 015 Glycine max DNA G18O 87 Zea mays DNA G682 467 016 Glycine max DNA G18O 87 Zea mays DNA G1816 939 017 Oryza sativa DNA G18O 87 Zea mays DNA G2718 959 018 Oryza sativa PRT G18O 87 053 Zea mays DNA G226 4 O19 Zea mays DNA G18O 87 Zea mays DNA G682 467 020 Glycine max DNA G188 97 45 Zea mays DNA G1816 939 021 Oryza sativa PRT G188 97 Zea mays DNA G2718 959 022 Oryza sativa PRT G188 97 054 Glycine max DNA G24 63 023 Zea mays DNA G188 97 055 Glycine max DNA G24 63 024 Glycine max DNA G192 O1 056 Glycine max DNA G24 63 O25 Oryza sativa PRT G192 O1 057 Oryza sativa DNA G24 63 026 Glycine max DNA G196 05 50 058 Zea mays DNA G24 63 027 Oryza sativa PRT G196 05 059 Zea mays DNA G24 63 028 Oryza sativa PRT G196 05 060 Zea mays DNA G24 63 029 Oryza sativa PRT G196 05 061 Zea mays DNA G24 63 030 Zea mays DNA G196 05 062 Zea mays DNA G24 63 031 Zea mays DNA G196 05 063 Glycine max DNA G2S4 79 032 Glycine max DNA G211 21 55 O64 Glycine max DNA G2S6 83 033 Oryza sativa DNA G211 21 O65 Glycine max DNA G2S6 83 O34 Oryza sativa PRT G211 21 066 Glycine max DNA G2S6 83 035 Glycine max DNA G214 27 067 Glycine max DNA G2S6 83 Glycine max DNA G68O 463 O68 Glycine max DNA G2S6 83 O36 Glycine max DNA G214 27 O69 Glycine max DNA G2S6 83 Glycine max DNA G68O 463 070 Glycine max DNA G2S6 83 O37 Glycine max DNA G214 27 60 071 Oryza sativa DNA G2S6 83 Glycine max DNA G68O 463 072 Oryza sativa PRT G2S6 83 O38 Glycine max DNA G214 27 073 Oryza sativa PRT G2S6 83 Glycine max DNA G68O 463 074 Oryza sativa PRT G2S6 83 O39 Oryza sativa DNA G214 27 O75 Oryza sativa PRT G2S6 83 Oryza sativa DNA G68O 463 O76 Oryza sativa PRT G2S6 83 040 Oryza sativa DNA G214 27 65 077 Zea mays DNA G2S6 83 Oryza sativa DNA G68O 463 078 Zea mays DNA G2S6 83 US 8,809,630 B2 125 126 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor 10 079 Zea mays DNA G256 183 26 Glycine max DNA G409 26 080 Zea mays DNA G256 183 27 Glycine max DNA G409 26 081 Zea mays DNA G256 183 28 Glycine max DNA G409 26 082 Zea mays DNA G256 183 29 Glycine max DNA G409 26 083 Glycine max DNA G325 223 30 Glycine max DNA G409 26 084 Zea mays DNA G325 223 15 31 Glycine max DNA G409 26 O85 Glycine max DNA G361 237 32 Oryza sativa DNA G409 26 086 Glycine max DNA G361 237 33 Oryza sativa DNA G409 26 087 Glycine max DNA G361 237 34 Oryza sativa DNA G409 26 O88 Glycine max DNA G361 237 35 Zea mays DNA G409 26 O89 Glycine max DNA G361 237 36 Zea mays DNA G409 26 090 Oryza sativa DNA G361 237 37 Zea mays DNA G409 26 091 Oryza sativa PRT G361 237 2O 38 Zea mays DNA G409 26 092 Oryza sativa PRT G361 237 39 Zea mays DNA G409 26 093 Oryza sativa PRT G361 237 40 Zea mayS DNA G409 26 094 Oryza sativa PRT G361 237 41 Zea mayS DNA G409 26 095 Oryza sativa PRT G361 237 42 Glycine max DNA G438 283 096 Zea mays DNA G361 237 43 Oryza sativa DNA G438 283 097 Zea mays DNA G361 237 25 44 Oryza sativa DNA G438 283 098 Glycine max DNA G390 249 45 Oryza sativa DNA G438 283 Glycine max DNA G438 283 46 Oryza sativa DNA G438 283 099 Glycine max DNA G390 249 47 Oryza sativa DNA G438 283 Glycine max DNA G438 283 48 Zea mays DNA G438 283 00 Glycine max DNA G390 249 49 Oryza sativa DNA G464 291 Glycine max DNA G438 283 30 50 Oryza sativa PRT G464 291 01 Glycine max DNA G390 249 51 Zea mays DNA G464 291 Glycine max DNA G438 283 52 Glycine max DNA G470 295 02 Glycine max DNA G390 249 53 Oryza sativa DNA G470 295 Glycine max DNA G438 283 54 Oryza sativa DNA G470 295 03 Glycine max DNA G390 249 55 Glycine max DNA G475 301 Glycine max DNA G438 283 35 56 Glycine max DNA G482 305 04 Glycine max DNA G390 249 57 Glycine max DNA G482 305 Glycine max DNA G438 283 58 Glycine max DNA G482 305 05 Glycine max DNA G390 249 59 Glycine max DNA G482 305 06 Glycine max DNA G390 249 60 Glycine max DNA G482 305 Glycine max DNA G438 283 61 Glycine max DNA G482 305 07 Glycine max DNA G390 249 62 Glycine max DNA G482 305 Glycine max DNA G438 283 40 63 Glycine max DNA G482 305 08 Oryza sativa DNA G390 249 64 Glycine max DNA G482 305 09 Oryza sativa PRT G390 249 65 Oryza sativa PRT G482 305 Oryza sativa PRT G438 283 66 Oryza sativa PRT G482 305 O Oryza sativa PRT G390 249 67 Oryza sativa PRT G482 305 Oryza sativa PRT G438 283 68 Oryza sativa PRT G482 305 1 Oryza sativa PRT G390 249 45 69 Oryza sativa PRT G482 305 Oryza sativa PRT G438 283 70 Oryza sativa DNA G482 305 2 Oryza sativa PRT G390 249 71 Zea mays DNA G482 305 Oryza sativa PRT G438 283 72 Zea mays DNA G482 305 3 Oryza sativa DNA G390 249 73 Zea mays DNA G482 305 Oryza sativa DNA G438 283 74 Zea mays DNA G482 305 4 Zea mayS DNA G390 249 50 75 Zea mays DNA G482 305 Zea mayS DNA G438 283 76 Zea mays DNA G482 305 5 Zea mays DNA G390 249 77 Zea mays DNA G482 305 Zea mayS DNA G438 283 78 Zea mays DNA G482 305 6 Zea mayS DNA G390 249 79 Zea mays DNA G482 305 Zea mayS DNA G438 283 80 Glycine max DNA G489 309 7 Zea mays DNA G390 249 55 81 Glycine max DNA G489 309 8 Zea mays DNA G390 249 82 Glycine max DNA G489 309 Zea mayS DNA G438 283 83 Glycine max DNA G489 309 9 Zea mays DNA G390 249 84 Glycine max DNA G489 309 Zea mayS DNA G438 283 85 Glycine max DNA G489 309 20 Zea mays DNA G390 249 86 Glycine max DNA G489 309 Zea mayS DNA G438 283 87 Oryza sativa DNA G489 309 21 Zea mayS DNA G390 249 60 88 Oryza sativa DNA G489 309 Zea mayS DNA G438 283 89 Oryza sativa PRT G489 309 22 Zea mayS DNA G390 249 90 Oryza sativa PRT G489 309 Zea mayS DNA G438 283 91 Oryza sativa PRT G489 309 23 Zea mays DNA G390 249 92 Zea mays DNA G489 309 Zea mayS DNA G438 283 93 Glycine max DNA GSO9 317 24 Glycine max DNA G409 261 65 94 Glycine max DNA GSO9 317 25 Glycine max DNA G409 261 95 Glycine max DNA GSO9 317 US 8,809,630 B2 127 128 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor H 10 196 Oryza sativa DNA GSO9 317 266 Glycine max DNA G627 40S 197 Oryza sativa PRT GSO9 317 267 Oryza sativa DNA G627 40S 198 Oryza sativa PRT GSO9 317 268 Oryza sativa DNA G634 415 199 Oryza sativa PRT GSO9 317 269 Oryza sativa PRT G634 415 200 Oryza sativa DNA GSO9 317 270 Oryza sativa DNA G634 415 201 Zea mays DNA GSO9 317 15 271 Oryza sativa DNA G634 415 202 Zea mays DNA GSO9 317 272 Zea mays DNA G634 415 203 Zea mays DNA GSO9 317 273 Zea mays DNA G634 415 204 Zea mays DNA GSO9 317 274 Zea mays DNA G634 415 205 Glycine max DNA GS45 345 275 Glycine max DNA G636 417 206 Glycine max DNA GS45 345 276 Glycine max DNA G636 417 207 Glycine max DNA GS45 345 277 Glycine max DNA G636 417 208 Glycine max DNA GS45 345 2O 278 Glycine max DNA G636 417 209 Glycine max DNA GS45 345 279 Glycine max DNA G636 417 210 Glycine max DNA GS45 345 280 Glycine max DNA G636 417 211 Glycine max DNA GS45 345 281 Glycine max DNA G636 417 212 Oryza sativa DNA GS45 345 282 Glycine max DNA G636 417 213 Oryza sativa PRT GS45 345 283 Oryza sativa DNA G636 417 214 Oryza sativa PRT GS45 345 25 284 Oryza sativa DNA G636 417 215 Oryza sativa PRT GS45 345 285 Oryza sativa DNA G636 417 216 Oryza sativa PRT GS45 345 286 Oryza sativa DNA G636 417 217 Zea mays DNA GS45 345 287 Zea mays DNA G636 417 218 Zea mays DNA GS45 345 288 Zea mays DNA G636 417 219 Zea mays DNA GS45 345 289 Zea mays DNA G636 417 220 Zea mays DNA GS61 359 30 290 Zea mays DNA G636 417 221 Glycine max DNA GS62 361 291 Glycine max DNA G638 419 222 Glycine max DNA GS62 361 292 Glycine max DNA G638 419 223 Glycine max DNA GS62 361 293 Glycine max DNA G638 419 224 Glycine max DNA GS62 361 294 Glycine max DNA G638 419 225 Glycine max DNA GS62 361 295 Glycine max DNA G663 435 226 Oryza sativa PRT GS62 361 35 296 Glycine max DNA G664 437 227 Oryza sativa PRT GS62 361 297 Glycine max DNA G664 437 228 Zea mays DNA GS62 361 298 Glycine max DNA G664 437 229 Zea mays DNA GS62 361 299 Glycine max DNA G664 437 230 Zea mays DNA GS62 361 300 Glycine max DNA G664 437 231 Glycine max DNA G567 369 301 Glycine max DNA G664 437 232 Oryza sativa DNA G567 369 302 Glycine max DNA G664 437 233 Oryza sativa PRT G567 369 40 303 Oryza sativa DNA G664 437 234 Glycine max DNA GS84 385 304 Oryza sativa DNA G664 437 235 Glycine max DNA GS84 385 305 Oryza sativa DNA G664 437 236 Glycine max DNA GS84 385 306 Oryza sativa DNA G664 437 237 Glycine max DNA GS84 385 307 Oryza sativa PRT G664 437 238 Glycine max DNA GS84 385 308 Oryza sativa PRT G664 437 239 Oryza sativa PRT GS84 385 45 309 Oryza sativa PRT G664 437 240 Zea mays DNA GS84 385 310 Oryza sativa PRT G664 437 241 Zea mayS DNA GS84 385 311 Zea mays DNA G664 437 242 Zea mayS DNA GS84 385 312 Zea mays DNA G664 437 243 Glycine max DNA G590 387 313 Zea mays DNA G664 437 244 Glycine max DNA G590 387 314 Zea mays DNA G664 437 245 Glycine max DNA G590 387 50 315 Zea mays DNA G664 437 246 Oryza sativa PRT G590 387 316 Zea mays DNA G664 437 247 Oryza sativa PRT G590 387 317 Zea mays DNA G664 437 248 Oryza sativa DNA G590 387 318 Zea mays DNA G664 437 249 Zea mays DNA G590 387 319 Oryza sativa DNA G68O 463 250 Glycine max DNA GS92 391 320 Zea mays DNA G68O 463 251 Glycine max DNA GS92 391 55 321 Glycine max DNA G736 487 252 Glycine max DNA GS92 391 322 Glycine max DNA G736 487 253 Glycine max DNA GS92 391 323 Oryza sativa PRT G736 487 254 Glycine max DNA GS92 391 324 Glycine max DNA G748 497 255 Oryza sativa DNA GS92 391 325 Glycine max DNA G748 497 256 Oryza sativa DNA GS92 391 326 Glycine max DNA G748 497 257 Oryza sativa DNA GS92 391 327 Oryza sativa DNA G748 497 258 Oryza sativa PRT GS92 391 60 328 Oryza sativa DNA G748 497 259 Oryza sativa PRT GS92 391 329 Oryza sativa PRT G748 497 260 Oryza sativa DNA GS92 391 330 Oryza sativa PRT G748 497 261 Zea mays DNA GS92 391 331 Oryza sativa PRT G748 497 262 Zea mays DNA GS92 391 332 Oryza sativa PRT G748 497 263 Zea mays DNA GS92 391 333 Zea mays DNA G748 497 264 Zea mays DNA GS92 391 65 334 Glycine max DNA G789 539 265 Glycine max DNA G627 40S 335 Glycine max DNA G789 539 US 8,809,630 B2 129 130 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor H 10 336 Oryza sativa DNA G789 539 406 Glycine max DNA G912 615 337 Oryza sativa DNA G789 539 407 Glycine max DNA G912 615 338 Oryza sativa PRT G789 539 408 Glycine max DNA G912 615 339 Oryza sativa PRT G789 539 409 Glycine max DNA G912 615 340 Oryza sativa PRT G789 539 410 Oryza sativa DNA G912 615 341 Zea mays DNA G789 539 15 411 Oryza sativa PRT G912 615 342 Glycine max DNA G8O1 S49 412 Oryza sativa PRT G912 615 343 Glycine max DNA G8O1 S49 413 Oryza sativa PRT G912 615 344 Zea mays DNA G8O1 S49 414 Oryza sativa PRT G912 615 345 Glycine max DNA G849 565 415 Oryza sativa DNA G912 615 346 Glycine max DNA G849 565 416 Zea mays DNA G912 615 347 Glycine max DNA G849 565 417 Zea mays DNA G912 615 348 Glycine max DNA G849 565 2O 418 Zea mays DNA G912 615 349 Glycine max DNA G849 565 419 Zea mays DNA G912 615 350 Glycine max DNA G849 565 420 Zea mays DNA G912 615 351 Zea mays DNA G849 565 421 Glycine max DNA G961 633 352 Zea mays DNA G849 565 422 Glycine max DNA G961 633 353 Zea mays DNA G849 565 423 Oryza sativa DNA G961 633 354 Glycine max DNA G864 573 25 424 Oryza sativa PRT G961 633 355 Glycine max DNA G864 573 425 Zea mays DNA G961 633 356 Glycine max DNA G864 573 426 Zea mays DNA G961 633 357 Glycine max DNA G864 573 427 Zea mays DNA G961 633 358 Glycine max DNA G864 573 428 Glycine max DNA G974 641 359 Glycine max DNA G864 573 429 Glycine max DNA G974 641 360 Oryza sativa DNA G864 573 30 430 Glycine max DNA G974 641 361 Oryza sativa PRT G864 573 431 Glycine max DNA G974 641 362 Oryza sativa PRT G864 573 432 Glycine max DNA G974 641 363 Zea mays DNA G864 573 433 Glycine max DNA G974 641 364 Zea mays DNA G864 573 434 Oryza sativa DNA G974 641 365 Zea mays DNA G864 573 435 Oryza sativa PRT G974 641 366 Glycine max DNA G867 579 35 436 Oryza sativa PRT G974 641 367 Glycine max DNA G867 579 437 Oryza sativa PRT G974 641 368 Glycine max DNA G867 579 438 Zea mays DNA G974 641 369 Glycine max DNA G867 579 439 Zea mays DNA G974 641 370 Glycine max DNA G867 579 440 Zea mays DNA G974 641 371 Glycine max DNA G867 579 441 Zea mayS DNA G974 641 372 Oryza sativa DNA G867 579 442 Glycine max DNA G975 643 373 Oryza sativa PRT G867 579 40 443 Glycine max DNA G975 643 374 Oryza sativa PRT G867 579 444 Glycine max DNA G975 643 375 Oryza sativa PRT G867 579 445 Glycine max DNA G975 643 376 Oryza sativa DNA G867 579 446 Glycine max DNA G975 643 377 Zea mays DNA G867 579 447 Oryza sativa DNA G975 643 378 Zea mays DNA G867 579 448 Oryza sativa PRT G975 643 379 Zea mays DNA G867 579 45 449 Oryza sativa DNA G975 643 380 Zea mays DNA G867 579 450 Zea mays DNA G975 643 381 Glycine max DNA G869 581 451 Zea mays DNA G975 643 382 Glycine max DNA G869 581 452 Glycine max DNA G979 649 383 Oryza sativa DNA G869 581 453 Glycine max DNA G979 649 384 Oryza sativa PRT G869 581 454 Glycine max DNA G979 649 385 Zea mays DNA G869 581 50 455 Oryza sativa DNA G979 649 386 Glycine max DNA G877 583 456 Oryza sativa PRT G979 649 387 Oryza sativa DNA G877 583 457 Oryza sativa PRT G979 649 388 Oryza sativa DNA G877 583 458 Oryza sativa PRT G979 649 389 Oryza sativa PRT G877 583 459 Oryza sativa PRT G979 649 390 Oryza sativa PRT G877 583 460 Oryza sativa PRT G979 649 391 Oryza sativa PRT G877 583 55 461 Zea mays DNA G979 649 392 Zea mays DNA G877 583 462 Zea mays DNA G979 649 393 Zea mays DNA G877 583 463 Zea mays DNA G979 649 394 Zea mays DNA G877 583 464 Glycine max DNA G987 653 395 Glycine max DNA G881 587 465 Glycine max DNA G987 653 396 Oryza sativa PRT G881 587 466 Glycine max DNA G987 653 397 Oryza sativa DNA G881 587 467 Glycine max DNA G987 653 398 Oryza sativa DNA G881 587 60 468 Glycine max DNA G987 653 399 Zea mays DNA G881 587 469 Glycine max DNA G987 653 400 Zea mays DNA G881 587 470 Oryza sativa DNA G987 653 401 Zea mays DNA G881 587 471 Oryza sativa DNA G987 653 402 Zea mays DNA G881 587 472 Oryza sativa PRT G987 653 403 Glycine max DNA G912 615 473 Zea mays DNA G987 653 404 Glycine max DNA G912 615 65 474 Glycine max DNA G1052 699 405 Glycine max DNA G912 615 475 Glycine max DNA G1052 699 US 8,809,630 B2 131 132 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor H 10 476 Glycine max DNA G1052 699 546 Zea mays DNA G1145 749 477 Glycine max DNA G1052 699 547 Zea mays DNA G1145 749 478 Glycine max DNA G1052 699 548 Zea mays DNA G1145 749 479 Glycine max DNA G1052 699 549 Zea mays DNA G1145 749 480 Glycine max DNA G1052 699 550 Glycine max DNA G1198 763 481 Oryza sativa DNA G1052 699 15 551 Glycine max DNA G1198 763 482 Oryza sativa DNA G1052 699 552 Glycine max DNA G1198 763 483 Oryza sativa PRT G1052 699 553 Glycine max DNA G1198 763 484 Oryza sativa PRT G1052 699 554 Glycine max DNA G1198 763 485 Zea mays DNA G1052 699 555 Glycine max DNA G1198 763 486 Zea mays DNA G1052 699 556 Glycine max DNA G1198 763 487 Zea mays DNA G1052 699 557 Glycine max DNA G1198 763 488 Zea mays DNA G1052 699 2O 558 Oryza sativa DNA G1198 763 489 Zea mays DNA G1052 699 559 Oryza sativa DNA G1198 763 490 Zea mays DNA G1052 699 560 Oryza sativa DNA G1198 763 491 Zea mays DNA G1052 699 561 Oryza sativa DNA G1198 763 492 Zea mays DNA G1052 699 562 Oryza sativa DNA G1198 763 493 Zea mays DNA G1052 699 563 Oryza sativa PRT G1198 763 494 Glycine max DNA G1062 713 25 564 Oryza sativa PRT G1198 763 495 Glycine max DNA G1062 713 565 Oryza sativa PRT G1198 763 496 Glycine max DNA G1062 713 566 Oryza sativa PRT G1198 763 497 Glycine max DNA G1062 713 567 Oryza sativa PRT G1198 763 498 Oryza sativa DNA G1062 713 568 Oryza sativa PRT G1198 763 499 Oryza sativa DNA G1062 713 569 Oryza sativa PRT G1198 763 500 Oryza sativa PRT G1062 713 30 570 Zea mays DNA G1198 763 501 Zea mays DNA G1062 713 571 Zea mays DNA G1198 763 502 Zea mays DNA G1062 713 572 Zea mays DNA G1198 763 503 Zea mays DNA G1062 713 573 Zea mays DNA G1198 763 504 Zea mays DNA G1062 713 574 Zea mays DNA G1198 763 505 Zea mays DNA G1062 713 575 Zea mays DNA G1198 763 506 Glycine max DNA G1069 721 35 576 Zea mays DNA G1198 763 507 Glycine max DNA G1069 721 577 Zea mays DNA G1198 763 508 Oryza sativa PRT G1069 721 578 Zea mays DNA G1198 763 509 Zea mays DNA G1069 721 579 Zea mays DNA G1198 763 510 Oryza sativa PRT G1073 723 580 Glycine max DNA G1242 787 511 Oryza sativa PRT G1073 723 581 Oryza sativa DNA G1242 787 512 Glycine max DNA G1075 725 582 Oryza sativa PRT G1242 787 513 Glycine max DNA G1075 725 40 583 Oryza sativa PRT G1242 787 514 Glycine max DNA G1075 725 584 Zea mays DNA G1242 787 515 Glycine max DNA G1075 725 585 Zea mays DNA G1242 787 516 Glycine max DNA G1075 725 586 Glycine max DNA G1255 793 517 Oryza sativa DNA G1075 725 587 Glycine max DNA G1255 793 518 Oryza sativa DNA G1075 725 588 Glycine max DNA G1255 793 519 Oryza sativa DNA G1075 725 45 589 Glycine max DNA G1255 793 520 Oryza sativa PRT G1089 731 590 Glycine max DNA G1255 793 521 Oryza sativa DNA G1089 731 591 Glycine max DNA G1255 793 522 Zea mays DNA G1089 731 592 Glycine max DNA G1255 793 523 Zea mays DNA G1089 731 593 Oryza sativa DNA G1255 793 524 Zea mays DNA G1089 731 594 Oryza sativa PRT G1255 793 525 Zea mays DNA G1089 731 50 595 Oryza sativa DNA G1255 793 526 Zea mays DNA G1089 731 596 Oryza sativa DNA G1255 793 527 Glycine max DNA G1134 741 597 Oryza sativa DNA G1255 793 528 Glycine max DNA G1134 741 598 Zea mays DNA G1255 793 529 Oryza sativa DNA G1134 741 599 Zea mays DNA G1255 793 530 Glycine max DNA G1145 749 600 Zea mays DNA G1255 793 531 Glycine max DNA G1145 749 55 601 Zea mays DNA G1255 793 532 Glycine max DNA G1145 749 602 Zea mays DNA G1255 793 533 Glycine max DNA G1145 749 603 Zea mays DNA G1255 793 534 Glycine max DNA G1145 749 604 Glycine max DNA G1266 799 535 Glycine max DNA G1145 749 605 Glycine max DNA G1266 799 536 Glycine max DNA G1145 749 606 Glycine max DNA G1266 799 537 Glycine max DNA G1145 749 607 Glycine max DNA G1266 799 538 Oryza sativa PRT G1145 749 60 608 Oryza sativa DNA G1266 799 539 Oryza sativa PRT G1145 749 609 Glycine max DNA G1274 805 540 Oryza sativa PRT G1145 749 610 Glycine max DNA G1274 805 541 Oryza sativa PRT G1145 749 611 Oryza sativa PRT G1274 805 542 Oryza sativa PRT G1145 749 612 Oryza sativa PRT G1274 805 543 Oryza sativa PRT G1145 749 613 Zea mays DNA G1274 805 544. Oryza sativa DNA G1145 749 65 614 Zea mays DNA G1274 805 545 Zea mays DNA G1145 749 615 Zea mays DNA G1274 805 US 8,809,630 B2 133 134 TABLE 7-continued TABLE 7-continued Orthologs of Representative Arabidopsis Transcription Factor Genes Orthologs of Representative Arabidopsis Transcription Factor Genes SEQ ID SEQ ID NO: of SEQID SEQ ID NO: of NO: of Sequence Nucleotide 5 NO: of Sequence Nucleotide Ortholog type GID NO of Encoding Ortholog type GID NO of Encoding O Species from used for Orthologous Orthologous O Species from used for Orthologous Orthologous Nucleotide Which determination Arabidopsis Arabidopsis Nucleotide Which determination Arabidopsis Arabidopsis Encoding Orthologis (DNA Transcription Transcription Encoding Orthologis (DNA Transcription Transcription Ortholog Derived or Protein) Factor Factor Ortholog Derived or Protein) Factor Factor 10 616 Zea mays DNA G1274 805 686 Oryza sativa PRT G1560 925 617 Oryza sativa DNA G1275 807 687 Oryza sativa PRT G1560 925 618 Oryza sativa PRT G1275 807 688 Oryza sativa PRT G1560 925 619 Oryza sativa PRT G1275 807 689 Oryza sativa PRT G1560 925 620 Oryza sativa PRT G1275 807 690 Zea mays DNA G1560 925 621 Zea mays DNA G1275 807 15 691 Zea mays DNA G1560 925 622 Zea mays DNA G1275 807 692 Zea mays DNA G1560 925 623 Zea mays DNA G1275 807 693 Zea mays DNA G1560 925 624 Glycine max DNA G1313 829 694 Zea mays DNA G1560 925 625 Oryza sativa DNA G1313 829 695 Zea mays DNA G1560 925 626 Oryza sativa PRT G1313 829 696 Oryza sativa PRT G1645 929 627 Oryza sativa PRT G1313 829 697 Zea mays DNA G1645 929 628 Zea mays DNA G1313 829 698 Zea mays DNA G1645 929 629 Zea mays DNA G1313 829 699 Zea mays DNA G1645 929 630 Zea mays DNA G1313 829 700 Glycine max DNA G1760 937 631 Glycine max DNA G1322 84 701 Glycine max DNA G1760 937 632 Glycine max DNA G1322 84 702 Oryza sativa PRT G1760 937 633 Glycine max DNA G1322 84 703 Oryza sativa PRT G1760 937 634 Oryza sativa DNA G1322 84 25 704 Zea mays DNA G1760 937 635 Oryza sativa PRT G1322 84 705 Zea mays DNA G1760 937 636 Oryza sativa PRT G1322 84 706 Zea mays DNA G1760 937 637 Zea mays DNA G1323 843 707 Oryza sativa DNA G1816 939 638 Zea mays DNA G1323 843 708 Glycine max DNA G2010 951 639 Glycine max DNA G1417 88 Glycine max DNA G2347 957 640 Oryza sativa PRT G1417 88 30 709 Oryza sativa DNA G2010 951 641 Oryza sativa PRT G1417 88 Oryza sativa DNA G2347 957 642 Glycine max DNA G1449 89 710 Zea mays DNA G2010 951 643 Glycine max DNA G1449 89 711 Zea mays DNA G2010 951 644. Oryza sativa DNA G1449 89 Zea mays DNA G2347 957 645 Oryza sativa DNA G1449 89 970 Glycine max DNA G859 567 646 Zea mays DNA G1449 89 35 Glycine max DNA G1842 943 647 Zea mays DNA G1449 89 Glycine max DNA G1843 945 648 Zea mays DNA G1449 89 Glycine max DNA G1844 947 649 Zea mays DNA G1449 89 971 Glycine max PRT G859 568 650 Glycine max DNA G1451 893 Glycine max PRT G1842 944 651 Glycine max DNA G1451 893 Glycine max PRT G1843 946 652 Oryza sativa DNA G1451 893 Glycine max PRT G1844 948 653 Oryza sativa DNA G1451 893 40 972 Glycine max DNA G859 567 654 Oryza sativa DNA G1451 893 Glycine max DNA G1842 943 655 Oryza sativa PRT G1451 893 Glycine max DNA G1843 945 656 Oryza sativa PRT G1451 893 Glycine max DNA G1844 947 657 Oryza sativa PRT G1451 893 973 Glycine max PRT G859 568 658 Oryza sativa PRT G1451 893 Glycine max PRT G1842 944 659 Zea mays DNA G1451 893 45 Glycine max PRT G1843 946 660 Zea mays DNA G1451 893 Glycine max PRT G1844 948 661 Zea mays DNA G1451 893 662 Zea mays DNA G1451 893 663 Glycine max DNA G1482 905 Table 8 lists a Summary of homologous sequences identi 664 Glycine max DNA G1482 905 fied using BLAST (tblastX program). The first column shows 665 Glycine max DNA G1482 905 50 666 Glycine max DNA G1482 905 the polynucleotide sequence identifier (SEQ ID NO:), the 667 Glycine max DNA G1482 905 second column shows the corresponding cDNA identifier 668 Oryza sativa DNA G1482 905 669 Oryza sativa DNA G1482 905 (Gene ID or GID), the third column shows the orthologous or 670 Oryza sativa DNA G1482 905 homologous polynucleotide GenBank Accession Number 671 Oryza sativa DNA G1482 905 55 (Test Sequence ID), the fourth column shows the calculated 672 Oryza sativa PRT G1482 905 probability value that the sequence identity is due to chance 673 Oryza sativa PRT G1482 905 674 Zea mays DNA G1482 905 (Smallest Sum Probability), the fifth column shows the plant 675 Zea mays DNA G1482 905 species from which the test sequence was isolated (Test 676 Zea mays DNA G1482 905 Sequence Species), and the sixth column shows the ortholo 677 Zea mays DNA G1482 905 678 Zea mays DNA G1482 905 60 gous or homologous test sequence GenBank annotation (Test 679 Zea mays DNA G1482 905 Sequence GenBank Annotation). 680 Oryza sativa PRT G1499 913 Table 9 lists sequences discovered to be paralogous to a 681 Glycine max DNA G1540 919 682 Oryza sativa PRT G1540 919 number of transcription factors of the present invention. The 683 Glycine max DNA G1560 925 columns headings include, from left to right, the Arabidopsis 684 Glycine max DNA G1560 925 65 SEQ ID NO; corresponding Arabidopsis Gene ID (GID) 685 Oryza sativa DNA G1560 925 numbers; the GID numbers of the paralogs discovered in a database search; and the SEQID NOs of the paralogs.

US 8,809,630 B2 137 138 TABLE 10 Honologous relationships found within the Sequence Listing DNA or Species from Which SEQID Protein Homologous NO: GID No. (PRT) Sequence is Derived Relationship of SEQID NO: to Other Genes

961 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

962 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

963 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

964 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

96S DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous

967 DNA Zea mays Predicted polypeptide seq uence is or O Ogous

968 DNA Zea mays Predicted polypeptide seq uence is or O Ogous

969 DNA Zea mays Predicted polypeptide seq uence is or O Ogous

970 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

971 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

972 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

973 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

974 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous

975 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous

976 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous

977 DNA Zea mays Predicted polypeptide seq uence is or O ogous

978 DNA Zea mays Predicted polypeptide seq uence is or O Ogous

979 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

98O DNA Glycine max Predicted polypeptide seq uence is or O Ogous

981 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

982 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

983 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

984 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

985 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

986 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

987 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

988 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G24 989 PRT Oryza sativa Orthologous to G24 990 PRT Oryza sativa Orthologous to G24 991 PRT Oryza sativa Orthologous to G24 992 DNA Zea mays Predicted polypeptide seq uence is or O Ogous

993 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

994 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

995 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

996 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

997 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

998 DNA Glycine max Predicted polypeptide seq uence is or O Ogous

999 DNA Glycine max Predicted polypeptide seq uence is or O Ogous US 8,809,630 B2 139 140 TABLE 10-continued Honologous relationships found within the Sequence Listing DNA or Species from Which SEQID Protein Homologous NO: GID No. (PRT) Sequence is Derived Relationship of SEQID NO: to Other Genes OOO DNA Glycine max Predicted polypeptide sequence is orthologous O G28 OO1 PRT Oryza sativa Orthologous to G28 OO2 PRT Oryza sativa Orthologous to G28 OO3 DNA Zea mays Predicted polypeptide sequence is orthologous O G28 OO4 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 005 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 OO6 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 OO7 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 OO8 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 O09 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 O10 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 O11 DNA Glycine max Predicted polypeptide sequence is orthologous O G46 O12 PRT Oryza sativa Orthologous to G46 O13 DNA Zea mays Predicted polypeptide sequence is orthologous O G46 O14 DNA Glycine max Predicted polypeptide sequence is orthologous O G157, G859, G1842, G1843 O15 DNA Glycine max Predicted polypeptide sequence is orthologous O G18O O16 DNA Glycine max Predicted polypeptide sequence is orthologous O G18O O17 DNA Oryza sativa Predicted polypeptide sequence is orthologous O G18O O18 PRT Oryza sativa Orthologous to G180 O19 DNA Zea mays Predicted polypeptide sequence is orthologous O G18O O2O DNA Glycine max Predicted polypeptide sequence is orthologous O G188 O21 PRT Oryza sativa Orthologous to G188 O22 PRT Oryza sativa Orthologous to G188 O23 DNA Zea mays Predicted polypeptide sequence is orthologous O G188 O24 DNA Glycine max Predicted polypeptide sequence is orthologous O G192 O25 PRT Oryza sativa Orthologous to G192 O26 DNA Glycine max Predicted polypeptide sequence is orthologous O G196 O27 PRT Oryza sativa Orthologous to G196 O28 PRT Oryza sativa Orthologous to G196 O29 PRT Oryza sativa Orthologous to G196 O3O DNA Zea mays Predicted polypeptide sequence is orthologous O G196 O31 DNA Zea mays Predicted polypeptide sequence is orthologous O G196 O32 DNA Glycine max Predicted polypeptide sequence is orthologous O G211 O33 DNA Oryza sativa Predicted polypeptide sequence is orthologous O G211 O34 PRT Oryza sativa Orthologous to G21 O35 DNA Glycine max Predicted polypeptide sequence is orthologous o G214, G680 O36 DNA Glycine max Predicted polypeptide sequence is orthologous o G214, G680 O37 DNA Glycine max Predicted polypeptide sequence is orthologous o G214, G680 O38 DNA Glycine max Predicted polypeptide sequence is orthologous o G214, G680 O39 DNA Oryza sativa Predicted polypeptide sequence is orthologous o G214, G680 O40 DNA Oryza sativa Predicted polypeptide sequence is orthologous o G214, G680 O41 DNA Zea mays Predicted polypeptide sequence is orthologous o G214, G680 US 8,809,630 B2 141 142 TABLE 10-continued Honologous relationships found within the Sequence Listing DNA or Species from Which SEQID Protein Homologous NO: GID No. (PRT) Sequence is Derived Relationship of SEQID NO: to Other Genes O42 DNA Zea mays Predic polypeptide sequence is orthologous O G21 G68O O43 DNA Zea mays Predic polypeptide sequence is orthologous O G21 G68O O44 DNA Glycine max Predic polypeptide sequence is orthologous O G.22 G682, G1816, G2718 O45 DNA Glycine max Predic polypeptide sequence is orthologous O G.22 G1816, G2718 O46 DNA Glycine max Predic polypeptide sequence is orthologous O G.22 G682, 16, G2718 O47 DNA Glycine max Predic polypeptide sequence is orthologous O G.22 G682, G1816, G2718 O48 DNA Glycine max Predic polypeptide sequence is orthologous O G.22 G682, G1816, G2718 O49 DNA Oryza sativa Predic polypeptide sequence is orthologous O G.22 G 6 8 2. G 816, G2718 050 PRT Oryza sativa Orthologous to G2 26, G682, G1816, G2718 051 PRT Oryza sativa Orthologous to G.226, G682, G1816, G2718 052 DNA Zea mays Predicted polypeptide sequence is orthologous o G.226, G682, G1816, G2718 053 DNA Zea mays Predicted polypeptide sequence is orthologous o G.226, G682, G1816, G2718 OS4 DNA Glycine max Predicted polypeptide sequence is orthologous O G24 055 DNA Glycine max Predicted polypeptide sequence is orthologous O G24 OS6 DNA Glycine max Predicted polypeptide sequence is orthologous O G24 057 DNA Oryza sativa Predicted polypeptide sequence is orthologous O G24 OS8 DNA Zea mays Predicted polypeptide sequence is orthologous O G24 059 DNA Zea mays Predicted polypeptide sequence is orthologous O G24 O60 DNA Zea mays Predicted polypeptide sequence is orthologous O G24 O61 DNA Zea mays Predicted polypeptide sequence is orthologous O G24 O62 DNA Zea mays Predicted polypeptide sequence is orthologous O G24 O63 DNA Glycine max Predic polypeptide sequence is orthologous O G2S O64 DNA Glycine max Predic polypeptide sequence is orthologous O G2S O6S DNA Glycine max Predic polypeptide sequence is orthologous O G2S O66 DNA Glycine max Predic polypeptide sequence is orthologous O G2S O67 DNA Glycine max Predic polypeptide sequence is orthologous O G2S O68 DNA Glycine max Predic polypeptide sequence is orthologous O G2S O69 DNA Glycine max Predic polypeptide sequence is orthologous O G2S DNA Glycine max Predic polypeptide sequence is orthologous O G2S DNA Oryza sativa Predicted polypeptide sequence is orthologous O G256 PRT Oryza sativa Orthologous to G.256 PRT Oryza sativa Orthologous to G.256 PRT Oryza sativa Orthologous to G.256 PRT Oryza sativa Orthologous to G.256 PRT Oryza sativa Orthologous to G.256 DNA Zea mays Predicted polypeptide sequence is orthologous to G256 DNA Zea mays Predicted polypeptide sequence is orthologous to G256 DNA Zea mays Predicted polypeptide sequence is orthologous to G256 DNA Zea mays Predicted polypeptide sequence is orthologous to G256 DNA Zea mays Predicted polypeptide sequence is orthologous to G256 US 8,809,630 B2 143 144 TABLE 10-continued Honologous relationships found within the Sequence Listing DNA or Species from Which SEQID Protein Homologous NO: GID No. (PRT) Sequence is Derived Relationship of SEQID NO: to Other Genes

DNA Zea mays Predicted polype i (StG uence is or O Ogous O G256 DNA Glycine max Predicted polype (StG uence is or O Ogous O G325 DNA Zea mays Predicted polype (StG uence is or O Ogous O G325 DNA Glycine max Predicted polype (StG uence is or O Ogous O G361 DNA Glycine max Predicted polype (StG uence is or O Ogous O G361 DNA Glycine max Predicted polype (StG uence is or O Ogous O G361 DNA Glycine max Predicted polype (StG uence is or O Ogous O G361 DNA Glycine max Predicted polype (StG uence is or O Ogous O G361 DNA Oryza sativa Predicted polype i (StG uence is or O Ogous O G361 PRT Oryza sativa Orthologous to G36 PRT Oryza sativa Orthologous to G36 PRT Oryza sativa Orthologous to G36 PRT Oryza sativa Orthologous to G36 PRT Oryza sativa Orthologous to G36 DNA Zea mays Predic polype (StG uence is or O Ogous O G36 DNA Zea mays Predic polype (StG uence is or O Ogous O G36 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 DNA Glycine max Predic polype (StG uence is or O Ogous O G39 G438 DNA Glycine max Predicted polype (StG uence is or O Ogous O G390, G438 DNA Oryza sativa Predicted polype i (StG uence is or O Ogous O G390 PRT Oryza sativa Orthologous to G390, G438 PRT Oryza sativa Orthologous to G390, G438 PRT Oryza sativa Orthologous to G390, G438 PRT Oryza sativa Orthologous to G390, G438 DNA Oryza sativa Predicted polype (StG uence is or O Ogous O G 3 9 G438 4 DNA Zea mays polype (StG uence is or O Ogous G438 DNA Zea mays polype (StG uence is or O Ogous O G 3 9 G438 DNA Zea mays polype (StG uence is or O Ogous O G 3 9 G438 DNA Zea mays polype (StG uence is or O Ogous

DNA Zea mays polype (StG uence is or O Ogous O G 3 9 G438 DNA Zea mays polype (StG uence is or O Ogous O G 3 9 G438 DNA Zea mays polype (StG uence is or O Ogous G438 21 DNA Zea mays OO GG 33 99 polype (StG uence is or O Ogous G438 22 DNA Zea mays Predicted polype (StG uence is or O Ogous o G390, G438 US 8,809,630 B2 145 146 TABLE 10-continued Honologous relationships found within the Sequence Listing DNA or Species from Which SEQID Protein Homologous NO: GID No. (PRT) Sequence is Derived Relationship O SEQ ID NO: to Other Genes

23 DNA Zea mays Predicted polypeptide seq uence is or O Ogous o G390, G438 24 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 25 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 26 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 27 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 28 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 29 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 30 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 31 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G409 32 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G409 33 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G409 34 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G409 35 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 36 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 37 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 38 DNA Zea mays Predicted polypeptide seq uence is or O ogous O G409 39 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 40 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 41 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G409 42 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G438 43 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G438 44 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G438 45 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G438 46 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G438 47 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G438 48 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G438 49 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G464 50 PRT Oryza sativa Orthologous to G464 51 DNA Zea mays Predicted polypeptide seq uence is or O Ogous O G464 52 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G470 53 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G470 S4 DNA Oryza sativa Predicted polypeptide seq uence is or O Ogous O G470 55 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G475 56 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G482 57 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G482 58 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G482 59 DNA Glycine max Predicted polypeptide seq uence is or O Ogous O G482