US007598429B2

(12) United States Patent (10) Patent No.: US 7,598.429 B2 Heard et al. (45) Date of Patent: *Oct. 6, 2009

(54) TRANSCRIPTION FACTOR SEQUENCES 6,329,567 B1 12/2001 Jofuku et al. FOR CONFERRING ADVANTAGEOUS 6,417,428 B1 7/2002 Thomashow et al. PROPERTIES TO 6,664,446 B2 12/2003 Heard et al. 6,706,866 B1 3/2004 Thomashow et al. (75) Inventors: Jacqueline E. Heard, Stonington, CT 6,717,034 B2 4/2004 Jiang (US); Jose Luis Riechmann, Pasadena, 6.833,446 B1 12/2004 Wood et al. CA (US). Oliver Ratcliffe, Oakland, CA 6,835,540 B2 12/2004 Broun SS Omaira Pineda, Vero Beach, FL 6,846,669 B1 1/2005 Jofuku et al. (US) 6,946,586 B1 9, 2005 Fromm et al. (73) Assignee: Mendel Biotechnology, Inc., Hayward, 7,109,393 B2 9, 2006 Gutterson et al. CA (US) 7,135,616 B2 * 1 1/2006 Heard et al...... 800,278 7,196,245 B2 3/2007 Jiang et al. (*) Notice: Subject to any disclaimer, the term of this 7,223,904 B2 5/2007 Heard et al. patent is extended or adjusted under 35 7,238,860 B2 7/2007 Ratcliffe et al. U.S.C. 154(b) by 170 days. 7,345,217 B2 3/2008 Zhang et al. 2002fOO40490 A1 4/2002 Gorlach et al. This patent is subject to a terminal dis- 2003/0093837 A1 5.2003 Keddie et al. claimer. 2003/O121070 A1 6/2003 Adam et al. (21) Appl. No.: 11/375,241 2003. O135888 A1 7/2003 Zhu et al. 2003/0217383 A1 11/2003 Reuber et al. (22) Filed: Mar 13, 2006 2003/0226,170 A1 12/2003 Lammers et al. 2003/0233680 A1 12/2003 Thomashow et al. (65) Prior Publication Data 2004/0010815 A1 1/2004 Kreps et al. US 2006/O195944 A1 Aug. 31, 2006 Related U.S. Application Data (Continued) (63) Continuation-in-part of application No. 10/225,067, filedon Aug.9, 2002, now Pat. No. 7,135,616, which is FOREIGN PATENT DOCUMENTS a continuation-in-part of application No. 09/837,944, filed on Apr. 18, 2001, now abandoned, and a continu- AU 735715 2, 1998 ation-in-part of application No. 10/171.468, filed on Jun. 14, 2002, now abandoned, application No. 1 1/375,241, filed on Mar. 13, 2006, which is a continu- (Continued) ation-in-part of application No. 10/714,887, filed on Nov. 13, 2003, and a continuation-in-part of applica- OTHER PUBLICATIONS tion No. 10/666,642, filed on Sep. 18, 2003, now Pat. No. 7,196,245. Liu Q. et al. Two transcription factors, DREB1 and DREB2, with an EREBP/AP2 DNA binding domain separate two cellular signal (60) Provisional application No. 60/713,952, filed on Aug. transduction pathways in drought- and low-temperature-responsive 31, 2005, provisional application No. 60/336,049, gene expression, respectively, in Arabidopsis. Cell. Aug. filed on Nov. 19, 2001, provisional application No. 1998; 10(8): 1391-406.* 60/310,847, filed on Aug. 9, 2001, provisional appli cation No. 60/338,692, filed on Dec. 11, 2001, provi- (Continued) sional application No. 60465,809, filed on Apr. 24. Primary Examiner Cynthia Collins 2003, provisional application No. 60/434,166, filed on (74) Attorney, Agent, or Firm Jeffrey M. Libby Dec. 17, 2002, provisional application No. 60/411, 837, filed on Sep. 18, 2002. (57) ABSTRACT (51) Int. Cl. CI2N 5/82 (2006.01) The invention relates to plant transcription factor polypep (52) U.S. Cl...... grgrrr. 800/290; 800/289 tides, polynucleotides that encode them, homologs from a (58) Field of Classification Search ...... None variety of plant species, and methods of using the polynucle See application file for complete search history. otides and polypeptides to produce transgenic plants having (56) References Cited advantageous properties compared to a reference or control plant, including increased plant size, seed size, increased leaf U.S. PATENT DOCUMENTS size, lignification, water deprivation tolerance, cold toler ance, or altered flowering time. Sequence information related 5,892,009 A 4/1999 Thomashow et al. to these polynucleotides and polypeptides can also be used in 5,981,729 A 11/1999 Chun et al. bioinformatic search methods and is also disclosed. 5,994,622 A 11/1999 Jofuku et al. 6,093,874 A 7/2000 Jofuku et al. 6,121,513 A 9/2000 Zhang et al. 21 Claims, 9 Drawing Sheets US 7,598.429 B2 Page 2

U.S. PATENT DOCUMENTS WO WO2O06033708 A2 3, 2006 WO WO2006069201 A2 6, 2006 2004, OO16025 A1 1, 2004 Kreps et al. WO WO2006130156 A2 12/2006 2004/OO 19927 A1 1, 2004 Sherman et al. WO WO2007O28165 A2 3, 2007 2004/OO31072 A1 2, 2004 La Rosa et al. WO WO2007O28165 8, 2007 2004.0034888 A1 2, 2004 Liu et al. WO WO2007O28165 A3 8, 2007 2004/O123343 A1 6, 2004 La Rosa et al. WO WO2007 127186 A2 11/2007 2004/O128712 A1 T/2004 Jiang et al. ZA 2001007413 11, 2002 2004.0143098 A1 T/2004 Pages et al. 2004/0172684 A1 9, 2004 Kovalic et al. OTHER PUBLICATIONS 2004/0214272 A1 10, 2004 La Rosa et al. 2004/0216190 A1 10, 2004 Kovalic et al. U.S. Appl. No. 12/064.961, Gutterson et al. 2004/O259145 A1 12, 2004 Wood et al. U.S. Appl. No. 12/077,535, Repetti et al. 2005.00091.87 A1 1/2005 Shinozaki et al. U.S. Appl. No. 1 1/981,576, Gutterson et al. 2005/0O86718 A1 4, 2005 Heard et al. U.S. Appl. No. 1 1/632,390, Zhang et al. 2005/OO97638 A1 5/2005 Jiang et al. U.S. Appl. No. 1 1/986,992, filed, Kumimoto et al. 2005/O155117 A1 7/2005 Century et al. U.S. Appl. No. 12/157,329, filed, Zhang et al. 2005/0172364 A1 8, 2005 Heard et al. U.S. Appl. No. 12/169,527, filed, Zhang et al. 2006, OOO8874 A1 1, 2006 Creelman et al. U.S. Appl. No. 12/154,154, Century et al. 2006, OO15972 A1 1, 2006 Heard et al. U.S. Appl. No. 1 1/981,733, Ratcliffe et al. 2006,0162018 A1 T/2006 Gutterson et al. Winicov. ILGA et al., (Jun. 1999) Transgenic overexpression of the 2006/0195944 A1 8, 2006 Heard et al. transcription factor Alfin lenhances expression of the endogenous 2006/0242738 A1 10, 2006 Sherman et al. MsPRP2 gene in alfalfa and improves salinity tolerance of the plants. 2006/0272060 A1 11, 2006 Heard et al. Plant Physiology vol. 120, No. 2, pp. 473-480. 2007/0022495 A1 1/2007 Reuber Whisstock, et al. (Aug. 2003). Prediction of protein function from 2007/0101.454 A1 5/2007 Jiang et al. protein sequence and structure. Q Rev Biophys 36(3):307-340 2007/O186308 A1 8, 2007 Reuber et al. Review. 2007/O1991.07 A1 8, 2007 Ratcliffe et al. 2007/0209086 A1 9, 2007 Ratcliffe et al. Stern.M. And Ganetzky.B. (1992) Identification and characterization 2007/0226839 A1 9, 2007 Gutterson et al. of inebriated, a gene affecting neuronal excitability in Drosophila. J. 2008/0010703 A1 1, 2008 Creelman et al. Neurogenet. 8 (3), 157-172. 2008. O1557O6 A1 6, 2008 Riechmann et al. Soehnge, H., etal. (1996). A neurotransmitter transporter encoded by 2008/O163397 A1 T/2008 Ratcliffe et al. the Drosophilainebriated gene. Proc. Natl. Acad. Sci. U.S.A. 93 (23), 13262-13267. 2008/0229448 Al 9, 2008 Libby et al. Bowie, et al. (1990). Deciphering the message in protein sequences: FOREIGN PATENT DOCUMENTS Tolerance to amino acid substitutions. Science 247: 1306-1310. Yang, et al. (2001). Expression of the REB transcriptional activator in AU 759027 4/2003 rice grains improves the yield. . . PNAS 98(20): 11438-11443. CN 1475497 2, 2004 McConnell, et al. (2001). Nature 411 (6838): 709–713. EP 1033405 9, 2000 Sasaki, T., et al. The genonme sequence and structure of rice chro EP 1054060 11, 2000 mosome 1. Nature 420 (6913), 312-316 (2002). EP 1055728 11, 2000 Mayer .K., et al. Conservation of microstructure between a EP O2757104 2, 2003 sequenced region of the genome of rice and multiple segments of the EP 1402042 3, 2004 genome of Arabidopsis Thaliana. Genome Res. 11(7), 1167-1174 JP 20031441.75 5, 2003 (2001). JP 2003219882 5, 2003 Ayele .M., et al. Whole genome shotgun sequencing of Brassica WO WO98,07842 2, 1998 oleracea and its application to gene discovery and annotation in WO WO994.1974 8, 1999 Arabidopsis. Genome Res. 15 (4), 487-495 (2005). WO WO99,55840 11, 1999 Demura.T., et al. “Visualization by comprehensive microarray analy WO WOOO32761 6, 2000 sis of gene expression programs during transdifferentiation of WO WOOO53724 A2 9, 2000 mesophyll cells into xylem cells” (Proc. Natl. Acad. Sci. U.S.A. 99 WO WO-0136598 A 5, 2001 (24), 15794-15799 (2002)). WO WOO215675 A1 2, 2002 Liu Qiang et al. (Aug. 1998)“Two transcription factors, DREB1 and WO WO O2O792.45 10, 2002 DREB2, with an EREBP/AP2 DNA binding domain separate two WO WO O2O81695 A2 10, 2002 cellular signal transduction pathways in drought-and low WO WOO3OOO898 1, 2003 termperature-responsive gene expression, respectivley, in WO WO 03008.540 1, 2003 Arabidopsis' Plant Cell, vol. 10, No. 8, pp. 1391-1406. WO WOO3O13227 A2 2, 2003 Tamagnone et al., the AmMYB308 and AmMYB330 transcription WO WOO3O13227 A3 2, 2003 factors from antirhinum regulate phenylpropanoid and lignin WO WOO3O13228 2, 2003 biosynthesis in transgenic tobacco. Plant Cell Feb. 1998; 10(2):135 WO WOO3O13228 A3 2, 2003 154. WO WO-03.014327 A 2, 2003 Park, J.M., et al. (2001). Overexpression of the tobacco Tsil gene WO WOO3O14327 A2 2, 2003 encoding an EREBP/AP2-type transcription factor enhances resis WO WO 2003O44190 A1 5, 2003 tance against pathogen attack and osmotic stress in tobacco. Plant WO WOO3O81978 10, 2003 Cell 13, 1035-1046. WO WO2004O31349 4/2004 Petersen...S.G., et al. (1989). "Analysis of RNA2 of pea early brown WO WO 2004O35798 A2 4/2004 ing virus strain SP5” (Plant Mol. Biol. 13 (6), 735-737. WO WO2004O76.638 9, 2004 Kim, et al. (Feb. 2001). A novel cold inducible zinc finger protein WO WO-2004O76638 A 9, 2004 from soybean, SCOF-1, enhances cold tolerance in transgenic plants. WO WO2005OO 1050 A2 1, 2005 The Plant Journal, vol. 25, No. 3: 247-259-. WO WO2005047516 5, 2005 Kasuga, Mie et al. (Mar. 1999), Improving plant drought, salt, and WO WO2005047516 A2 5, 2005 freezing tolerance by gene transfer of a single stress-inducible tran WO WO2005047516 A3 5, 2005 scription factor, Nature Biotechnology vol. 17. No. 3, pp. 287-291. US 7,598.429 B2 Page 3

Maes, Tamara et al: “The inflorescence architecture of Petunia Source: Arabidopsis thaliana (thale cress); Title: “Sequence and hybrida is modified by the Arabidopsis thaliana Ap2 gene” Devel analysis of chromosome 1 of the plant Arabidopsis thaliana” (Nature opmental Genetics, vol. 25, No. 3, 1999, pp. 199-208, XPO08056283. 408 (6814), 816-820 (2000)). Riechmann, J. L., et al.: “The Ap2/Erebp Family of Plant Transcrip NCBI acc. No. AC007067 (gi:4406713) (Mar. 12, 1999); Shinn.P., et tion Factors' Biological Chemistry, vol. 379, Jun. 1998, pp. 633-646, al. “Arabidopsis thaliana chromosome 1, *** Sequencing in XPOO2937907. Progress ***, 6 unordered pieces'; source: Arabidopsis thaliana Nole-Wilson Stacietal: “DNA binding properties of the Arabidopsis (thale cress); Title: "Genomic sequence for Arabidopsis thaliana floral development protein Aintegumenta' Nucleic Acids Research, BAC T10O24” (Unpublished (1999)). Oxford University Press, Surrey, GB, vol. 28, No. 21, Nov. 1, 2000, NCBI acc. No. BE320193 (gi: 11929308) (Jul 14, 2000); Watson.B. pp. 4076-4082, XP002 187932. S., et al. “NFO24B04RT1F1029 Developing root Medicago Kagaya, Yasuaki, et al.: “RAV1, a novel DNA-binding protein, binds truncatula cDNA clone NF024B04RT 5', mRNA sequence': to bipartite recognition sequence through two distinct DNA-binding source: Medicago truncatula (barrel medic); Title: “Expressed domains uniquely found in higher plants' Nucleic Acids Research, Sequence Tags from the Samuel Roberts Noble Foundation Oxford University Press, Surrey, GB, vol. 27, No. 2, Jan. 15, 1999, Medicago truncatula root library” (Unpublished (2000)). pp. 470-478, XP002314310. NCBI acc. No. AP003379 (gi: 13365596) (Mar. 16, 2001); Sasaki.T., Sakurai , et al. RARGE: a large-scale database of RIKEN etal. “Oryza sativa chromosome 1 clone P0408G07. *** Sequencing Arabidopsis resources ranging from transcriptome to phenome. in Progress ***'; source: Oryza sativa; Title: “Oryza sativa nip Nucleic Acids Research (2005), 33(Database Iss), D647-D650. ponbare(GA3) genomic DNA, chromosome 1, PAC Theologis, et al. Sequence and analysis of chromosome 1 of the plant clone:P0408G07” (Published Only in DataBase (2001) in press). Arabidopsis thaliana. Nature (London) (2000), 408(6814), 816-820. NCBI acc. No. BE319522 (gi: 11929645) (Jul 14, 2000); Watson.B Mayer, et al. (1999). Sequence and analysis of chromosome 4 of the S., et al. “NFO 19G12RT1F1088 Developing root Medicago plant Arabidopsis thaliana. Nature (London) (1999), 402(6763), truncatula cDNA clone NFO 19G12RT 5', mRNA sequence': 769-777. source: Medicago truncatula (barrel medic); Title: “Expressed Lin, et al. (1999). Sequence and analysis of chromosome 2 of the Sequence Tags from the Samuel Roberts Noble Foundation plant Arabidopsis thaliana. Nature (London) (1999), 402(6763), Medicago truncatula root library” (Unpublished (2000)). T60-768. NCBI acc. No. AW685808 (gi: 7560544) (Apr. 14, 2000); Watson.B. S., et al. “NFO35D03NR1F1000 Nodulated root Medicago Terryn, et al. (1999). Evidence for an ancient chromosomal duplica truncatula cDNA clone NF035D03NR 5', mRNA sequence': tion in Arabidopsis thaliana by sequencing and analyzing a 400-kb source: Medicago truncatula (barrel medic); Title: “Expressed contig at the APETALA2 locus on chromosome 4. FEBS Letters Sequence Tags from the Samuel Roberts Noble Foundation (1999), 445(2.3), 237-245. Medicago truncatula nodulated root library” (Unpublished (2000)). Okamuro, et al. (1997). The AP2 domain of APETALA2 defines a NCBI acc. No. BF644218 (gii: 11909347) (Dec. 20, 2000 Torres large new family of DNA binding proteins in Arabidopsis. Proceed Jerez.I., et al. “NFO60H11EC1F1096 Elicited cell culture Medicago ings of the National Academy of Sciences of the United States of truncatula cDNA clone NF060H11EC 5&apos, mRNA sequence': America (1997), 94(13), 7076-7081. source: Medicago truncatula (barrel medic); Title: “Expressed NCBI acc. No. AL360314 (gi: 8953373) (Jul. 6, 2000); Bevan.M., et Sequence Tags from the Samuel Roberts Noble Foundation Center al. “Arabidopsis thaliana DNA chromosome 5. BAC clone F2I11 for Medicago Genomics Research” (Unpublished (2000)). (ESSA project); source: Arabidopsis thaliana (thale cress); Title: NCBI acc. No. AW220454 (gi: 6531328) (Dec. 6, 1999); van der “Direct Submission” (Unpublished). Hoeven.R.S., et al. “EST302937 tomato root during/after fruit set, NCBI acc. no. D13044 (gi: 285614) (Jun. 11, 1993); Yamaguchi Cornell University Solanum lycopersicum cDNA clone cI.EX10P20, Shinozaki.K., et al. “Arabidopsis thaliana rd29A and rd29B genes'; mRNA sequence'; source: Solanum lycopersicum (Lycopersicon source: Unknown.; Title: “Arabidopsis DNA encoding two desicca esculentum); Title: “Generation of ESTs from tomato root tissue” tion-responsive rc29 genes' (Plant Physiol. 101, 1119-1120 (1993)). (Unpublished (1999)). Database EMBLOnline Jan. 13, 1998, “Arabidopsis thaliana chro NCBI acc. No. BI434553 (gi: 15259243) (Aug. 21, 2001); mosome 1 BAC T22J18 sequence, complete sequence.” Restrepo.S., et al. “EST537314 P. infestans-challenged potato leaf, XPOO2355968 retrieved from EBI accession No. compatible reaction Solanum tuberosum cDNA clone PPCBR81 EM PRO:AC003979 Database accession No. AC003979. 5' sequence, mRNA sequence'; source: Solanum tuberosum NCBI acc. No. NP 173695 (gii: 15219954) (Aug. 21, 2001); (potato); Title: “Generation of ESTs from Potato Leaves Challenged Theologis, A., et al. “TINY-like transcription factor Arabidopsis with Phytophthora infestans, Compatible Interaction” (Unpublished thaliana'; source: Arabidopsis thaliana (thale cress); Title: (2000)). “Sequence and analysis of chromosome 1 of the plant Arabidopsis NCBI acc. No. CAC39072 (gi: 14140155) (May 17, 2001); thaliana” (Nature 408 (6814), 816-820 (2000)). Mayer.K., et al. “putative AP2 domain transcription factor Oryza NCBI acc. No. AC007591 (gi: 4874280) (May 19, 1999); satival; source: Oryza sativa'; Title: “Conservation of Vysotskaia.V.S., et al. “Arabidopsis thaliana chromosome 1 clone microstructure bewtween a sequenced region of the genome of rice F9L1, *** SEQUENCING IN PROGRESS ***, 2 ordered pieces”; and multiple segments of the genome of Arabidopsis thaliana' source: Arabidopsis thaliana (thale cress); Title: “Arabidopsis (Unpublished). thaliana chromosome 1 BACF9L1 sequence” (Unpublished (1999)). NCBI acc. No. AF229 199 (gi: 11244751) (Nov. 21, 2000); Yoon, U.- NCBI acc. No. S55885 (gi: 235912) (May 7, 1993); Petersen, S.G., et H. etal. “Oryza sativa chromosome 1 clone OSJNBa0048101.com al. “orf 212, orf255 pea early browning virus PEBV, strain 5P5, plete sequence'; source: Oryza sativa Japonica Group; Title: “Oryza Genomic RNA, 2358 nt'; source: Unknown.; Title: "Analysis of sativa chromosome 1 OSJNBa0048101 genomic sequence” (Unpub RNA2 of pea early browning virus strain 5P5” (Plant Mol. Biol. 13 lished). (6), 735-737 (1989)). NCBI acc. No. BG832521 (gi: 141891.63) (May 22, 2001); Acc. No. AAC47292 Database Geneseq Derwent, EP Patent 1033405 Sederoff.R., et al. “NXPV 073 H03 F NXPV (Nsf Xylem Plan Blast. (Oct. 2000) Arabidopsis thaliana DNA fragment SEQID No. ings wood Vertical) Pinus taeda cDNA clone NXPV 073 H03 53281. 5' similar to Arabidopsis thaliana sequence Atlg19210 AP2 Acc. No. AAG42700 Database Geneseq, Derwent, EP Patent domain-containing unknown protein see http://mips.gsfdef pro?thal/ 1033405 Blast. (Oct. 2000) Arabidopsis Thaliana DNA fragment db/index.html, mRNA sequence'; source: Pinus taeda (loblolly SEQ ID No. 53281. pine); Title: “Molecular Basis of Wood Formation in the Pine Arabidopsis thaliana DNA fragment SEQID No. 54172 from Patent Megagenome” (Unpublished (2000)). No. EP1033405A2 Blast (Sep. 6, 2000) Derwent Accession No. NCBI acc. No. AF274033 (gi: 85.71475) (Jun. 18, 2000); Shen.Y., et AAG43349. al. 'Atriplex hortensis apetala2 domain-containing protein mRNA, NCBI acc. No. NP 177307 (gii: 15217518) (Aug. 21, 2001); complete cols'; source: Atriplex hortensis; Title: “Direct Submission” Theologis, A., et al. “hypothetical protein Arabidopsis thaliana; (Submitted (May 31, 2000) Plant Biotechnology Laboratory, Insti US 7,598.429 B2 Page 4 tute of Genetics, Chinese Academy of Sciences, Andingmenwai its application to gene discovery and annotation in Arabidopsis' Datun Road, Beijing 100101, P.R. China). (Genome Res. 15 (4), 487-495 (2005)). NCBI acc. No. AAC25505 (gi: 3287677) (Jul. 3, 1998); NCBI acc. No. AU292603 (gi: 24253.111) (Oct. 22, 2002); Vysotskaia, V.S., et al. "Contains similarity to transcription factor Demura.T., et al. “AU292.603 Zinnia cultured mesophyll cell equal (TINY) isolog TO2O04.22 gb2062174 from A. thaliana BAC ized cDNA Zinnia violacea cDNA clone Z7362, mRNA sequence': gb|AC001645 Arabidopsis thaliana'; source: Arabidopsis thaliana source: Zinnia violacea; Title: “Visualization by comprehensive (thale cress); Title: "Arabidopsis thaliana chromosome 1 BAC microarray analysis of gene expression programs during transdif T22J18 sequence, complete sequence” (Unpublished (1998)). ferentiation of mesophyll cells into xylem cells' (Proc. Natl. Acad. NCBI acc. No. AAF23336 (gi: 6682615) (Jan. 8, 2000); Lin.X., et al. Sci. U.S.A. 99 (24), 15794-15799 (2002)). "hypothetical protein Arabidopsis thaliana'; source: Arabidopsis NCBI acc. No. AC137623 (gi: 25696494) (Nov. 27, 2002); Chow,T- thaliana (thale cress); Title: "Arabidopsis thaliana chromosome I Y., et al. “Oryza sativa (japonica cultivar-group) chromosome 5 clone BAC F26A9 genomic sequence” (Unpublished). PO426G01, *** Sequencing in Progress ***, 8 ordered pieces”; NCBI acc. No. AC003979 (gi: 2754702) (Jan. 7, 1998); Source: Oryza sativa (japonica cultivar-group); Title: “Oryza sativa Vysotskaia.V.S., et al. “Arabidopsis thaliana chromosome 1 clone PAC P0426G01 genomic sequence” (Unpublished). T22J18, *** Sequencing in Progress ***, 7 unordered pieces”; NCBI acc. No. Q9LFN7 (gi: 75263866) (Sep. 14, 2005); Bevan.M., source: Arabidopsis thaliana (thale cress); Title: “Sequence of BAC et al. “Hypothetical protein F2I11 (Putative AP2/EREBP transcrip T22J18 from Arabidopsis thaliana chromosome 1” (Unpublished tion factor); source: Arabidopsis thaliana (thale cress); Title: (1997)). “Direct Submission” (Submitted (??-Jul. 2000)). NCBI acc. No. ACO16162 (gii. 6466530) (Nov. 23, 1999); Lin.X., et NCBI acc. No. G96768 (gi: 22747524) (Sep. 6, 2002); Wade.C., et al. al. “Arabidopsis thaliana chromosome I clone IGF-F26A9, *** “S208P6686RB8..TO 129S 1/SvImJ Mus musculus STS genomic, Sequencing in Progress ***, 4 unordered pieces'; source: sequence tagged site'; source: Mus musculus (house mouse); Title: Arabidopsis thaliana (thale cress); Title: “Arabidopsis thaliana “Polymorphism Structure in the Mouse” (Unpublished (2002)). 'IGF' BAC 'F26A9' genomic sequence NCBI acc. No. AF371983 (gi: 14161426) (May 21, 2001); near marker 'PAB5'” (Unpublished). Werber.M., et al. “Arabidopsis thaliana putative transcription factor NCBI. No. BG543936 (gi: 20374916) (May 1, 2002); Ryu.S.H., et al. MYB122 (MYB122) mRNA, complete cods'; source: Arabidopsis “E1686 Chinese cabbage etiolated seedling library Brassica rapa thaliana (thale cress); Title: “R2R3-MYB transcription factor gene subsp. pekinensis cDNA clone E 1686, mRNA sequence'; source: nomenclature in Arabidopsis thaliana” (Unpublished). Brassica rapa Subsp. pekinensis (Brassica campestris (Pekinensis Sakuma et al. (2002). DNA-Binding Specificity of the ERF/AP2 Group)); Title: “Expressed Sequence Tags of Chinese Cabbage Domain of Arabidopsis DREBs, Transcription Factors Involved in Etiolated Seedling cDNA” (Unpublished (2001)). Dehydration- and Cold-Inducible Gene Expression Biochemical and NCBI acc. No. BH420519 (gi: 1760.6247) (Dec. 12, 2001); Ayele.M., Biophysical Research Communications, vol 290, Issue 3, Jan. 25. et al. “BOGUH88TF BOGU Brassica oleracea genomic clone 2002, pp. 998-1009. BOGUH88, genomic Survey sequence'; source: Brassica oleracea; Title: “Whole genome shotgun sequencing of Brassica oleracea and * cited by examiner U.S. Patent Oct. 6, 2009 Sheet 1 of 9 US 7,598.429 B2

Fagales Cucurbitales Rosales Fabales Oxalidales Malpighiales Sapindales i Malvales Brassicales Myrtales Geraniales Dipsacales Asterales Apiales Aquifoliales Solanales Lamiales Gentianales Garryales Ericales Cornales Saxifradales Santalales Caryophyllales Proteales ZingiberalesRanunculales COmmelinales Poales AreCales Pandanales DiOSCOreales Asparagales Alismatales ACOrales Piperales Magnoliales Laurales Ceratophyllales FIG. 1 U.S. Patent US 7,598.429 B2 U.S. Patent Oct. 6, 2009 Sheet 3 of 9 US 7,598.429 B2

64 G3652 O. Sativa 95 G3653 O. Satiya 97 G3655 O. Satiya 99 G3654 O. Sativa G2576 A. thaliana 98 G872 A. thaliana G3657 O. Sativa 100 G21 15 A. thaliana 80G2294 A. thaliana G1090 A. thaliana 86 CBF4 A. thaliana 100 CBF2 A. thaliana 89 CBF3A... thaliana 50 CBF1 A. thaliana 80 G3644 O. Satiya 100 G3651 O. Satiya G3650 Z. mavs G3649 O. Satiya 93 G3643 G. max G3647 Z. elegans 56 98 G2133 A. thaliana G3646 B. oleracea 99 G3645 B. rapa 98 G47 A. thaliana G3656 Z. mays 97 G12 A. thaliana 99 G24 A. thaliana 93 G1277 A. thaliana 99 G1379 A. thaliana G867 A. thaliana FIG. 3

U.S. Patent Oct. 6, 2009 Sheet 6 of 9 US 7,598.429 B2

rx s (s s2. . . rite-. . o, ' 4. ' 3S 2-lity s

1.

FIG. 6A FIG. 6B

FIG. 6C U.S. Patent Oct. 6, 2009 Sheet 7 Of 9 US 7,598.429 B2

U.S. Patent Oct. 6, 2009 Sheet 8 of 9 US 7,598.429 B2

U.S. Patent Oct. 6, 2009 Sheet 9 Of 9 US 7,598.429 B2

FIG. 9 US 7,598.429 B2 1. 2 TRANSCRIPTION FACTOR SEQUENCES tional effects, or the like, introducing recombinant DNA into FOR CONFERRING ADVANTAGEOUS a plant genome does not resultinatransgenic planthaving the PROPERTIES TO PLANTS desired phenotype with the enhanced agronomic trait. There fore, methods to select individual transgenic events from a This application claims the benefit of U.S. Provisional population may be required to identify those transgenic Application No. 60/713,952, filed Aug. 31, 2005; and this application is a continuation-in-part of prior U.S. application events that are characterized by the enhanced agronomic trait. Ser. No. 10/225,067, filed Aug. 9, 2002 which claims the Other aspects and embodiments of the invention are benefit of U.S. Provisional Application No. 60/336,049, filed described below and can be derived from the teachings of this Nov. 19, 2001, U.S. Provisional Application No. 60/310,847, 10 disclosure as a whole. filed Aug. 9, 2001 and U.S. Provisional Application No. 60/338,692, filed Dec. 11, 2001; and, prior U.S. application BACKGROUND OF THE INVENTION Ser. No. 10/225,067, filed Aug. 9, 2002 is a continuation-in part of U.S. Non-provisional application Ser. No. 09/837,944, filed Apr. 18, 2001 (now abandoned), and U.S. Non-provi 15 Transcription factors can modulate gene expression, either sional application Ser. No. 10/171.468, filed Jun. 14, 2002 increasing or decreasing (inducing or repressing) the rate of (now abandoned); and, this application is a continuation-in transcription. This modulation results in differential levels of part of prior U.S. application Ser. No. 10/714,887, filed Nov. gene expression at various developmental stages, in different 13, 2003; and, this application is a continuation-in-part of tissues and cell types, and in response to different exogenous prior U.S. application Ser. No. 10/666,642, filed Sep. 18. (e.g., environmental) and endogenous stimuli throughout the 2003 (pending) which claims the benefit of U.S. Provisional life cycle of the organism. Application No. 60/465,809, filed Apr. 24, 2003, U.S. Provi sional Application No. 60/434,166, filed Dec. 17, 2002 and Because transcription factors are key controlling elements U.S. Provisional Application No. 60/411,837, filed Sep. 18, of biological pathways, altering the expression levels of one 2002. Each of these applications is hereby incorporated by 25 or more transcription factors can change entire biological reference in their entirety. pathways in an organism. For example, manipulation of the The claimed invention, in the field of functional genomics levels of selected transcription factors may result in increased and the characterization of plant genes for the improvement expression of economically useful proteins or metabolic of plants, was made by or on behalf of Mendel Biotechnology, chemicals in plants or to improve other agriculturally relevant Inc. and Monsanto Company as a result of activities under 30 characteristics. Conversely, blocked or reduced expression of taken within the scope of a joint research agreement, in effect a transcription factor may reduce biosynthesis of unwanted on or before the date the claimed invention was made. compounds or remove an undesirable trait. Therefore, FIELD OF THE INVENTION manipulating transcription factor levels in a plant offers tre 35 mendous potential in agricultural biotechnology for modify This invention relates to the field of plant biology. More ing a plant's traits. particularly, the present invention pertains to compositions The present invention provides novel transcription factors and methods for phenotypically modifying a plant. useful for modifying a plant's phenotype in desirable ways.

INTRODUCTION 40 SUMMARY OF THE INVENTION Transgenic plants with improved traits, including enhanced yield, environmental stress tolerance, pest resis The present invention pertains to transgenic plants, and tance, herbicide tolerance, improved seed compositions, and methods for producing the transgenic plant, that have desir the like are desired by both farmers and consumers. Although 45 able characteristics relative to wild-type or control plants. The considerable efforts in plant breeding have provided signifi desirable characteristics in the transgenic plants, which have cant gains in desired traits, the ability to introduce specific been transformed with a sequence that is closely or phyloge DNA into plant genomes provides further opportunities for netically related to G47, polynucleotide SEQID NO: 65 and generation of plants with improved and/or unique traits. For polypeptide SEQ ID NO: 66, include increased size and/or tunately, a plant's traits, such as its biochemical, developmen 50 biomass, tolerance to osmotic stress or drought, and/or tal, or phenotypic characteristics, may be controlled through increased lignification. The transgenic plants may also be a number of cellular processes. One important way to delayed in their flowering, relative to a control or wild-type manipulate that control is through transcription factors— plant of the same species. The transgenic plants are made by proteins that influence the expression of a particular gene or first producing an expression vector that comprises a nucle sets of genes. Transformed and transgenic plants that com 55 otide sequence encoding a polypeptide with a conserved prise cells having altered levels of at least one selected tran domain, said domain having at least 69%, or at least 73%, or Scription factor, for example, possess advantageous or desir at least 80%, or at least 87% amino acid identity to the able traits. Strategies for manipulating traits by altering a conserved domain of G47 (amino acid coordinates 11-80 of plant cells transcription factor content can therefore result in G47 or SEQID NO: 66). The expression vector is next intro plants and crops with commercially valuable properties. 60 Polynucleotides encoding transcription factors have been duced into a suitable target plant, and the polypeptide is identified, transformed into transgenic plants, and the plants overexpressed in this now transgenic plant. This results in the have been analyzed for a variety of important improved traits. transgenic plant having increased size and/or biomass, toler In so doing, important polynucleotide and polypeptide ance to the osmotic stress or drought, delayed flowering, sequences for producing commercially valuable plants and 65 and/or increased lignification. crops as well as the methods for making and using them were Methods for increasing plant size and/or biomass, increas identified. In some cases, because of epigenetic effects, posi ing osmotic stress or drought tolerance of a plant, increasing US 7,598.429 B2 3 4 lignincontent, or causing a delay in development or flowering two valines and a histidine residue at these positions, respec are also encompassed by the invention. tively. The SEQID NOs: of the subsequences appear within the parentheses in this Figure. BRIEF DESCRIPTION OF THE SEQUENCE FIG. 5 shows the conserved domain of G47 (SEQID NO: LISTING, TABLES, AND FIGURES 66) aligned against the conserved domains of Arabidopsis The Sequence Listing provides exemplary polynucleotide paralog sequence G2133 (SEQID NO: 152; 62 of 71 or 87% and polypeptide sequences of the invention. The traits asso identical residues) and three orthologs, soy G3643 (SEQ ID ciated with the use of the sequences are included in the NO: 158; 45 of 65 or 69% of residues are identical), rice Examples. 10 G3649 (SEQ ID NO: 154; 35 of 44 or 80% of residues This application contains a Sequence Listing. CD-ROMs identical) and rice G3644 (SEQID NO: 156:35 of 48 or 73% Copy 1 and Copy 2, and the CRF copy of the Sequence of residues identical). Alignments and percentage identity Listing under CFR Section 1.821(e), are read-only memory were determined from BLASTp analysis in which the con computer-readable compact discs. Each contains a copy of served domain of G47, amino acid coordinates 11-80, were the Sequence Listing in ASCII text format. The Sequence 15 queried against a database containing the G47 homologs, Listing is named “MBI-0036-3CIPST25.txt, the electronic with default settings of a wordlength (W) of 3, an expectation file of the Sequence Listing contained on each of these CD (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff ROMs was created on Mar. 13, 2006, and is 516 kilobytes in & Henikoff (1989) supra). size. The copies of the Sequence Listing on the CD-ROM FIGS. 6A-6C show Arabidopsis G47, SEQ ID NO: 66 discs are hereby incorporated by reference in their entirety. (FIG. 6A, plant at left), soy G3649, SEQID NO: 154 (FIG. FIG. 1 shows a phylogenic tree of related plant families 6B, plants at left and center), and rice G3643, SEQ ID NO: adapted from Daly et al. (2001 Plant Physiology 127: 1328 158 (FIG. 6C, plants at left and center) overexpressors at 58, 1333). 44, and 33 days after planting, respectively. The overexpres FIG. 2 shows a phylogenic dendogram depicting phyloge sors generally developed later, and some lines had larger netic relationships of higher plant taxa, including clades con 25 rosettes and an increased amount of vegetative tissue com taining tomato and Arabidopsis; adapted from Ku et al. pared to the control plants at the right of each photograph. (2000) Proc. Natl. Acad. Sci. USA 97: 9121-9126; and Chase FIGS. 7A-7B compare seedlings ectopically expressing et al. (1993) Ann. Missouri Bot. Gard. 80: 528-580. rice sequence G3644, SEQID NO: 156 (FIG. 7A) and wild FIG. 3 shows a phylogenetic tree and multiple sequence type seedling controls. The 35S::G3644 seedlings (FIG. 7A) alignments of G47 and related full length proteins were con 30 were generally larger and greener after germination in a 150 structed using ClustalW (CLUSTALW Multiple Sequence mMNaCl than the wild-type control seedlings exposed to the Alignment Program version 1.83, 2003) and MEGA2 (http:// same conditions (FIG. 7B). The small pale seedlings in FIG. www.megaSoftware.net) software. Sequences closely related 7A represent wild-type segregants, based on kanamycin to G47, SEQ ID NO: 66, fall within the G47 clade and resistance segregation data from the same population. descend from a common ancestral sequence represented by 35 As shown in FIGS. 8A-8B, seedlings ectopically express the arrow at an ancestral node of the tree. These phylogeneti ing rice sequence G3649, SEQID NO: 154 (FIG. 8A) were cally-related sequences within the G47 clade that have thus generally larger and greener after germination in a medium far been shown to have a transcriptional regulatory activity of containing 0.3 uMabscisic acid than the wild-type control G47 by conferring similar morphological and physiological 40 seedlings exposed to the same conditions (FIG. 8B). characteristics have conserved domains that are at least 69% FIG. 9 illustrates a dramatic example of osmotic-stress identical to the conserved domain of G47 (amino acid coor tolerance. Seedlings overexpressing Arabidopsis G2133, dinates 11-80). The percentage identity was determined by SEQ ID NO: 152, in the pot at the left were significantly BLASTp analysis against a database containing G47 greener and more vigorous than the wild-type control seed homologs, with default settings of a wordlength (W) of 3, an 45 lings, seen at right, after both sets of plants had been exposed expectation (E) of 10, and the BLOSUM62 scoring matrix to the same severe drought conditions and rewatered. The (see Henikoff & Henikoff (1989) Proc. Natl. Acad. Sci. USA overexpressors readily recovered from the severe treatment 89: 10915-10919). ClustalW multiple alignment parameters after resumption of watering, whereas the few control plants for FIG. 3 were as follows: at right that survived had been severely and adversely affected Gap Opening Penalty: 10.00; Gap Extension Penalty: 0.20; by the drought treatment. Delay divergent sequences: 30%; 50 DNA Transitions Weight: 0.50; Protein weight matrix: DETAILED DESCRIPTION OF EXEMPLARY Gonnet series; DNA weight matrix: IUB: Use negative EMBODIMENTS matrix: OFF. A FastA formatted alignment was then used to generate a 55 The present invention relates to polynucleotides and phylogenetic tree in MEGA2using the neighborjoining algo polypeptides for modifying phenotypes of plants, particularly rithm and a p-distance model. A test of phylogeny was done those associated with increased biomass, increased disease via bootstrap with 1000 replications and Random Seed set to resistance, and/or abiotic stress tolerance. Throughout this default. Cut off values of the bootstrap tree were set to 50%. disclosure, various information sources are referred to and/or Orthologs of G47 are considered as being those proteins 60 are specifically incorporated. The information sources within the node of the tree below with a bootstrap value of 93, include Scientific journal articles, patent documents, text bounded by G3644 and G47, as indicated by the sequences books, and World Wide Web browser-inactive page within the box. addresses. While the reference to these information sources FIG. 4 shows a ClustalWalignment of the AP2 domains of clearly indicates that they can be used by one of skill in the art, the G47 clade and other representative AP2 proteins. The 65 each and every one of the information sources cited hereinare three residues indicated by the boxes define the G47 clade: specifically incorporated in their entirety, whether or not a clade members (indicated by the vertical line at left) have a specific mention of “incorporation by reference' is noted. US 7,598.429 B2 5 6 The contents and teachings of each and every one of the have an identifiable promoter). The function of a gene may information sources can be relied on and used to make and use also be regulated by enhancers, operators, and other regula embodiments of the invention. tory elements. As used herein and in the appended claims, the singular A “recombinant polynucleotide' is a polynucleotide that is forms “a”, “an', and “the include the plural reference unless 5 not in its native state, e.g., the polynucleotide comprises a the context clearly dictates otherwise. Thus, for example, a nucleotide sequence not found in nature, or the polynucle reference to “a host cell includes a plurality of such host otide is in a context other than that in which it is naturally cells, and a reference to “a stress' is a reference to one or more found, e.g., separated from nucleotide sequences with which stresses and equivalents thereof known to those skilled in the it typically is in proximity in nature, or adjacent (or contigu art, and so forth. 10 ous with) nucleotide sequences with which it typically is not in proximity. For example, the sequence at issue can be DEFINITIONS cloned into a vector, or otherwise recombined with one or more additional nucleic acid. “Nucleic acid molecule' refers to an oligonucleotide, poly An "isolated polynucleotide' is a polynucleotide, whether nucleotide or any fragment thereof. It may be DNA or RNA of 15 naturally occurring or recombinant, that is present outside the genomic or synthetic origin, double-stranded or single cell in which it is typically found in nature, whether purified Stranded, and combined with carbohydrate, lipids, protein, or or not. Optionally, an isolated polynucleotide is Subject to one other materials to perform a particular activity Such as trans or more enrichment or purification procedures, e.g., cell lysis, formation or form a useful composition Such as a peptide extraction, centrifugation, precipitation, or the like. nucleic acid (PNA). A "polypeptide' is an amino acid sequence comprising a "Polynucleotide' is a nucleic acid molecule comprising a plurality of consecutive polymerized amino acid residues plurality of polymerized nucleotides, e.g., at least about 15 e.g., at least about 15 consecutive polymerized amino acid consecutive polymerized nucleotides. A polynucleotide may residues. In many instances, a polypeptide comprises a poly be a nucleic acid, oligonucleotide, nucleotide, or any frag merized amino acid residue sequence that is a transcription ment thereof. In many instances, a polynucleotide comprises 25 factor or a domain or portion or fragment thereof. Addition a nucleotide sequence encoding a polypeptide (or protein) or ally, the polypeptide may comprise: (i) a localization domain; a domain or fragment thereof. Additionally, the polynucle (ii) an activation domain; (iii) a repression domain; (iv) an otide may comprise a promoter, an intron, an enhancer region, oligomerization domain; (v) a DNA-binding domain; or the a polyadenylation site, a translation initiation site, 5' or 3' like. The polypeptide optionally comprises modified amino untranslated regions, a reporter gene, a selectable marker, or 30 acid residues, naturally occurring amino acid residues not the like. The polynucleotide can be single-stranded or double encoded by a codon, non-naturally occurring amino acid resi stranded DNA or RNA. The polynucleotide optionally com dues. prises modified bases or a modified backbone. The polynucle “Protein’ refers to an amino acid sequence, oligopeptide, otide can be, e.g., genomic DNA or RNA, a transcript (such as peptide, polypeptide or portions thereof whether naturally an mRNA), a cDNA, a PCR product, a cloned DNA, a syn 35 occurring or synthetic. thetic DNA or RNA, or the like. The polynucleotide can be “Portion', as used herein, refers to any part of a protein combined with carbohydrate, lipids, protein, or other materi used for any purpose, but especially for the screening of a als to perform a particular activity Such as transformation or library of molecules which specifically bind to that portion or form a useful composition Such as a peptide nucleic acid for the production of antibodies. (PNA). The polynucleotide can comprise a sequence in either 40 A “recombinant polypeptide' is a polypeptide produced by sense orantisense orientations. “Oligonucleotide' is Substan translation of a recombinant polynucleotide. A “synthetic tially equivalent to the terms amplimer, primer, oligomer, polypeptide' is a polypeptide created by consecutive poly element, target, and probe and is preferably single-stranded. merization of isolated amino acid residues using methods "Gene' or “gene sequence” refers to the partial or complete well known in the art. An "isolated polypeptide, whether a coding sequence of a gene, its complement, and its 5' or 3' 45 naturally occurring or a recombinant polypeptide, is more untranslated regions. A gene is also a functional unit of inher enriched in (or out of) a cell than the polypeptide in its natural itance, and in physical terms is a particular segment or state in a wild-type cell, e.g., more than about 5% enriched, sequence of nucleotides along a molecule of DNA (or RNA, more than about 10% enriched, or more than about 20%, or in the case of RNA viruses) involved in producing a polypep more than about 50%, or more, enriched, i.e., alternatively tide chain. The latter may be subjected to Subsequent process 50 denoted: 105%, 110%, 120%, 150% or more, enriched rela ing such as chemical modification or folding to obtain a tive to wildtype standardized at 100%. Such an enrichment is functional protein or polypeptide. A gene may be isolated, not the result of a natural response of a wild-type plant. partially isolated, or found with an organism's genome. By Alternatively, or additionally, the isolated polypeptide is way of example, a transcription factor gene encodes a tran separated from other cellular components with which it is Scription factor polypeptide, which may be functional or 55 typically associated, e.g., by any of the various protein puri require processing to function as an initiator of transcription. fication methods herein. Operationally, genes may be defined by the cis-trans test, a “Homology” refers to sequence similarity between a ref genetic test that determines whether two mutations occur in erence sequence and at least a fragment of a newly sequenced the same gene and that may be used to determine the limits of clone insert or its encoded amino acid sequence. the genetically active unit (Rieger et al. (1976)). A gene 60 “Identity” or “similarity” refers to sequence similarity generally includes regions preceding (“leaders'; upstream) between two polynucleotide sequences or between two and following (“trailers'; downstream) the coding region. A polypeptide sequences, with identity being a more strict com gene may also include intervening, non-coding sequences, parison. The phrases “percent identity” and “96 identity” refer referred to as “introns, located between individual coding to the percentage of sequence similarity found in a compari segments, referred to as “exons'. Most genes have an associ 65 Son of two or more polynucleotide sequences or two or more ated promoter region, a regulatory sequence 5' of the tran polypeptide sequences. “Sequence similarity refers to the Scription initiation codon (there are some genes that do not percent similarity in base pair sequence (as determined by any US 7,598.429 B2 7 8 suitable method) between two or more polynucleotide acids of a particular transcription factor consensus sequence sequences. Two or more sequences can be anywhere from or consensus DNA-binding site. Furthermore, a particular 0-100% similar, or any integer value therebetween. Identity fragment, region, or domain of a polypeptide, or a polynucle or similarity can be determined by comparing a position in otide encoding a polypeptide, can be "outside a conserved each sequence that may be aligned for purposes of compari 5 domain if all the amino acids of the fragment, region, or son. When a position in the compared sequence is occupied domain fall outside of a defined conserved domain(s) for a by the same nucleotide base oramino acid, then the molecules polypeptide or protein. Sequences having lesser degrees of are identical at that position. A degree of similarity or identity identity but comparable biological activity are considered to between polynucleotide sequences is a function of the num be equivalents. ber of identical, matching or corresponding nucleotides at 10 As one of ordinary skill in the art recognizes, conserved positions shared by the polynucleotide sequences. A degree domains may be identified as regions or domains of identity to of identity of polypeptide sequences is a function of the a specific consensus sequence (see, for example, Riechmann number of identical amino acids at corresponding positions et al. (2000) Science 290: 2105-2110, Riechmann et al. shared by the polypeptide sequences. A degree of homology (2000b) Curr. Opin. Plant Biol. 3: 423-434). Thus, by using or similarity of polypeptide sequences is a function of the 15 alignment methods well known in the art, the conserved number of amino acids at corresponding positions shared by domains of the plant transcription factors, for example, for the the polypeptide sequences. AT-hook proteins (Reeves and Beckerbauer (2001) Biochim. Alignment” refers to a number of nucleotide bases or Biophys. Acta 1519: 13-29; and Reeves (2001) Gene 277: amino acid residue sequences aligned by lengthwise com 63-81), may be determined. parison so that components in common (i.e., nucleotide bases The conserved domains for many of the transcription factor or amino acid residues at corresponding positions) may be sequences of the invention are listed in Table 4. A comparison visually and readily identified. The fraction or percentage of of the regions of these polypeptides allows one of skill in the components in common is related to the homology or identity art (see, for example, Reeves and Nissen (1995) Prog. Cell between the sequences. Alignments such as those of FIG. 4 or Cycle Res. 1: 339-349) to identify domains or conserved FIG.5 may be used to identify conserved domains and relat 25 domains for any of the polypeptides listed or referred to in this edness within these domains. An alignment may suitably be disclosure. determined by means of computer programs known in the art, “Complementary” refers to the natural hydrogen bonding such as MACVECTOR software (1999) (Accelrys, Inc., San by base pairing between purines and pyrimidines. For Diego, Calif.). example, the sequence A-C-G-T (5'->3') forms hydrogen A “conserved domain or “conserved region' as used 30 bonds with its complements A-C-G-T (5'->3') or A-C-G-U herein refers to a region in heterologous polynucleotide or (5'->3'). Two single-stranded molecules may be considered polypeptide sequences where there is a relatively high degree partially complementary, if only some of the nucleotides of sequence identity between the distinct sequences. For bond, or “completely complementary’ if all of the nucle example, an AT-hook” domain'. Such as is found in a otides bond. The degree of complementarity between nucleic polypeptide member of AT-hook transcription factor family, 35 acid strands affects the efficiency and strength of hybridiza is an example of a conserved domain. An AP2’ domain’. tion and amplification reactions. “Fully complementary Such as is found in a polypeptide member of AP2 transcription refers to the case where bonding occurs between every base factor family, is another example of a conserved domain. With pair and its complement in a pair of sequences, and the two respect to polynucleotides encoding presently disclosed tran sequences have the same number of nucleotides. Scription factors, a conserved domain is preferably at least 40 The terms “highly stringent' or “highly stringent condi nine base pairs (bp) in length. A conserved domain (for tion” refer to conditions that permit hybridization of DNA example, a DNA binding domain) with respect to presently Strands whose sequences are highly complementary, wherein disclosed polypeptides refers to a domain that exhibits at least these same conditions exclude hybridization of significantly about 38% sequence identity, or at least about 55% sequence mismatched DNAs. Polynucleotide sequences capable of identity, or at least about 62% sequence identity, or at least 45 hybridizing under stringent conditions with the polynucle about 69%, or at least about 70%, or at least about 73%, or at otides of the present invention may be, for example, variants least about 76%, or at least about 78%, or at least about 80%, of the disclosed polynucleotide sequences, including allelic or at least about 82%, or at least about 85%, or at least about or splice variants, or sequences that encode orthologs or para 87%, or at least about 89%, or at least about 95%, amino acid logs of presently disclosed polypeptides. Nucleic acid residue sequence identity, to a conserved domain of a 50 hybridization methods are disclosed in detail by Kashima et polypeptide of the invention. Sequences that possess or al. (1985) Nature 313: 402-404, Sambrook et al. (1989) encode for conserved domains that meet these criteria of Molecular Cloning A Laboratory Manual (2nd Ed.), Vol. percentage identity, and may have comparable biological 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, activity to the present transcription factor sequences. This is N.Y., 1989 (“Sambrook”), and by Haymes et al. (1985) particularly true for sequences that derive from a common 55 Nucleic Acid Hybridization: A Practical Approach, IRL ancestral sequence that had the same or similar function, and Press, Washington, D.C., which references are incorporated for which the function has been retained. These sequences, herein by reference. which are closely and phylogenetically related, being mem In general, stringency is determined by the temperature, bers of a particular clade of transcription factor polypeptides, ionic strength, and concentration of denaturing agents (e.g., are encompassed by the invention. A fragment or domain can 60 formamide) used in a hybridization and washing procedure be referred to as outside a conserved domain, outside a con (for a more detailed description of establishing and determin sensus sequence, or outside a consensus DNA-binding site ing stringency, see the section “Identifying Polynucleotides that is known to exist or that exists for a particular transcrip or Nucleic Acids by Hybridization', below). The degree to tion factor class, family, or Sub-family. In this case, the frag which two nucleic acids hybridize under various conditions ment or domain will not include the exact amino acids of a 65 of stringency is correlated with the extent of their similarity. consensus sequence or consensus DNA-binding site of a tran Thus, similar nucleic acid sequences from a variety of Scription factor class, family or Sub-family, or the exactamino Sources, such as within a plant's genome (as in the case of US 7,598.429 B2 10 paralogs) or from another plant (as in the case of orthologs) nucleotide encoding polypeptide, and improper or that may perform similar functions can be isolated on the unexpected hybridization to allelic variants, with a locus basis of their ability to hybridize with known transcription other than the normal chromosomal locus for the polynucle factor sequences. Numerous variations are possible in the otide sequence encoding polypeptide. conditions and means by which nucleic acid hybridization “Allelic variant' or “polynucleotide allelic variant” refers can be performed to isolate transcription factor sequences to any of two or more alternative forms of a gene occupying having similarity to transcription factor sequences known in the same chromosomal locus. Allelic variation arises natu the art and are not limited to those explicitly disclosed herein. rally through mutation, and may result in phenotypic poly Such an approach may be used to isolate polynucleotide morphism within populations. Gene mutations may be sequences having various degrees of similarity with disclosed 10 'silent” or may encode polypeptides having altered amino transcription factor sequences, such as, for example, encoded acid sequence. “Allelic variant' and “polypeptide allelic vari transcription factors having 38% or greater identity with the ant’ may also be used with respect to polypeptides, and in this conserved domain of disclosed transcription factors. case the terms refer to a polypeptide encoded by an allelic The terms “paralog and "ortholog” are defined below in variant of a gene. the section entitled “Orthologs and Paralogs”. In brief, 15 “Splice variant' or “polynucleotide splice variant” as used orthologs and paralogs are evolutionarily related genes that herein refers to alternative forms of RNA transcribed from a have similar sequences and functions. Orthologs are structur gene. Splice variation naturally occurs as a result of alterna ally related genes in different species that are derived by a tive sites being spliced within a single transcribed RNA mol speciation event. Paralogs are structurally related genes ecule or between separately transcribed RNA molecules, and within a single species that are derived by a duplication event. may result in several different forms of mRNA transcribed The term “equivalog describes members of a set of from the same gene. Thus, splice variants may encode homologous proteins that are conserved with respect to func polypeptides having different amino acid sequences, which tion since their last common ancestor. Related proteins are may or may not have similar functions in the organism. grouped into equivalog families, and otherwise into protein “Splice variant' or “polypeptide splice variant may also families with other hierarchically defined homology types. 25 refer to a polypeptide encoded by a splice variant of a tran This definition is provided at the Institute for Genomic scribed mRNA. Research (TIGR) WorldWideWeb (www) website, “tigr.org As used herein, “polynucleotide variants' may also refer to under the heading “Terms associated with TIGRFAMs”. polynucleotide sequences that encode paralogs and orthologs In general, the term “variant” refers to molecules with of the presently disclosed polypeptide sequences. “Polypep Some differences, generated synthetically or naturally, in 30 tide variants' may refer to polypeptide sequences that are their base oramino acid sequences as compared to a reference paralogs and orthologs of the presently disclosed polypeptide (native) polynucleotide or polypeptide, respectively. These sequences. differences include Substitutions, insertions, deletions or any Differences between presently disclosed polypeptides and desired combinations of such changes in a native polynucle polypeptide variants are limited so that the sequences of the otide of amino acid sequence. 35 former and the latter are closely similar overall and, in many With regard to polynucleotide variants, differences regions, identical. Presently disclosed polypeptide sequences between presently disclosed polynucleotides and polynucle and similar polypeptide variants may differ in amino acid otide variants are limited so that the nucleotide sequences of sequence by one or more substitutions, additions, deletions, the former and the latter are closely similar overall and, in fusions and truncations, which may be present in any combi many regions, identical. Due to the degeneracy of the genetic 40 nation. These differences may produce silent changes and code, differences between the former and latter nucleotide result in a functionally equivalent transcription factor. Thus, it sequences may be silent (i.e., the amino acids encoded by the will be readily appreciated by those of skill in the art, that any polynucleotide are the same, and the variant polynucleotide of a variety of polynucleotide sequences is capable of encod sequence encodes the same amino acid sequence as the pres ing the transcription factors and transcription factor homolog ently disclosed polynucleotide. Variant nucleotide sequences 45 polypeptides of the invention. A polypeptide sequence variant may encode different amino acid sequences, in which case may have "conservative changes, wherein a Substituted such nucleotide differences will result in amino acid substi amino acid has similar structural or chemical properties. tutions, additions, deletions, insertions, truncations or fusions Deliberate amino acid substitutions may thus be made on the with respect to the similar disclosed polynucleotide basis of similarity in polarity, charge, solubility, hydropho sequences. These variations may result in polynucleotide 50 bicity, hydrophilicity, and/or the amphipathic nature of the variants encoding polypeptides that share at least one func residues, as long as a significant amount of the functional or tional characteristic. The degeneracy of the genetic code also biological activity of the transcription factor is retained. For dictates that many different variant polynucleotides can example, negatively charged amino acids may include aspar encode identical and/or Substantially similar polypeptides in tic acid and glutamic acid, positively charged amino acids addition to those sequences illustrated in the Sequence List 55 may include lysine and arginine, and amino acids with ing. uncharged polar head groups having similar hydrophilicity Also within the scope of the invention is a variant of a values may include leucine, isoleucine, and valine; glycine transcription factor nucleic acid listed in the Sequence List and alanine; asparagine and glutamine; serine and threonine; ing, that is, one having a sequence that differs from the one of and phenylalanine and tyrosine. More rarely, a variant may the polynucleotide sequences in the Sequence Listing, or a 60 have “non-conservative changes, e.g., replacement of a gly complementary sequence, that encodes a functionally equiva cine with a tryptophan. Similar minor variations may also lent polypeptide (i.e., a polypeptide having some degree of include amino acid deletions or insertions, or both. Related equivalent or similar biological activity) but differs in polypeptides may comprise, for example, additions and/or sequence from the sequence in the Sequence Listing, due to deletions of one or more N-linked or O-linked glycosylation degeneracy in the genetic code. Included within this defini 65 sites, or an addition and/or a deletion of one or more cysteine tion are polymorphisms that may or may not be readily detect residues. Guidance in determining which and how many able using a particular oligonucleotide probe of the poly amino acid residues may be substituted, inserted or deleted US 7,598.429 B2 11 12 without abolishing functional or biological activity may be phytes, bryophytes, and multicellular algae (see for example, found using computer programs well known in the art, for FIG. 1, adapted from Daly et al. (2001) supra, FIG. 2, adapted example, DNASTAR software (see U.S. Pat. No. 5,840,544). from Ku et al. (2000) supra; and see also Tudge (2000) in The “Fragment, with respect to a polynucleotide, refers to a Variety of Life, Oxford University Press, New York, N.Y. pp. clone or any part of a polynucleotide molecule that retains a 547-606. usable, functional characteristic. Useful fragments include A “control plant’ as used in the present invention refers to oligonucleotides and polynucleotides that may be used in a plant cell, seed, plant component, plant tissue, plant organ or hybridization or amplification technologies or in the regula whole plant used to compare against transgenic or genetically tion of replication, transcription or translation. A polynucle modified plant for the purpose of identifying an enhanced otide fragment” refers to any Subsequence of a polynucle 10 phenotype in the transgenic or genetically modified plant. A otide, typically, of at least about 9 consecutive nucleotides, control plant may in Some cases be a transgenic plant line that preferably at least about 30 nucleotides, more preferably at comprises an empty vector or marker gene, but does not least about 50 nucleotides, of any of the sequences provided contain the recombinant polynucleotide of the present inven herein. Exemplary polynucleotide fragments are the first tion that is expressed in the transgenic or genetically modified sixty consecutive nucleotides of the transcription factor poly 15 plant being evaluated. In general, a control plant is a plant of nucleotides listed in the Sequence Listing. Exemplary frag the same line or variety as the transgenic or genetically modi ments also include fragments that comprise a region that fied plant being tested. A suitable control plant would include encodes an conserved domain of a transcription factor. Exem a genetically unaltered or non-transgenic plant of the parental plary fragments also include fragments that comprise a con line used to generate a transgenic plant herein. served domain of a transcription factor. Exemplary fragments A “transgenic plant” refers to a plant that contains genetic include fragments that comprise an conserved domain of a material not found in a wild-type plant of the same species, transcription factor, for example, amino acid residues 11-80 variety or cultivar. The genetic material may include a trans of G47 (SEQID NO: 66). gene, an insertional mutagenesis event (Such as by transposon Fragments may also include Subsequences of polypeptides or T-DNA insertional mutagenesis), an activation tagging and protein molecules, or a Subsequence of the polypeptide. 25 sequence, a mutated sequence, a homologous recombination Fragments may have uses in that they may have antigenic event or a sequence modified by chimeraplasty. Typically, the potential. In some cases, the fragment or domain is a Subse foreign genetic material has been introduced into the plant by quence of the polypeptide which performs at least one bio human manipulation, but any method can be used as one of logical function of the intact polypeptide in Substantially the skill in the art recognizes. same manner, or to a similar extent, as does the intact 30 A transgenic plant may contain an expression vector or polypeptide. For example, a polypeptide fragment can com cassette. The expression cassette typically comprises a prise a recognizable structural motif or functional domain polypeptide-encoding sequence operably linked (i.e., under such as a DNA-binding site or domain that binds to a DNA regulatory control of) to appropriate inducible or constitutive promoter region, an activation domain, or a domain for pro regulatory sequences that allow for the controlled expression tein-protein interactions, and may initiate transcription. Frag 35 of polypeptide. The expression cassette can be introduced ments can vary in size from as few as 3 amino acid residues to into a plant by transformation or by breeding after transfor the full length of the intact polypeptide, but are preferably at mation of a parent plant. A plant refers to a whole plantas well least about 30 amino acid residues in length and more pref as to a plant part, such as seed, fruit, leaf, or root, plant tissue, erably at least about 60 amino acid residues in length. plant cells or any other plant material, e.g., a plant explant, as The invention also encompasses production of DNA 40 well as to progeny thereof, and to in vitro systems that mimic sequences that encode transcription factors and transcription biochemical or cellular components or processes in a cell. factor derivatives, or fragments thereof, entirely by synthetic “Wildtype' or “wild-type', as used herein, refers to a plant chemistry. After production, the synthetic sequence may be cell, seed, plant component, plant tissue, plant organ or whole inserted into any of the many available expression vectors and plant that has not been genetically modified or treated in an cell systems using reagents well known in the art. Moreover, 45 experimental sense. Wild-type cells, seed, components, tis synthetic chemistry may be used to introduce mutations into Sue, organs or whole plants may be used as controls to com a sequence encoding transcription factors or any fragment pare levels of expression and the extent and nature of trait thereof. modification with cells, tissue or plants of the same species in “Derivative' refers to the chemical modification of a which a transcription factor expression is altered, e.g., in that nucleic acid molecule or amino acid sequence. Chemical 50 it has been knocked out, overexpressed, or ectopically modifications can include replacement of hydrogen by an expressed. alkyl, acyl, or amino group or glycosylation, pegylation, or A “trait” refers to a physiological, morphological, bio any similar process that retains or enhances biological activ chemical, or physical characteristic of a plant or particular ity or lifespan of the molecule or sequence. plant material or cell. In some instances, this characteristic is The term “plant' includes whole plants, shoot vegetative 55 visible to the human eye. Such as seed or plant size, or can be organs/structures (for example, leaves, stems and tubers), measured by biochemical techniques, such as detecting the roots, flowers and floral organs/structures (for example, protein, starch, or oil content of seed or leaves, or by obser bracts, sepals, petals, stamens, carpels, anthers and ovules), Vation of a metabolic or physiological process, e.g. by mea seed (including embryo, endosperm, and seed coat) and fruit Suring tolerance to water deprivation or particular salt or (the mature ovary), plant tissue (for example, vascular tissue, 60 Sugar concentrations, or by the observation of the expression ground tissue, and the like) and cells (for example, guard level of agene or genes, e.g., by employing Northern analysis, cells, egg cells, and the like), and progeny of same. The class RT-PCR, microarray gene expression assays, or reporter gene of plants that can be used in the method of the invention is expression systems, or by agricultural observations such as generally as broad as the class of higher and lower plants hyperosmotic stress tolerance or yield. Any technique can be amenable to transformation techniques, including 65 used to measure the amount of comparative level of, or angiosperms (monocotyledonous and dicotyledonous difference in any selected chemical compound or macromol plants), gymnosperms, ferns, horsetails, psilophytes, lyco ecule in the transgenic plants, however. US 7,598.429 B2 13 14 As used herein an "enhanced trait” means a characteristic normal distribution and magnitude of the trait in the plants as of a transgenic plant that includes, but is not limited to, an compared to control or wild-type plants. enhance agronomic trait characterized by enhanced plant When two or more plants have “similar morphologies’. morphology, physiology, growth and development, yield, “Substantially similar morphologies”, “a morphology that is nutritional enhancement, disease or pest resistance, or envi substantially similar, or are “morphologically similar, the ronmental or chemical tolerance. In more specific aspects of plants have comparable forms or appearances, including this invention enhanced trait is selected from group of analogous features such as overall dimensions, height, width, enhanced traits consisting of enhanced water use efficiency, mass, root mass, shape, glossiness, color, stem diameter, leaf enhanced cold tolerance, increased yield, enhanced nitrogen size, leaf dimension, leaf density, internode distance, branch use efficiency, enhanced seed protein and enhanced seed oil. 10 ing, root branching, number and form of inflorescences, and In an important aspect of the invention the enhanced trait is other macroscopic characteristics, and the individual plants enhanced yield including increased yield under non-stress are not readily distinguishable based on morphological char conditions and increased yield under environmental stress acteristics alone. conditions. Stress conditions may include, for example, “Modulates’ refers to a change in activity (biological, drought, shade, fungal disease, viral disease, bacterial dis 15 chemical, or immunological) or lifespan resulting from spe ease, insect infestation, nematode infestation, cold tempera ture exposure, heat exposure, osmotic stress, reduced nitro cific binding between a molecule and either a nucleic acid gen nutrient availability, reduced phosphorus nutrient molecule or a protein. availability and high plant density. “Yield’ can be affected by The term “transcript profile” refers to the expression levels many properties including without limitation, plant height, of a set of genes in a cell in a particular state, particularly by pod number, pod position on the plant, number of internodes, comparison with the expression levels of that same set of genes in a cell of the same type in a reference state. For incidence of pod shatter, grain size, efficiency of nodulation example, the transcript profile of a particular transcription and nitrogen fixation, efficiency of nutrient assimilation, factor in a suspension cell is the expression levels of a set of resistance to biotic and abiotic stress, carbon assimilation, genes in a cell knocking out or overexpressing that transcrip plant architecture, resistance to lodging, percent seed germi 25 nation, seedling vigor, and juvenile traits. Yield can also tion factor compared with the expression levels of that same affected by efficiency of germination (including germination set of genes in a Suspension cell that has normal levels of that in stressed conditions), growth rate (including growth rate in transcription factor. The transcript profile can be presented as stressed conditions), ear number, seed number per ear, seed a list of those genes whose expression level is significantly size, composition of seed (starch, oil, protein) and character 30 different between the two treatments, and the difference istics of seed fill. ratios. Differences and similarities between expression levels Increased yield of a transgenic plant of the present inven may also be evaluated and calculated using statistical and tion can be measured in a number of ways, including plant clustering methods. Volume, plant biomass, test weight, seed number per plant, With regard to transcription factor gene knockouts as used seed weight, seed number per unit area (i.e. seeds, or weight 35 herein, the term “knockout” refers to a plant or plant cell of seeds, per acre), bushels per acre (bu?a), tonnes per acre, having a disruption in at least one transcription factor gene in tons per acre, and/or kilo per hectare. For example, maize the plant or cell, where the disruption results in a reduced yield may be measured as production of shelled corn kernels expression or activity of the transcription factor encoded by per unit of production area, for example in bushels per acre or that gene compared to a control cell. The knockout can be the metric tons perhectare, often reported on a moisture adjusted 40 result of for example, genomic disruptions, including trans basis, for example at 15.5 percent moisture. Increased yield posons, tilling, and homologous recombination, antisense may result from improved utilization of key biochemical constructs, sense constructs, RNA silencing constructs, or compounds, such as nitrogen, phosphorous and carbohy RNA interference. AT-DNA insertion within a transcription drate, or from improved responses to environmental stresses, factor gene is an example of a genotypic alteration that may Such as cold, heat, drought, salt, and attack by pests or patho 45 abolish expression of that transcription factor gene. gens. Recombinant DNA used in this invention can also be “Ectopic expression or altered expression' in reference to used to provide plants having improved growth and develop a polynucleotide indicates that the pattern of expression in, ment, and ultimately increased yield, as the result of modified e.g., a transgenic plant or plant tissue, is different from the expression of plant growth regulators or modification of cell expression pattern in a wild-type plant or a reference plant of cycle or photosynthesis pathways. Also of interest is the gen 50 the same species. The pattern of expression may also be eration of transgenic plants that demonstrate enhanced yield compared with a reference expression pattern in a wild-type with respect to a seed component that may or may not corre plant of the same species. For example, the polynucleotide or spond to an increase in overall plant yield. Such properties polypeptide is expressed in a cell or tissue type other than a include enhancements in seed oil, seed molecules such as cell or tissue type in which the sequence is expressed in the tocopherol, protein and starch, or oil particular oil compo 55 wild-type plant, or by expression at a time other than at the nents as may be manifest by an alteration in the ratios of seed time the sequence is expressed in the wild-type plant, or by a components. response to different inducible agents. Such as hormones or “Trait modification” refers to a detectable difference in a environmental signals, or at different expression levels (either characteristic in a plant ectopically expressing a polynucle higher or lower) compared with those found in a wild-type otide or polypeptide of the present invention relative to a plant 60 plant. The term also refers to altered expression patterns that not doing so. Such as a wild-type plant. In some cases, the trait are produced by lowering the levels of expression to below the modification can be evaluated quantitatively. For example, detection level or completely abolishing expression. The the trait modification can entail at least about a 2% increase or resulting expression pattern can be transient or stable, consti decrease, or an even greater difference, in an observed trait as tutive or inducible. In reference to a polypeptide, the term compared with a control or wild-type plant. It is known that 65 “ectopic expression or altered expression' further may relate there can be a natural variation in the modified trait. There to altered activity levels resulting from the interactions of the fore, the trait modification observed entails a change of the polypeptides with exogenous or endogenous modulators or US 7,598.429 B2 15 16 from interactions with factors or as a result of the chemical sensitivity to soluble Sugar concentrations, biomass or tran modification of the polypeptides. spiration characteristics, as well as plant architecture charac The term “overexpression' as used herein refers to a teristics such as apical dominance, branching patterns, num greater expression level of a gene in a plant, plant cell or plant ber of organs, organ identity, organ shape or size. tissue, compared to expression in a wild-type plant, cell or Transcription Factors Modify Expression of Endogenous tissue, at any developmental or temporal stage for the gene. Genes Overexpression can occur when, for example, the genes Expression of genes which encode transcription factors encoding one or more transcription factors are under the that modify expression of endogenous genes, polynucle control of a strong promoter (e.g., the cauliflower mosaic otides, and proteins are well known in the art. In addition, virus 35S transcription initiation region). Overexpression 10 transgenic plants comprising isolated polynucleotides encod may also under the control of an inducible or tissue specific ing transcription factors may also modify expression of promoter. Thus, overexpression may occur throughout a endogenous genes, polynucleotides, and proteins. Examples plant, in specific tissues of the plant, or in the presence or include Peng et al. (1997) Genes Develop. 11: 3194-3205) absence of particular environmental signals, depending on and Peng et al. (1999) Nature 400: 256-261). In addition, the promoter used. 15 many others have demonstrated that an Arabidopsis tran Overexpression may take place in plant cells normally Scription factor expressed in an exogenous plant species elic lacking expression of polypeptides functionally equivalent or its the same or very similar phenotypic response. See, for identical to the present transcription factors. Overexpression example, Fu et al. (2001) Plant Cell 13: 1791-1802): Nandiet may also occur in plant cells where endogenous expression of al. (2000) Curr. Biol. 10: 215-218); Coupland (1995) Nature the present transcription factors or functionally equivalent 377: 482-483); and Weigel and Nilsson (1995) Nature 377: molecules normally occurs, but Such normal expression is at 482-500). a lower level. Overexpression thus results in a greater than In another example, Mandeletal. (1992) Cell 71-133-143) normal production, or “overproduction' of the transcription and Suzuki et al. (2001) Plant J. 28: 409–418) teach that a factor in the plant, cell or tissue. transcription factor expressed in another plant species elicits The term “transcription regulating region” refers to a DNA 25 the same or very similar phenotypic response of the endog regulatory sequence that regulates expression of one or more enous sequence, as often predicted in earlier studies of Ara genes in a plant when a transcription factor having one or bidopsis transcription factors in Arabidopsis (see Mandel et more specific binding domains binds to the DNA regulatory al. (1992) supra; Suzuki et al. (2001) supra). sequence. Transcription factors of the present invention pos Other examples include Müller et al. (2001) Plant J. 28: sess an conserved domain. The transcription factors of the 30 169-179); Kim et al. (2001) Plant J. 25: 247-259): Kyozuka invention also comprise an amino acid Subsequence that and Shimamoto (2002) Plant Cell Physiol. 43: 130-135): forms a transcription activation domain that regulates expres Boss and Thomas (2002, Nature 416: 847-850); He et al. sion of one or more abiotic stress tolerance genes in a plant (2000) Transgenic Res. 9:223-227); and Robson et al. (2001) when the transcription factor binds to the regulating region. Plant J. 28: 619-631). 35 In yet another example, Gilmour et al. (1998) Plant J. 16: Traits which May be Modified 433-442) teach an Arabidopsis AP2 transcription factor, Trait modifications of particular interest include those to CBF1, which, when overexpressed in transgenic plants, seed (Such as embryo or endosperm), fruit, root, flower, leaf, increases plant freezing tolerance. Jaglo et al (2001) Plant stem, shoot, seedling or the like, including: enhanced toler Physiol. 127: 910-017) further identified sequences in Bras ance to environmental conditions including freezing, chill 40 sica napus which encode CBF-like genes and that transcripts ing, heat, drought, water saturation, radiation and OZone; for these genes accumulated rapidly in response to low tem improved tolerance to microbial, fungal or viral diseases; perature. Transcripts encoding CBF-like proteins were also improved tolerance to pest infestations, including nematodes, found to accumulate rapidly in response to low temperature in mollicutes, parasitic higher plants or the like; decreased her wheat, as well as intomato. An alignment of the CBF proteins bicide sensitivity; improved tolerance of heavy metals or 45 from Arabidopsis, B. napus, wheat, rye, and tomato revealed enhanced ability to take up heavy metals; improved growth the presence of conserved amino acid sequences, PKK/ under poor photoconditions (e.g., low light and/or short day RPAGRXKFXETRHP and DSAWR, which bracket the AP2/ length), or changes in expression levels of genes of interest. EREBPDNA binding domains of the proteins and distinguish Other phenotype that can be modified relate to the production them from other members of the AP2/EREBP protein family. of plant metabolites. Such as variations in the production of 50 taxol, tocopherol, tocotrienol, Sterols, phytosterols, vitamins, (See Jaglo et al. (2001) supra.) wax monomers, anti-oxidants, amino acids, lignins, cellu Polypeptides and Polynucleotides of the Invention lose, tannins, prenyllipids (such as chlorophylls and caro The present invention provides, among other things, tran tenoids), glucosinolates, and terpenoids, enhanced or compo Scription factors (TFS), and transcription factor homologue sitionally altered protein or oil production (especially in 55 polypeptides, and isolated or recombinant polynucleotides seeds), or modified Sugar (insoluble or Soluble) and/or starch encoding the polypeptides, or novel variant polypeptides or composition. Physical plant characteristics that can be modi polynucleotides encoding novel variants of transcription fac fied include cell development (such as the number of tri tors derived from the specific sequences provided here. These chomes), fruit and seed size and number, yields of plant parts polypeptides and polynucleotides may be employed to Such as stems, leaves, inflorescences, and roots, the stability 60 modify a plants characteristic. of the seeds during storage, characteristics of the seed pod Exemplary polynucleotides encoding the polypeptides of (e.g., Susceptibility to shattering), root hair length and quan the invention were identified in the Arabidopsis thaliana tity, internode distances, or the quality of seed coat. Plant GenBank database using publicly available sequence analy growth characteristics that can be modified include growth sis programs and parameters. Sequences initially identified rate, germination rate of seeds, vigor of plants and seedlings, 65 were then further characterized to identify sequences com leaf and flower senescence, male sterility, apomixis, flower prising specified sequence strings corresponding to sequence ing time, flower abscission, rate of nitrogen uptake, osmotic motifs present in families of known transcription factors. In US 7,598.429 B2 17 18 addition, further exemplary polynucleotides encoding the the amino acid sequence to the complete native amino acid polypeptides of the invention were identified in the plant sequence associated with the recited protein molecule. GenBank database using publicly available sequence analy As one of ordinary skill in the art recognizes, transcription sis programs and parameters. Sequences initially identified factors can be identified by the presence of a region or domain were then further characterized to identify sequences com of structural similarity or identity to a specific consensus prising specified sequence strings corresponding to sequence sequence or the presence of a specific consensus DNA-bind motifs present in families of known transcription factors. ing site or DNA-binding site motif (see, for example, Riech Polynucleotide sequences meeting Such criteria were con mann et al. (2000a) supra). The plant transcription factors firmed as transcription factors. may belong to one of the following transcription factor fami Additional polynucleotides of the invention were identi 10 lies: the AP2 (APETALA2) domain transcription factor fam fied by Screening Arabidopsis thaliana and/or other plant ily (Riechmann and Meyerowitz (1998) Biol. Chem. 379: cDNA libraries with probes corresponding to known tran 633-646); the MYB transcription factor family (ENBib; Mar Scription factors under low stringency hybridization condi tin and Paz-Ares (1997) Trends Genet. 13:67-73); the MADS tions. Additional sequences, including full length coding domain transcription factor family (Riechmann and Mey sequences were Subsequently recovered by the rapid ampli 15 erowitz (1997) Biol. Chem. 378: 1079-1101); the WRKY fication of cDNA ends (RACE) procedure, using a commer protein family (Ishiguro and Nakamura (1994) Mol. Gen. cially available kit according to the manufacturers instruc Genet. 244: 563-571); the ankyrin-repeat protein family tions. Where necessary, multiple rounds of RACE are (Zhanget al. (1992) Plant Cell 4: 1575-1588); the zinc finger performed to isolate 5' and 3' ends. The full length cDNA was protein (Z) family (Klug and Schwabe (1995) FASEB J. 9: then recovered by a routine end-to-end polymerase chain 597-604): Takatsuji (1998) Cell. Mol. Life Sci. 54: 582-596): reaction (PCR) using primers specific to the isolated 5' and 3' the homeobox (HB) protein family (Buerglin (1994) in ends. Exemplary sequences are provided in the Sequence Guidebook to the Homeobox Genes, Duboule (ed.) Oxford Listing. University Press); the CAAT-element binding proteins (Fors The polynucleotides of the invention can be or were ectopi burg and Guarente (1989) Genes Dev. 3: 1166-1178); the cally expressed in overexpressor or knockout plants and the 25 squamosa promoter binding proteins (SPB) (Klein et al. changes in the characteristic(s) or trait(s) of the plants (1996) Mol. Gen. Genet. 1996 250: 7-16); the NAM protein observed. Therefore, the polynucleotides and polypeptides family (Souer et al. (1996) Cell 85: 159-170); the IAA/AUX can be employed to improve the characteristics of plants. proteins (Abel et al. (1995).J. Mol. Biol. 251: 533-549): the HLH/MYC protein family (Littlewood et al. (1994) Prot. The polynucleotides of the invention can be or were ectopi 30 Profile 1: 639-709); the DNA-binding protein (DBP) family cally expressed in overexpressor plant cells and the changes (Tucker et al. (1994) EMBO J. 13: 2994-3002); the bZIP in the expression levels of a number of genes, polynucle family of transcription factors (Fosteretal. (1994) FASEB.J8: otides, and/or proteins of the plant cells observed. Therefore, 192-200); the Box P-binding protein (the BPF-1) family (da the polynucleotides and polypeptides can be employed to Costa e Silva et al. (1993) Plant J. 4: 125-135); the high change expression levels of a genes, polynucleotides, and/or 35 mobility group (HMG) family (Bustin and Reeves (1996) proteins of plants. Prog. Nucl. Acids Res. Mol. Biol. 54:35-100); the scarecrow The polynucleotide sequences of the invention encode (SCR) family (DiLaurenzio et al. (1996) Cell 86: 423-433): polypeptides that are members of well-known transcription the GF14 family (Wu et al. (1997) Plant Physiol. 114: 1421 factor families, including plant transcription factor families, 1431); the polycomb (PCOMB) family (Goodrich et al. as disclosed in Table 4. Generally, the transcription factors 40 (1997) Nature 386:44-51); the teosintebranched (TEO) fam encoded by the present sequences are involved in cell differ ily (Luo et al. (1996) Nature 383: 794-799); the AB13 family entiation and proliferation and the regulation of growth. (Giraudat et al. (1992) Plant Cell 4: 1251-1261); the triple Accordingly, one skilled in the art would recognize that by helix (TH) family (Dehesh et al. (1990) Science 250: 1397 expressing the present sequences in a plant, one may change 1399); the EIL family (Chao et al. (1997) Cell 89: 1133-44); the expression of autologous genes or induce the expression 45 the AT-HOOK family (Reeves and Nissen (1990).J. Biol. of introduced genes. By affecting the expression of similar Chem. 265: 8573-8582); the SIFA family (Zhou et al. (1995) autologous sequences in a plant that have the biological activ Nucleic Acids Res. 23: 1165-1169); the bZIPT2 family (Lu ity of the present sequences, or by introducing the present and Ferl (1995) Plant Physiol. 109: 723); the YABBY family sequences into a plant, one may alter a plants phenotype to (Bowman et al. (1999) Development 126:2387-96); the PAZ one with improved traits. The sequences of the invention may 50 family (Bohmert et al. (1998) EMBO.J. 17: 170-80); a family also be used to transform a plant and introduce desirable traits of miscellaneous (MISC) transcription factors including the not found in the wild-type cultivar or strain. Plants may then DPBF family (Kim et al. (1997) Plant J. 11: 1237-1251) and be selected for those that produce the most desirable degree of the SPF1 family (Ishiguro and Nakamura (1994) Mol. Gen. over- or under-expression of target genes of interest and coin Genet. 244: 563-571); the GARP family (Hall et al. (1998) cident trait improvement. 55 Plant Cell 10: 925-936), the TUBBY family (Boggin et al The sequences of the present invention may be from any (1999) Science 286: 21 19-2125), the heat shock family (Wu species, particularly plant species, in a naturally occurring (1995) Annu. Rev. Cell Dev. Biol. 11: 441-469), the ENBP form or from any source whether natural, synthetic, semi family (Christiansen et al. (1996) Plant Mol. Biol. 32: 809 synthetic or recombinant. The sequences of the invention may 821), the RING-zinc family (Jensen et al. (1998) FEBS Let also include fragments of the present amino acid sequences. 60 ters 436: 283-287), the PDBP family (Janik et al. (1989) In this context, a “fragment refers to afragment of a polypep Virology 168: 320-329), the PCF family (Cubas et al. Plant J. tide sequence which is at least 5 to about 15 amino acids in (1999) 18: 215-22), the SRS (SHI-related) family (Fridborg length, most preferably at least 14 amino acids, and which et al. (1999) Plant Cell 11: 1019-1032), the CPP (cysteine retain some biological activity of a transcription factor. rich polycomb-like) family (Cvitanich et al. (2000) Proc. Where “amino acid sequence” is recited to refer to an amino 65 Natl. Acad. Sci. 97: 8163-8168), the ARF (auxin response acid sequence of a naturally occurring protein molecule, factor) family (Ulmasov et al. (1999) Proc. Natl. Acad. Sci. “amino acid sequence” and like terms are not meant to limit 96: 5844-5849), the SWI/SNF family (Collingwood et al. US 7,598.429 B2 19 20 (1999).J. Mol. Endocrinol. 23:255-275), the ACBF family A variety of methods exist for producing the polynucle (Seguin et al. (1997) Plant Mol. Biol. 35: 281-291), PCGL otides of the invention. Procedures for identifying and isolat (CG-1 like) family (da Costa e Silva et al. (1994) Plant Mol. ing DNA clones are well known to those of skill in the art, and Biol. 25: 921-924) the ARID family (Vazquez et al. (1999) are described in, e.g., Berger and Kimmel, Guide to Molecu Development 126: 733-742), the Jumonji family (Balciunas lar Cloning Techniques, Methods in Enzymology volume 152 et al. (2000), Trends Biochem. Sci. 25: 274-276), the bZIP Academic Press, Inc., San Diego, Calif. (“Berger); Sam NIN family (Schauseret al. (1999) Nature 402:191-195), the brook et al. supra, and Current Protocols in Molecular Biol E2F family (Kaelin et al. (1992) Cell 70: 351-364) and the ogy, F. M. Ausubel et al., eds. Current Protocols, a joint GRF-like family (Knaap et al. (2000) Plant Physiol. 122: venture between Greene Publishing Associates, Inc. and John 695-704). As indicated by any part of the list above and as 10 Wiley & Sons, Inc., (supplemented through 2000) known in the art, transcription factors have been sometimes ("Ausubel'). categorized by class, family, and Sub-family according to Alternatively, polynucleotides of the invention, can be pro their structural content and consensus DNA-binding site duced by a variety of in vitro amplification methods adapted motif, for example. Many of the classes and many of the to the present invention by appropriate selection of specific or families and sub-families are listed here. However, the inclu 15 degenerate primers. Examples of protocols Sufficient to direct sion of one Sub-family and not another, or the inclusion of one persons of skill through in vitro amplification methods, family and not another, does not mean that the invention does including the polymerase chain reaction (PCR) the ligase not encompass polynucleotides or polypeptides of a certain chain reaction (LCR), QB-replicase amplification and other family or sub-family. The list provided here is merely an RNA polymerase mediated techniques (e.g., NASBA), e.g., example of the types of transcription factors and the knowl for the production of the homologous nucleic acids of the edge available concerning the consensus sequences and con invention are found in Berger (Supra), Sambrook (Supra), and sensus DNA-binding site motifs that help define them as Ausubel (supra), as well as Mullis et al., (1987) PCR Proto known to those of skill in the art (each of the references noted cols A Guide to Methods and Applications (Innis et al. eds) above are specifically incorporated herein by reference). A Academic Press Inc. San Diego, Calif. (1990) (Innis). transcription factor may include, but is not limited to, any 25 Improved methods for cloning invitro amplified nucleic acids polypeptide that can activate or repress transcription of a are described in Wallace et al., U.S. Pat. No. 5,426,039. single gene or a number of genes. This polypeptide group Improved methods for amplifying large nucleic acids by PCR includes, but is not limited to, DNA-binding proteins, DNA are summarized in Cheng et al. (1994) Nature 369: 684-685 binding protein binding proteins, protein kinases, protein and the references cited therein, in which PCR amplicons of phosphatases, protein methyltransferases, GTP-binding pro 30 up to 40 kb are generated. One of skill will appreciate that teins, and receptors, and the like. essentially any RNA can be converted into a double stranded In addition to methods for modifying a plant phenotype by DNA suitable for restriction digestion, PCR expansion and employing one or more polynucleotides and polypeptides of sequencing using reverse transcriptase and a polymerase. the invention described herein, the polynucleotides and See, e.g., Ausubel, Sambrook and Berger, all Supra. polypeptides of the invention have a variety of additional 35 Alternatively, polynucleotides and oligonucleotides of the uses. These uses include their use in the recombinant produc invention can be assembled from fragments produced by tion (i.e., expression) of proteins; as regulators of plant gene Solid-phase synthesis methods. Typically, fragments of up to expression, as diagnostic probes for the presence of comple approximately 100 bases are individually synthesized and mentary or partially complementary nucleic acids (including then enzymatically or chemically ligated to produce a desired for detection of natural coding nucleic acids); as Substrates 40 sequence, e.g., a polynucleotide encoding all or part of a for further reactions, e.g., mutation reactions, PCR reactions, transcription factor. For example, chemical synthesis using or the like; as Substrates for cloning e.g., including digestion the phosphoramidite method is described, e.g., by Beaucage or ligation reactions; and for identifying exogenous or endog et al. (1981) Tetrahedron Letters 22: 1859-1869; and Matthes enous modulators of the transcription factors. et al. (1984) EMBO.J. 3: 801-805. According to such meth 45 ods, oligonucleotides are synthesized, purified, annealed to Producing Polypeptides their complementary Strand, ligated and then optionally The polynucleotides of the invention include sequences cloned into suitable vectors. And if so desired, the polynucle that encode transcription factors and transcription factor otides and polypeptides of the invention can be custom homologue polypeptides and sequences complementary ordered from any of a number of commercial Suppliers. thereto, as well as unique fragments of coding sequence, or 50 sequence complementary thereto. Such polynucleotides can Homologous Sequences be, e.g., DNA or RNA, e.g., mRNA, cRNA, synthetic RNA, Sequences homologous, i.e., that share significant genomic DNA, cDNA synthetic DNA, oligonucleotides, etc. sequence identity or similarity, to those provided in the The polynucleotides are either double-stranded or single Sequence Listing, derived from Arabidopsis thaliana or from Stranded, and include either, or both sense (i.e., coding) 55 other plants of choice are also an aspect of the invention. sequences and antisense (i.e., non-coding, complementary) Homologous sequences can be derived from any plant includ sequences. The polynucleotides include the coding sequence ing monocots and dicots and in particular agriculturally of a transcription factor, or transcription factor homologue important plant species, including but not limited to, crops polypeptide, in isolation, in combination with additional cod Such as Soybean, wheat, corn, potato, cotton, rice, rape, oil ing sequences (e.g., a purification tag, a localization signal, as 60 seed rape (including canola), Sunflower, alfalfa, Sugarcane a fusion-protein, as a pre-protein, or the like), in combination and turf; or fruits and vegetables, such as banana, blackberry, with non-coding sequences (e.g., introns or inteins, regula blueberry, strawberry, and raspberry, cantaloupe, carrot, cau tory elements such as promoters, enhancers, terminators, and liflower, coffee, cucumber, eggplant, grapes, honeydew, let the like), and/or in a vector or host environment in which the tuce, mango, melon, onion, papaya, peas, peppers, pineapple, polynucleotide encoding a transcription factor or transcrip 65 pumpkin, spinach, squash, Sweet corn, tobacco, tomato, tion factor homologue polypeptide is an endogenous or exog watermelon, rosaceous fruits (such as apple, peach, pear, enous gene. cherry and plum) and vegetable brassicas (such as broccoli, US 7,598.429 B2 21 22 cabbage, cauliflower, Brussels sprouts, and kohlrabi). Other cies. Functional predictions can be greatly improved by crops, fruits and vegetables whose phenotype can be changed focusing on how the genes became similar in sequence (i.e., include barley, rye, millet, Sorghum, currant, avocado, citrus evolution) rather than on the sequence similarity itself (Eisen, fruits such as oranges, lemons, grapefruit and tangerines, (1998) Genome Res. 8: 163-167): “the first step in making artichoke, cherries, nuts Such as the walnut and peanut, functional predictions is the generation of a phylogenetic tree endive, leek, roots, such as arrowroot, beet, cassaya, turnip, representing the evolutionary history of the gene of interest radish, yam, and Sweet potato, and beans. The homologous and its homologs. Such trees are distinct from clusters and sequences may also be derived from woody species, such other means of characterizing sequence similarity because pine, poplar and eucalyptus, or mint or other labiates. they are inferred by techniques that help convert patterns of 10 similarity into evolutionary relationships. . . . After the gene Orthologs and Paralogs tree is inferred, biologically determined functions of the vari Several different methods are known by those of skill in the ous homologs are overlaid onto the tree. Finally, the structure art for identifying and defining these functionally homolo of the tree and the relative phylogenetic positions of genes of gous sequences. Three general methods for defining paralogs different functions are used to trace the history of functional and orthologs are described; a paralog or ortholog or homolog 15 changes, which is then used to predict functions of as yet may be identified by one or more of the methods described uncharacterized genes' (Eisen, Supra). Thus, once a phylo below. genic tree for a gene family of one species has been con Orthologs and paralogs are evolutionarily related genes structed using a program Such as CLUSTAL (Thompson et al. that have similar sequence and similar functions. Orthologs (1994) Nucleic Acids Res. 22: 4673-4680; Higgins et al. are structurally related genes in different species that are (1996) Methods Enzymol. 266: 383-402), potential ortholo derived from a speciation event. Paralogs are structurally gous sequences can be placed into the phylogenetic tree and related genes within a single species that are derived by a its relationship to genes from the species of interest can be duplication event. determined. Once the ortholog pair has been identified, the Within a single plant species, gene duplication may cause function of the test ortholog can be determined by determin two copies of a particular gene, giving rise to two or more 25 ing the function of the reference ortholog. It is then a matter of genes with similar sequence and similar function known as routine to align sequences that are most closely related by paralogs. A paralog is therefore a similar gene with a similar virtue of their presence in a related clade (e.g., a group of function within the same species. Paralogs typically cluster sequences descending from a strong node of a phylogenetic together or in the same clade (a group of similar genes) when tree representing a common ancestral sequence) using a gene family phylogeny is analyzed using programs such as 30 BLAST or similar analysis, or compare similarity or identity CLUSTAL (Thompson et al. (1994) Nucleic Acids Res. 22: of the amino acid residues of these sequences and/or their 4673-4680; Higgins et al. (1996) Methods Enzymol. 266: conserved domains or motifs that confer and correlate with 383-402). Groups of similar genes can also be identified with conserved function. pair-wise BLAST analysis (Feng and Doolittle (1987).J. Mol. Transcription factors that are homologous to the listed Evol. 25: 351-360). For example, a clade of very similar 35 sequences will typically share at least about 30% amino acid MADS domain transcription factors from Arabidopsis all sequence identity, or at least about 30% amino acid sequence share a common function in flowering time (Ratcliffe et al. identity outside of a known consensus sequence or consensus (2001) Plant Physiol. 126: 122-132), and a group of very DNA-binding site. More closely related transcription factors similar AP2 domain transcription factors from Arabidopsis can share at least about 50%, about 60%, about 65%, about are involved in tolerance of plants to freezing (Gilmour et al. 40 70%, about 75% or about 80% or about 90% or about 95% or (1998) Plant J. 16: 433-442). Analysis of groups of similar about 98% or more sequence identity with the listed genes with similar function that fall within one clade can yield sequences, or with the listed sequences but excluding or out Sub-sequences that are particular to the clade. These Sub side a known consensus sequence or consensus DNA-binding sequences, known as consensus sequences, can not only be site, or with the listed sequences excluding one or all con used to define the sequences within each clade, but define the 45 served domain. Factors that are most closely related to the functions of these genes; genes within a clade may contain listed sequences share, e.g., at least about 85%, about 90% or paralogous or orthologous sequences that share the same about 95% or more % sequence identity to the listed function. (See also, for example, Mount, D. W. (2001) Bio sequences, or to the listed sequences but excluding or outside informatics. Sequence and Genome Analysis, Cold Spring a known consensus sequence or consensus DNA-binding site Harbor Laboratory Press, Cold Spring Harbor, N.Y. page 50 or outside one or all conserved domain. At the nucleotide 543.) level, the sequences will typically share at least about 40% Speciation, the production of new species from a parental nucleotide sequence identity, preferably at least about 50%, species, can also give rise to two or more genes with similar about 60%, about 70% or about 80% sequence identity, and sequence and similar function. These genes, termed more preferably about 85%, about 90%, about 95% or about orthologs, often have an identical function within their host 55 97% or more sequence identity to one or more of the listed plants and are often interchangeable between species without sequences, or to a listed sequence but excluding or outside a losing function. Because plants have common ancestors, known consensus sequence or consensus DNA-binding site, many genes in any plant species will have a corresponding or outside one orall conserved domain. The degeneracy of the orthologous gene in another plant species. Transcription fac genetic code enables major variations in the nucleotide tor gene sequences are thus conserved across diverse eukary 60 sequence of a polynucleotide while maintaining the amino otic species lines (Goodrich et al. (1993) Cell 75: 519-530; acid sequence of the encoded protein. Conserved domains Linet al. (1991) Nature 353: 569-571; Sadowski et al. (1988) (for example, a DNA binding domain) within a transcription Nature 335: 563-564). Plants are no exception to this obser factor family may exhibit a high degree of sequence homol Vation; diverse plant species possess transcription factors that ogy. Such as at least about at least about 65%, or at least about have similar sequences and functions. It is well known in the 65 69%, or at least about 70%, or at least about 73%, or at least art that protein function can be classified using phylogenetic about 76%, or at least about 78%, or at least about 80%, or at analysis of gene trees combined with the corresponding spe least about 82%, or at least about 85%, or at least about 87%, US 7,598.429 B2 23 24 or at least about 89%, or at least about 95%, amino acid gous to one or more polynucleotides as noted herein, or one or residue sequence identity, to a conserved domain of a tran more target polypeptides encoded by the polynucleotides, or scription factor polypeptide of the invention listed in the otherwise noted herein and may include linking or associat Sequence Listing. Transcription factors that are homologous ing a given plant phenotype or gene function with a sequence. to the listed sequences should share at least 30%, or at least In the methods, a sequence database is provided (locally or about 60%, or at least about 75%, or at least about 80%, or at across an inter or intra net) and a query is made against the least about 90%, or at least about 95% amino acid sequence sequence database using the relevant sequences herein and identity over the entire length of the polypeptide or the associated plant phenotypes or gene functions. homolog. In addition, transcription factors that are homolo In addition, one or more polynucleotide sequences or one gous to the listed sequences should share at least 30%, or at 10 or more polypeptides encoded by the polynucleotide least about 60%, or at least about 75%, or at least about 80%, sequences may be used to search against a BLOCKS (Bairoch or at least about 90%, or at least about 95% amino acid et al. (1997) Nucleic Acids Res. 25: 217-221), PFAM, and sequence similarity over the entire length of the polypeptide other databases which contain previously identified and or the homolog. annotated motifs, sequences and gene functions. Methods Percent identity can be determined electronically, e.g., by 15 that search for primary sequence patterns with secondary using the MEGALIGN program (DNASTAR, Inc. Madison, structure gap penalties (Smith et al. (1992) Protein Engineer Wis.). The MEGALIGN program can create alignments ing 5: 35-51) as well as algorithms such as Basic Local between two or more sequences according to different meth Alignment Search Tool (BLAST: Altschul (1993) J. Mol. ods, e.g., the clustal method. (See, e.g., Higgins and Sharp Evol. 36: 290-300; Altschul et al. (1990) supra), BLOCKS (1988) Gene 73: 237-244.) The clustal algorithm groups (Henikoff and Henikoff (1991) Nucl. Acids Res. 19: 6565 sequences into clusters by examining the distances between 6572), Hidden Markov Models (HMM; Eddy (1996) Curr: all pairs. The clusters are aligned pairwise and then in groups. Opin. Str. Biol. 6: 361-365: Sonnhammer et al. (1997) Pro Otheralignment algorithms or programs may be used, includ teins 28: 405-420), and the like, can be used to manipulate and ing FASTA, BLAST, or ENTREZ, FASTA and BLAST. analyze polynucleotide and polypeptide sequences encoded These are available as a part of the GCG sequence analysis 25 by polynucleotides. These databases, algorithms and other package (University of Wisconsin, Madison, Wis.), and can methods are well known in the art and are described in be used with or without default settings. ENTREZ is available Ausubel et al. (1997) Short Protocols in Molecular Biology, through the National Center for Biotechnology Information. John Wiley & Sons, New York N.Y., unit 7.7) and in Meyers, In one embodiment, the percent identity of two sequences can R. A. (1995) Molecular Biology and Biotechnology, Wiley be determined by the GCG program with a gap weight of 1. 30 VCH, New York N.Y., p856-853). e.g., each amino acid gap is weighted as if it were a single Furthermore, methods using manual alignment of amino acid or nucleotide mismatch between the two sequences similar or homologous to one or more polynucle sequences (see U.S. Pat. No. 6,262.333). otide sequences or one or more polypeptides encoded by the Other techniques for alignment are described in Methods polynucleotide sequences may be used to identify regions of in Enzymology, Vol. 266. Computer Methods for Macromo 35 similarity and conserved domains. Such manual methods are lecular Sequence Analysis (1996), ed. Doolittle, Academic well-known of those of skill in the art and can include, for Press, Inc., San Diego, Calif., USA. Preferably, an alignment example, comparisons of tertiary structure between a program that permits gaps in the sequence is utilized to align polypeptide sequence encoded by a polynucleotide which the sequences. The Smith-Waterman is one type of algorithm comprises a known function, with a polypeptide sequence that permits gaps in sequence alignments (Shpaer (1997) 40 encoded by a polynucleotide sequence which has a function Methods Mol. Biol. 70: 173-187). Also, the GAP program not yet determined. Such examples of tertiary structure may using the Needleman and Wunsch alignment method can be comprise predicted a helices, B-sheets, amphipathic helices, utilized to align sequences. An alternative search strategy leucine Zipper motifs, Zinc finger motifs, proline-rich regions, uses MPSRCH software, which runs on a MASPAR com cysteine repeat motifs, and the like. puter. MPSRCH uses a Smith-Waterman algorithm to score 45 sequences on a massively parallel computer. This approach VI. Identifying Polynucleotides or Nucleic Acids by Hybrid improves ability to pick up distantly related matches, and is ization especially tolerant of Small gaps and nucleotide sequence Polynucleotides homologous to the sequences illustrated errors. Nucleic acid-encoded amino acid sequences can be in the Sequence Listing and tables can be identified, e.g., by used to search both protein and DNA databases. 50 hybridization to each other under Stringent or under highly The percentage similarity between two polypeptide stringent conditions. Single stranded polynucleotides hybrid sequences, e.g., sequence A and sequence B, is calculated by ize when they associate based on a variety of well character dividing the length of sequence A, minus the number of gap ized physical-chemical forces, such as hydrogen bonding, residues in sequence A, minus the number of gap residues in Solvent exclusion, base stacking and the like. The stringency sequence B, into the Sum of the residue matches between 55 of a hybridization reflects the degree of sequence identity of sequence A and sequence B, times one hundred. Gaps of low the nucleic acids involved, such that the higher the Stringency, or of no similarity between the two amino acid sequences are the more similar are the two polynucleotide strands. Strin not included in determining percentage similarity. Percent gency is influenced by a variety of factors, including tempera identity between polynucleotide sequences can also be ture, salt concentration and composition, organic and non counted or calculated by other methods known in the art, e.g., 60 organic additives, solvents, etc. present in both the the Jotun Hein method. (See, e.g., Hein (1990) Methods Enzy hybridization and wash solutions and incubations (and num mol. 183: 626-645.) Identity between sequences can also be berthereof), as described in more detail in the references cited determined by other methods known in the art, e.g., by vary above. Encompassed by the invention are polynucleotide ing hybridization conditions (see US Patent Application No. sequences that are capable of hybridizing to the polynucle 20010010913). 65 otide sequences, listed in the Sequence Listing; and frag Thus, the invention provides methods for identifying a ments, thereof under various conditions of stringency. (See, sequence similar or paralogous or orthologous or homolo e.g., Wahl and Berger (1987) Methods Enzymol. 152:399 US 7,598.429 B2 25 26 407; Kimmel, A. R. (1987) Methods Enzymol. 152:507-511.) The washing steps that follow hybridization can also vary Estimates of homology are provided by either DNA-DNA or in stringency. Wash Stringency conditions can be defined by DNA-RNA hybridization under conditions of stringency as is salt concentration and by temperature. As above, wash Strin well understood by those skilled in the art (Hames and Hig gency can be increased by decreasing salt concentration or by gins, eds. (1985) Nucleic Acid Hybridisation, IRL Press, increasing temperature. For example, Stringent salt concen Oxford, U.K.). Stringency conditions can be adjusted to tration for the wash steps will preferably be less than about 30 screen for moderately similar fragments, such as homologous mM NaCl and 3 mM trisodium citrate, and most preferably sequences from distantly related organisms, to highly similar less than about 15 mM NaCl and 1.5 mM trisodium citrate. fragments, such as genes that duplicate functional enzymes Stringent temperature conditions for the wash steps will ordi from closely related organisms. Post-hybridization washes 10 narily include temperature of at least about 25° C., more determine stringency conditions. preferably of at least about 42°C. Another preferred set of In addition to the nucleotide sequences listed in Table 4, highly stringent conditions uses two final washes in 0.1 xSSC, full length cDNA, orthologs, paralogs and homologs of the 0.1% SDS at 65° C. The most preferred high stringency present nucleotide sequences may be identified and isolated washes are of at least about 68°C. For example, in a preferred using well known methods. The cDNA libraries orthologs, 15 embodiment, wash steps will occur at 25°C. in 30 mM NaCl, paralogs and homologs of the present nucleotide sequences 3 mM trisodium citrate, and 0.1% SDS. In a more preferred may be screened using hybridization methods to determine embodiment, wash steps will occur at 42°C. in 15 mM NaCl, their utility as hybridization target or amplification probes. 1.5 mM trisodium citrate, and 0.1% SDS. In a most preferred An example of stringent hybridization conditions for embodiment, the wash steps will occur at 68°C. in 15 mM hybridization of complementary nucleic acids which have NaCl, 1.5 mM trisodium citrate, and 0.1% SDS. Additional more than 100 complementary residues on a filter in a South variations on these conditions will be readily apparent to ern or northern blot is about 5° C. to 20° C. lower than the those skilled in the art (see U.S. Patent Application No. thermal melting point (T) for the specific sequence at a 20010010913). defined ionic strength and pH. The T is the temperature As another example, stringent conditions can be selected (under defined ionic strength and pH) at which 50% of the 25 Such that an oligonucleotide that is perfectly complementary target sequence hybridizes to a perfectly matched probe. to the coding oligonucleotide hybridizes to the coding oligo Nucleic acid molecules that hybridize under stringent condi nucleotide with at least about a 5-10x higher signal to noise tions will typically hybridize to a probe based on either the ratio than the ratio for hybridization of the perfectly comple entire cDNA or selected portions, e.g., to a unique Subse mentary oligonucleotide to a nucleic acid encoding a tran quence, of the cDNA under wash conditions of 0.2xSSC to 30 Scription factor known as of the filing date of the application. 2.0xSSC, 0.1% SDS at 50-65° C. For example, high strin Conditions can be selected Such that a higher signal to noise gency is about 0.2xSSC, 0.1% SDS at 65° C. Ultra-high ratio is observed in the particular assay which is used, e.g., stringency will be the same conditions except the wash tem about 15x, 25x, 35x. 50x or more. Accordingly, the subject perature is raised about 3 to about 5°C., and ultra-ultra-high nucleic acid hybridizes to the unique coding oligonucleotide stringency will be the same conditions except the wash tem 35 with at least a 2x higher signal to noise ratio as compared to perature is raised about 6 to about 9° C. For identification of hybridization of the coding oligonucleotide to a nucleic acid less closely related homologues washes can be performed at encoding known polypeptide. Again, higher signal to noise a lower temperature, e.g., 50° C. In general, stringency is ratios can be selected, e.g., about 5x, 10x, 25x, 35x. 50x or increased by raising the wash temperature and/or decreasing more. The particular signal will depend on the label used in the concentration of SSC, as known in the art. 40 the relevant assay, e.g., a fluorescent label, a colorimetric In another example, Stringent salt concentration will ordi label, a radioactive label, or the like. narily be less than about 750 mMNaCl and 75 mM trisodium Alternatively, transcription factor homolog polypeptides citrate, preferably less than about 500 mM. NaCl and 50 mM can be obtained by screening an expression library using trisodium citrate, and most preferably less than about 250 antibodies specific for one or more transcription factors. With mM NaCl and 25 mM trisodium citrate. Low stringency 45 the provision herein of the disclosed transcription factor, and hybridization can be obtained in the absence of organic Sol transcription factor homologue nucleic acid sequences, the vent, e.g., formamide, while high Stringency hybridization encoded polypeptide(s) can be expressed and purified in a can be obtained in the presence of at least about 35% forma heterologous expression system (e.g., E. coli) and used to mide, and most preferably at least about 50% formamide. raise antibodies (monoclonal or polyclonal) specific for the Stringent temperature conditions will ordinarily include tem 50 polypeptide(s) in question. Antibodies can also be raised peratures of at least about 30°C., more preferably of at least against Synthetic peptides derived from transcription factor, about 37° C., and most preferably of at least about 42° C. or transcription factor homologue, amino acid sequences. Varying additional parameters, such as hybridization time, Methods of raising antibodies are well known in the art and the concentration of detergent, e.g., sodium dodecyl Sulfate are described in Harlow and Lane (1988) Antibodies. A Labo (SDS), and the inclusion or exclusion of carrier DNA, are well 55 ratory Manual, Cold Spring Harbor Laboratory, New York. known to those skilled in the art. Various levels of stringency Such antibodies can then be used to Screen an expression are accomplished by combining these various conditions as library produced from the plant from which it is desired to needed. In a preferred embodiment, hybridization will occur clone additional transcription factor homologues, using the at 30°C. in 750 mM. NaCl, 75 mM trisodium citrate, and 1% methods described above. The selected cDNAs can be con SDS. In a more preferred embodiment, hybridization will 60 firmed by sequencing and enzymatic activity. occurat 37°C. in 500 mMNaCl, 50mMtrisodium citrate, 1% SDS, 35% formamide, and 100 g/ml denatured salmon Sequence Variations sperm DNA (ssDNA). In a most preferred embodiment, It will readily be appreciated by those of skill in the art, that hybridization will occur at 42°C. in 250 mM. NaCl, 25 mM any of a variety of polynucleotide sequences are capable of trisodium citrate, 1% SDS, 50% formamide, and 200 ug/ml 65 encoding the transcription factors and transcription factor ssDNA. Useful variations on these conditions will be readily homologue polypeptides of the invention. Due to the degen apparent to those skilled in the art. eracy of the genetic code, many different polynucleotides can US 7,598.429 B2 27 28 encode identical and/or Substantially similar polypeptides in addition to those sequences illustrated in the Sequence List TABLE 1. ing. Nucleic acids having a sequence that differs from the sequences shown in the Sequence Listing, or complementary Amino acid Possible Codons sequences, that encode functionally equivalent peptides (i.e., Alanine Ala A. GCA GCC GCG GCU peptides having some degree of equivalent or similar biologi Cysteine Cys C TGC TGT cal activity) but differ in sequence from the sequence shown in the sequence listing due to degeneracy in the genetic code, Aspartic acid Asp D GAC GAT are also within the scope of the invention. 10 Glutamic acid Glu E GAA GAG Altered polynucleotide sequences encoding polypeptides include those sequences with deletions, insertions, or Substi Phenylalanine Phe F TTC TTT tutions of different nucleotides, resulting in a polynucleotide Glycine Gly G GGA. GGC GGG GGT encoding a polypeptide with at least one functional charac Histidine His H CAC CAT teristic of the instant polypeptides. Included within this defi 15 nition are polymorphisms which may or may not be readily Isoleucine Ile I ATA ATC ATT detectable using a particular oligonucleotide probe of the Lysine Lys K AAA AAG polynucleotide encoding the instant polypeptides, and improper or unexpected hybridization to allelic variants, with Leucine Le L TTA TTG CTA CTC CTG CTT a locus other than the normal chromosomal locus for the Methionine Met M ATG polynucleotide sequence encoding the instant polypeptides. Asparagine Asn. N AAC AAT Allelic variant refers to any of two or more alternative forms of a gene occupying the same chromosomal locus. Proline Pro P CCA CCC CCG CCT Allelic variation arises naturally through mutation, and may 25 result in phenotypic polymorphism within populations. Gene Glutamine Glin Q CAA CAG mutations can be silent (i.e., no change in the encoded Arginine Arg R AGA AGG CGA CGC CGG CGT polypeptide) or may encode polypeptides having altered Serine Ser S AGC AGT TCA TCC TCG TCT amino acid sequence. The term allelic variant is also used 30 herein to denote a protein encoded by an allelic variant of a Threonine Thir T ACA ACC ACG ACT gene. Splice variant refers to alternative forms of RNA tran Waline Wal W GTA GTG. GTG GTT scribed from a gene. Splice variation arises naturally through Tryptophan Trp W TGG use of alternative splicing sites within a transcribed RNA 35 molecule, or less commonly between separately transcribed Tyrosine Tyr Y TAC TAT RNA molecules, and may result in several mRNAs tran scribed from the same gene. Splice variants may encode Sequence alterations that do not change the amino acid polypeptides having altered amino acid sequence. The term sequence encoded by the polynucleotide are termed “silent” splice variant is also used herein to denote a protein encoded 40 variations. With the exception of the codons ATG and TGG. by a splice variant of an mRNA transcribed from a gene. encoding methionine and tryptophan, respectively, any of the Those skilled in the art would recognize that G47, SEQID possible codons for the same amino acid can be substituted by NO: 66, represents a single transcription factor; allelic varia a variety of techniques, e.g., site-directed mutagenesis, avail tion and alternative splicing may be expected to occur. Allelic able in the art. Accordingly, any and all Such variations of a 45 sequence selected from the above table are a feature of the variants of SEQID NO: 65 can be cloned by probing cDNA invention. or genomic libraries from different individual organisms In addition to silent variations, other conservative varia according to standard procedures. Allelic variants of the DNA tions that alter one, or a few amino acids in the encoded sequence shown in SEQID NO: 65, including those contain polypeptide, can be made without altering the function of the ing silent mutations and those in which mutations result in 50 polypeptide, these conservative variants are, likewise, a fea amino acid sequence changes, are within the scope of the ture of the invention. present invention, as are proteins which are allelic variants of For example, Substitutions, deletions and insertions intro SEQID NO: 66. cDNAs generated from alternatively spliced duced into the sequences provided in the Sequence Listing are mRNAs, which retain the properties of the transcription fac also envisioned by the invention. Such sequence modifica tor are included within the scope of the present invention, as 55 tions can be engineered into a sequence by site-directed are polypeptides encoded by such cDNAs and mRNAs. mutagenesis (Wu (ed.) Meth. Enzymol. (1993) vol. 217, Aca Allelic variants and splice variants of these sequences can be demic Press) or the other methods noted below. Amino acid cloned by probing cDNA or genomic libraries from different Substitutions are typically of single residues; insertions usu individual organisms or tissues according to standard proce ally will be on the order of about from 1 to 10 amino acid 60 residues; and deletions will range about from 1 to 30 residues. dures known in the art (see U.S. Pat. No. 6,388,064). In preferred embodiments, deletions or insertions are made in For example, Table 1 illustrates, e.g., that the codons AGC, adjacent pairs, e.g., a deletion of two residues or insertion of AGT, TCA, TCC, TCG, and TCT all encode the same amino two residues. Substitutions, deletions, insertions or any com acid: serine. Accordingly, at each position in the sequence bination thereof can be combined to arrive at a sequence. The where there is a codon encoding serine, any of the above 65 mutations that are made in the polynucleotide encoding the trinucleotide sequences can be used without altering the transcription factor should not place the sequence out of encoded polypeptide. reading frame and should not create complementary regions US 7,598.429 B2 29 30 that could produce secondary mRNA structure. Preferably, cantly in their effect on maintaining (a) the structure of the the polypeptide encoded by the DNA performs the desired polypeptide backbone in the area of the substitution, for function. example, as a sheet or helical conformation, (b) the charge or Conservative substitutions are those in which at least one hydrophobicity of the molecule at the target site, or (c) the residue in the amino acid sequence has been removed and a bulk of the side chain. The substitutions which in general are different residue inserted in its place. Such substitutions gen expected to produce the greatest changes in protein properties erally are made in accordance with the Table 2 when it is will be those in which (a) a hydrophilic residue, e.g., seryl or desired to maintain the activity of the protein. Table 2 shows threonyl, is substituted for (orby) a hydrophobic residue, e.g., amino acids which can be substituted for an amino acid in a leucyl, isoleucyl phenylalanyl, Valyl or alanyl; (b) a cysteine protein and which are typically regarded as conservative Sub 10 or proline is substituted for (or by) any other residue; (c) a stitutions. residue having an electropositive side chain, e.g., lysyl, argi nyl, or histidyl, is substituted for (or by) an electronegative TABLE 2 residue, e.g., glutamyl or aspartyl; or (d) a residue having a bulky side chain, e.g., phenylalanine, is Substituted for (orby) Conservative 15 one not having a side chain, e.g., glycine. Residue Substitutions Ala Ser Further Modifying Sequences of the Invention—Mutation/ Arg Lys Forced Evolution ASn Gln: His In addition to generating silent or conservative Substitu Asp Glu tions as noted, above, the present invention optionally Gln ASn Cys Ser includes methods of modifying the sequences of the Glu Asp Sequence Listing. In the methods, nucleic acid or protein Gly Pro modification methods are used to alter the given sequences to His ASn; Glin produce new sequences and/or to chemically or enzymati Ile Leu, Val Leu Ile: Val 25 cally modify given sequences to change the properties of the Lys Arg: Gln nucleic acids or proteins. Met Leu: Ile Thus, in one embodiment, given nucleic acid sequences are Phe Met; Leu: Tyr Ser Thr; Gly modified, e.g., according to standard mutagenesis or artificial Thr Ser; Val evolution methods to produce modified sequences. The modi Trp Tyr 30 fied sequences may be created using purified natural poly Tyr Trp; Phe nucleotides isolated from any organism or may be synthe Wall Ile: Leu sized from purified compositions and chemicals using chemical means well know to those of skill in the art. For Similar substitutions are those in which at least one residue example, Ausubel, Supra, provides additional details on in the amino acid sequence has been removed and a different 35 mutagenesis methods. Artificial forced evolution methods are residue inserted in its place. Such Substitutions generally are described, for example, by Stemmer (1994) Nature 370: 389 made in accordance with the Table 3 when it is desired to 391, Stemmer (1994) Proc. Natl. Acad. Sci. USA 91: 10747 maintain the activity of the protein. Table 3 shows amino 10751, and U.S. Pat. Nos. 5,811,238, 5,837,500, and 6,242, acids which can be substituted for an amino acid in a protein 568. Methods for engineering synthetic transcription factors and which are typically regarded as structural and functional 40 and other polypeptides are described, for example, by Zhang substitutions. For example, a residue in column 1 of Table 3 et al. (2000) J. Biol. Chem. 275: 33.850-33860, Liu et al. may be substituted with residue in column 2; in addition, a (2001).J. Biol. Chem. 276: 11323-11334, and Isalan et al. residue in column 2 of Table 3 may be substituted with the (2001) Nature Biotechnol. 19: 656-660. Many other mutation residue of column 1. and evolution methods are also available and expected to be 45 within the skill of the practitioner. TABLE 3 Similarly, chemical or enzymatic alteration of expressed nucleic acids and polypeptides can be performed by standard Residue Similar Substitutions methods. For example, sequence can be modified by addition Ala Ser: Thr; Gly; Val; Leu: Ile of lipids, Sugars, peptides, organic or inorganic compounds, Arg Lys; His; Gly 50 by the inclusion of modified nucleotides or amino acids, or ASn Gln: His: Gly: Ser: Thr the like. For example, protein modification techniques are Asp Glu, Ser: Thr Gln ASn; Ala illustrated in Ausubel, supra. Further details on chemical and Cys Ser: Gly enzymatic modifications can be found herein. These modifi Glu Asp cation methods can be used to modify any given sequence, or Gly Pro; Arg 55 to modify any sequence produced by the various mutation and His ASn; Gln: Tyr; Phe, Lys; Arg artificial evolution modification methods noted herein. Ile Ala; Leu; Val; Gly; Met Leu Ala: Ile:Val; Gly; Met Accordingly, the invention provides for modification of Lys Arg; His; Glin; Gly; Pro any given nucleic acid by mutation, evolution, chemical or Met Leu: Ile: Phe Phe Met; Leu: Tyr; Trp: His; Val; Ala enzymatic modification, or other available methods, as well Ser Thr; Gly; Asp; Ala; Val: Ile: His 60 as for the products produced by practicing Such methods, e.g., Thr Ser; Val; Ala; Gly using the sequences herein as a starting Substrate for the Trp Tyr; Phe: His various modification approaches. Tyr Trp; Phe: His For example, optimized coding sequence containing Wal Ala: Ile; Leu; Gly: Thr; Ser; Glu codons preferred by a particular prokaryotic or eukaryotic 65 host can be used e.g., to increase the rate of translation or to Substitutions that are less conservative than those in Table produce recombinant RNA transcripts having desirable prop 2 can be selected by picking residues that differ more signifi erties. Such as a longer half-life, as compared with transcripts US 7,598.429 B2 31 32 produced using a non-optimized sequence. Translation stop include those derived from a Ti plasmid of Agrobacterium codons can also be modified to reflect host preference. For tumefaciens, as well as those disclosed by Herrera-Estrella et example, preferred stop codons for Saccharomyces cerevi al. (1983) Nature 303: 209, Bevan (1984) Nucl Acid Res. 12: siae and mammals are TAA and TGA, respectively. The pre 8711-8721, Klee (1985) Bio/Technology 3: 637-642, for ferred stop codon for monocotyledonous plants is TGA, dicotyledonous plants. whereas insects and E. coli prefer to use TAA as the stop Alternatively, non-Ti vectors can be used to transfer the codon. DNA into monocotyledonous plants and cells by using free The polynucleotide sequences of the present invention can DNA delivery techniques. Such methods can involve, for also be engineered in order to alter a coding sequence for a example, the use of liposomes, electroporation, microprojec variety of reasons, including but not limited to, alterations 10 tile bombardment, silicon carbide whiskers, and viruses. By which modify the sequence to facilitate cloning, processing using these methods transgenic plants such as wheat, rice and/or expression of the gene product. For example, alter (Christou (1991) Bio/Technology 9:957-962) and corn (Gor ations are optionally introduced using techniques which are don-Kamm (1990) Plant Cell 2: 603-618) can be produced. well known in the art, e.g., site-directed mutagenesis, to insert An immature embryo can also be a good target tissue for new restriction sites, to alterglycosylation patterns, to change 15 monocots for direct DNA delivery techniques by using the codon preference, to introduce splice sites, etc. particle gun (Weeks et al. (1993) Plant Physiol. 102: 1077 Furthermore, a fragment or domain derived from any of the 1084; Vasil (1993) Bio/Technology 10: 667-674; Wan and polypeptides of the invention can be combined with domains Lemeaux (1994) Plant Physiol. 104:37-48, and for Agrobac derived from other transcription factors or synthetic domains terium-mediated DNA transfer (Ishida et al. (1996) Nature to modify the biological activity of a transcription factor. For Biotech. 14: 745-750). instance, a DNA-binding domainderived from a transcription Typically, plant transformation vectors include one or factor of the invention can be combined with the activation more cloned plant coding sequence (genomic or cDNA) domain of another transcription factor or with a synthetic under the transcriptional control of 5' and 3' regulatory activation domain. A transcription activation domain assists sequences and a dominant selectable marker. Such plant in initiating transcription from a DNA-binding site. Examples 25 transformation vectors typically also contain a promoter (e.g., include the transcription activation region of VP16 or GAL4 a regulatory region controlling inducible or constitutive, envi (Moore et al. (1998) Proc. Natl. Acad. Sci. USA 95: 376-381: ronmentally- or developmentally-regulated, or cell- or tissue and Aoyama et al. (1995) Plant Cell 7: 1773-1785), peptides specific expression), a transcription initiation start site, an derived from bacterial sequences (Ma and Ptashne (1987) RNA processing signal (such as intron splice sites), a tran Cell 51; 113-119) and synthetic peptides (Giniger and 30 Scription termination site, and/or a polyadenylation signal. Ptashne, (1987) Nature 330: 670-672). Examples of constitutive plant promoters which can be Expression and Modification of Polypeptides useful for expressing the TF sequence include: the cauli Typically, polynucleotide sequences of the invention are flower mosaic virus (CaMV) 35S promoter, which confers incorporated into recombinant DNA (or RNA) molecules that constitutive, high-level expression in most plant tissues (see, direct expression of polypeptides of the invention in appro 35 e.g., Odell et al. (1985) Nature 313: 810-812); the nopaline priate host cells, transgenic plants, in vitro translation sys synthase promoter (An et al. (1988) Plant Physiol. 88: 547 tems, or the like. Due to the inherent degeneracy of the genetic 552); and the octopine synthase promoter (Fromm et al. code, nucleic acid sequences which encode Substantially the (1989) Plant Cell 1:977-984). same or a functionally equivalent amino acid sequence can be A variety of plant gene promoters that regulate gene 40 expression in response to environmental, hormonal, chemi Substituted for any listed sequence to provide for cloning and cal, developmental signals, and in a tissue-active manner can expressing the relevant homologue. be used for expression of a TF sequence in plants. Choice of Vectors, Promoters, and Expression Systems a promoter is based largely on the phenotype of interest and is The present invention includes recombinant constructs determined by Such factors as tissue (e.g., seed, fruit, root, comprising one or more of the nucleic acid sequences herein. 45 pollen, vascular tissue, flower, carpel, etc.), inducibility (e.g., The constructs typically comprise a vector, Such as a plasmid, in response to wounding, heat, cold, drought, light, patho a cosmid, a phage, a virus (e.g., a plant virus), a bacterial gens, etc.), timing, developmental stage, and the like. Numer artificial chromosome (BAC), a yeast artificial chromosome ous known promoters have been characterized and can favor (YAC), or the like, into which a nucleic acid sequence of the ably be employed to promote expression of a polynucleotide invention has been inserted, in a forward or reverse orienta 50 of the invention in a transgenic plant or cell of interest. For tion. In a preferred aspect of this embodiment, the construct example, tissue specific promoters include: seed-specific pro further comprises regulatory sequences, including, for moters (such as the napin, phaseolin or DC3 promoter example, a promoter, operably linked to the sequence. Large described in U.S. Pat. No. 5,773.697), fruit-specific promot numbers of suitable vectors and promoters are knownto those ers that are active during fruit ripening (such as the dru 1 of skill in the art, and are commercially available. 55 promoter (U.S. Pat. No. 5,783,393), or the 2A11 promoter General texts that describe molecular biological tech (U.S. Pat. No. 4,943,674) and the tomato polygalacturonase niques useful herein, including the use and production of promoter (Bird et al. (1988) Plant Mol. Biol. 11: 651), root vectors, promoters and many other relevant topics, include specific promoters, such as those disclosed in U.S. Pat. Nos. Berger, Sambrook and Ausubel, supra. Any of the identified 5,618.988, 5,837,848 and 5,905,186, pollen-active promoters sequences can be incorporated into a cassette or vector, e.g., 60 such as PTA29, PTA26 and PTA13 (U.S. Pat. No. 5,792,929), for expression in plants. A number of expression vectors promoters active in vascular tissue (Ringli and Keller (1998) suitable for stable transformation of plant cells or for the Plant Mol. Biol. 37: 977-988), flower-specific (Kaiser et al. establishment of transgenic plants have been described (1995) Plant Mol. Biol. 28: 231-243), pollen (Baerson et al. including those described in Weissbach and Weissbach, (1994) Plant Mol. Biol. 126: 1947-1959), carpels (Ohl et al. (1989) Methods for Plant Molecular Biology, Academic 65 (1990) Plant Cell 2: 837-848), pollen and ovules (Baerson et Press, and Gelvin et al., (1990) Plant Molecular Biology al. (1993) Plant Mol. Biol. 22: 255-267), auxin-inducible Manual, Kluwer Academic Publishers. Specific examples promoters (such as that described in van der Kop et al. (1999) US 7,598.429 B2 33 34 Plant Mol. Biol. 39:979-990 or Baumann et al. (1999) Plant P23465 (SEQID NO: 222) contained a 35S:G3643 direct Cell 11: 323-334), cytokinin-inducible promoter (Guevara fusion and carries KanR. The construct harbors a cDNA clone Garcia (1998) Plant Mol. Biol. 38: 743-753), promoters of G3643. responsive to gibberellin (Shi et al. (1998) Plant Mol. Biol. P25402 (SEQID NO: 223) contained a 35S:G3650 direct 38: 1053-1060, Willmott et al. (1998) Plant Mol. Biol. 38: 5 fusion and carries KanR. The construct contains a cDNA 817-825) and the like. Additional promoters are those that clone. elicit expression in response to heat (Ainley et al. (1993) The two constructs P5318(SEQID NO: 225; STM::Lex A Plant Mol. Biol. 22: 13-23), light (e.g., the pea rbcS-3A GAL4TA) and P3853 (SEQ ID NO: 224; opLexA::G47) promoter, Kuhlemeier et al. (1989) Plant Cell 1: 471, and the together constitute a two-component system for expression of maize rbcS promoter, Schaffner and Sheen (1991) Plant Cell 10 G47 from the STM promoter. Kanamycin resistant transgenic 3: 997); wounding (e.g., wunI, Siebertz et al. (1989) Plant lines containing P5318 were established (lines #5 and #10), Cell 1:961); pathogens (such as the PR-1 promoter described and these were then supertransformed with the P3853 con in Buchel et al. (1999) Plant Mol. Biol. 40:387-396, and the struct containing a cDNA clone of G47 and a sulfonamide PDF1.2 promoter described in Manners et al. (1998) Plant resistance marker. Mol. Biol. 38: 1071-80), and chemicals such as methyl jas 15 The two constructs P5288 (SEQ ID NO: 226; CUT1: monate or salicylic acid (Gatz et al. (1997) Plant Mol. Biol. LexA-GAL4TA) and P3853 (SEQ ID NO: 224; opLex A: 48: 89-108). In addition, the timing of the expression can be G47) together constitute a two-component system for expres controlled by using promoters such as those acting at Senes sion of G47 from the CUT1 promoter. A kanamycin resistant cence (An and Amazon (1995) Science 270: 1986-1988); or transgenic line containing P5288 was established, and this late seed development (Odellet al. (1994) Plant Physiol. 106: was then supertransformed with the P3853 construct contain 447-458). ing a cDNA clone of G47 and a sulfonamide resistance Plant expression vectors can also include RNA processing marker. signals that can be positioned within, upstream or down The two constructs P5284 (SEQ ID NO: 235; RBCS3: stream of the coding sequence. In addition, the expression LexA-GAL4TA) and P3853 (SEQ ID NO: 224; opLex A: vectors can include additional regulatory sequences from the 25 G47) together constituted a two-component system for 3'-untranslated region of plant genes, e.g., a 3' terminator expression of G47 from the RBCS3 promoter. A kanamycin region to increase mRNA stability of the mRNA, such as the resistant transgenic line containing P5284 was established, PI-II terminator region of potato or the octopine or nopaline and this was then supertransformed with the P3853 construct synthase 3' terminator regions. containing a cDNA clone of G47 and a sulfonamide resis The following represent specific examples of expression 30 tance marker. constructs used to overexpress sequences of the invention. The two constructs P5290 (SEQ ID NO. 234: SUC2: The choice of promoters may include, for example, the con LexA-GAL4TA) and P3853 (SEQ ID NO: 224; opLex A: stitutive CaMV 35S promoter, the STM shoot apical mer G47) together constitute a two-component system for expres istem-specific promoter, the CUT1 epidermal-specific pro sion of G47 from the SUC2 promoter. A kanamycin resistant moter, the LTP1 epidermal-specific promoter, the SUC2 35 transgenic line containing P5290 was established, and this vascular-specific promoter, the RBCS3 leaf-specific pro was then supertransformed with the P3853 construct contain moter, the ARSK1 root-specific promoter, the RD29A stress ing a cDNA clone of G47 and a sulfonamide resistance inducible promoter, the AP1 floral meristem-specific pro marker. moter (SEQ ID NO: 209-217, respectively). Many of these The two constructs P5311 (SEQ ID NO: 236; ARSK1: examples have been used to produce transgenic plants. These 40 LexA-GAL4TA) and P3853 (SEQ ID NO: 224; opLex A: or other inducible or tissue-specific promoters may be incor G47) together constitute a two-component system for expres porated into an expression vector comprising a transcription sion of G47 from the ARSK1 promoter. A kanamycin factor polynucleotide of the invention, where the promoter is resistant transgenic line containing P5311 was established, operably linked to the transcription factor polynucleotide, can and this was then supertransformed with the P3853 construct be envisioned and produced. 45 containing a cDNA clone of G47 and a sulfonamide resis P894 (SEQ ID NO: 218) contained a 35S:G47 direct tance marker. fusion and carries KanR. The construct contains a G47 cDNA The two constructs P9002 (SEQ ID NO. 237; RD29A: clone. LexA-GAL4TA) and P3853 (SEQ ID NO: 224; opLex A: An alternative means of overexpressing G47 makes use of 50 G47) together constitute a two-component system for expres the two constructs P6506 (SEQ ID NO. 233: 35S::LexA sion of G47 from the RD29A promoter. A kanamycin GAL4TA) and P3853 (SEQ ID NO: 224; opLex A::G47), resistant transgenic line (#5) containing P9002 was estab which together constituted a two-component system for lished, and this was then supertransformed with the P3853 expression of G47 from the 35S promoter. A kanamycin construct containing a cDNA clone of G47 and a sulfonamide resistant transgenic line containing P6506 was established, resistance marker. and this was then supertransformed with the P3853 construct The two constructs P5326 (SEQID NO: 238; AP1::Lex A containing a cDNA clone of G47 and a sulfonamide resis GAL4TA) and P3853 (SEQ ID NO: 224; opLexA::G47) tance marker. together constitute a two-component system for expression of P1572 (SEQID NO: 219) comprised a 35S:G2133 direct G47 from the AP1 promoter. A kanamycin resistant trans promoter fusion and carries KanR. The construct contains a 60 genic line containing P5326 was established, and this was cDNA clone of G2133 then supertransformed with the P3853 construct containing a P23456 (SEQID NO: 220) contained a 35S:G3649 direct cDNA clone of G47 and a sulfonamide resistance marker. promoter fusion and carries KanR. The construct contains a P25186 (SEQ ID NO: 239) contains a 35S:GAL4-G47 cDNA clone of G3649. fusion and carries KanR (addition to the G47 protein of a P23455 (SEQID NO: 221) contained a 35S:G3644 direct 65 strong transcription activation domain from the yeast GAL4 promoter fusion and carries KanR. The construct contains a gene). SEQ ID NO: 240 is the predicted polypeptide that cDNA clone of G3644. results expression of the vector comprising SEQID NO: 239. US 7,598.429 B2 35 36 P25279 (SEQID NO. 241) carries a 35S:G47-GFP fusion Agrobacterium tumefaciens, and a portion is stably integrated directly fused to the 35S promoter and a KanR marker. SEQ into the plant genome (Horsch et al. (1984) Science 233: ID NO: 242 is the predicted polypeptide that results expres 496-498; Fraley et al. (1983) Proc. Natl. Acad. Sci. USA 80: sion of the vector comprising SEQID NO: 239. 4803-4807). Similar to constructs made with G47, other vectors may be 5 The cell can include a nucleic acid of the invention which produced that incorporate a promoter and other transcription encodes a polypeptide, wherein the cells expresses a polypep factor polynucleotide combination. For example, the two tide of the invention. The cell can also include vector constructs P9002 (SEQ ID NO. 237; RD29A::LexA sequences, or the like. Furthermore, cells and transgenic GAL4TA) and P4361 (SEQ ID NO: 227; opLex A::G2133) plants that include any polypeptide or nucleic acid above or together constitute a two-component system for expression of 10 throughout this specification, e.g., produced by transduction G2133 from the RD29A promoter. A kanamycin resistant of a vector of the invention, are an additional feature of the transgenic line containing P9002 was established, and this invention. was then supertransformed with the P4361 construct contain For long-term, high-yield production of recombinant pro ing a cDNA clone of G2133 and a sulfonamide resistance teins, stable expression can be used. Host cells transformed marker. 15 with a nucleotide sequence encoding a polypeptide of the invention are optionally cultured under conditions Suitable Additional Expression Elements for the expression and recovery of the encoded protein from Specific initiation signals can aid in efficient translation of cell culture. The protein or fragment thereof produced by a coding sequences. These signals can include, e.g., the ATG recombinant cell may be secreted, membrane-bound, or con initiation codon and adjacent sequences. In cases where a tained intracellularly, depending on the sequence and/or the coding sequence, its initiation codon and upstream sequences vector used. As will be understood by those of skill in the art, are inserted into the appropriate expression vector, no addi expression vectors containing polynucleotides encoding tional translational control signals may be needed. However, in cases where only coding sequence (e.g., a mature protein mature proteins of the invention can be designed with signal coding sequence), or a portion thereof, is inserted, exogenous sequences which direct secretion of the mature polypeptides transcriptional control signals including the ATG initiation 25 through a prokaryotic or eukaryotic cell membrane. codon can be separately provided. The initiation codon is Modified Amino Acid Residues provided in the correct reading frame to facilitate transcrip Polypeptides of the invention may contain one or more tion. Exogenous transcriptional elements and initiation modified amino acid residues. The presence of modified codons can be of various origins, both natural and synthetic. amino acids may be advantageous in, for example, increasing The efficiency of expression can be enhanced by the inclusion 30 polypeptide half-life, reducing polypeptide antigenicity or of enhancers appropriate to the cell system in use. toxicity, increasing polypeptide storage stability, or the like. Amino acid residue(s) are modified, for example, co-transla Expression Hosts tionally or post-translationally during recombinant produc The present invention also relates to host cells which are tion or modified by synthetic or chemical means. transduced with vectors of the invention, and the production 35 Non-limiting examples of a modified amino acid residue of polypeptides of the invention (including fragments include incorporation or other use of acetylated amino acids, thereof) by recombinant techniques. Host cells are geneti glycosylated amino acids, Sulfated amino acids, prenylated cally engineered (i.e., nucleic acids are introduced, e.g., trans (e.g., farnesylated, geranylgeranylated) amino acids, PEG duced, transformed or transfected) with the vectors of this modified (e.g., “PEGylated') amino acids, biotinylated invention, which may be, for example, a cloning vector or an 40 amino acids, carboxylated amino acids, phosphorylated expression vector comprising the relevant nucleic acids amino acids, etc. References adequate to guide one of skill in herein. The vector is optionally a plasmid, a viral particle, a the modification of amino acid residues are replete through phage, a naked nucleic acid, etc. The engineered host cells can out the literature. be cultured in conventional nutrient media modified as appro The modified amino acid residues may prevent or increase priate for activating promoters, selecting transformants, or 45 affinity of the polypeptide for another molecule, including, amplifying the relevant gene. The culture conditions, such as but not limited to, polynucleotide, proteins, carbohydrates, temperature, pH and the like, are those previously used with lipids and lipid derivatives, and other organic or synthetic the host cell selected for expression, and will be apparent to compounds. those skilled in the art and in the references cited herein, including, Sambrook and Ausubel. 50 Identification of Additional Factors The host cell can be a eukaryotic cell. Such as a yeast cell, A transcription factor provided by the present invention or a plant cell, or the host cell can be a prokaryotic cell. Such can also be used to identify additional endogenous or exog as a bacterial cell. Plant protoplasts are also suitable for some enous molecules that can affect a phenotype or trait of inter applications. For example, the DNA fragments are introduced est. On the one hand, such molecules include organic (Small into plant tissues, cultured plant cells or plant protoplasts by 55 or large molecules) and/or inorganic compounds that affect standard methods including electroporation (Fromm et al. expression of (i.e., regulate) a particular transcription factor. (1985) Proc. Natl. Acad. Sci. USA 82: 5824-5828, infection Alternatively, Such molecules include endogenous molecules by viral vectors such as cauliflower mosaic virus (CaMV) that are acted upon either at a transcriptional level by a tran (Hohnet al. (1982) Molecular Biology of Plant Tumors, (Aca Scription factor of the invention to modify a phenotype as demic Press, New York) pp. 549-560; U.S. Pat. No. 4,407, 60 desired. For example, the transcription factors can be 956), high velocity ballistic penetration by small particles employed to identify one or more downstream gene with with the nucleic acid either within the matrix of small beads or which is subject to a regulatory effect of the transcription particles, or on the surface (Klein et al., (1987) Nature 327: factor. In one approach, a transcription factor or transcription 70-73), use of pollen as vector (WO 85/01856), or use of factor homologue of the invention is expressed in a host cell, Agrobacterium tumefaciens or A. rhizogenes carrying a 65 e.g., a transgenic plant cell, tissue or explant, and expression T-DNA plasmid in which DNA fragments are cloned. The products, either RNA or protein, of likely or random targets T-DNA plasmid is transmitted to plant cells upon infection by are monitored, e.g., by hybridization to a microarray of US 7,598.429 B2 37 38 nucleic acid probes corresponding to genes expressed in a monitoring changes in mRNA expression. These techniques tissue or cell type of interest, by two-dimensional gel electro are exemplified in Ausubel et al. (eds) Current Protocols in phoresis of protein products, or by any other method known in Molecular Biology, John Wiley & Sons (1998, and supple the art for assessing expression of gene products at the level of ments through 2001). Such changes in the expression levels RNA or protein. Alternatively, a transcription factor of the 5 can be correlated with modified plant traits and thus identified invention can be used to identify promoter sequences (i.e., molecules can be useful for soaking or spraying on fruit, binding sites) involved in the regulation of a downstream Vegetable and grain crops to modify traits in plants. target. After identifying a promoter sequence, interactions Essentially any available composition can be tested for between the transcription factor and the promoter sequence modulatory activity of expression or activity of any nucleic can be modified by changing specific nucleotides in the pro 10 acid or polypeptide herein. Thus, available libraries of com moter sequence or specific amino acids in the transcription pounds Such as chemicals, polypeptides, nucleic acids and the factor that interact with the promoter sequence to alter a plant like can be tested for modulatory activity. Often, potential trait. Typically, transcription factor DNA-binding sites are modulator compounds can be dissolved in aqueous or organic identified by gel shift assays. After identifying the promoter (e.g., DMSO-based) solutions for easy delivery to the cell or regions, the promoter region sequences can be employed in 15 plant of interest in which the activity of the modulator is to be double-stranded DNA arrays to identify molecules that affect tested. Optionally, the assays are designed to screen large the interactions of the transcription factors with their promot modulator composition libraries by automating the assay ers (Bulyket al. (1999) Nature Biotechnol. 17: 573-577). steps and providing compounds from any convenient source The identified transcription factors are also useful to iden to assays, which are typically run in parallel (e.g., in micro tify proteins that modify the activity of the transcription fac titer formats on microtiter plates in robotic assays). tor. Such modification can occur by covalent modification, In one embodiment, high throughput Screening methods Such as by phosphorylation, or by protein-protein (homo or involve providing a combinatorial library containing a large -heteropolymer) interactions. Any method suitable for detect number of potential compounds (potential modulator com ing protein-protein interactions can be employed. Among the pounds). Such “combinatorial chemical libraries' are then methods that can be employed are co-immunoprecipitation, 25 screened in one or more assays, as described herein, to iden cross-linking and co-purification through gradients or chro tify those library members (particular chemical species or matographic columns, and the two-hybrid yeast system. Subclasses) that display a desired characteristic activity. The The two-hybrid system detects protein interactions in vivo compounds thus identified can serve as target compounds. and is described in Chien et al. (1991), Proc. Natl. Acad. Sci. A combinatorial chemical library can be, e.g., a collection USA 88: 9578-9582) and is commercially available from 30 of diverse chemical compounds generated by chemical Syn Clontech (Palo Alto, Calif.). In such a system, plasmids are thesis or biological synthesis. For example, a combinatorial constructed that encode two hybrid proteins: one consists of chemical library such as a polypeptide library is formed by the DNA-binding domain of a transcription activator protein combining a set of chemical building blocks (e.g., in one fused to the TF polypeptide and the other consists of the example, amino acids) in every possible way for a given transcription activator protein’s activation domainfused to an 35 compound length (i.e., the number of amino acids in a unknown protein that is encoded by a cDNA that has been polypeptide compound of a set length). Exemplary libraries recombined into the plasmid as part of a cDNA library. The include peptide libraries, nucleic acid libraries, antibody DNA-binding domain fusion plasmid and the cDNA library libraries (see, e.g., Vaughn et al. (1996) Nature Biotechnol., are transformed into a strain of the yeast Saccharomyces 14: 309-314 and PCT/US96/10287), carbohydrate libraries cerevisiae that contains a reporter gene (e.g., lac7) whose 40 (see, e.g., Liang et al. Science (1996) 274: 1520-1522 and regulatory region contains the transcription activator's bind U.S. Pat. No. 5,593.853), peptide nucleic acid libraries (see, ing site. Either hybrid protein alone cannot activate transcrip e.g., U.S. Pat. No. 5,539,083), and small organic molecule tion of the reporter gene. Interaction of the two hybrid pro libraries (see, e.g., benzodiazepines, Baum Chem. Eng. News teins reconstitutes the functional activator protein and results January 18, page 33 (1993); isoprenoids, U.S. Pat. No. 5,569, in expression of the reporter gene, which is detected by an 45 588; thiazolidinones and metathiazanones, U.S. Pat. No. assay for the reporter gene product. Then, the library plasmids 5,549,974; pyrrolidines, U.S. Pat. Nos. 5,525,735 and 5,519, responsible for reporter gene expression are isolated and 134; morpholino compounds, U.S. Pat. No. 5,506,337) and sequenced to identify the proteins encoded by the library the like. plasmids. After identifying proteins that interact with the Preparation and screening of combinatorial or other librar transcription factors, assays for compounds that interfere 50 ies is well known to those of skill in the art. Such combina with the TF protein-protein interactions can be preformed. torial chemical libraries include, but are not limited to, pep tide libraries (see, e.g., U.S. Pat. No. 5,010, 175: Furka (1991) Identification of Modulators Int. J. Pept. Prot. Res. 37: 487-493; and Houghton et al. In addition to the intracellular molecules described above, (1991) Nature 354: 84-88). Other chemistries for generating extracellular molecules that alter activity or expression of a 55 chemical diversity libraries can also be used. transcription factor, either directly or indirectly, can be iden In addition, as noted, compound screening equipment for tified. For example, the methods can entail first placing a high-throughput screening is generally available, e.g., using candidate molecule in contact with a plant or plant cell. The any of a number of well known robotic systems that have also molecule can be introduced by topical administration, such as been developed for Solution phase chemistries useful in assay spraying or soaking of a plant, and then the molecule's effect 60 systems. These systems include automated workstations on the expression or activity of the TF polypeptide or the including an automated synthesis apparatus and robotic sys expression of the polynucleotide monitored. Changes in the tems utilizing robotic arms. Any of the above devices are expression of the TF polypeptide can be monitored by use of Suitable for use with the present invention, e.g., for high polyclonal or monoclonal antibodies, gel electrophoresis or throughput Screening of potential modulators. The nature and the like. Changes in the expression of the corresponding 65 implementation of modifications to these devices (if any) so polynucleotide sequence can be detected by use of microar that they can operate as discussed herein will be apparent to rays, Northerns, quantitative PCR, or any other technique for persons skilled in the relevant art. US 7,598.429 B2 39 40 Indeed, entire high throughput Screening systems are com useful as nucleic acid probes and primers. An oligonucleotide mercially available. These systems typically automate entire suitable for use as a probe or primer is at least about 15 procedures including all sample and reagent pipetting, liquid nucleotides in length, more often at least about 18 nucle dispensing, timed incubations, and final readings of the otides, often at least about 21 nucleotides, frequently at least microplate in detector(s) appropriate for the assay. These about 30 nucleotides, or about 40 nucleotides, or more in configurable systems provide high throughput and rapid start length. A nucleic acid probe is useful in hybridization proto up as well as a high degree of flexibility and customization. cols, e.g., to identify additional polypeptide homologues of Similarly, microfluidic implementations of screening are also the invention, including protocols for microarray experi commercially available. ments. Primers can be annealed to a complementary target The manufacturers of Such systems provide detailed pro 10 DNA strand by nucleic acid hybridization to form a hybrid tocols the various high throughput. Thus, for example, between the primer and the target DNA strand, and then Zymark Corp. provides technical bulletins describing screen extended along the target DNA strand by a DNA polymerase ing systems for detecting the modulation of gene transcrip enzyme. Primer pairs can be used for amplification of a tion, ligand binding, and the like. The integrated systems nucleic acid sequence, e.g., by the polymerase chain reaction herein, in addition to providing for sequence alignment and, 15 (PCR) or other nucleic-acid amplification methods. See Sam optionally, synthesis of relevant nucleic acids, can include brook and Ausubel, Supra. Such screening apparatus to identify modulators that have an In addition, the invention includes an isolated or recombi effect on one or more polynucleotides or polypeptides nant polypeptide including a Subsequence of at least about 15 according to the present invention. contiguous amino acids encoded by the recombinant or iso In some assays it is desirable to have positive controls to lated polynucleotides of the invention. For example, such ensure that the components of the assays are working prop polypeptides, or domains or fragments thereof, can be used as erly. At least two types of positive controls are appropriate. immunogens, e.g., to produce antibodies specific for the That is, known transcriptional activators or inhibitors can be polypeptide sequence, or as probes for detecting a sequence incubated with cells/plants/etc. in one sample of the assay, of interest. A Subsequence can range in size from about 15 and the resulting increase? decrease in transcription can be 25 amino acids in length up to and including the full length of the detected by measuring the resulting increase in RNA/protein polypeptide. expression, etc., according to the methods herein. It will be To be encompassed by the present invention, an expressed appreciated that modulators can also be combined with tran polypeptide which comprises such a polypeptide Subse scriptional activators or inhibitors to find modulators that quence performs at least one biological function of the intact inhibit transcriptional activation or transcriptional repression. 30 polypeptide in Substantially the same manner, or to a similar Either expression of the nucleic acids and proteins herein or extent, as does the intact polypeptide. For example, a any additional nucleic acids or proteins activated by the polypeptide fragment can comprise a recognizable structural nucleic acids or proteins herein, or both, can be monitored. motif or functional domain Such as a DNA binding domain In an embodiment, the invention provides a method for that binds to a specific DNA promoter region, an activation identifying compositions that modulate the activity or expres 35 domain or a domain for protein-protein interactions. sion of a polynucleotide or polypeptide of the invention. For example, a test compound, whether a small or large molecule, Production of Transgenic Plants is placed in contact with a cell, plant (or plant tissue or Modification of Traits explant), or composition comprising the polynucleotide or The polynucleotides of the invention are favorably polypeptide of interestanda resulting effect on the cell, plant, 40 employed to produce transgenic plants with various traits, or (or tissue or explant) or composition is evaluated by monitor characteristics, that have been modified in a desirable man ing, either directly or indirectly, one or more of expression ner, e.g., to improve the seed characteristics of a plant. For level of the polynucleotide or polypeptide, activity (or modu example, alteration of expression levels or patterns (e.g., spa lation of the activity) of the polynucleotide or polypeptide. In tial or temporal expression patterns) of one or more of the Some cases, an alteration in a plant phenotype can be detected 45 transcription factors (or transcription factor homologues) of following contact of a plant (or plant cell, or tissue or explant) the invention, as compared with the levels of the same protein with the putative modulator, e.g., by modulation of expres found in a wild type plant, can be used to modify a plants sion or activity of a polynucleotide or polypeptide of the traits. An illustrative example of trait modification, improved invention. Modulation of expression or activity of a poly characteristics, by altering expression levels of a particular nucleotide or polypeptide of the invention may also be caused 50 transcription factor is described further in the Examples and by molecular elements in a signal transduction second mes the Sequence Listing. senger pathway and Such modulation can affect similar ele ments in the same or another signal transduction second mes Arabidopsis as a Model System senger pathway. Arabidopsis thaliana is the object of rapidly growing atten 55 tion as a model for genetics and metabolism in plants. Ara Subsequences bidopsis has a small genome, and well documented Studies Also contemplated are uses of polynucleotides, also are available. It is easy to grow in large numbers and mutants referred to herein as oligonucleotides, typically having at defining important genetically controlled mechanisms are least 12 bases, preferably at least 15, more preferably at least either available, or can readily be obtained. Various methods 20, 30, or 50 bases, which hybridize under at least highly 60 to introduce and express isolated homologous genes are avail stringent (or ultra-high Stringent or ultra-ultra-high Stringent able (see Koncz, et al., eds. Methods in Arabidopsis Research. conditions) conditions to a polynucleotide sequence et al. (1992), World Scientific, New Jersey, N.J., in “Pref. described above. The polynucleotides may be used as probes, ace'). Because of its small size, short life cycle, obligate primers, sense and antisense agents, and the like, according to autogamy and high fertility, Arabidopsis is also a choice methods as noted Supra. 65 organism for the isolation of mutants and studies in morpho Subsequences of the polynucleotides of the invention, genetic and development pathways, and control of these path including polynucleotide fragments and oligonucleotides are ways by transcription factors (Koncz. Supra, p. 72). A number US 7,598.429 B2 41 42 of studies introducing transcription factors into A. thaliana ods include introducing into a planta recombinant expression have demonstrated the utility of this plant for understanding vector or cassette comprising a functional promoter operably the mechanisms of gene regulation and trait alteration in linked to one or more sequences homologous to presently plants. See, for example, Koncz, supra, and U.S. Pat. No. disclosed sequences. Plants and kits for producing these 6,417,428). plants that result from the application of these methods are Arabidopsis Genes in Transgenic Plants. also encompassed by the present invention. Expression of genes which encode transcription factors Traits of Interest modify expression of endogenous genes, polynucleotides, Examples of some of the traits that may be desirable in and proteins are well known in the art. In addition, transgenic plants, and that may be provided by transforming the plants plants comprising isolated polynucleotides encoding tran 10 with the presently disclosed sequences, are listed in Tables 4 Scription factors may also modify expression of endogenous and 6. genes, polynucleotides, and proteins. Examples include Peng The first column of Table 4 shows the polynucleotide SEQ et al. (1997) Genes Develop. 11:3194-3205) and Peng et al. ID NO; the second column shows the Mendel Gene ID No., (1999) Nature 400: 256-261). In addition, many others have GID; the third column shows the transcription factor family to demonstrated that an Arabidopsis transcription factor 15 which the polynucleotide belongs; the fourth column shows expressed in an exogenous plant species elicits the same or the category of the trait; the fifth column shows the trait(s) very similar phenotypic response. See, for example, Fu et al. resulting from the knock out or overexpression of the poly (2001) Plant Cell 13: 1791-1802): Nandi et al. (2000) Curr: nucleotide in the transgenic plant; the sixth column (“Com Biol. 10: 215-218); Coupland (1995) Nature 377: 482-483); ment'), includes specific effects and utilities conferred by the polynucleotide of the first column; the seventh column shows and Weigel and Nilsson (1995, Nature 377: 482-500). the SEQID NO of the polypeptide encoded by the polynucle Homologous Genes Introduced into Transgenic Plants. otide; and the eighth column shows the amino acid residue Homologous genes that may be derived from any plant, or positions of the conserved domain in amino acid (AA) co from any source whether natural, synthetic, semi-synthetic or ordinates. recombinant, and that share significant sequence identity or 25 The first column (Col. 1) of Table 4 lists the SEQID NO: of similarity to those provided by the present invention, may be presently disclosed polynucleotide sequences. The second introduced into plants, for example, crop plants, to confer column lists the corresponding GID number. The third col desirable or improved traits. Consequently, transgenic plants umn shows the transcription factor family in which each of may be produced that comprise a recombinant expression the respective sequences is found. The fourth column lists the vector or cassette with a promoter operably linked to one or 30 conserved domains in amino acid coordinates of the respec more sequences homologous to presently disclosed tive encoded polypeptide sequences. The fifth and sixth col sequences. The promoter may be, for example, a plant or viral umns list the trait category and specific traits observed for promoter. plants overexpressing the respective sequences (except where The invention thus provides for methods for preparing noted as “KO in Col. 2 for plants in which the respective transgenic plants, and for modifying plant traits. These meth sequence was knocked out).

TABLE 4 Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) 1 G1272 PAZ 800-837 Seed glucosinolates Decrease in seed glucosinolate M39497 3 G1506 GATAZn 7-33 Seed glucosinolates Increase in glucosinolates M395.02 and M39498 5 G1897 Z-DOf 34-62 Seed glucosinolates Increase in seed glucosinolates M39491 and M394.93 7 G1946 HS 37-128 Seed glucosinolates Increase in seed glucosinolate M395.01 Increased tolerance to phosphate-free media 9 G2113 AP2 SS-122 Seed glucosinolates Decrease in seed glucosinolate M39497, increase of glucosinolates M39501, M394.94 and M39478 11 G2117 bZIP 46-106 Seed glucosinolates Decrease in M394.96 13 G21 SS AT-hook 18-38 Seed glucosinolates Increase in M39497 Plant size Large plant size 1S G2290 WRKY 147-205 Seed glucosinolates Increase in M39496 17 G2340 MYB 14-120 Seed glucosinolates Altered glucosinolate (R1)R2R3 profile 21 G353 Z-C2H2 41-61, 84-104 Seed glucosinolates Increase in M394.94 23 G484 CAAT 11-104 Seed glucosinolates Altered glucosinolate (KO) profile 2S G674 MYB 20-120 Seed glucosinolates Increase in M395.01 US 7,598.429 B2 43 44

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) 27 G1052 bZIP 201-261 Seed prenyl lipids Decrease in lutein and increase in Xanthophyll 1 29 G1328 14-119 Seed prenyl lipids Decreased seed lutein 31 G1930 59-124, Seed prenyl lipids increased chlorophyll a 179-273 and b content C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 33 G214 MYB 25-71 Seed prenyl lipids; increased seed lutein; related leaf fatty acids; increased leaf fatty acids; prenyl lipids increased chlorophyll, carotenoids Plant size Larger biomass (increased leaf number and size Prenyl lipids Darker green in vegetative and reproductive tissues due o a higher chlorophyll content in the later stages of development 35 G2509 AP2 89-156 Seed prenyl lipids increase in C-tocopherol 37 G252O HLHAMYC 139-197 Seed prenyl lipids; increase in seed 8 leaf glucosinolates ocopherol and decrease in seed Y-tocopherol. increase in M39478 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 39 G259 HS 40-131 Seed prenyl lipids increase in C-tocopherol 41 G490 CAAT 48-143 Seed prenyl lipids increase in seed 8 ocopherol 43 G652 Z-CLDSH 28-49, 137-151, Seed prenyl lipids; increase in C-tocopherol; 182-196 leaf glucosinolates increase in M394.80 45 G748 112-140 Seed prenyl lipids increased lutein content 47 G883 245-302 Seed prenyl lipids Decreased seed lutein 49 G20 68-144 Seed sterols increase in campesterol 51 G974 80-147 Seed oil content Altered seed oil content S3 G2343 14-116 Seed oil content increased seed oil content

55 G1777 124-247 Seed oil and protein increased seed oil content content and decreased seed protein 57 G229 14-120 Biochemistry: other Up-regulation of genes involved in secondary metabolism; Genes coding for enzymes involved in alkaloid biosynthesis including indole-3-glycerol phosphatase and strictosidine synthase were induced; genes for enzymes involved in aromatic amino acid biosynthesis were also up-regulated including tryptophan synthase and tyrosine transaminase. Phenylalanine ammonia yase, chalcone synthase and trans-cinnamate mono-oxygenase, involved in phenylpropenoid biosynthesis, were also induced S9 G663 9-111 Biochemistry: other increased anthocyanins in eaf, root, seed 61 G362 62-82 Biochemistry: other increased trichome density and trichome products; increased US 7,598.429 B2 45 46

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) anthocyanins in various issues 63 G21 OS TH 100-153 Biochemistry: other increased trichome density and trichome products 6S G47 AP2 11-8O Flowering Time increased lignin content Biochemistry: other increased cold tolerance Abiotic stress increased drought tolerance olerance increased desiccation olerance increased salt tolerance Late flowering Dark green increased leaf size, larger rosettes and/or increased amount of vegetative issue 67 G2123 GF14 99-109 Biochemistry: other Putative 14-3-3 protein 69 G1266 AP2 79-147 Leaf fatty acids, Changes in leaf fatty insoluble Sugars; acids, insoluble Sugars, decreased sensitivity to ABA C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 71 G1337 Z-CO-like 9-75 Leaf fatty acids increase in the amount of oleic aci Sugar sensing Decreased tolerance to SUCOS 73 G1399 AT-hook 86-93 Leaf fatty acids increase of the percentage of the 16:0 fatty acid 7S G146S NAC 242-306 Leaf fatty acids increases in the percentages of 16:0, 16:1, 8:0 and 18:2 and decreases in 16:3 and 8:3 fatty acids 77 G1512 RINGi C3HC4 39-93 Leaf fatty acids increase in 18:2 fatty acids 79 G1537 HB 14-74 Leaf fatty acids Altered leaf fatty acid composition 81 G2136 MADS 43-100 Leaf fatty acids Decrease in 18:3 fatty acid 83 G2147 HLHAMYC 163-220 Leaf fatty acids increase in 16:0, increase in 18:2 fatty acids 85 G377 RINGi C3H2C3 85-128 Leaf fatty acids increased 18:2 and decreased 18:3 leaf fatty acids 87 G962 NAC 53-175 Leaf fatty acids increased 16:0 and decreased 18:3 leaf fatty acids 89 G975 AP2 4-71 Leaf fatty acids increased wax in leaves increased C29, C31, and C33 alkanes increased up o 10-fold compared to control plants; More drought toleran han controls C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 91 G987 SCR 395-462, Leaf fatty acids; leaf Reduction in 16:3 fatty 525-613, prenyl lipids acids; altered chlorophyll, 1027-1102, ocopherol, carotenoid 1162-1255 93 G1069 AT-hook 67-74 Leaf and seed Altered leaf glucosinolate glucosinolates composition increased seed glucosinolate M39497 US 7,598.429 B2 47 48

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s) Increase 16:0 fatty acid, decrease 18:2 fatty acids, decrease sensitivity to ABA C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 95 G1198 bZIP 173-223 Leaf glucosinolates increase in M39481 97 G1322 MYB 26-130 Leaf glucosinolates increase in M394.80 (R1)R2R3 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 99 G1421 AP2 74-151 Leaf glucosinolates increased leaf content of glucosinolate M39482 101 G1794 AP2 182-249 Leaf glucosinolates increased leaf content of glucosinolate M394.80 103 G2144 HLHAMYC 207-265 Leaf glucosinolates increased leaf content of glucosinolate M394.80 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 105 G2S12 AP2 79-147 Leaf glucosinolates increased leaf content of glucosinolate M39481 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay O7 G2552 HLHAMYC 124-181 Leaf glucosinolates increased leaf content of glucosinolate M394.80 O9 G264 HS 23-114 Leaf glucosinolates increased leaf content of glucosinolate M39481 11 G681 MYB 14-120 Leaf glucosinolates increased leaf content of (R1)R2R3 glucosinolate M394.80 13 G1012 WRKY 30-86 Leaf insoluble Sugars Decreased rhamnose 1S G1309 MYB 9-114 Leaf insoluble Sugars Ce3S(C80SE (R1)R2R3 17 G158 MADS 2-57 Leaf insoluble Sugars Ce3St. rhamnose 19 G1641 MYB 32-82, 141-189 Leaf insoluble Sugars Ce3St. rhamnose related 21 G1865 GRF-like 45-162 Leaf insoluble Sugars Ce3St. galactose, decrease xylose 23 G2094 GATAZn 43-68 Leaf insoluble Sugars increase in arabinose 2S G211 MYB 24-137 Leaf insoluble Sugars increase in xylose (R1)R2R3 27 G242 MYB 6-105 Leaf insoluble Sugars increased arabinose (R1)R2R3 29 G2S89 MADS 1-57 Leaf insoluble Sugars increase in arabinose 31 G274 AKR 94-600 insoluble Sugars increased leafarabinose 33 G598 DBP 205-263 LC8 insoluble Sugars Altered insoluble Sugars; (increased galactose evels) 3S G1543 HB 135-195 Leaf prenyl lipids increase in chlorophyll a and b increased biomass 37 G280 AT-hook 97-104, LC8 prenyl lipids increased 8- and Y 130-137-155-162, ocopherol 185-192 39 G2131 AP2 50-121, Leafsterols increase in campesterol 146-217 C/N sensing increased tolerance to ow nitrogen conditions in C/N sensing assay 41 G2424 MYB 107-219 Leafsterols increase in stigmaStanol (R1)R2R3 43 G2583 AP2 4-71 Le3 W8X Glossy leaves, increased Flowering time epicuticular wax content or altered composition Late developing, late flowering time 147 G977 AP2 5-72 Leaf wax Altered epicuticular wax content or composition US 7,598.429 B2 49 50

TABLE 4-continued Sequences of the invention and the traits they confer in plants Col. 1 Col. 4 SEQID Col. 2 Col. 3 Conserved Col. 5 Col. 6 NO: GID No. Family domains Trait Category Observed trait(s)

151 G2133 AP2 11-82 Flowering Time Ce3St. cold tolerance Biochemistry: other Ce3St. drought Abiotic stress olerance tolerance Ce3St. desiccation olerance increased salt tolerance Late flowering Dark green increased leaf size and/or arger rosette increased seed size 157 G3643 AP2 14-79 Flowering Time increased cold tolerance Biochemistry: other increased drought Abiotic stress olerance tolerance Ce3St. desiccation olerance increased heat tolerance Late flowering Dark green Larger plants 1SS G3644 AP2 SS-102 Flowering Time increased salt tolerance Biochemistry: other Late flowering Abiotic stress Dark green tolerance Large seedlings Large rosettes with long, broad leaves 153 G3649 AP2 18-61 Flowering Time increased cold tolerance Biochemistry: other increased drought Abiotic stress olerance tolerance increased desiccation olerance Decreased heat tolerance Late flowering Dark green Larger rosettes Large cauline leaves 145 G1387 AP2 4-68 Few lines of overexpressors have been produced or examined 149 G4294 AP2 5-72 Overexpressors not yet produced or examined

Abbreviations: KO—knockout

Table 5 lists a Summary of orthologous and homologous 45 filtered for sequences from all plants except Arabidopsis sequences identified using BLAST (tblastX program). The thaliana by selecting all entries in the NCBI GenBank data first column shows the polynucleotide sequence identifier base associated with NCBI taxonomic ID 33090 (Viridiplan (SEQ ID NO), the second column shows the corresponding tae; all plants) and excluding entries associated with taxo 50 nomic ID 3701 (Arabidopsis thaliana). These sequences are cDNA identifier (Gene ID), the third column shows the compared to those listed in the Sequence Listing, using the orthologous or homologous polynucleotide GenBank Acces Washington University TBLASTX algorithm (version sion Number (Test Sequence ID), the fourth column shows 2.0a19MP) at the default settings using gapped alignments the calculated probability value that the sequence identity is with the filter “off”. For each sequence listed in the Sequence due to chance (Smallest Sum Probability), the fifth column 55 Listing, individual comparisons were ordered by probability score (P-value), where the score reflects the probability that a shows the plant species from which the test sequence was particular alignment occurred by chance. For example, a isolated (Test Sequence Species), and the sixth column shows score of 3.6e-40 is 3.6x10'. In addition to P-values, com the orthologous or homologous test sequence GenBank anno 60 parisons were also scored by percentage identity. Percentage tation (Test Sequence GenBank Annotation). identity reflects the degree to which two segments of DNA or protein are identical over a particular length. The identified homologous polynucleotide and polypeptide sequences and Of the identified sequences homologous to the Arabidopsis homologs of the Arabidopsis polynucleotides and polypep sequences provided in Table 5, the percent sequence identity 65 tides may be orthologs of the Arabidopsis polynucleotides among these sequences can be as low as 47%, or even lower and polypeptides and/or closely, phylogenetically-related sequence identity. The entire NCBI GenBank database was Sequences. US 7,598.429 B2 51 52

TABLE 5 Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation 19 G671 G2340, 7 1.OE-103 Arabidopsis thaliana BG269414 G2340, 1.60 E-45 Mesembryanthemum L0-3478T3 Ice plant crystallinum Lambda Un BG448527 G2340, 5.30 E-41 Medicago truncatula NFO36FO4RT1F1032 Developing root Medica AIf30649 G2340, 1.1OE Gossypium hirsutium BNLGHTS95 Six day Cotton fiber Gossypiu G2340, 1.2OE Glycine max skó4f05.y1 Gm c1016 Glycine max cDNA clone GENO PHMYBPH31 G2340, Petunia X hybrida P hybrida myb.Ph3 gene encoding protein A491024 G2340, 4.1OE Lycopersicon EST241733 tomato escientiin shoot, Cornell Lyc AMMIXTA G2340, Antirrhinum maius A. maints mixta mRNA. OSMYB1355 G2340, Oryza sativa O. Saiiva mRNA for myb factor, 1355 bp. BE4953OO G2340, Secale cereale WHE1268 FO2 KO4ZS Secale cereale anther cDNA BG300704 G2340, Hordeum vulgare HVSMEbOO18BO3f Hordeum vulgare Seedling sho gi2605617 G2340, 1...SO Oryza sativa OSMYB1. gi2O563 G2340, 7.30 Petunia X hybrida protein 1. gi485867 G2340, 4.OO Antirrhinum maius mixta. gi437327 G2340, 2.OO Gossypium hirsutium MYBA; putative. gi19051 G2340, 3.10 Hordeum vulgare MybHv1. gi227030 G2340, 3.10 Hordeum vulgare myb-related gene war. distichum Hw1. gi1101770 G2340, 7 3 8 Picea mariana MYB-like transcriptional factor MBF1. gi1430846 G2340, 6.30 Lycopersicon myb-related escientiin transcription factor. gi5139814 G2340, 2.50 3 5 Glycine max GMYB29B2. gió651292 G2340, 1.70 3 4 Pimpinella myb-related brachycarpa transcription factor. 145 G1387 G2583, 43 6.OO Arabidopsis thaliana 89 G975 G2583, 43 3.00 Arabidopsis thaliana 149 G4294 G2583, 43 2.OO Oryza sativa AW92846S G2583, 43 140 43 Lycopersicon EST337253 tonato escientiin flower buds 8 mm tSm80e10.y1 BEO23297 G2583, 43 42 Glycine max Gm c1015 Glycine max cDNA clone GENO APOO3615 G2583, 43 Oryza sativa chromosome 6 clone P0486H12, *** SEQUENCING IN AUO88998 G2583, 43 2.90E Lotus japonicus AUO88998 Lois japonicus flower bud cDNA Lo ATOO1828 G2583, 43 Brassica rapa Subsp. AT001828 Flower pekinensis bud cDNA Br BG415973 G2583, 43 Hordeum vulgare HVSMEkOOO9EO6f Hordeum vulgare testaperica BF647090 G2583, 43 Medicago truncatula NFOO7A06EC1F1038 Elicited cell culture BG560598 G2583, 43 2.90E Sorghum. RHIZ2 59 D07.b1 AO03 propinquilin Rhizome2 (RHIZ2). So AWO11200 G2583, 43 Pinus taeda ST17HO8 Pine TriplEx shoot tip library Pinus ta US 7,598.429 B2 53 54

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation BF479478 G2583, 43 1.60 E 5 Mesembryanthemum L48-3155T3 Ice plant crystallinum Lambda U gi19507 G2583, 43 140 E 6 Lupinus polyphyllius put. pPLZ2 product (AA 1-164). gi10798644 G2583, 43 1.OO E Nicotiana tabacum AP2 domain containing transcription fac gi8571476 G2583, 43 4.70 E Atriplex hortensis apetala2 domain containing protein. gi2213783 G2583, 43 840 E Lycopersicon Pt.S. escientiin gi8809573 G2583, 43 5.30 E Nicotiana Sylvestris ethylene-responsive element binding gi4099914 G2583, 43 840 E Stylosanthes hamata ethylene-responsive element binding p gió478845 G2583, 43 8.90 E Matricaria ethylene-responsive chamomilia element binding gi15290041 G2583, 43 9.40 Oryza sativa hypothetical protein. gi12225.884 G2583, 43 1.70 Zea mayS unnamed protein product. gi3264767 G2583, 43 340 E Prints armeniaca AP2 domain containing protein. 242 G361 G362.6 7.OEe-17 Arabidopsis thaliana 244 G2826 G362.6 S.OE-14 Arabidopsis thaliana 246 G2838 G362.6 2.OE-12 Arabidopsis thaliana 248 G1995 G362.6 S.OE-10 Arabidopsis thaliana 250 G370 G362.6 S.OE-10 Arabidopsis thaliana BGS8113S G362.6 1.7OE-1 Medicago truncatula EST482865. GVN Medicago truncatula cDNA BI2O6903 G362.6 R 1 Lycopersicon ESTS24943 cTOS escientiin Lycopersicon esculen G362.6 Glycine max saa71c12.y1 Gm c1060 Glycine max cDNA clone GEN APOO3214 G362.6 3.OOE-1 Oryza sativa chromosome 1 clone OSJNBa0083M16, *** SEQUENCI BE366047 G362.6 6.40E-1 Sorghum bicolor PI1 30 G05.b2 AO02 Pathogen induced 1 (PI1) BF616974 G362.6 1.90E-O Hordeum vulgare HVSMEcOO14CO8f Hordeum vulgare Seedling sho BG444243 G362.6 Gossypium arboretin GA Ea?)023L22f Gossypium arboretin 7-10 d BESOO26S G362.6 O.OOO15 Titictim aestivitin WHEO981 F11 L2OZS Wheat pre-anthesis spik ABOO6604 G362.6 O.OOO23 Petunia X hybrida mRNA for ZPT2-9, complete cols. AI163O84 G362.6 OOOO4 Populus tremitia X AO31pé5u Hybrid Populus tremuloides aspen gi15528588 G362.6 5 Oryza sativa hypothetical protein. gi2346984 G362.6 8 Petunia X hybrida ZPT2-9. gi7228329 G362.6 Medicago Saiiva putative TFIIIA (or kruppel)-like Zinc fi gi1763063 G362.6 Glycine max SCOF-1 gi485814 G362.6 Titictim aestivitin WZF1. gi4666360 G362.6 Datisca glomerata zinc-finger protein 1. gi2O58504 G362.6 Brassica rapa zinc-finger protein-1. gi861091 G362.6 Pistin sativum putative Zinc finger protein. gi2981169 G362.6 O42 Nicotiana tabacum osmotic stress induced zinc-finger prot BM110736 G2105,63 5 Soianum tuberosum EST558272 potato roots Soianum tuberosum US 7,598.429 B2 55 56

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation BF646615 OS, 63 6.60 E-36 Medicago truncatula NFO66CO8EC1F1065 Elicited cell culture ABOS2729 OS, 63 E-30 Pistin sativum mRNA for DNA binding protein DF1, complete cod OSNOOO22 OS, 63 1.1OE Oryza sativa chromosome 4 clone OSJNBa0011L07, *** SEQUENC AIf 77.252 OS, 63 4.2OE Lycopersicon EST 258217 tomato escientiin resistant, Cornell BMSOOO43 OS, 63 Zea mayS 952036C09.y1952 BMS tissue from Walbot Lab (red APOO4839 OS, 63 1.90E Oryza sativa () chromosome 2 clo (japonica cultivar group) AW596787 OS, 63 2.3OE Glycine max s16f10.y1 Gm-c1032 Glycine max cDNA clone GENO AV41071S OS, 63 Lotus japonicus AV410715 Lotus japonicits young plants (two BM357046 OS, 63 3.1OE Triphysaria 16I-G5 Triphysaria versicolor versicolor root-t gi13646986 OS, 63 Pistin sativum DNA-binding protein DF1. gi20249 OS, 63 1.30 Oryza sativa gt-2. gi18182311 OS, 63 8.20 Glycine max GT-2 factor. gi8096269 OS, 63 O.24 Nicotiana tabacum KED. 167 G3645 G47.65 Brassica rapa subsp. Pekinensis 151 G2133 G47.65 1.OE-47 Arabidopsis thaliana 16S G3646 G47.65 2.OE-46 Brassica oleracea 163 G3647 G47.65 2.OE-33 Zinnia elegans 157 G3643 G47.65 1.OE-29 Glycine max 155 G3644 G47.65 9.OEe-26 Oryza sativa (japonica cultivar group) 159 G47.65 1.OE-23 Zea mayS 153 G47.65 1.OE-23 Oryza sativa (japonica cultivar group) 161 G3651 G47.65 9.OE-21 Oryza sativa (japonica cultivar group) BE32O193 G47.65 Medicago truncatula NFO24BO4RT1F1029 Developing root Medica APOO3379 G47.65 8.9 OE Oryza sativa chromosome 1 clone P0408G07, 8.8% SEQUENCING IN G47.65 7.90 Lycopersicon EST3O2937 tomato escientiin root during after B434553 G47.65 8.90 Soianum tuberosum EST537314 P. infestans-challenged leaf So BF6101.98 G47.65 1.30 Pinus taeda NXSI 055 HO4 F NXSI (Nsf Xylem Side wood Inclin BE659994 G47.65 2.50 E-15 Glycine max 4-G2 GmaxSC Glycine max cDNA, mRNA sequence. BG446456 G47.65 S.OO Gossypium arboretin GA EbOO34M18f Gossypium arboretin 7-10 d BG321374 G47.65 1.10 E-14 Deschirainia Sophia DSO1 06dO8 R DSO1. AAFC ECORC cold stress G47.65 240 Gossypium hirsutium BNLGH11133 Six day Cotton fiber Gossypi US 7,598.429 B2 57 58

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation gi14140155 G47.65 2.90 -16 Oryza sativa putative AP2 domain transcription factor. gi5616086 G47.65 7.90 E 4 Brassica naptis dehydration responsive element binding pro gi12225916 G47.65 8.70 E 4 Zea mayS unnamed protein product. gi8571476 G47.65 1.30 E 3 Atriplex hortensis apetala2 domain containing protein. gi898.0313 G47.65 E 3 Catharanthus roseus AP2-domain DNA binding protein. gió478845 G47.65 E 2 Matricaria ethylene-responsive chamomilla element binding gi1208498 G47.65 6.40 Nicotiana tabacum EREBP-2. gi8809573 G47.65 2.2O Nicotiana Sylvestris ethylene-responsive element binding gi7528276 G47.65 340 E 1 Mesembryanthemum AP2-related crystallinum transcription f gi3342211 G47.65 4...SO E 1 Lycopersicon Pti4. escientiin 149 G4294 G975/89 2.OE-65 Oryza sativa 143 G2S83 G975/89 3.OE-56 Arabidopsis thaliana 145 G1387 G975/89 S.OE-54 Arabidopsis thaliana APOO3615 G975/89 1.1OE-51 Oryza sativa chromosome 6 clone P0486H12, *** SEQUENCING IN BG642554 G975/89 1.1OE 5 O Lycopersicon EST356O31 tomato escientiin flower buds, anthe AW705973 G975/89 3.2OE-45 Glycine max skó4c02.y1 Gm c1016 Glycine max cDNA clone GENO ATOO1828 G975/89 Brassica rapa Subsp. AT001828 Flower pekinensis bud cDNA Br BG415973 G975/89 3.7OE-29 Hordeum vulgare HVSMEkOOO9EO6f Hordeum vulgare testaperica AUO88998 G975/89 2.1OE-27 Lotus japonicus AUO88998 Lois japonicus flower bud cDNA Lo AL377839 G975/89 8.4OE-21 Medicago truncatula MtBB34CO4F1 MtBB Medicago truncatula cD BF479478 G975/89 2.2OE-18 Mesembryanthemum L48-3155T3 Ice plant crystallinum Lambda U BG560598 G975/89 3.4OE-18 Sorghum. RHIZ2 59 D07.b1 AO03 propinquilin Rhizome2 (RHIZ2). So L46408 G975/89 S.90E-18 Brassica rapa BNAF1258 Mustard flower buds Brassica rapacD gi19507 G975/89 2.1OE-19 Lupinus polyphyllius put. pPLZ2 product (AA 1-164). gi2213783 G975/89 18OE-15 Lycopersicon Pt.S. escientiin gi8571476 G975/89 2.8OE-14 Atriplex hortensis apetala2 domain containing protein. gi4099914 G975/89 7.90E-14 Stylosanthes hamata ethylene-responsive element binding p gió478845 G975/89 3.4OE-13 Matricaria ethylene-responsive chamomilia element binding gi12225.884 G975/89 Zea mayS unnamed protein product. gi8809573 G975/89 7.OOE-13 Nicotiana Sylvestris ethylene-responsive element binding gi15290041 G975/89 1.2O Oryza sativa hypothetical protein. gi898.0313 G975/89 1.2O Catharanthus roseus AP2-domain DNA binding protein. gi7528276 G975/89 1.3OE-12 Mesembryanthemum AP2-related crystallinum transcription f US 7,598.429 B2 59 60

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation 252 4,33 1.OE-116 Arabidopsis thaliana 4,33 4.40 Lycopersicon EST310415 tomato escientiin root deficiency, C BG156656 4,33 18O E-33 Glycine max sab31d11.y1 Gm c1026 Glycine max cDNA clone GEN BES97638 4,33 Sorghum bicolor PI1 72 CO5.b1 AO02 Pathogen induced 1 (PI1) BI272895 4,33 Medicago truncatula NFO91A11FL1F1084 Developing flower Medi BE129981 4,33 3.90E Zea mayS 945O34COSX1945 - Mixed adult tissues rom Walbot BF889434 4,33 Oryza sativa EST003 Magnaporthe grisea infected 16 day-old gi15528628 4,33 Oryza sativa hypothetical protein-similar to Oryza sativa gi7677132 4,33 Secale cereale c-myb-like transcription factor. gi13676413 4,33 O43 Glycine max hypothetical protein. gi12406993 4,33 0.57 Hordeum vulgare MCB1 protein. gi940288 4,33 O.85 Pistin sativum protein localized in he nucleoli of peanu gi1279563 4,33 O.92 Medicago Saiiva nuM1. gi12005328 4,33 O.98 Hevea brasiensis unknown. gi7688744 4,33 O.99 Lycopersicon aSc1. escientiin gi1070004 4,33 O.99 Brassica naptis Biotin carboxyl carrier protein. gi5326994 4,33 Daiiciis Caroia DNA topoisomerase 254 G5 1.OE-76 Arabidopsis thaliana B421315 7.10 E-54 Lycopersicon EST531981 tomato escientiin callus, TAMU Lycop Glycine max sc38e09.y1 Gm c1014 Glycine max cDNA clone GENO AF274.033 Atriplex hortensis apetala2 domain containing protein mRNA, BG5929.17 Soianum tuberosum EST491595 CSTS Soianum tuberosum cDNA clo AI166481 6.2OE Populus balsamifera xylem.est.309 Poplar Subsp. trichocarpa AW776927 2.1OE Medicago truncatula EST335992 DSIL Medicago truncatula cDNA APOO4119 Oryza sativa chromosome 2 clone OJ1288 G09, *** SEQUENCING BE918036 Sorghum bicolor OV1 1 B03.b1 AO02 Ovary 1 (OV1) Sorghum bic gi8571476 7.OOE Atriplex hortensis apetala2 domain containing protein. gi14140155 Oryza sativa putative AP2 domain transcription factor. gi3342211 Lycopersicon Pti4. escientiin gi1208498 1...SO Nicotiana tabacum EREBP-2. gi12225.884 1...SO Zea mayS unnamed protein product. gi7528276 3.90E Mesembryanthemum AP2-related crystallinum transcription f US 7,598.429 B2 61 62

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation gi8809571 G974, S1 3.90 E-19 Nicotiana Sylvestris ethylene-responsive element binding gi1688233 G974, S1 3.SO E-18 Soianum tuberosum DNA binding protein homolog. gi3264767 G974, S1 9.40 E-18 Prints armeniaca AP2 domain containing protein. gió478845 G974, S1 2.OO E-17 Matricaria ethylene-responsive chamomilia element binding BI3111.37 G2343,53 4.OO E-45 Medicago truncatula ESTS312887 GESD Medicago truncatula cDN BG130765 G2343,53 Lycopersicon EST463657 tomato escientiin crown gall Lycoper G2343,53 2.3OE Sorghum bicolor LG1 354 G05.b1 AO02 Light Grown 1 (LG 1) Sor AV421932 G2343,53 Lotus japonicus AV421932 Lois japonicits young plants (two BE611938 G2343,53 Glycine max Sr01 ho4.y1 Gm c1049 Glycine max cDNA clone GENO BF484214 G2343,53 1.90E Titictim aestivitin WHE2309 F07 K13ZS Wheat pre-anthesis spik BG3.01022 G2343,53 Hordeum vulgare HVSMEbOO19E16f Hordeum vulgare Seedling sho APOO3O18 G2343,53 3.2OE Oryza sativa genomic DNA, chromosome 1, BAC clone: OSJNBa000 BE4953OO G2343,53 3.3OE Secale cereale WHE1268 FO2 KO4ZS Secale cereale anther cDNA AI657290 G2343,53 3.5OE Zea mayS 486093A08.y1486 eaf primordia cDNA ibrary fro gi1167486 G2343,53 9.SOE Lycopersicon transcription factor. escientiin gi13366181 G2343,53 1.3OE Oryza sativa putative transcription actor. gi2130045 G2343,53 1. SOE Hordeum vulgare MybHv33 protein barley. gi82310 G2343,53 Antirrhinum maius myb protein 330 garden Snapdragon. gi1732247 G2343,53 RR 33 44 Nicotiana tabacum transcription factor gi1841475 G2343,53 7.8O Pistin sativum gi5139814 G2343,53 2.8O Glycine max gi13346178 G2343,53 4.90 Gossypium hirsutium BNLGH233. gió651292 G2343,53 2.70 Pimpinella myb-related brachycarpa transcription factor. gi8247759 G2343,53 1.10 2 9 Titictim aestivitin GAMyb protein. AF272573 23,67 1.30 5 O Popitius alba X Popiti is clone INRA717-1-B4 tremia 4-3-3 pr BGS81482 23,67 Medicago truncatula EST483216 GVN Medicago truncatula cDNA 23,67 Soianum tuberosum O9A12 Mature tuber iambda ZAP Soianum

LETFTT 23,67 1.2OE Lycopersicon mRNA for 14-3-3 escientiin protein, TFT7. AF228SO1 23,67 4. SOE Glycine max 4-3-3-like protein mRNA, complete cols. BE643058 23,67 Ceratopteris Cri. 7 M14 SP6 richardii Ceratopteris Spore Li AF2228OS 23,67 7.OOE Euphorbia estia 4-3-3-like protein mRNA, complete cols. US 7,598.429 B2 63 64

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation PSA238682 23,67 1.30 E-42 Pistin sativum mRNA for 14-3-3- like protein, sequence 2. BG44.3252 23,67 Gossypium arboretin GA Ea?)020A13f Gossypium arboretin 7-10 d AIT 27536 23,67 Gossypium hirsutium BNLGH8338 Six day Cotton fiber Gossypiu gi8515890 23,67 5 2 Populus albax Populus 14-3-3 protein. tremula gi8099061 23,67 3.70 Popitius X canescens 14-3-3 protein. gi7576887 23,67 1.OO Glycine max 14-3-3-like protein. gi3925703 23,67 8.90 - g Lycopersicon 14-3-3 protein. escientiin gió752903 23,67 8.90 Euphorbia estia 14-3-3-like protein. gi913214 23,67 2.10 Nicotiana tabacum T14-3-3. gi11138322 23,67 340 Vicia faba wf14-3-3d protein. gi2879818 23,67 8.50 Soianum tuberosum 14-3-3 protein. gi10.15462 23,67 8.90 E-E-E-E-E- Chlamydomonas 14-3-3 protein. reinhardtii gi2921512 23,67 1.10 agrestis GF14 protein. ACO91246 777/55 3.SO Oryza sativa chromosome 3 clone OSJNBa0002IO3, *** SEQUENCI BG136684 777/55 1.1OE 6 7 Lycopersicon EST477126 wild pennelli tomato pollen Lycoper AW703793 777/55 2. SOE Glycine max sk12fO8.y1 Gm c1023 Glycine max cDNA clone GENO BEOS 1040 777/55 Zea mayS Za71g01.b50 Maize Glume cDNAs Library Zea mayS cDN AW933922 777/55 2.90E Lycopersicon EST359765 tonato escientiin fruit mature green BG600834 777/55 Soianum tuberosum EST505729 cSTS Soianum tuberosum cDNA clo BF440069 777/55 3.2OE Theilungiella Sc0136 Theilungiella Salistiginea Saisuginea ZA BFS87440 777/55 4.2OE Sorghum. FM1 36 D07.b1 AO03 propinquum Floral-Induced Merist BI267961 777/55 2.1OE Medicago truncatula NF118E09IN1F1071 Insect herbivory Medic BE415217 777/55 2. SOE Titictim aestivitin MWLO2S.FO2FOOO2O8 ITECMWLWheat Root Lib gi1666171 777/55 Nicotiana unknown. plumbaginifolia 777/55 Fragaria Xananassa unknown. AW928317 Lycopersicon EST307OSO tomato escientiin flower buds 8 mm t BF271147 Gossypium arboretin GA EbOO1OK15f Gossypium arboretin 7-10 d BE3296.54 Glycine max so67c05.y1 Gm c1040 Glycine max cDNA clone GENO BG103O16 Sorghum. RHIZ2 36 A10.b1 AO03 propinquilin Rhizome2 (RHIZ2). So BE60698O 1.OOE Titictim aestivitin WHEO914 F04 KO8ZS Wheat S-1 SDAP spike cD BGO48756 Sorghum bicolor OV1 22 F05.b1 AO02 Ovary 1 (OV1) Sorghum bi US 7,598.429 B2 65 66

TABLE 5-continued Sequences phylogenetically related to Arabidopsis sequences shown to confer useful traits in plants

Col. 2 GID or Related Col. 3 Col. 4 Col. 1 Sequence Related to Smallest Col. 5 Col. 6 SEQ Identifier GID/SEQID Sum Species from which Test Sequence ID NO (Accession No.) NO Probability Sequence is Derived GenBank Annotation AI162779 G252O?37 2.10E-22 Populus tremula X Populus AO23P62U Hybrid tremioides aspen BI27.0049 G252O?37 2.90E-22 Medicago truncatula NFOO4DO4FL1F1042 Developing flower Medi BE921054 G252O?37 3.90E-22 Soianum tuberosum EST424823 potato leaves and petioles Sola BF200249 G252O?37 9.1OE-22 Titicip WHE2254 F11 L22ZE iii.626COCGilii Trictim iii.626COCGilii S gi118.62964 G252O?37 4.50E-16 Oryza sativa hypothetical protein. gi5923912 G252O?37 6.30E-16 Tulipa gesneriana bHLH transcription factor GBOF-1. gió166283 G252O?37 O.69 Pinus taeda helix-loop-helix protein 1A. gi1086538 G252O?37 1 Oryza rufipogon transcriptional activator Rb homolog.

For many of the traits listed in Table 6 that may be con fatty acid ratios. Thus, Suppressing a gene that causes a plant ferred to plants by ectopically expressing transcription fac to be more sensitive to cold may improve a plants tolerance tors of the invention, a single transcription factor gene may be 30 of cold. used to increase or decrease, advance or delay, or improve or The first and second columns of Table 6 provide the Trait prove deleterious to a given trait. For example, overexpres category and specific trait were generally observed in plants sion of a transcription factor gene that naturally occurs in a overexpressing the listed transcription factor sequence of the plant may cause early flowering relative to non-transformed invention, or, where noted, in plants in which a specific tran or wild-type plants. By knocking out the gene, or Suppressing 35 scription factor has been knocked out (KO). The third column the gene (with, for example, antisense Suppression) the plant lists the sequences for which a specific trait was observed may experience delayed flowering. Similarly, overexpressing when the expression of the sequence was altered, and the last or Suppressing one or more genes can impart significant dif column provides the utility and specific observations, relative ferences in production of plant products, such as different to controls, for each of the sequences.

TABLE 6 Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations Environmental Increased osmotic stress G353, G1069, G1930 Enhanced germination rate, stress resistance tolerance Survivability, yield and tolerance G47 (in a root growth assay on PEG-containing media, G47 overexpressing Seedlings were larger and had more root growth compared to the wild-type) G353 (on PEG containing media, overexpressing Seedlings were larger and greener than the wild-type) G1069 (overexpressing lines showed more tolerance to osmotic stress on high Sucrose media) G1930 (with more seedling vigor on high Sucrose than wild-type control plants) Altered C/N sensing and G975, G1069, G1266, Improved yield, less fertilizer tolerance to low G1322, G1930, G2131, required, improved stress nitrogen conditions G2144, G2512, G2520 tolerance and quality G975 (less anthocyanin accumulation on low nitrogen media, better US 7,598.429 B2 67 68

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations tolerance to low nitrogen conditions than controls) G1069 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1266 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1322 (accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G1930 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2131 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2144 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2512 (less anthocyanin accumulation on low nitrogen media, better tolerance to low nitrogen conditions than controls) G2520 (less anthocyanin accumulation on low nitrogen media, better olerance to low nitrogen conditions than controls) Increased tolerance to G1946 mproved yield, less fertilizer phosphate-limitation required, improved stress olerance and quality G1946 (more secondary root growth on phosphate ree media than wild-type controls) Increased salt tolerance G47, G1930, G3644 G1930 (with more seedling vigor on high salt media han wild-type control plants) G47 and G3644 (homologs; more seedling vigor on high salt media than wild-type control plants) Increased cold stress G47, G1322, G1930, Enhanced germination, resistance and/or G2133, G3643, G3649 growth, earlier planting improved germination in G1322 (at 8° C., cold conditions overexpressor seedlings were slightly larger and had longer roots than wild type) G1930 (increased tolerance to 8°C. in a germination assay) G47 (with leaf RBCS3 or shoot apical meristem promoters) and closely related homologs G2133, G3643 and G3649 (35S promoter) conferred increased tolerance to 8°C. in a germination assay relative to controls) US 7,598.429 B2 69 70

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations Increased drought or G47, G353, G975, G1069, Improved survivability, yield, desiccation tolerance G2133, G3643, G3644, extended range G3649 G353 (overexpressors had greater tolerance to drought than wild type in a soil based assay) G975 (overexpressors had greater tolerance to desiccation in plate-based assays, and greater tolerance to drought than wild type in a soil-based assay) G1069 (overexpressors had greater tolerance to drought than wild type in a soil based assay) G47 and homologs G2133, G3643 and G3649 conferred increased water deprivation when overexpressed compared to controls (another homolog, G3644, was not tested in drought assays) Altered light response G377, G1069, G1322, Enhanced germination, and shade tolerance G1794, G2144, G2520 growth, development, flowering time, greater planting density and improved yield G377 (overexpressors had altered leaf orientation) G1322 (overexpressors exhibited constitutive photomorphogenesis) G1069 (overexpressors exhibited altered leaf orientation) G1794 (overexpressors exhibited constitutive photomorphogenesis) G2144 overexpressors exhibited long hypocotyls G2520 (overexpressors had long hypocotyls) Sugar sensing Altered plant response G1337 Photosynthetic rate, to Sugars carbohydrate accumulation, biomass production, Source sink relationships, senescence G1337 (G1337 overexpressors germinated poorly on high glucose compared to controls, thus G1337 may be involved in Sugar sensing, transport, or metabolism) Hormonal Altered hormone G47, G1069, G1266 Seed dormancy, drought sensitivity tolerance; plant form, fruit ripening G47 (overexpressors had decreased sensitivity to ABA) G1069 (overexpressors had decreased sensitivity to ABA) G1266 (overexpressors had decreased sensitivity to ABA) Development, Altered overall plant Altered vascular tissues, morphology architecture increased lignin content; altered cell wall content: and/or appearance G47 (increased lignin content, stems were wider US 7,598.429 B2 71 72

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations with a much greater number of xylem vessels than wild type) G353 (overexpressors had short pedicels, downward pointing siliques, leaves had short petioles, were rather flat, rounded, and sometimes showed changes in coloration) G1543 (some G1543 overexpressors exhibited contorted, stunted carpels; 35S:G1543 plants also exhibited altered branching pattern, and apical dominance was reduced) G1794 (overexpressors exhibited decreased apical dominance) G2509 (overexpressors exhibited decreased apical dominance) Increased size, stature G47: G377, G1052, G1543, Improved yield and/or biomass G2133, G2155, G3643, G47 (stem sections were of G3644, G3649 wider diameter and vascular bundles were larger, sometimes multiple cauline leaves were present at each node; overexpression of G47 and its homologs G2133, G3643, G3644 and G3649, resulted in some lines that produced larger plants than controls with larger rosettes, seedlings and/or seeds) G377 (some lines had broader, fuller rosette leaves than wild type) G214 (larger biomass, increased leaf number and size compared to controls) G1052 (larger leaves and were generally more sturdy than wild type) G1543 (some overexpressors exhibited increased biomass, including tomato plants overexpressing this sequence) G2155 (late in development, 35S:G2155 plants became very large relative to controls) Size: reduced stature or G280; G353; G362; G652; Ornamental utility (creation dwarfism G674: G962: G977; of dwarf varieties); Small G1198; G1266; G1309; stature also provides wind G1322; G1421; G1537; resistance G1641; G1794; G2094; G2144: G2147 Flower structure, G47, G259, G353, G1543 Ornamental horticulture: inflorescence production of saffron or other edible flowers G47 (thick and fleshy inflorescences) G259 (rosette leaves were longer, narrower, darker green than controls, sepals were longer, narrower, and often fused at the tips) G353 (35S:G353 plants had a reduction in flower US 7,598.429 B2 73 74

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations pedicellength and downward pointing siliques) G1543 (some lines showed contorted, stunted carpels) Number and G362, G1930, G2105 Improved resistance to pests development of and desiccation; essential oil trichomes production G362 (increased trichome density) G1930 (decreased trichome density) G2105 (adaxial leaf Surfaces had a somewhat lumpy appearance caused by trichomes being raised up on Small mounds of epidermal cells) Seed size, color, and G652; G2105 mproved yield number G652 (seeds produced by knockouts of G652 plants were somewhat wrinkled and misshapen) G2105 (pale, larger seeds han controls) Leafshape, color, G377; G674: G977; Appealing shape or shiny modifications G1198; G2094; G2105; eaves for ornamental G2113; G2117: G2144: agriculture, increased G2155, G2583 biomass or photosynthesis G377 (during later rosette stage, leaves were rounder, darker green, and shorter han wild type. After lowering, 35S:G377 eaves had a greater blade area than wild-type) G674 (rounded, dark green eaves that sometimes pointed upward) G977 (dark green leaves hat were generally wrinkled or curled) G1198 (smaller, narrower eaves) G2094 (leaves pf overexpressors were short, wide, and slightly yellowed compared to wild type., occasionally the leaves also showed mild serrations on their margins) G2105 (uneven leaf Surface) G2113 (long petioles, vertical leaf orientation, eaves appeared narrow and were downward curling at he margins compared to controls) G2155 (slightly small, rounded, leaves that became darkgreen, very arge and senesced later han wild type late in development) G2144 (pale, narrow, flat eaves that had long petioles and sometimes positioned in a vertical orientation) G2583 (narrow, curled eaves) US 7,598.429 B2 75

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations Altered stem G47, G748 Ornamental; digestibility morphology G47 (stems of wider diameter with large irregular vascular bundles containing greater number of xylem vessels than wild type; some xylem vessels within the bundles appeared narrow and more lignified) G748 (thicker and more vascular bundles in stems han controls) Pigment Production of G214; G259; G362, G490; Antioxidant activity, vitamin E anthocyanin and prenyl G652, G748; G883; G977, G214 (darker green in lipids G1052; G1328; G1930; vegetative and reproductive G2509, G2520 issues due to a higher chlorophyll content in the alter stages of development; increased seed lutein) G259 (increase in seed C ocopherol) G362 (increased pigment production compared to controls, seeds developed patches of dark purple pigmentation, increased anthocyanin in seedling eaves; late flowering lines also became darkly pigmented.) G490 (increased seed 8 ocopherol) G652 (increase in seed C ocopherol) G748 overexpressors consistently produced greater root content than controls) G883 (decreased seed lutein) G1328 (decreased seed lutein) G977 (darker green leaves than controls) G1052 (overexpressors had decreased lutein and increased Xanthophyll 1 relative to controls) G1930 (increased chlorophyll content) G2509 (increase in C ocopherol) G2520 (increase in seed 8 ocopherol and a decrease in seed Y-tocopherol) Seed biochemistry Production of seed G20 Precursors for human steroid sterols hormones; cholesterol modulators G20 (increased campesterol) Production of seed G353; G484: G674; Defense against insects; glucosinolates G1069; G1272 (KO); putative anticancer activity; G1506; G 1897: G1946; undesirable in animal feeds G2113; G2117: G2155; G353 (increased M39494) G2290, G2340 G484 (altered glucosinolate profile) G674 (increased M395.01) G1069 (increased M39497) G1272 (decreased M39497) G1506 (increased M39502 and M39498) G1897 (increased M39491 and M39493) US 7,598.429 B2 77 78

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G1946 (increased M39501) G2113 (decreased M39497, increased M395.01 and M39494) G2117 (increased M39497, decreased M39496) G2155 (increased M39497) G2290 (increased M39496) G2340 (extreme alteration in seed glucosinolate profile) Modified seed oil G229, G652, G663, G974; Vegetable oil production: content G1198; G1543; G1777; increased caloric value for G1946; G2117, G2123; animal feeds; lutein content G2343 G.229 (increased seed oil) G652 (decreased seed oil) G663 (decreased seed oil) G1198 (increased seed oil) G1543 (decreased seed oil observed in Arabidopsis overexpressors, increased seed oil observed in soy) G1777 (increased seed oil) G1946 (increased seed oil) G2117 (decreased seed oil) G2123 (increased seed oil) Modified seed protein G229, G663, G1641; Reduced caloric value for content G1777: G1946; G2117; 8S G2SO9 G.229 (decreased seed protein) G663 (increased seed protein) G1641 (increased seed protein) G1777 (decreased seed protein) G1946 (decreased seed protein) G2117 (increased seed protein) G2509 (increased seed protein) Modified seed fatty acid G1069, G1421 Altered nutritional value: content increase in waxes for disease resistance G1069 (increased 16:0 fatty acids and decreased 18:2 fatty acids) G1421 (increased 18:1 and decreased 18:3 seed fatty acids) Leaf biochemistry Production of leaf G264: G353; G652; G681; Defense against insects; glucosinolates G1069; G1198; G1322; putative anticancer activity; G1421: G1794; G2113, undesirable in animal feeds G2144: G2512; G2520; G264 (increased M39481) G2552 G353 (increased M39494) G652 (increased M394.80) G681 (increased M394.80) G1069 () G1198 (increased M3948) G1322 (increased M39480) G1421 (increased M39482) G1794 (increased M39480) G2113 (increased M39478) G2144 (increased M39480) G2512 (increased M39481) G2520 (increased M39478) G2552 (increased M39480) Production of leaf G2131; G2424 Precursors for human steroid phytosterols, inc. hormones; cholesterol Stigmastanol, modulators campesterol G2131 (Increase in leaf campesterol) US 7,598.429 B2 79 80

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G2424 (increase in StigmaStanol) Leaf fatty acid G214; G377: G962: G975; Altered nutritional value: composition G987 (KO); G1266; increase in waxes for disease G1337; G1399, G1465; resistance G1512; G2136; G2147, G214 (increased leaf fatty G2583 acids) G377 (increase in leaf 18:2 fatty acids and decrease in leaf 18:3 fatty acids) G962 (increase in 16:0 leaf fatty acids, decrease in 18:3 leaf fatty acids) G987 KO (reduction in 16:3 fatty acids relative to controls) G975 (increased leaf fatty acids, glossy leaves) G1337 (increased leaf oleic acids) G1399 (increased leaf 16:0 fatty acid) G1465 (increased in 16:0, 16:1, 18:0 and 18:2 and decreased 16:3 and 18:3 leaf fatty acids) G1512 (increased 18:2 leaf fatty acids) G2136 (decreased 18:3 leaf fatty acids) G2147 increased 16:0 and 18:23 leaf fatty acids) G2583 (glossy leaves) Production of prenyl G214; G259; G280; G362, Antioxidant activity, vitamin E lipids, including G652; G987 (KO), G1543; G214 (increased leaf tocopherol G1930, G2509; G2520 chlorophyll and carotenoids) G259 (increased seed C to copherol) G280 (increased leaf 8 and Y tocopherol) G362 (increased anthocyanin levels in various tissues at different stages of growth.; Seedlings showed high levels of pigment in first true leaves, late flowering lines became darkly pigmented., seeds from developed patches of dark purple pigmentation) G652 (increased seed C to copherol) G987 (overexpressors had two Xanthophylls not present in wild-type leaves, Y-tocopherol (which normally accumulate in seed tissue), and reduced levels of chlorophylla and chlorophyll b in leaves) G1543 (dark green color, increased levels of carotenoids and chlorophylls a and b in leaves) G1930 (increased levels of chlorophylla and chlorophyll b in seeds compared to controls) G2509 (increased seed C to copherol) US 7,598.429 B2 81 82

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations G2520 (increase in seed 8 to copherol and a decrease in seed Y-tocopherol) Sugar, starch, G158; G211; G242; G274; Improved food digestibility, hemicellulose G1012; G1266; G1309; increased hemicellulose & composition, G1641; G1865; G2094; pectin content; increased G2589 fiber content; increased plant tensile strength, wood quality, pathogen resistance, pulp production and/or tuber starch content G158 (increased leaf rhamnose) G211 (increased leaf xylose) G242 (increased leaf arabinose) G274 (increased leaf arabinose) G1012 (decreased leaf rhamnose) G1266 (alterations in rhamnose, arabinose, xylose, and mannose, and galactose) G1309 (increased leaf mannose) G1641 (increased leaf rhamnose) G1865 (increased galactose, decreased xylose) G2094 (increased leaf arabinose) G2589 (increased leaf insoluble Sugars - increased arabinose) Growth, Plant growth rate and G1543 Faster growth, increased Reproduction development biomass or yield, improved appearance; delay in bolting G1543 (faster growth of seedlings) Senescence: cell death G652, G1897, G2155, Altered yield, appearance; G2340 response to pathogens (potential protective response without the potentially detrimental consequences of a constitutive systemic acquired resistance) G652 (premature senescence of rosette leaves) G1897 (later senescence than controls G2155 (senesced much later than controls) G2340 (overexpressors showed necrosis of blades of rosette and cauline leaves, necrotic lesions) Modified fertility G652: G962: G977; Prevents or minimizes escape G1266; G1421; G2094; of the pollen of genetically G2113; G2147 modified plants G652 (poor fertility) G962 (poor fertility) G977 (poor fertility) G1266 (poor fertility) G1421 (poor fertility) G2094 (poor fertility) G2113 (poor fertility) G2094 (poor fertility) G2147 (poor fertility) US 7,598.429 B2 83 84

TABLE 6-continued Genes, traits and utilities that affect plant characteristics Transcription factor genes Utility/ Trait Category Traits that impact traits Observations Early flowering G490; G1946; G2144: Faster generation time; G2SO9 synchrony of flowering: potential for introducing new traits to single variety Delayed flowering G47; G214; G362; G748; Delayed time to pollen G1052; G1865; G1930, production of GMO plants; G2155, G2133, G3643, synchrony of flowering: G3644, G3649 increased yield Flower and leaf G259; G353; G377; G652; Ornamental applications; development G1865; G 1897: G2094 decreased fertility G259 (rosette leaves were longer and narrow, dark green and curled compared to control plants, sepals were long, narrow, and often fused at the tips) G353 (reduction in flower pedicellength and downward pointing siliques) G377 (inflorescence stems were shorter than wild-type, during late rosette stage, leaves were rounder, darker green, and slightly shorter than those of wild type) G652 (reduced number of stamens: 4-5 of these organs rather than 6) G1865 (short, thick, inflorescence stems, greatly increased number of leaves; visible flower buds up to a month after wild type, continuous light conditions, by which time rosette leaves had become rather large and contorted) G1897 (narrow, dark-green rosette and cauline leaves, inflorescences had short internodes with various abnormalities, perianth organs were typically rather long and narrow..., stamens were short, silique formation was poor) G2094 (inflorescence stems were often thin and carried short flowers, mild serrations on leaf margins) Flower abscission G1897 Ornamental: longer retention of flowers G1897 (delayed abscission of floral organs) * When co-expressed with G669 and G663

Significance of Modified Plant Traits 55 factor genes that provide increased salt tolerance during ger The sequences of the Sequence Listing, those in Tables 4-6, mination, the seedling stage, and throughout a plants life or those disclosed here can be used to prepare transgenic cycle would find particular value for imparting survivability plants and plants with altered traits. The specific transgenic and yield in areas where a particular crop would not normally plants listed below are produced from the sequences of the prosper. Sequence Listing, as noted Tables 4-6 provides exemplary 60 Osmotic stress resistance. Presently disclosed transcrip polynucleotide and polypeptide sequences of the invention. tion factor genes that confer resistance to osmotic stress may Salt stress resistance. Soil salinity is one of the more impor increase germination rate under adverse conditions, which tant variables that determines where a plant may thrive. Salin could impact Survivability and yield of seeds and plants. ity is especially important for the Successful cultivation of Cold stress resistance. The potential utility of presently crop plants, particular in many parts of the world that have 65 disclosed transcription factor genes that increase tolerance to naturally high soil salt concentrations, or where the soil has cold is to confer better germination and growth in cold con been over-utilized. Thus, presently disclosed transcription ditions. The germination of many crops is very sensitive to US 7,598.429 B2 85 86 cold temperatures. Genes that would allow germination and Generally, plants that have the highest level of defense seedling vigor in the cold would have highly significant utility mechanisms, such as, for example, polyunsaturated moieties in allowing seeds to be planted earlier in the season with a of membrane lipids, are most likely to thrive under conditions high rate of Survivability. Transcription factor genes that con that introduce oxidative stress (e.g., highlight, oZone, water fer better survivability in cooler climates allow a grower to 5 deficit, particularly in combination). Introduction of the pres move up planting time in the spring and extend the growing ently disclosed transcription factor genes that increase the season further into autumn for higher crop yields. level of oxidative stress defense mechanisms would provide Tolerance to freezing. The presently disclosed transcrip beneficial effects on the yield and appearance of plants. One tion factor genes that impart tolerance to freezing conditions specific oxidizing agent, oZone, has been shown to cause are useful for enhancing the Survivability and appearance of 10 significant foliar injury, which impacts yield and appearance plants conditions or conditions that would otherwise cause of crop and ornamental plants. In addition to reduced foliar extensive cellular damage. Thus, germination of seeds and injury that would be found in ozone resistant plant created by Survival may take place at temperatures significantly below transforming plants with some of the presently disclosed that of the mean temperature required for germination of transcription factor genes, the latter have also been shown to seeds and Survival of non-transformed plants. As with salt 15 have increased chlorophyll fluorescence (Yu-Sen Changet al. tolerance, this has the added benefit of increasing the poten Bot. Bull. Acad. Sin. (2001) 42:265-272). tial range of a crop plant into regions in which it would Heavy metal tolerance. Heavy metals such as lead, mer otherwise Succumb. Cold tolerant transformed plants may cury, arsenic, chromium and others may have a significant also be planted earlier in the spring or later in autumn, with adverse impact on plant respiration. Plants that have been greater Success than with non-transformed plants. transformed with presently disclosed transcription factor Heat stress tolerance. The germination of many crops is genes that confer improved resistance to heavy metals, also sensitive to high temperatures. Presently disclosed tran through, for example, sequestering or reduced uptake of the Scription factor genes that provide increased heat tolerance metals will show improved vigor and yield in soils with are generally useful in producing plants that germinate and relatively high concentrations of these elements. Conversely, grow in hot conditions, may find particular use for crops that 25 transgenic transcription factors may also be introduced into are planted late in the season, or extend the range of a plant by plants to conferan increase inheavy metal uptake, which may allowing growth in relatively hot climates. benefit efforts to clean up contaminated soils. Drought, low humidity tolerance. Strategies that allow Light response. Presently disclosed transcription factor plants to Survive in low water conditions may include, for genes that modify a plants response to light may be useful for example, reduced Surface area or Surface oil or wax produc 30 modifying a plant's growth or development, for example, tion. A number of presently disclosed transcription factor photomorphogenesis in poor light, or accelerating flowering genes increase a plants tolerance to low water conditions and time in response to various light intensities, quality or dura provide the benefits of improved survivability, increased tion to which a non-transformed plant would not similarly yield and an extended geographic and temporal planting respond. Examples of Such responses that have been demon range. 35 Radiation resistance. Presently disclosed transcription fac strated include leaf number and arrangement, and early tor genes have been shown to increase lutein production. flower bud appearances. Lutein, like other Xanthophylls Such as Zeaxanthin and vio Overall plant architecture. Several presently disclosed laxanthin, are important in the protection of plants against the transcription factor genes have been introduced into plants to damaging effects of excessive light. Lutein contributes, alter numerous aspects of the plants morphology. For directly or indirectly, to the rapid rise of non-photochemical 40 example, it has been demonstrated that a number of transcrip quenching in plants exposed to highlight. Increased tolerance tion factors may be used to manipulate branching, Such as the of field plants to visible and ultraviolet light impacts surviv means to modify lateral branching, a possible application in ability and vigor, particularly for recent transplants. Also the forestry industry. Transgenic plants have also been pro affected are the yield and appearance of harvested plants or duced that have altered cell wall content, lignin production, plant parts. Crop plants engineered with presently disclosed 45 transcription factor genes that cause the plant to produce flower organ number, or overall shape of the plants. Presently higher levels of lutein therefore would have improved photo disclosed transcription factor genes transformed into plants protection, leading to less oxidative damage and increase may be used to affect plant morphology by increasing or vigor, Survivability and higher yields under high light and decreasing internode distance, both of which may be advan ultraviolet light conditions. 50 tageous under different circumstances. For example, for fast Decreased herbicide sensitivity. Presently disclosed tran growth of woody plants to provide more biomass, or fewer Scription factor genes that confer resistance or tolerance to knots, increased internode distances are generally desirable. herbicides (e.g., glyphosate) may find use in providing means For improved wind screening of shrubs or trees, or harvesting to increase herbicide applications without detriment to desir characteristics of for example, members of the Gramineae able plants. This would allow for the increased use of a 55 family, decreased internode distance may be advantageous. particular herbicide in a local environment, with the effect of These modifications would also prove useful in the ornamen increased detriment to undesirable species and less harm to talhorticulture industry for the creation of unique phenotypic transgenic, desirable cultivars. characteristics of ornamental plants. Increased herbicide sensitivity. Knockouts of a number of Increased stature. For some ornamental plants, the ability the presently disclosed transcription factor genes have been 60 to provide larger varieties may be highly desirable. For many shown to be lethal to developing embryos. Thus, these genes plants, including t fruit-bearing trees or trees and shrubs that are potentially useful as herbicide targets. serve as view or wind Screens, increased stature provides Oxidative stress. In plants, as in all living things, abiotic obvious benefits. Crop species may also produce higher and biotic stresses induce the formation of oxygen radicals, yields on larger cultivars including Superoxide and peroxide radicals. This has the 65 Reduced stature or dwarfism. Presently disclosed tran effect of accelerating senescence, particularly in leaves, with Scription factor genes that decrease plant stature can be used the resulting loss of yield and adverse effect on appearance. to produce plants that are more resistant to damage by wind US 7,598.429 B2 87 88 and rain, or more resistant to heat or low humidity or water Root development, modifications. By modifying the struc deficit. Dwarf plants are also of significant interest to the ture or development of roots by transforming into a plant one ornamental horticulture industry, and particularly for home or more of the presently disclosed transcription factor genes, garden applications for which space availability may be lim plants may be produced that have the capacity to thrive in ited. otherwise unproductive soils. For example, grape roots that Fruit size and number. Introduction of presently disclosed extend further into rocky soils, or that remain viable in water transcription factor genes that affect fruit size will have desir logged soils, would increase the effective planting range of able impacts on fruit size and number, which may comprise the crop. It may be advantageous to manipulate a plant to increases in yield for fruit crops, or reduced fruit yield, such produce short roots, as when a soil in which the plant will be as when vegetative growth is preferred (e.g., with bushy orna 10 growing is occasionally flooded, or when pathogenic fungi or mentals, or where fruit is undesirable, as with ornamental disease-causing nematodes are prevalent. olive trees). Modifications to root hairs. Presently disclosed transcrip Flower structure, inflorescence, and development. Pres tion factor genes that increase root hair length or number ently disclosed transgenic transcription factors have been potentially could be used to increase root growth or vigor, used to create plants with larger flowers or arrangements of 15 which might in turn allow better plant growth under adverse flowers that are distinct from wild-type or non-transformed conditions such as limited nutrient or water availability. cultivars. This would likely have the most value for the orna Apical dominance. The modified expression of presently mental horticulture industry, where larger flowers or interest disclosed transcription factors that control apical dominance ing presentations generally are preferred and command the could be used in ornamental horticulture, for example, to highest prices. Flower structure may have advantageous modify plant architecture. effects on fertility, and could be used, for example, to Branching patterns. Several presently disclosed transcrip decrease fertility by the absence, reduction or screening of tion factor genes have been used to manipulate branching, reproductive components. One interesting application for which could provide benefits in the forestry industry. For manipulation of flower structure, for example, by introduced example, reduction in the formation of lateral branches could 25 reduce knot formation. Conversely, increasing the number of transcription factors could be in the increased production of lateral branches could provide utility when a plant is used as edible flowers or flower parts, including saffron, which is a windscreen, or may also provide ornamental advantages. derived from the Stigmas of Crocus sativus. Leaf shape, color and modifications. It has been demon Number and development of trichomes. Several presently strated in laboratory experiments that overexpression of some disclosed transcription factor genes have been used to modify 30 of the presently disclosed transcription factors produced trichome number and amount of trichome products in plants. marked effects on leaf development. At early stages of Trichome glands on the surface of many higher plants pro growth, these transgenic seedlings developed narrow, upward duce and secrete exudates that give protection from the ele pointing leaves with long petioles, possibly indicating a dis ments and pests such as insects, microbes and herbivores. ruption in circadian-clock controlled processes or nyctinastic These exudates may physically immobilize insects and 35 movements. Other transcription factor genes can be used to spores, may be insecticidal orant-microbial or they may act as increase plant biomass; large size would be useful in crops allergens or irritants to protect against herbivores. Trichomes where the vegetative portion of the plant is the marketable have also been suggested to decrease transpiration by portion. decreasing leaf Surface air flow, and by exuding chemicals Siliques. Genes that later silique conformation in brassi that protect the leaf from the sun. 40 cates may be used to modify fruit ripening processes in bras Another potential utilities for sequences that increase tri Sicates and other plants, which may positively affect seed or chome number is to increase the density of cotton fibers in fruit quality. cotton bolls. Cotton fibers are modified unicellular trichomes Stem morphology and shoot modifications. Laboratory that are produced from the ovule epidermis. However, typi studies have demonstrated that introducing several of the cally only 30% of the epidermal cells take on a trichome fate 45 presently disclosed transcription factor genes into plants can (Basra and Malik, 1984). Thus, cotton yields might be cause stem bifurcations in shoots, in which the shoot mer increased by inducing a greater proportion of the ovule epi istems split to form two or three separate shoots. This unique dermal cells to become fibers. appearance would be desirable in ornamental applications. Seed size, color and number. The introduction of presently Diseases, pathogens and pests. A number of the presently disclosed transcription factor genes into plants that alter the 50 disclosed transcription factor genes have been shown to or are size or number of seeds may have a significant impact on likely to confer resistance to various plant diseases, patho yield, both when the product is the seed itself, or when bio gens and pests. The offending organisms include fungal mass of the vegetative portion of the plant is increased by pathogens Fusarium oxysporum, Botrytis cinerea, Sclero reducing seed production. In the case of fruit products, it is tinia Sclerotiorum, and Erysiphe Orontii. Bacterial pathogens often advantageous to modify a plant to have reduced size or 55 to which resistance may be conferred include Pseudomonas number of seeds relative to non-transformed plants to provide syringae. Other problem organisms may potentially include seedless or varieties with reduced numbers or smaller seeds. nematodes, mollicutes, parasites, or herbivorous arthropods. Presently disclosed transcription factor genes have also been In each case, one or more transformed transcription factor shown to affect seed size, including the development of larger genes may provide some benefit to the plant to help prevent or seeds. Seed size, in addition to seed coat integrity, thickness 60 overcome infestation. The mechanisms by which the tran and permeability, seed water content and by a number of other Scription factors work could include increasing Surface waxes components including antioxidants and oligosaccharides, or oils, Surface thickness, local senescence, or the activation may affect seed longevity in storage. This would be an impor of signal transduction pathways that regulate plant defense in tant utility when the seed of a plant is the harvested crops, as response to attacks by herbivorous pests (including, for with, for example, peas, beans, nuts, etc. Presently disclosed 65 example, protease inhibitors). transcription factor genes have also been used to modify seed Increased tolerance of plants to nutrient-limited soils. Pres color, which could provide added appeal to a seed product. ently disclosed transcription factor genes introduced into US 7,598.429 B2 89 90 plants may provide the means to improve uptake of essential edible crops, tissue specific promoters might be used to nutrients, including nitrogenous compounds, phosphates, ensure that these compounds accumulate specifically in tis potassium, and trace minerals. The effect of these modifica Sues, such as the epidermis, which are not taken for consump tions is to increase the seedling germination and range of tion. ornamental and crop plants. The utilities of presently dis 5 Modified seed oil content. The composition of seeds, par closed transcription factor genes conferring tolerance to con ticularly with respect to seed oil amounts and/or composition, ditions of low nutrients also include cost savings to the grower is very important for the nutritional value and production of by reducing the amounts of fertilizer needed, environmental various food and feed products. Several of the presently dis benefits of reduced fertilizer runoff; and improved yield and closed transcription factor genes in seed lipid saturation that stress tolerance. In addition, this gene could be used to alter 10 alter seed oil content could be used to improve the heat seed protein amounts and/or composition that could impact stability of oils or to improve the nutritional quality of seed yield as well as the nutritional value and production of various oil, by, for example, reducing the number of calories in seed, food products. increasing the number of calories in animal feeds, or altering Hormone sensitivity. One or more of the presently dis the ratio of saturated to unsaturated lipids comprising the oils. closed transcription factor genes have been shown to affect 15 Seed and leaf fatty acid composition. A number of the plant abscisic acid (ABA) sensitivity. This plant hormone is presently disclosed transcription factor genes have been likely the most important hormone in mediating the adapta shown to alter the fatty acid composition in plants, and seeds tion of a plant to stress. For example, ABA mediates conver in particular. This modification may find particular value for sion of apical meristems into dormant buds. In response to improving the nutritional value of, for example, seeds or increasingly cold conditions, the newly developing leaves whole plants. Dietary fatty acids ratios have been shown to growing above the meristem become converted into stiff bud have an effect on, for example, bone integrity and remodeling scales that closely wrap the meristem and protect it from (see, for example, Weiler Pediatr. Res. (2000)47:5692-697). mechanical damage during winter. ABA in the bud also The ratio of dietary fatty acids may alter the precursor pools enforces dormancy; during premature warm spells, the buds of long-chain polyunsaturated fatty acids that serve as pre are inhibited from Sprouting. Bud dormancy is eliminated 25 cursors for prostaglandin synthesis. In mammalian connec after either a prolonged cold period of cold or a significant tive tissue, prostaglandins serve as important signals regulat number of lengthening days. Thus, by affecting ABA sensi ing the balance between resorption and formation in bone and tivity, introduced transcription factor genes may affect cold cartilage. Thus dietary fatty acid ratios altered in seeds may sensitivity and survivability. ABA is also important in pro affect the etiology and outcome of bone loss. tecting plants from drought tolerance. 30 Modified seed protein content. As with seed oils, the com Several other of the present transcription factor genes have position of seeds, particularly with respect to protein amounts been used to manipulate ethylene signal transduction and and/or composition, is very important for the nutritional value response pathways. These genes can thus be used to manipu and production of various food and feed products. A number late the processes influenced by ethylene, such as seed ger of the presently disclosed transcription factor genes modify mination or fruit ripening, and to improve seed or fruit qual 35 the protein concentrations in seeds would provide nutritional ity. benefits, and may be used to prolong storage, increase seed Production of seed and leaf prenyl lipids, including toco pest or disease resistance, or modify germination rates. pherol. Prenyl lipids play a role in anchoring proteins in Production of flavonoids in leaves and other plant parts. membranes or membranous organelles. Thus, modifying the Expression of presently disclosed transcription factor genes prenyl lipid content of seeds and leaves could affect mem 40 that increase flavonoid production in plants, including antho brane integrity and function. A number of presently disclosed cyanins and condensed tannins, may be used to alter in pig transcription factor genes have been shown to modify the ment production for horticultural purposes, and possibly tocopherol composition of plants. Tocopherols have both increasing stress resistance. Flavonoids have antimicrobial anti-oxidant and vitamin E activity. activity and could be used to engineer pathogen resistance. Production of seed and leaf phytosterols: Presently dis 45 Several flavonoid compounds have health promoting effects closed transcription factor genes that modify levels of phy Such as the inhibition of tumor growth and cancer, prevention tosterols in plants may have at least two utilities. First, phy of bone loss and the prevention of the oxidation of lipids. tosterols are an important source of precursors for the Increasing levels of condensed tannins, whose biosynthetic manufacture of human steroid hormones. Thus, regulation of pathway is shared with anthocyanin biosynthesis, in forage transcription factor expression or activity could lead to 50 legumes is an important agronomic trait because they prevent elevated levels of important human steroid precursors for pasture bloat by collapsing protein foams within the rumen. steroid semi-synthesis. For example, transcription factors For a review on the utilities offlavonoids and their derivatives, that cause elevated levels of campesterol in leaves, or sito refer to Dixon et al. (1999) Trends Plant Sci. 4: 394-400. sterols and stigmasterols in seed crops, would be useful for Production of diterpenes in leaves and other plant parts. this purpose. Phytosterols and their hydrogenated derivatives 55 Depending on the plant species, varying amounts of diverse phytostanols also have proven cholesterol-lowering proper secondary biochemicals (often lipophilic terpenes) are pro ties, and transcription factor genes that modify the expression duced and exuded or volatilized by trichomes. These exotic of these compounds in plants would thus provide health ben secondary biochemicals, which are relatively easy to extract efits. because they are on the surface of the leaf, have been widely Production of seed and leaf glucosinolates. Some glucosi 60 used in Such products as flavors and aromas, drugs, pesticides nolates have anti-cancer activity; thus, increasing the levels or and cosmetics. Thus, the overexpression of genes that are composition of these compounds by introducing several of used to produce diterpenes in plants may be accomplished by the presently disclosed transcription factors might be of inter introducing transcription factor genes that induce said over est from a nutraceutical standpoint. (3) Glucosinolates form expression. One class of secondary metabolites, the diterpe part of a plants natural defense against insects. Modification 65 nes, can effect several biological systems such as tumor pro of glucosinolate composition or quantity could therefore gression, prostaglandin synthesis and tissue inflammation. In afford increased protection from predators. Furthermore, in addition, diterpenes can act as insect pheromones, termite US 7,598.429 B2 91 92 allomones, and can exhibit neurotoxic, cytotoxic and antimi alter storage compound accumulation in seeds. Manipulation totic activities. As a result of this functional diversity, diter of the Sucrose signaling pathway in seeds may therefore cause penes have been the target of research several pharmaceutical seeds to have more protein, oil or carbohydrate, depending on Ventures. In most cases where the metabolic pathways are the type of manipulation. Similarly, in tubers. Sucrose is con impossible to engineer, increasing trichome density or size on Verted to starch which is used as an energy store. It is thought leaves may be the only way to increase plant productivity. that Sugar signaling pathways may partially determine the Production of anthocyanin in leaves and other plant parts. levels of starch synthesized in the tubers. The manipulation of Several presently disclosed transcription factor genes can be Sugar signaling in tubers could lead to tubers with a higher used to alter anthocyanin production in numerous plant spe starch content. cies. The potential utilities of these genes include alterations 10 Thus, the presently disclosed transcription factor genes in pigment production for horticultural purposes, and possi that manipulate the Sugar signal transduction pathway may bly increasing stress resistance in combination with another lead to altered gene expression to produce plants with desir transcription factor. able traits. In particular, manipulation of Sugar signal trans Production of miscellaneous secondary metabolites. duction pathways could be used to alter source-sink relation Microarray data Suggests that flux through the aromatic 15 ships in seeds, tubers, roots and other storage organs leading amino acid biosynthetic pathways and primary and secondary to increase in yield. metabolite biosynthetic pathways are up-regulated. Presently Plant growth rate and development. A number of the pres disclosed transcription factors have been shown to be ently disclosed transcription factor genes have been shown to involved in regulating alkaloid biosynthesis, in part by up have significant effects on plant growth rate and development. regulating the enzymes indole-3-glycerol phosphatase and These observations have included, for example, more rapid or strictosidine synthase. Phenylalanine ammonia lyase, chal delayed growth and development of reproductive organs. cone synthase and trans-cinnamate mono-oxygenase are also This would provide utility for regions with short or long induced, and are involved in phenylpropenoid biosynthesis. growing seasons, respectively. Accelerating plant growth Sugar, starch, hemicellulose composition. Overexpression would also improve early yield or increase biomass at an of the presently disclosed transcription factors that affect 25 earlier stage, when Such is desirable (for example, in produc sugar content resulted in plants with altered leaf insoluble ing forestry products). Sugar content. Transcription factors that alter plant cell wall Embryo development. Presently disclosed transcription composition have several potential applications including factor genes that alter embryo development has been used to altering food digestibility, plant tensile strength, wood qual alter seed protein and oil amounts and/or composition which ity, pathogen resistance and in pulp production. The potential 30 is very important for the nutritional value and production of utilities of a gene involved in glucose-specific Sugar sensing various food products. Seed shape and seed coat may also be are to alter energy balance, photosynthetic rate, carbohydrate altered by these genes, which may provide for improved accumulation, biomass production, Source-sink relation storage stability. ships, and senescence. Seed germination rate. A number of the presently disclosed Hemicellulose is not desirable in paper pulps because of its 35 transcription factor genes have been shown to modify seed lack of strength compared with cellulose. Thus modulating germination rate, including when the seeds are in conditions the amounts of cellulose vs. hemicellulose in the plant cell normally unfavorable for germination (e.g., cold, heat or salt wall is desirable for the paper/lumber industry. Increasing the stress, or in the presence of ABA), and may thus be used to insoluble carbohydrate content in various fruits, vegetables, modify and improve germination rates under adverse condi and other edible consumer products will result in enhanced 40 tions. fiber content. Increased fiber content would not only provide Plant, seedling vigor. Seedlings transformed with pres health benefits in food products, but might also increase ently disclosed transcription factors have been shown to pos digestibility of forage crops. In addition, the hemicellulose sess larger cotyledons and appeared somewhat more and pectin content of fruits and berries affects the quality of advanced than control plants. This indicates that the seedlings jam and catsup made from them. Changes in hemicellulose 45 developed more rapidly that the control plants. Rapid seed and pectin content could result in a Superior consumer prod ling development is likely to reduce loss due to diseases uct. particularly prevalent at the seedling stage (e.g., damping off) Plant response to Sugars and Sugar composition. In addition and is thus important for Survivability of plants germinating to their important role as an energy source and structural in the field or in controlled environments. component of the plant cell, Sugars are central regulatory 50 Senescence, cell death. Presently disclosed transcription molecules that control several aspects of plant physiology, factor genes may be used to alter senescence responses in metabolism and development. It is thought that this control is plants. Although leaf Senescence is thought to be an evolu achieved by regulating gene expression and, in higher plants, tionary adaptation to recycle nutrients, the ability to control Sugars have been shown to repress or activate plant genes senescence in an agricultural setting has significant value. For involved in many essential processes such as photosynthesis, 55 example, a delay in leafsenescence in some maize hybrids is glyoxylate metabolism, respiration, starch and Sucrose Syn associated with a significant increase in yields and a delay of thesis and degradation, pathogen response, wounding a few days in the Senescence of soybean plants can have a response, cell cycle regulation, pigmentation, flowering and large impact on yield. Delayed flower senescence may also senescence. The mechanisms by which Sugars control gene generate plants that retain their blossoms longer and this may expression are not understood. 60 be of potential interest to the ornamental horticulture indus Because Sugars are important signaling molecules, the try. ability to control either the concentration of a signaling Sugar Modified fertility. Plants that overexpress a number of the or how the plant perceives or responds to a signaling Sugar presently disclosed transcription factor genes have been could be used to control plant development, physiology or shown to possess reduced fertility. This could be a desirable metabolism. For example, the flux of sucrose (a disaccharide 65 trait, as it could be exploited to prevent or minimize the escape Sugar used for systemically transporting carbon and energy in of the pollen of genetically modified organisms (GMOs) into most plants) has been shown to affect gene expression and the environment. US 7,598.429 B2 93 94 Early and delayed flowering. Presently disclosed transcrip Antisense and Co-Suppression tion factor genes that accelerate flowering could have valu In addition to expression of the nucleic acids of the inven able applications in Such programs since they allow much tion as gene replacement or plant phenotype modification faster generation times. In a number of species, for example, nucleic acids, the nucleic acids are also useful for sense and broccoli, cauliflower, where the reproductive parts of the anti-sense Suppression of expression, e.g., to down-regulate plants constitute the crop and the vegetative tissues are dis expression of a nucleic acid of the invention, e.g., as a further carded, it would be advantageous to accelerate time to flow mechanism for modulating plant phenotype. That is, the ering. Accelerating flowering could shorten crop and tree nucleic acids of the invention, or Subsequences or anti-sense breeding programs. Additionally, in some instances, a faster sequences thereof, can be used to block expression of natu 10 rally occurring homologous nucleic acids. A variety of sense generation time might allow additional harvests of a crop to and anti-sense technologies are known in the art, e.g., as set be made within a given growing season. A number of Arabi forth in Lichtenstein and Nellen (1997) Antisense Technol dopsis genes have already been shown to accelerate flowering ogy: A Practical Approach IRL Press at Oxford University when constitutively expressed. These include LEAFY. Press, Oxford, U.K. In general, sense oranti-sense sequences APETALA1 and CONSTANS (Mandel et al. (1995) Nature 15 are introduced into a cell, where they are optionally amplified, 377:522-524; Weigel and Nilsson (1995) Nature 377: 495 e.g., by transcription. Such sequences include both simple 500; and Simon et al. (1996) Nature 384: 59-62). oligonucleotide sequences and catalytic sequences Such as By regulating the expression of potential flowering using ribozymes. inducible promoters, flowering could be triggered by appli For example, a reduction or elimination of expression (i.e., cation of an inducer chemical. This would allow flowering to a “knock-out”) of a transcription factor or transcription factor be synchronized across a crop and facilitate more efficient homologue polypeptide in a transgenic plant, e.g., to modify harvesting. Such inducible systems could also be used to tune a plant trait, can be obtained by introducing an antisense the flowering of crop varieties to different latitudes. At construct corresponding to the polypeptide of interest as a present, species such as Soybean and cotton are available as a cDNA. For antisense Suppression, the transcription factor or 25 homologue cDNA is arranged in reverse orientation (with series of maturity groups that are suitable for different lati respect to the coding sequence) relative to the promoter tudes on the basis of their flowering time (which is governed sequence in the expression vector. The introduced sequence by day-length). A system in which flowering could be chemi need not be the full length cDNA or gene, and need not be cally controlled would allow a single high-yielding northern identical to the cDNA or gene found in the plant type to be maturity group to be grown at any latitude. In southern 30 transformed. Typically, the antisense sequence need only be regions such plants could be grown for longer, thereby capable of hybridizing to the target gene or RNA of interest. increasing yields, before flowering was induced. In more Thus, where the introduced sequence is of shorter length, a northern areas, the induction would be used to ensure that the higher degree of homology to the endogenous transcription crop flowers prior to the first winter frosts. factor sequence will be needed for effective antisense Sup In a sizeable number of species, for example, root crops, 35 pression. While antisense sequences of various lengths can be where the vegetative parts of the plants constitute the crop and utilized, preferably, the introduced antisense sequence in the the reproductive tissues are discarded, it would be advanta vector will be at least 30 nucleotides in length, and improved geous to delay or prevent flowering. Extending vegetative antisense Suppression will typically be observed as the length development with presently disclosed transcription factor of the antisense sequence increases. Preferably, the length of genes could thus bring about large increases in yields. Pre 40 the antisense sequence in the vector will be greater than 100 vention of flowering might help maximize vegetative yields nucleotides. Transcription of an antisense construct as and prevent escape of genetically modified organism (GMO) described results in the production of RNA molecules that are pollen. the reverse complement of mRNA molecules transcribed Extended flowering phase. Presently disclosed transcrip from the endogenous transcription factor gene in the plant tion factors that extend flowering time have utility in engi 45 cell. neering plants with longer-lasting flowers for the horticulture Suppression of endogenous transcription factor gene industry, and for extending the time in which the plant is expression can also be achieved using a ribozyme. fertile. Ribozymes are RNA molecules that possess highly specific endoribonuclease activity. The production and use of Flower and leaf development. Presently disclosed tran 50 ribozymes are disclosed in U.S. Pat. No. 4,987,071 and U.S. Scription factor genes have been used to modify the develop Pat. No. 5,543.508. Synthetic ribozyme sequences including ment of flowers and leaves. This could be advantageous in the antisense RNAs can be used to confer RNA cleaving activity development of new ornamental cultivars that present unique on the antisense RNA, such that endogenous mRNA mol configurations. In addition, some of these genes have been ecules that hybridize to the antisense RNA are cleaved, which shown to reduce a plants fertility, which is also useful for 55 in turn leads to an enhanced antisense inhibition of endog helping to prevent development of pollen of GMOs. enous gene expression. Flower abscission. Presently disclosed transcription factor Suppression of endogenous transcription factor gene genes introduced into plants have been used to retain flowers expression can also be achieved using RNA interference, or for longer periods. This would provide a significant benefit to RNAi. RNAi is a post-transcriptional, targeted gene-silenc the ornamental industry, for both cut flowers and woody plant 60 ing technique that uses double-stranded RNA (dsRNA) to varieties (of for example, maize), as well as have the potential incite degradation of messenger RNA (mRNA) containing to lengthen the fertile period of a plant, which could positively the same sequence as the dsRNA (Constans, (2002) The Sci impact yield and breeding programs. entist 16:36). Small interfering RNAs, or siRNAs are pro A listing of specific effects and utilities that the presently duced in at least two steps: an endogenous ribonuclease disclosed transcription factor genes have on plants, as deter 65 cleaves longer dsRNA into shorter, 21-23 nucleotide-long mined by direct observation and assay analysis, is provided in RNAs. The siRNA segments then mediate the degradation of Tables 4 and 6. the target mRNA (Zamore, (2001) Nature Struct. Biol. 8: US 7,598.429 B2 95 96 746–50). RNAi has been used for gene function determination (1996) Science 274: 982-985). This method entails trans in a manner similar to antisense oligonucleotides (Constans, forming a plant with a gene tag containing multiple transcrip (2002) The Scientist 16:36). Expression vectors that continu tional enhancers and once the tag has inserted into the ally express siRNAs in transiently and stably transfected have genome, expression of a flanking gene coding sequence been engineered to express small hairpin RNAs (shRNAs), becomes deregulated. In another example, the transcriptional which get processed in vivo into siRNAs-like molecules machinery in a plant can be modified so as to increase tran capable of carrying out gene-specific silencing (Brum Scription levels of a polynucleotide of the invention (See, e.g., melkamp et al. (2002) Science 296:550-553, and Paddisonet PCT Publications WO 96/06166 and WO 98/53057 which al. (2002) Genes & Dev. 16:948-958). Post-transcriptional describe the modification of the DNA-binding specificity of gene silencing by double-stranded RNA is discussed in fur 10 Zinc finger proteins by changing particular amino acids in the ther detail by Hammond et al. (2001) Nature Rev Gen 2: DNA-binding motif). 110-119, Fireet al. (1998) Nature 391:806-811 and Timmons The transgenic plant can also include the machinery nec and Fire (1998) Nature 395: 854. essary for expressing or altering the activity of a polypeptide Vectors in which RNA encoded by a transcription factor or encoded by an endogenous gene, for example by altering the transcription factor homologue cDNA is over-expressed can 15 phosphorylation state of the polypeptide to maintain it in an also be used to obtain co-suppression of a corresponding activated State. endogenous gene, e.g., in the manner described in U.S. Pat. Transgenic plants (or plant cells, or plant explants, or plant No. 5,231,020 to Jorgensen. Such co-suppression (also tissues) incorporating the polynucleotides of the invention termed sense Suppression) does not require that the entire and/or expressing the polypeptides of the invention can be transcription factor cDNA be introduced into the plant cells, produced by a variety of well established techniques as nor does it require that the introduced sequence be exactly described above. Following construction of a vector, most identical to the endogenous transcription factor gene of inter typically an expression cassette, including a polynucleotide, est. However, as with antisense Suppression, the Suppressive e.g., encoding a transcription factor or transcription factor efficiency will be enhanced as specificity of hybridization is homologue, of the invention, standard techniques can be used increased, e.g., as the introduced sequence is lengthened, 25 to introduce the polynucleotide into a plant, a plant cell, a and/or as the sequence similarity between the introduced plant explant or a plant tissue of interest. Optionally, the plant sequence and the endogenous transcription factor gene is cell, explant or tissue can be regenerated to produce a trans increased. genic plant. Vectors expressing an untranslatable form of the transcrip The plant can be any higher plant, including gymnosperms, tion factor mRNA, e.g., sequences comprising one or more 30 monocotyledonous and dicotyledonous plants. Suitable pro stop codon, or nonsense mutation) can also be used to Sup tocols are available for Leguminosae (alfalfa, soybean, clo press expression of an endogenous transcription factor, ver, etc.), Umbelliferae (carrot, celery, parsnip), Cruciferae thereby reducing or eliminating it's activity and modifying (cabbage, radish, rapeseed, broccoli, etc.), Curcurbitaceae one or more traits. Methods for producing Such constructs are (melons and cucumber), Gramineae (wheat, corn, rice, bar described in U.S. Pat. No. 5,583,021. Preferably, such con 35 ley, millet, etc.), Solanaceae (potato, tomato, tobacco, pep structs are made by introducing a premature stop codon into pers, etc.), and various other crops. See protocols described in the transcription factor gene. Alternatively, a plant trait can be Ammirato et al. (1984) Handbook of Plant Cell Culture-Crop modified by gene silencing using double-strand RNA (Sharp Species, Macmillan Publ. Co. Shimamoto et al. (1989) Nature (1999) Genes Devel. 13: 139-141). Another method for abol 338: 274–276: Fromm et al. (1990) Bio/Technology 8: 833 ishing the expression of a gene is by insertion mutagenesis 40 839; and Vasil et al. (1990) Bio/Technology 8: 429-434. using the T-DNA of Agrobacterium tumefaciens. After gen Transformation and regeneration of both erating the insertion mutants, the mutants can be screened to ous and dicotyledonous plant cells is now routine, and the identify those containing the insertionina transcription factor selection of the most appropriate transformation technique or transcription factor homologue gene. Plants containing a will be determined by the practitioner. The choice of method single transgene insertion event at the desired gene can be 45 will vary with the type of plant to be transformed; those crossed to generate homozygous plants for the mutation. skilled in the art will recognize the suitability of particular Such methods are well known to those of skill in the art. (See methods for given plant types. Suitable methods can include, for example Koncz et al. (1992) Methods in Arabidopsis but are not limited to: electroporation of plant protoplasts; Research, World Scientific.) liposome-mediated transformation; polyethylene glycol Alternatively, a plant phenotype can be altered by elimi 50 (PEG) mediated transformation; transformation using nating an endogenous gene. Such as a transcription factor or viruses; micro-injection of plant cells; micro-projectile bom transcription factor homologue, e.g., by homologous recom bardment of plant cells; vacuum infiltration; and Agrobacte bination (Kempin et al. (1997) Nature 389: 802-803). rium tumefaciens mediated transformation. Transformation A plant trait can also be modified by using the Cre-lox means introducing a nucleotide sequence into a plant in a system (for example, as described in U.S. Pat. No. 5,658, 55 manner to cause stable or transient expression of the 772). A plant genome can be modified to include first and Sequence. second lox sites that are then contacted with a Cre recombi Successful examples of the modification of plant charac nase. If the loX sites are in the same orientation, the interven teristics by transformation with cloned sequences which ing DNA sequence between the two sites is excised. If the lox serve to illustrate the current knowledge in this field of tech sites are in the opposite orientation, the intervening sequence 60 nology, and which are herein incorporated by reference, is inverted. include: U.S. Pat. Nos. 5,571,706; 5,677,175: 5,510,471; The polynucleotides and polypeptides of this invention can 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268,526; also be expressed in a plant in the absence of an expression 5,780,708: 5,538,880; 5,773,269; 5,736,369 and 5,610,042. cassette by manipulating the activity or expression level of the Following transformation, plants are preferably selected endogenous gene by other means. For example, by ectopi 65 using a dominant selectable marker incorporated into the cally expressing a gene by T-DNA activation tagging transformation vector. Typically, such a marker will confer (Ichikawa et al. (1997) Nature 390 698-701; Kakimoto et al. antibiotic or herbicide resistance on the transformed plants, US 7,598.429 B2 97 98 and selection of transformants can be accomplished by sequence. T is referred to as the neighborhood word score exposing the plants to appropriate concentrations of the anti threshold (Altschulet al., supra). These initial neighborhood biotic or herbicide. word hits act as seeds for initiating searches to find longer After transformed plants are selected and grown to matu HSPs containing them. The word hits are then extended in rity, those plants showing a modified trait are identified. The both directions along each sequence for as far as the cumu modified trait can be any of those traits described above. lative alignment score can be increased. Cumulative scores Additionally, to confirm that the modified trait is due to are calculated using, for nucleotide sequences, the parameters changes in expression levels or activity of the polypeptide or M (reward score for a pair of matching residues; always >0) polynucleotide of the invention can be determined by analyz and N (penalty score for mismatching residues; always <0). ing mRNA expression using Northern blots, RT-PCR or 10 For amino acid sequences, a scoring matrix is used to calcu microarrays, or protein expression using immunoblots or late the cumulative score. Extension of the word hits in each Western blots or gel shift assays. direction are halted when: the cumulative alignment score Integrated Systems—Sequence Identity falls off by the quantity X from its maximum achieved value: Additionally, the present invention may be an integrated 15 the cumulative score goes to Zero or below, due to the accu system, computer or computer readable medium that com mulation of one or more negative-scoring residue alignments; prises an instruction set for determining the identity of one or or the end of either sequence is reached. The BLAST algo more sequences in a database. In addition, the instruction set rithm parameters W. T. and X determine the sensitivity and can be used to generate or identify sequences that meet any speed of the alignment. The BLASTN program (for nucle specified criteria. Furthermore, the instruction set may be otide sequences) uses as defaults a wordlength (W) of 11, an used to associate or link certain functional benefits, such expectation (E) of 10, a cutoff of 100, M-5, N=-4, and a improved characteristics, with one or more identified comparison of both Strands. For amino acid sequences, the Sequence. BLASTP program uses as defaults a wordlength (W) of 3, an For example, the instruction set can include, e.g., a sequence comparison or other alignment program, e.g., an expectation (E) of 10, and the BLOSUM62 scoring matrix 25 (see Henikoff & Henikoff (1989) Proc. Natl. Acad. Sci. USA available program Such as, for example, the Wisconsin Pack 89: 10915). Unless otherwise indicated, “sequence identity” age Version 10.0, such as BLAST, FASTA, PILEUP. FIND here refers to the '% sequence identity generated from atblastx PATTERNS or the like (GCG, Madison, Wis.). Public using the NCBI version of the algorithm at the default settings sequence databases such as GenBank, EMBL, Swiss-Prot using gapped alignments with the filter "off (see, for and PIR or private sequence databases such as PHYTOSEQ 30 sequence database (Incyte Genomics, Palo Alto, Calif.) can example, internet website at ncbi.nlm.nih.gov). be searched. In addition to calculating percent sequence identity, the Alignment of sequences for comparison can be conducted BLAST algorithm also performs a statistical analysis of the by the local homology algorithm of Smith and Waterman similarity between two sequences (see, e.g., Karlin & Alts (1981) Adv. Appl. Math. 2: 482-489, by the homology align 35 chul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5787). One ment algorithm of Needleman and Wunsch (1970) J. Mol. measure of similarity provided by the BLAST algorithm is Biol. 48: 443-453, by the search for similarity method of the smallest sum probability (P(N)), which provides an indi Pearson and Lipman (1988) Proc. Natl. Acad. Sci. U.S.A. 85: cation of the probability by which a match between two 2444-2448, by computerized implementations of these algo nucleotide or amino acid sequences would occur by chance. rithms. After alignment, sequence comparisons between two 40 For example, a nucleic acid is considered similar to a refer (or more) polynucleotides or polypeptides are typically per ence sequence (and, therefore, in this context, homologous) if formed by comparing sequences of the two sequences over a the Smallest Sum probability in a comparison of the test comparison window to identify and compare local regions of nucleic acid to the reference nucleic acid is less than about sequence similarity. The comparison window can be a seg 0.1, or less than about 0.01, and or even less than about 0.001. ment of at least about 20 contiguous positions, usually about 45 An additional example of a useful sequence alignment algo 50 to about 200, more usually about 100 to about 150 con rithm is PILEUP, PILEUP creates a multiple sequence align tiguous positions. A description of the method is provided in ment from a group of related sequences using progressive, Ausubel et al., Supra. pairwise alignments. The program can align, e.g., up to 300 A variety of methods for determining sequence relation sequences of a maximum length of 5,000 letters. ships can be used, including manual alignment and computer 50 The integrated system, or computer typically includes a assisted sequence alignment and analysis. This later approach user input interface allowing a user to selectively view one or is a preferred approach in the present invention, due to the more sequence records corresponding to the one or more increased throughput afforded by computer assisted methods. character Strings, as well as an instruction set which aligns the As noted above, a variety of computer programs for perform one or more character strings with each other or with an ing sequence alignment are available, or can be produced by 55 additional character string to identify one or more region of one of skill. sequence similarity. The system may include a link of one or One example algorithm that is Suitable for determining more character strings with a particular phenotype or gene percent sequence identity and sequence similarity is the function. Typically, the system includes a user readable out BLAST algorithm, which is described in Altschul et al. J. put element that displays an alignment produced by the align Mol. Biol. 215: 403-410 (1990). Software for performing 60 ment instruction set. BLAST analyses is publicly available, e.g., through the The methods of this invention can be implemented in a National Center for Biotechnology Information (see internet localized or distributed computing environment. In a distrib website at ncbi.nlm.nih.gov). This algorithm involves first uted environment, the methods may be implemented on a identifying high scoring sequence pairs (HSPs) by identify single computer comprising multiple processors or on a mul ing short words of length W in the query sequence, which 65 tiplicity of computers. The computers can be linked, e.g. either match or satisfy some positive-valued threshold score T through a common bus, but more preferably the computer(s) when aligned with a word of the same length in a database are nodes on a network. The network can be a generalized or US 7,598.429 B2 99 100 a dedicated local or wide-area network and, in certain pre ligation of the MarathonTM Adaptor to the cDNA to form a ferred embodiments, the computers may be components of an library of adaptor-ligated ds cDNA. intra-net or an internet. Gene-specific primers were designed to be used along with Thus, the invention provides methods for identifying a adaptor specific primers for both 5' and 3' RACE reactions. sequence similar or homologous to one or more polynucle Nested primers, rather than single primers, were used to otides as noted herein, or one or more target polypeptides increase PCR specificity. Using 5' and 3' RACE reactions, 5' encoded by the polynucleotides, or otherwise noted herein and 3' RACE fragments were obtained, sequenced and and may include linking or associating a given plant pheno cloned. The process can be repeated until 5' and 3' ends of the type or gene function with a sequence. In the methods, a full-length gene were identified. Then the full-length cDNA sequence database is provided (locally or across an inter or 10 was generated by PCR using primers specific to 5' and 3' ends intra net) and a query is made against the sequence database of the gene by end-to-end PCR. using the relevant sequences herein and associated plant phe notypes or gene functions. Example II Any sequence herein can be entered into the database, before or after querying the database. This provides for both 15 Construction of Expression Vectors expansion of the database and, if done before the querying step, for insertion of control sequences into the database. The The sequence was amplified from a genomic or cDNA control sequences can be detected by the query to ensure the library using primers specific to sequences upstream and general integrity of both the database and the query. As noted, downstream of the coding region. The expression vector was the query can be performed using a web browser based inter pMEN20 or pMEN65, which are both derived from face. For example, the database can be a centralized public pMON316 (Sanders et al. (1987) Nucleic Acids Res. 15: database Such as those noted herein, and the querying can be 1543-1558) and contain the CaMV 35S promoter to express done from a remote terminal or computer across an internet or transgenes. To clone the sequence into the vector, both intranet. pMEN20 and the amplified DNA fragment were digested 25 separately with SalI and NotI restriction enzymes at 37°C. for EXAMPLES 2 hours. The digestion products were subject to electrophore sis in a 0.8% agarose gel and visualized by ethidium bromide The following examples are intended to illustrate but not staining. The DNA fragments containing the sequence and limit the present invention. The complete descriptions of the the linearized plasmid were excised and purified by using a 30 Qiaquick gel extraction kit (Qiagen, Valencia Calif.). The traits associated with each polynucleotide of the invention is fragments of interest were ligated at a ratio of 3:1 (vector to fully disclosed in Table 4 and Table 6. insert). Ligation reactions using T4 DNA ligase (New England Biolabs, Beverly Mass.) were carried out at 16° C. Example I for 16 hours. The ligated DNAs were transformed into com 35 petent cells of the E. coli strain DH5C. by using the heat shock Full Length Gene Identification and Cloning method. The transformations were plated on LB plates con taining 50 mg/l kanamycin (Sigma, St. Louis, Mo.). Indi Putative transcription factor sequences (genomic or ESTs) vidual colonies were grown overnight in five milliliters of LB related to known transcription factors were identified in the broth containing 50 mg/l kanamycin at 37° C. Plasmid DNA Arabidopsis thaliana GenBank database using the tblastn 40 was purified by using Qiaquick Mini Prep kits (Qiagen). sequence analysis program using default parameters and a P-value cutoff threshold of -4 or -5 or lower, depending on Example III the length of the query sequence. Putative transcription factor sequence hits were then screened to identify those containing Transformation of Agrobacterium with the particular sequence strings. If the sequence hits contained 45 Expression Vector Such sequence strings, the sequences were confirmed as tran Scription factors. After the plasmid vector containing the gene was con Alternatively, Arabidopsis thaliana cDNA libraries structed, the vector was used to transform Agrobacterium derived from different tissues or treatments, or genomic tumefaciens cells expressing the gene products. The stock of libraries were screened to identify novel members of a tran 50 Agrobacterium tumefaciens cells for transformation were Scription family using a low stringency hybridization made as described by Nagel et al. (1990) FEMS Microbiol approach. Probes were synthesized using gene specific prim Letts. 67: 325-328. Agrobacterium strain ABI was grown in ers in a standard PCR reaction (annealing temperature 60°C.) 250 ml LB medium (Sigma) overnight at 28°C. with shaking and labeled with PdCTP using the High Prime DNA Label until an absorbance (A) of 0.5-1.0 was reached. Cells were ing Kit (Boehringer Mannheim). Purified radiolabelled 55 harvested by centrifugation at 4,000xg for 15 min at 4°C. probes were added to filters immersed in Church hybridiza Cells were then resuspended in 250 ul chilled buffer (1 mM tion medium (0.5 MNaPO pH 7.0, 7% SDS, 1% w/v bovine HEPES, pH adjusted to 7.0 with KOH). Cells were centri serum albumin) and hybridized overnight at 60°C. with shak fuged again as described above and resuspended in 125 ul ing. Filters were washed two times for 45 to 60 minutes with chilled buffer. Cells were then centrifuged and resuspended 1xSCC, 1% SDS at 60° C. 60 two more times in the same HEPES bufferas described above To identify additional sequence 5' or 3' of a partial clNA at a volume of 100 ul and 750 ul, respectively. Resuspended sequence in a cDNA library, 5' and 3' rapid amplification of cells were then distributed into 40 ul aliquots, quickly frozen cDNA ends (RACE) was performed using the MarathonTM in liquid nitrogen, and stored at -80°C. cDNA amplification kit (Clontech, Palo Alto, Calif.). Gener Agrobacterium cells were transformed with plasmids pre ally, the method entailed first isolating poly(A) mRNA, per 65 pared as described above following the protocol described by forming first and second strand cDNA synthesis to generate Nagel et al. For each DNA construct to be transformed, double stranded cDNA, blunting cDNA ends, followed by 50-100 ng DNA (generally resuspended in 10 mM Tris-HCl, US 7,598.429 B2 101 102 1 mM EDTA, pH 8.0) was mixed with 40 ul of Agrobacterium (Clorox) was added to the seeds, and the Suspension was cells. The DNA/cell mixture was then transferred to a chilled shaken for 10 min. After removal of the bleach/detergent cuvette with a 2 mm electrode gap and subject to a 2.5 kV solution, seeds were thenwashed five times insterile distilled charge dissipated at 25uF and 200 uF using a Gene Pulser II HO. The seeds were stored in the last wash water at 4°C. for apparatus (Bio-Rad, Hercules, Calif.). After electroporation, 5 2 days in the dark before being plated onto antibiotic selection cells were immediately resuspended in 1.0 ml LB and medium (1x Murashige and Skoog salts (pH adjusted to 5.7 allowed to recover without antibiotic selection for 2-4 hours with 1MKOH), 1x Gamborg's B-5 vitamins, 0.9% phytagar at 28°C. in a shaking incubator. After recovery, cells were (Life Technologies), and 50 mg/l kanamycin). Seeds were plated onto selective medium of LB broth containing 100 germinated under continuous illumination (50-75uE/m/sec) ug/ml spectinomycin (Sigma) and incubated for 24-48 hours 10 at 22-23°C. After 7-10 days of growth under these conditions, at 28°C. Single colonies were then picked and inoculated in kanamycin resistant primary transformants (T generation) fresh medium. The presence of the plasmid construct was were visible and obtained. These seedlings were transferred verified by PCR amplification and sequence analysis. first to fresh selection plates where the seedlings continued to grow for 3-5 more days, and then to soil (Pro-Mix BX potting Example IV 15 medium). Primary transformants were crossed and progeny seeds Transformation of Arabidopsis Plants with (T) collected; kanamycin resistant seedlings were selected Agrobacterium tumefaciens with Expression Vector and analyzed. The expression levels of the recombinant poly nucleotides in the transformants varies from about a 5% After transformation of Agrobacterium tumefaciens with expression level increase to a least a 100% expression level plasmid vectors containing the gene, single Agrobacterium increase. Similar observations are made with respect to colonies were identified, propagated, and used to transform polypeptide level expression. Arabidopsis plants. Briefly, 500 ml cultures of LB medium containing 50 mg/lkanamycin were inoculated with the colo Example VI nies and grown at 28°C. with shaking for 2 days until an 25 optical absorbance at 600 nm wavelength over 1 cm (Agoo) of Identification of Arabidopsis Plants with >2.0 is reached. Cells were then harvested by centrifugation Transcription Factor Gene Knockouts at 4,000xg for 10 min, and resuspended in infiltration medium (/2x Murashige and Skoog salts (Sigma), 1X Gam The screening of insertion mutagenized Arabidopsis col borg's B-5 vitamins (Sigma), 5.0% (w/v) sucrose (Sigma), 30 lections for null mutants in a known target gene was essen 0.044 uMbenzylamino purine (Sigma), 200 ul/1 Silwet L-77 tially as described in Krysan et al (1999) Plant Cell 11: (Lehle Seeds) until an A of 0.8 was reached. 2283-2290. Briefly, gene-specific primers, nested by 5-250 Prior to transformation, Arabidopsis thaliana seeds base pairs to each other, were designed from the 5' and 3 (ecotype Columbia) were sown at a density of ~10 plants per regions of a known target gene. Similarly, nested sets of 4" pot onto Pro-Mix BX potting medium (Hummert Interna 35 primers were also created specific to each of the T-DNA or tional) covered with fiberglass mesh (18 mmx 16 mm). Plants transposon ends (the “right' and “left borders). All possible were grown under continuous illumination (50-75 uE/m/ combinations of gene specific and T-DNA/transposon prim sec) at 22-23°C. with 65-70% relative humidity. After about ers were used to detect by PCR an insertion event within or 4 weeks, primary inflorescence stems (bolts) are cut off to close to the target gene. The amplified DNA fragments were encourage growth of multiple secondary bolts. After flower 40 then sequenced which allows the precise determination of the ing of the mature secondary bolts, plants were prepared for T-DNA/transposon insertion point relative to the target gene. transformation by removal of all siliques and opened flowers. Insertion events within the coding or intervening sequence of The pots were then immersed upside down in the mixture the genes were deconvoluted from a pool comprising a plu of Agrobacterium infiltration medium as described above for rality of insertion events to a single unique mutant plant for 30 sec, and placed on their sides to allow draining into a 1'x2' 45 functional characterization. The method is described in more flat surface covered with plastic wrap. After 24 h, the plastic detail in Yu and Adam, U.S. application Ser. No. 09/177,733 wrap was removed and pots are turned upright. The immer filed Oct. 23, 1998. sion procedure was repeated one week later, for a total of two immersions per pot. Seeds were then collected from each Example VII transformation pot and analyzed following the protocol 50 described below. Morphological Analysis Example V Morphological analysis was performed to determine whether changes in transcription factor levels affect plant Identification of Arabidopsis Primary Transformants 55 growth and development. This was primarily carried out on the T1 generation, when at least 10-20 independent lines were Seeds collected from the transformation pots were steril examined. However, in cases where a phenotype required ized essentially as follows. Seeds were dispersed into in a confirmation or detailed characterization, plants from Subse solution containing 0.1% (v/v) Triton X-100 (Sigma) and quent generations were also analyzed. sterile H2O and washed by shaking the suspension for 20 min. 60 Primary transformants were selected on MS medium with The wash solution was then drained and replaced with fresh 0.3%. Sucrose and 50 mg/lkanamycin. T2 and later generation wash solution to wash the seeds for 20 min with shaking. plants were selected in the same manner, except that kana After removal of the second wash Solution, a solution con mycin was used at 35 mg/l. In cases where lines carry a taining 0.1% (v/v) Triton X-100 and 70% ethanol (Equistar) Sulfonamide marker (as in all lines generated by Super-trans was added to the seeds and the Suspension was shaken for 5 65 formation), seeds were selected on MS medium with 0.3% min. After removal of the ethanol/detergent solution, a solu sucrose and 1.5 mg/l sulfonamide. KO lines were usually tion containing 0.1% (v/v) TritonX-100 and 30% (v/v) bleach germinated on plates without a selection. Seeds were cold US 7,598.429 B2 103 104 treated (stratified) on plates for 3 days in the dark (in order to C29, C31, or C33 alkanes; sterols, such as brassicasterol, increase germination efficiency) prior to transfer to growth campesterol, Stigmasterol, sitosterol or Stigmastanol or the cabinets. Initially, plates were incubated at 22°C. under a like, glucosinolates, protein or oil levels. light intensity of approximately 100 microEinsteins for 7 Fatty acids were measured using two methods depending days. At this stage, transformants were green, possessed the on whether the tissue was from leaves or seeds. For leaves, first two true leaves, and were easily distinguished from lipids were extracted and esterified with hot methanolic bleached kanamycin or Sulfonamide-susceptible seedlings. HSO and partitioned into hexane from methanolic brine. Resistant seedlings were then transferred onto soil (Sunshine For seed fatty acids, seeds were pulverized and extracted in potting mix). Following transfer to Soil, trays of seedlings methanol: heptane:toluene:2.2-dimethoxypropane:HSO were covered with plastic lids for 2-3 days to maintain humid 10 (39:34:20:5:2) for 90 minutes at 80° C. After cooling to room ity while they became established. Plants were grown on soil temperature the upper phase, containing the seed fatty acid underfluorescent light at an intensity of 70-95 microEinsteins esters, was subjected to GC analysis. Fatty acid esters from and a temperature of 18–23°C. Light conditions consisted of both seed and leaf tissues were analyzed with a Supelco a 24-hour photoperiod unless otherwise stated. In instances SP-2330 column. where alterations in flowering time was apparent, flowering 15 Glucosinolates were purified from seeds or leaves by first was re-examined under both 12-hour and 24-hour light to heating the tissue at 95°C. for 10 minutes. Preheated ethanol: assess whether the phenotype was photoperiod dependent. water (50:50) is and after heating at 95° C. for a further 10 Under 24-hour lightgrowth conditions, the typical generation minutes, the extraction solvent is applied to a DEAE Sepha time (seed to seed) was approximately 14 weeks. dex column which had been previously equilibrated with 0.5 Because many aspects of Arabidopsis development are M pyridine acetate. Desulfoglucosinolates were eluted with dependent on localized environmental conditions, in all cases 300 ul water and analyzed by reverse phase HPLC monitoring plants were evaluated in comparison to controls in the same at 226 mm. flat. Controls for transgenic lines were generally wild-type For wax alkanes, samples were extracted using an identical plants or, where specifically indicated, transgenic plants har method as fatty acids and extracts were analyzed on a HP boring an empty transformation vector selected on kanamy 25 5890 GC coupled with a 5973 MSD. Samples were chromato cin or Sulfonamide. Careful examination was made at the graphically isolated on a J&W DB35 mass spectrometer following stages: seedling (1 week), rosette (2-3 weeks), (J&W Scientific). flowering (4-7 weeks), and late seed set (8-12 weeks). Seed To measure prenyl lipids levels, seeds or leaves were pull was also inspected. Seedling morphology was assessed on verized with 1 to 2% pyrogallol as an antioxidant. For seeds, selection plates. At all other stages, plants were macroscopi 30 extracted samples were filtered and a portion removed for cally evaluated while growing on soil. All significant differ tocopherol and carotenoid/chlorophyll analysis by HPLC. ences (including alterations in growth rate, size, leaf and The remaining material was saponified for sterol determina flower morphology, coloration and flowering time) were tion. For leaves, an aliquot was removed and diluted with recorded, but routine measurements were not be taken if no methanol and chlorophyll A, chlorophyll B, and total caro differences were apparent. In certain cases, stem sections 35 tenoids measured by spectrophotometry by determining opti were stained to reveal lignin distribution. In these instances, cal absorbance at 665.2 nm, 652.5 nm, and 470 mm. An hand-sectioned stems were mounted in phloroglucinol satu aliquot was removed for tocopherol and carotenoid/chloro rated 2M HCl (which stains lignin pink) and viewed imme phyll composition by HPLC using a Waters uBondapak C18 diately under a dissection microscope. column (4.6 mmx 150 mm). The remaining methanolic solu Flowering time was measured by the number of rosette 40 tion was saponified with 10% KOH at 80° C. for one hour. The leaves present when a visible inflorescence of approximately samples were cooled and diluted with a mixture of methanol 3 cm is apparent Rosette and total leafnumber on the progeny and water. A solution of 2% methylene chloride in hexane was stem are tightly correlated with the timing of flowering mixed in and the samples were centrifuged. The aqueous (Koornneef et al (1991) Mol. Gen. Genet 229: 57-66. The methanol phase was again re-extracted 2% methylene chlo Vernalization response was measured. For Vernalization treat 45 ride in hexane and, after centrifugation, the two upper phases ments, seeds were sown to MS agar plates, sealed with were combined and evaporated. 2% methylene chloride in micropore tape, and placed in a 4°C. cold room with low light hexane was added to the tubes and the samples were then levels for 6-8 weeks. The plates were then transferred to the extracted with one ml of water. The upper phase was growth rooms alongside plates containing freshly sown non removed, dried, and resuspended in 400 ul of 2% methylene vernalized controls. Rosette leaves were counted when a vis 50 chloride in hexane and analyzed by gas chromatography ible inflorescence of approximately 3 cm was apparent. using a 50 m DB-5 ms (0.25 mm ID, 0.25 um phase, J&W Scientific). Example VIII Insoluble sugar levels were measured by the method essen tially described by Reiter et al., (1999) Plant J. 12:335-345. Biochemical Analysis 55 This method analyzes the neutral Sugar composition of cell wall polymers found in Arabidopsis leaves. Soluble Sugars Experiments were also performed to identify those trans were separated from Sugar polymers by extracting leaves with formants or knockouts that exhibited modified biochemical hot 70% ethanol. The remaining residue containing the characteristics. Among the biochemicals that were assayed insoluble polysaccharides was then acid hydrolyzed with were insoluble Sugars, such as arabinose, fucose, galactose, 60 allose added as an internal standard. Sugar monomers gener mannose, rhamnose or xylose or the like; prenyl lipids, Such ated by the hydrolysis were then reduced to the corresponding as lutein, B-carotene, Xanthophyl-1, Xanthophyll-2, chloro alditols by treatment with NaBH4, then were acetylated to phylls A or B, or C-, 6- or Y-tocopherol or the like; fatty acids, generate the volatile alditol acetates which were then ana such as 16:0 (palmitic acid), 16:1 (palmitoleic acid), 18:0 lyzed by GC-FID. Identity of the peaks was determined by (stearic acid), 18:1 (oleic acid), 18:2 (linoleic acid), 20:0, 65 comparing the retention times of known Sugars converted to 18:3 (linolenic acid), 20:1 (eicosenoic acid), 20:2, 22:1 (eru the corresponding alditol acetates with the retention times of cic acid) or the like; waxes, such as by altering the levels of peaks from wild-type plant extracts. Alditol acetates were US 7,598.429 B2 105 106 analyzed on a Supelco SP-2330 capillary column (30 mx250 sis can be used with designs to remove within-block umx0.2 um) using a temperature program beginning at 180° variability that would not be removed with the standard split C. for 2 minutes followed by an increase to 220° C. in 4 plot analysis (Papadakis, 1973, Inst. d’Amelior. Plantes Thes minutes. After holding at 220° C. for 10 minutes, the oven saloniki (Greece) Bull. Scientif., No. 23; Papadakis, 1984, temperature is increased to 240°C. in 2 minutes and held at 5 Proc. Acad. Athens, 59,326-342). this temperature for 10 minutes and brought back to room temperature. Example IX To identify plants with alterations in total seed oil or pro tein content, 150 mg of seeds from T2 progeny plants were Plate-based Physiology Experimental Methods subjected to analysis by Near Infrared Reflectance Spectros 10 copy (NIRS) using a Foss NirSystems Model 6500 with a Plate Assays. Twelve different plate-based physiological spinning cup transport system. NIRS is a non-destructive assays (shown below), representing a variety of drought analytical method used to determine seed oil and protein stress related conditions, are used as a pre-screen to identify composition. Infrared is the region of the electromagnetic top performing lines from each project (i.e. lines from trans spectrum located after the visible region in the direction of 15 formation with a particular construct), that will be tested in longer wavelengths. Near infrared owns its name for being Subsequent Soil based assays. Typically, ten lines are Sub the infrared region near to the visible region of the electro jected to plate assays, from which the best three lines are magnetic spectrum. For practical purposes, near infrared selected for Subsequent soil based assays. However, in comprises wavelengths between 800 and 2500 nm. NIRS is projects where significant stress tolerance is not obtained in applied to organic compounds rich in O—H bonds (such as plate based assays, lines are not submitted for soil assays. moisture, carbohydrates, and fats), C-H bonds (such as In addition, Some projects are subjected to nutrient limita organic compounds and petroleum derivatives), and N—H tion studies. A nutrient limitation assay is intended to find bonds (such as proteins and amino acids). The NIRS analyti genes that allow more plant growth upon deprivation of nitro cal instruments operate by Statistically correlating NIRS sig gen. Nitrogen is a major nutrient affecting plant growth and nals at several wavelengths with the characteristic or property 25 development that ultimately impacts yield and stress toler intended to be measured. All biological Substances contain ance. These assays monitor primarily root but also rosette thousands of C H, O—H, and N—H bonds. Therefore, the growth on nitrogen deficient media. In all higher plants, inor exposure to near infrared radiation of a biological sample, ganic nitrogen is first assimilated into glutamate, glutamine, Such as a seed, results in a complex spectrum which contains aspartate and asparagine, the four amino acids used to trans qualitative and quantitative information about the physical 30 port assimilated nitrogen from sources (e.g. leaves) to sinks and chemical composition of that sample. (e.g. developing seeds). This process is regulated by light, as The numerical value of a specific analyte in the sample, well as by C/N metabolic status of the plant. We use a C/N Such as protein content or oil content, is mediated by a cali sensing assay to look for alterations in the mechanisms plants bration approach known as chemometrics. Chemometrics use to sense internal levels of carbon and nitrogen metabolites applies statistical methods such as multiple linear regression 35 which could activate signal transduction cascades that regu (MLR), partial least squares (PLS), and principle component late the transcription of N-assimilatory genes. To determine analysis (PCA) to the spectral data and correlates them with a whether these mechanisms are altered, we exploit the obser physical property or other factor, that property or factor is Vation that wild-type plants grown on media containing high directly determined rather than the analyte concentration levels of Sucrose (3%) without a nitrogen source accumulate itself. The method first provides “wet chemistry” data of the 40 high levels of anthocyanins. This Sucrose induced anthocya samples required to develop the calibration. nin accumulation can be relieved by the addition of either Calibration for Arabidopsis seed oil composition was per inorganic or organic nitrogen. We use glutamine as a nitrogen formed using accelerated Solvent extraction using 1 g seed Source since it also serves as a compound used to transport N sample size and was validated against certified canola seed. A in plants. similar wet chemistry approach was performed for seed pro 45 Germination assays. NaCl (150 mM), mannitol (300 mM), tein composition calibration. sucrose (9.4%), ABA (0.3 uM), Heat (32° C.), Cold (8° C.), Data obtained from NIRS analysis was analyzed statisti -N is basal media minus nitrogen plus 3% Sucrose and -N/+ cally using a nearest-neighbor (N N) analysis. The N N Gln is basal media minus nitrogen plus 3% Sucrose and 1 mM analysis allows removal of within-block spatial variability in glutamine. a fairly flexible fashion which does not require prior knowl 50 Growth assays. Severe dehydration (drought), heat (32°C. edge of the pattern of variability in the chamber. Ideally, all for 5 days followed by recovery at 22°C.), chilling (8° C.), hybrids are grown under identical experimental conditions root development (visual assessment of lateral and primary within a block (rep). In reality, even in many block designs, roots, root hairs and overall growth). For the nitrogen limita significant within-block variability exists. Nearest-neighbor tion assay, all components of MS medium remain constant procedures are based on assumption that environmental effect 55 except N is reduced to 20 mg/L of NHNO. Note that 80% of a plot is closely related to that of its neighbors. Nearest MS has 1.32 g/L NHNO, and 1.52 g/L KNO. neighbor methods use information from adjacent plots to Unless otherwise stated, all experiments are performed adjust for within-block heterogeneity and so provide more with the Arabidopsis thaliana ecotype Columbia (col-0). precise estimates of treatment means and differences. If there Assays are usually performed on non-selected segregating T2 is within-plot heterogeneity on a spatial scale that is larger 60 populations (in order to avoid the extra stress of selection). thana single plot and Smaller than the entire block, then yields Control plants for assays onlines containing direct promoter from adjacent plots will be positively correlated. Information fusion constructs are Col-O plants transformed an empty from neighboring plots can be used to reduce or remove the transformation vector (pMEN65). Controls for 2-component unwanted effect of the spatial heterogeneity, and hence lines (generated by Supertransformation) are the background improve the estimate of the treatment effect. Data from neigh 65 promoter-driver lines (i.e. promoter:LexA-GAL4TA lines), boring plots can also be used to reduce the influence of com into which the supertransformations were initially per petition between adjacent plots. The Papadakis N N analy formed. US 7,598.429 B2 107 108 All assays are performed in tissue culture. Growing the dure, seedlings were first germinated on selection plates con plants under controlled temperature and humidity on sterile taining either kanamycin or Sulfonamide. Seeds were steril medium produces uniform plant material that has not been ized by a 2 minute ethanol treatment followed by 20 minutes exposed to additional stresses (such as water stress) which in 30% bleach/O.01% Tween and five washes in distilled could cause variability in the results obtained. All assays were water. Seeds are sown to MSagar in 0.1% agarose and strati designed to detect plants that are more tolerant or less tolerant fied for 3 days at 4°C., before transfer to growth cabinets with to the particular stress condition and were developed with a temperature of 22°C. After 7 days of growth on selection reference to the following publications: Jang et al. (1997) plates, seedlings are transplanted to 3.5 inch diameter clay Plant Cell 9:5-19; Smeekens (1998) Curr. Opin. Plant Biol. pots containing 80 g of a 50:50 mix of vermiculite:perlite 1: 230-234; Liu and Zhu (1997) Proc. Natl. Acad. Sci. U.S. A. 10 topped with 80 g of ProMix. Typically, each pot contains 14 94: 14960-14964; Saleki et al. (1993) Plant Physiol. 101: seedlings, and plants of the transgenic line being tested are in 839-845; Wu et al. (1996) Plant Cell 8: 617-627; Zhu et al. separate pots to the wild-type controls. Pots containing the (1998) Plant Cell 10: 1181-1191: Alia et al. (1998) Plant J. transgenic line versus control pots were interspersed in the 16: 155-161; Xin and Browse, (1998) Proc. Natl. Acad. Sci. growth room, maintained under 24-hour light conditions (18 U.S.A. 95: 7799-7804; Leon-Kloosterziel et al. (1996) Plant 15 23°C., and 90-100 uEm’s') and watered for a period of 14 Physiol. 110: 233-240. Where possible, assay conditions days. Water was then withheld and pots were placed on absor were originally tested in a blind experiment with controls that bent diaper paper for a period of 8-10 days to apply a drought had phenotypes related to the condition tested. treatment. After this period, a visual qualitative “drought Procedures score” from 0-6 is assigned to record the extent of visible Prior to plating, seed for all experiments are surface ster drought stress symptoms. A score of “6” corresponds to no ilized in the following manner: (1) 5 minute incubation with visible symptoms whereas a score of “0” corresponds to mixing in 70% ethanol, (2) 20 minute incubation with mixing extreme wilting and the leaves having a "crispy texture. At in 30% bleach, 0.01% triton-X 100, (3) 5x rinses with sterile the end of the drought period, pots are re-watered and scored water, (4) Seeds are re-suspended in 0.1% sterile agarose and after 5-6 days; the number of Surviving plants in each pot is stratified at 4°C. for 3-4 days. 25 counted, and the proportion of the total plants in the pot that All germination assays follow modifications of the same survived is calculated. basic protocol. Sterile seeds are sown on the conditional Slit-pot method. A variation of the above method was media that has a basal composition of 80% MS+Vitamins. Sometimes used, whereby plants for a given transgenic line Plates are incubated at 22°C. under 24-hour light (120-130 were compared to wild-type controls in the same pot. For uEm’s') in a growth chamber. Evaluation of germination 30 those studies, 7 wild-type seedlings were transplanted into and seedling vigor is done 5 days after planting. For assess one half of a 3.5 inch pot and 7 seedlings of the line being ment of root development, seedlings germinated on 80% tested were transplanted into the other half of the pot. MS+Vitamins+1% sucrose are transferred to square plates at Analysis of results. In a given experiment, we typically 7 days. Evaluation is done 5 days after transfer following compare 6 or more pots of a transgenic line with 6 or more growth in a vertical position. Qualitative differences are 35 pots of the appropriate control. (In the split pot method, 12 or recorded including lateral and primary root length, root hair more pots are used.) The mean drought score and mean pro number and length, and overall growth. portion of plants Surviving (Survival rate) are calculated for For chilling (8° C.) and heat sensitivity (32° C.) growth both the transgenic line and the wild-type pots. In each case a assays, seeds are germinated and grown for 7 days on MS+Vi p-value is calculated, which indicates the significance of the tamins+1% sucrose at 22° C. and then are transferred to 40 difference between the two mean values. The results for each chilling or heat stress conditions. Heat stress is applied for 5 transgenic line across each planting for aparticular projectare days, after which the plants are transferred back to 22°C. for then presented in a results table. Results where the lines show recovery and evaluated after a further 5 days. Plants are sub a significantly better or worse performance versus the control jected to chilling conditions (8°C.) and evaluated at 10 days are highlighted. and 17 days. 45 Calculation of p-values. For the assays where control and For severe dehydration (drought) assays, seedlings are experimental plants are in separate pots, Survival is analyzed grown for 14 days on MS+Vitamins+1% Sucrose at 22°C. with a logistic regression to account for the fact that the Plates are opened in the sterile hood for 3 hr for hardening and random variable is a proportion between 0 and 1. The reported then seedlings are removed from the media and let dry for 2h p-value is the significance of the experimental proportion in the hood. After this time they are transferred back to plates 50 contrasted to the control, based upon regressing the logit and incubated at 22°C. for recovery. Plants are evaluated after transformed data. 5 days. Drought score, being an ordered factor with no real Experiments were also performed to identify those trans numeric meaning, is analyzed with a non-parametric test formants or knockouts that exhibited modified Sugar-sensing. between the experimental and control groups. The p-value is For Such studies, seeds from transformants were germinated 55 calculated with a Mann-Whitney rank-sum test. on media containing 5% glucose or 9.4% Sucrose which For the split-potassays, matched control and experimental normally partially restrict hypocotyl elongation. Plants with measurements are available for both variables. In lieu of a altered Sugar sensing may have either longer or shorter hypo direct transformed regression technique for this data, the cotyls than normal plants when grown on this media. Addi logit-transformed proportions are analyzed by parametric 60 methods. The p-value is derived from a paired-t-test on the tionally, other plant traits may be varied Such as root mass. transformed data. For the paired score data, the p-value from Example X a Wilcoxon test is reported. Measurement of Photosynthesis. Photosynthesis was mea Soil Drought Experimental Methods sured using a LICOR LI-6400. The LI-6400 uses infrared gas 65 analyzers to measure carbon dioxide to generate a photosyn The Soil drought assay (performed in clay pots) is based on thesis measurement. It is based upon the difference of the CO that described by Haake et al. (2002). In the current proce reference (the amount put into the chamber) and the CO US 7,598.429 B2 109 110 sample (the amount that leaves the chamber). Since photo might have become reduced between the generations. It synthesis is the process of converting CO to carbohydrates, should be noted that G2340 and G671 (SEQID NO: 19) are we expect to see a decrease in the amount of CO sample. part of the same clade and that they had very similar morpho From this difference, a photosynthesis rate can be generated. logical phenotypes and a similar expression pattern. These In some cases, respiration may occur and an increase in CO two genes may have overlapping or redundant phenotypes in detected. To perform measurements, the LI-6400 is set-up the plant. Small, pale seedlings with strap-like leaves that and calibrated as per LI-6400 standard directions. Photosyn held a vertical orientation were found in the mixed line popu thesis is measured in the youngest most fully expanded leafat lations of 35S::G2340 transgenic seedlings when grown 300 and 1000 ppm CO, using a metalhalide light source. This under sterile conditions, similar to those observed in soil light source provides about 700 uEms'. 10 grown plants in the T1 generation. The necrotic lesions Fluorescence was measured in dark and light adapted observed on the T1 plants grown in soil were not observed on leaves using either a LI-6400 (LICOR) with a leaf chamber the plants grown in culture leaving uncertainty as to whether fluorometer attachment or an OS-1 (Opti-Sciences) as the necrotic lesion phenotype is a classic lesion mimic phe described in the manufacturer's literature. When the LI-6400 notype that would suggest that G2340 is involved in cell death was used, all manipulations were performed under a dark 15 responses or if the G2340 overexpressor plants are simply shade cloth. Plants were dark adapted by placing in a box hyper-sensitive to stresses. One class of lesion mimic forms under this shade cloth until used. The OS-30 utilized small progressive lesions following an inductive stress. Lesion for clips to create dark adapted leaves. mation may be induced in G2340 overexpressing plants Chlorophyll/carotenoid determination. For some experi grown in culture. In addition to the morphological changes, ments, chlorophyll was estimated in methanolic extracts overexpression of G2340 resulted in an extreme alteration in using the method of Porra et al. (1989) Biochim. et Biophys. seed glucosinolate profile. This phenotype was observed in Acta 975: 384-394. Carotenoids were estimated in the same one line, line 1, in seed from two independent plantings. extract at 450 nm using an A(1%) of 2500. We currently are According to RT-PCR analysis, G2340 was expressed prima measuring chlorophyll using a SPAD-502 (Minolata). When rily in roots and was slightly induced in leaf tissue in response the SPAD-502 is being used to measure chlorophyll, both 25 to auxin and heat treatments. G2340 can be used to engineer carotenoid and chlorophyll content and amount can also be plants with an inducible cell death response. A gene that determined via HPLC. Pigments are extracted from leave regulates cell death in plants can be used to induce a pathogen tissue by homogenizing leaves in acetone:ethyl acetate (3:2). protective hyper-response (HR) in plants without the poten Water was added, the mixture centrifuged, and the upper tially detrimental consequences of a constitutive systemic phase removed for HPLC analysis. Samples are analyzed 30 acquired resistance (SAR). Other potential utilities include using a Zorbax C18 (non-endcapped) column (250x4.6) with the creation of novel abscission Zones or inducing death in a gradient of acetonitrile:water (85:15) to acetonitrile:metha reproductive organs to prevent the spread of pollen, trans nol (85:15) in 12.5 minutes. After holding at these conditions genic or otherwise. In the case of necrotrophic pathogens that for two minutes, solvent conditions were changed to metha rely on dead plant tissue as a source of nutrients, prevention of nol:ethyl acetate (68:32) in two minutes. Carotenoids and 35 cell death could confer tolerance to these diseases. Overex chlorophylls are quantified using peak areas and response pression of G2340 in Arabidopsis also resulted in an extreme factors calculated using lutein and B-carotene as standards. alteration in seed glucosinolate profile. Therefore, the gene can be used to alter glucosinolate composition in plants. Example XI Increases or decreases in specific glucosinolates or total glu 40 cosinolate content are desirable depending upon the particu Experimental Results lar application. For example: (1) Glucosinolates are undesir able components of the oilseeds used in animal feed, since they produce toxic effects. Low-glucosinolate varieties of G2340; (SEQID NOs. 17 and 18) canola have been developed to combat this problem. (2) Some G2340 was analyzed using transgenic plants in which the 45 glucosinolates have anti-cancer activity; thus, increasing the gene was expressed under the control of the 35S promoter. levels or composition of these compounds might be of interest Overexpression of G2340 produced a spectrum of deleterious from a nutraceutical standpoint. (3) Glucosinolates form part effects on Arabidopsis growth and development. 35S::G2340 of a plants natural defense against insects. Modification of primary transformants were generally smaller than controls, glucosinolate composition or quantity can therefore afford and at early stages some displayed leaves that were held in a 50 increased protection from predators. Furthermore, in edible vertical orientation. The most severely affected lines died at crops, tissue specific promoters can be used to ensure that early stages. Others survived, but displayed necrosis of the these compounds accumulate specifically in tissues, such as blades in later rosette leaves and cauline leaves. Inflorescence the epidermis, which are not taken for consumption. development was also highly abnormal; stems were typically shorter than wild type, often kinked at nodes, and the tissue 55 G2583: (SEQID NOs. 143 and 144) had a rather fleshy succulent appearance. Flower buds were G2583 was studied using transgenic plants in which the frequently poorly formed, failed to open and withered away gene was expressed under the control of the 35S promoter. without siliques developing. Additionally, secondary shoot Most notably, 35S::G2583 plants exhibited extremely glossy growth frequently failed the tips of such structures sometimes leaves. At early stages, 35S::G2583 seedlings appeared nor senesced. Due to these abnormalities, many of the primary 60 mal, but by about two weeks after sowing, the plants exhibited transformants were completely infertile. Three T1 lines (#1, very striking shiny leaves, which were apparent until very late 5.20) with a relatively weak phenotype, which did set some in development. In addition to this phenotype, it should be seed, were selected for further study. Plants from the T2-20 noted that many lines displayed a variety of other effects such population displayed a strong phenotype, and died early in as a reduction in overall size, narrow curled leaves, or various development. The other two T2 populations were slightly 65 non-specific floral abnormalities, which reduced fertility. small, but the effects were much weaker than those seen in the These effects on leaf appearance were observed in 18/20 parental plants, Suggesting that activity of the transgene primary transformants, and in all the plants from 4/6 of the T2 US 7,598.429 B2 111 112 lines (#2.49 and 15) examined. The glossy nature of the G362 and its homologs to increase trichome density, size or leaves from 35S::G2583 plants can be a consequence of type can have profound utilities in molecular farming prac changes in epicuticular wax content or composition. G2583 tices (for example, the use of trichomes as a manufacturing belongs to a small clade within the large AP2/EREBP Arabi system for complex secondary metabolites), and in producing dopsis family that also contains G975 (SEQ ID NO: 89), insect resistant and herbivore resistant plants. In addition, G1387 (SEQ ID NO: 145), and G977 (SEQ ID NO: 147). G362 can be used to alter a plants time to flowering. Overexpression of G975 (SEQID NO: 89) caused a substan G2105: (SEQID NOs. 63 and 64) tial increase in leaf wax components, as well as morphologi The ORF boundary of G2105 was determined and G2105 cal phenotypes resembling those observed in 35S::G2583 was analyzed using transgenic plants in which G2105 was plants. G2583 was ubiquitously expressed (at higher levels in 10 expressed under the control of the 35S promoter. Two of four root, flower, embryo, and silique tissues). G2583 can be used T2 lines examined appeared darkgreen and were Smaller than to modify plant appearance (shiny leaves). In addition, it can wild type at all stages of development. Additionally, the be used to manipulate wax composition, amount, or distribu adaxial leaf Surfaces from these plants had a somewhat tion, which in turn can modify plant tolerance to drought lumpy appearance caused by trichomes being raised-up on and/or low humidity or resistance to insects. 15 small mounds of epidermal cells. Two lines of G2105 over G362: (SEQID NOs. 61 and 62) expressing plants had larger seed. G2105 expression was root specific and induced in leaves by auxin, abscisic acid, high G362 was analyzed using transgenic plants in which G362 temperature, Salt and osmotic stress treatments. On the basis was expressed under the control of the 35S promoter. 35S:: of the analyses, G2105 can be used to manipulate Some aspect G362 had a number of developmental effects with the most of plant growth or development, particularly in trichome prominent result being an increase in trichome number as development. In addition, G2105 can be used to modify seed well as the ectopic formation of trichomes. Overexpression of size and/or morphology, which can have an impact on yield. G362 also increased anthocyanin levels in various tissues at The promoter of G2105 can have some utility as a root spe different stages of growth. Seedlings sometimes showed high cific promoter. levels of pigment in the first true leaves. Late flowering lines 25 also became darkly pigmented. Seeds from a number of lines G47 (SEQID NOs. 65 and 66) were observed to develop patches of dark purple pigmenta G47 was studied using transgenic plants in which the gene tion. Inflorescences from 35S::G362 plants were thin, and was expressed under the control of the 35S promoter. Over flowers sometimes displayed poorly developed organs. The expression of G47 resulted in a variety of morphological and seed yield from many lines was somewhat poor. As deter 30 physiological phenotypic alterations. 35S::G47 plants mined by RT-PCR, G362 is expressed in roots, and is showed enhanced tolerance to osmotic stress. In a root growth expressed at significantly lower levels in siliques, seedlings assay on PEG-containing media, G47 overexpressing trans and shoots. No expression of G362 was detected in the other genic seedlings were larger and had more root growth com tissues tested. G362 expression was induced in rosette leaves pared with wild-type controls. G47 expression levels may be by heat stress. G362 can be used to alter trichome number and 35 altered by environmental conditions, in particular reduced by distribution in plants. Trichome glands on the Surface of many salt and osmotic stresses. In addition to the phenotype higher plants produce and secrete exudates which give pro observed in the osmotic stress assay, germination efficiency tection from the elements and pests Such as insects, microbes for the seeds from G47 overexpressor plants was low. Over and herbivores. These exudates may physically immobilize expression of G47 also produced a substantial delay in flow insects and spores, may be insecticidal or ant-microbial or 40 ering time and caused a marked change in shoot architecture. they may allergens or irritants to protect against herbivores. 35S::G47 transformants were small at early stages and Trichomes have also been Suggested to decrease transpiration switched to flowering more than a week later than wild-type by decreasing leaf Surface airflow, and by exuding chemicals controls (continuous light conditions). The inflorescences that protect the leaf from the sun. Another use for G362 is to from these plants appeared thick and fleshy, had reduced increase the density of cotton fibers in cotton bolls. Cotton 45 apical dominance, and exhibited reduced internode elonga fibers are modified unicellular trichomes that are produced tion leading to a short compact stature. The branching pattern from the ovule epidermis. However, typically only 30% of the of the stems also appeared abnormal, with the primary shoot epidermal cells take on a trichome fate (Basra and Malik becoming kinked at each coflorescence node. Additionally, (1984) Int. Rev. Cytol. 89: 65-113). Thus, cotton yields can be the plants showed slightly reduced fertility and formed rather increased by inducing a greater proportion of the ovule epi 50 small siliques that were borne on short pedicels and held dermal cells to become fibers. Depending on the plant spe Vertically, close against the stem. Additional alterations were cies, varying amounts of diverse secondary biochemicals (of detected in the inflorescence stems of 35S::G47 plants. Stem ten lipophilic terpenes) are produced and exuded or sections from T2-21 and T2-24 plants were of wider diameter, volatilized by trichomes. These exotic secondary biochemi and had large irregular vascular bundles containing a much cals, which are relatively easy to extract because they are on 55 greater number of xylem vessels than wildtype. Furthermore, the surface of the leaf, have been widely used in such products some of the xylem vessels within the bundles appeared nar as flavors and aromas, drugs, pesticides and cosmetics. One row and were possibly more lignified than were those of class of secondary metabolites, the diterpenes, can effect controls. G47 was expressed at higher levels in rosette leaves, several biological systems such as tumor progression, pros and transcripts were detected in other tissues (flower, embryo, taglandin synthesis and tissue inflammation. In addition, 60 silique, and germinating seedling). G47 can be used to diterpenes can act as insect pheromones, termite allomones, manipulate flowering time, to modify plant architecture and and can exhibit neurotoxic, cytotoxic and antimitotic activi stem structure (including development of vascular tissues and ties. As a result of this functional diversity, diterpenes have lignin content) and to improve plant performance under been the target of research several pharmaceutical ventures. osmotic stress. The use of G47 or of G47 orthologs from tree In most cases where the metabolic pathways are impossible to 65 species can be used to modulate lignin content of a plant. This engineer, increasing trichome density or size on leaves may allows the quality of wood used for furniture or construction be the only way to increase plant productivity. Thus, the use of to be improved. Lignin is energy rich; increasing lignincom US 7,598.429 B2 113 114 position could therefore be valuable in raising the energy phylogeneitc analysis, has been shown to confer a transcrip content of wood used for fuel. Conversely, the pulp and paper tional regulatory activity of G975 in that when the polypep industries seek wood with a reduced lignin content. Cur tide sequences were overexpressed in plants and produced rently, lignin must be removed in a costly process that some lines that were later in their development and flowering, involves the use of many polluting chemicals. Consequently, and produced shiny leaves, indicating more wax production, lignin is a serious barrier to efficient pulp and paper produc similar to plants overexpressing G975 (Table 4), as compared tion. In addition to forest biotechnology applications, chang to controls. Other closely related sequences include G1387 ing lignin content might increase the palatability of various (SEQID NO: 145 and 146), and G4294 (SEQID NO: 149 and fruits and vegetables. A wide variety of applications exist for 150). Although detailed analyses with plants overexpressing systems that either lengthen or shorten the time to flowering. 10 these sequence have not yet been performed, plants overex Closely-related homologs of G47, determined by BLAST, pressing these related sequences are likely to confer some alignment and phylogeneitc analysis, include G2133 (SEQ similar transcriptional regulatory activity and traits as G975. ID NO: 152), G3643 (SEQ ID NO: 158), G3644 (SEQ ID G214: (SEQID NOs. 33 and 34) NO: 156), and G3649 (SEQ ID NO: 154). Each of these G214 overexpressing lines were latebolting, showed larger sequences has conferred a transcriptional regulatory activity 15 biomass (increased leaf number and size), and were darker of G47 in that when any of these sequences were overex green in vegetative and reproductive tissues due to a higher pressed in plants, they have each produced some lines that chlorophyll content in the later stages of development. In were larger, later in their development and flowering, and these later stages, the overexpressor plants also had higher more tolerant to water-deprivation, cold or salt, similar to insoluble Sugar, leaf fatty acid, and carotenoid content per plants overexpressing G47 (Table 4), as compared to controls. unit area. Line 11 also showed a significant, repeatable G975: (SEQID NOS. 89 and 90) increase in lutein levels in seeds. Micro-array data was con G975 was identified as a new member of the AP2/EREBP sistent with the morphological and biochemical data in that family (EREBP subfamily) of transcription factors. G975 the genes that were highly induced included chloroplast was expressed in flowers and, at lower levels, in shoots, 25 localized enzymes, and light regulated genes such as leaves, and siliques. GC-FID and GC-MS analyses of leaves Rubisco, carbonic anhydrase, and the photosystem 1 reaction from G975 overexpressing plants showed that the levels of center subunit precursor. A chlorophyll biosynthetic enzyme C29, C31, and C33 alkanes were substantially increased (up was also highly induced, consistent with the dark green color to 10-fold) compared with control plants. A number of addi of the adult leaves and perhaps a higher photosynthetic rate. A tional compounds of similar molecular weight, presumably 30 measurement of leaf fatty acid in the older overexpressors also wax components, also accumulated to significantly suggested that the overall levels were higher than wild-type higher levels in G975 overexpressing plants. C29 alkanes levels (except for the percent composition of 16:3 in line 11). constituted close to 50% of the wax content in wild-type Percent composition of 16:1 and 16:3 (fatty acids found pri plants (Millar et al. (1998) Plant Cell 11: 1889-1902), sug marily in plastids) is similar to wild-type arguing against an gesting that a major increase in total wax content occurred in 35 increase in chloroplast number as an explanation for increase the G975 transgenic plants. However, the transgenic plants chlorophyll content in the leaves. G214 overexpressing lines had an almost normal phenotype (although Small morpho 3, 11, and 15 were sensitive to germination on high glucose logical differences are detected in leaf appearance), indicat showing less cotyledon expansion and hypocotyl elongation ing that overexpression of G975 was not deleterious to the Suggesting the late bolting and darkgreen phenotype could be plant. Overexpression of G975 did not cause the dramatic 40 tied into carbon sensing which has been shown to regulate alterations in plant morphology that had been reported for phytochrome A signaling. Sugars are key regulatory mol Arabidopsis plants in which the FATTY ACID ELONGA ecules that affect diverse processes in higher plants including TION1 gene was overexpressed (Millar et al. (1998) supra). germination, growth, flowering, senescence, Sugar metabo G975 may regulate the expression of some of the genes lism and photosynthesis. Glucose-specific hexose-sensing involved in wax metabolism. One Arabidopsis AP2 sequence 45 has also been described in plants and implicated in cell divi (G1387: SEQID NO: 145) that is significantly more closely sion and the repression of famine genes (photosynthetic or related to G975 than the rest of the members of the AP2/ glyoxylate cycles). Potential utilities of G214 include EREBP family is predicted to have a function and a use increasing chlorophyll content allowing more growth and related to that of G975. G975 can be used to manipulate wax productivity in conditions of low light. With a potentially higher photosynthetic rate, fruits can have higher Sugar con composition, amount, or distribution, which in turn can 50 modify plant tolerance to drought and/or low humidity or tent. Increased carotenoid content can be used as a nutraceu resistance to insects, as well as plant appearance (shiny tical to produce foods with greater antioxidant capability. leaves). G975 can also be used to specifically alter wax com Also G214 can be used to manipulate seed composition position, amount, or distribution in those plants and crops which is very important for the nutritional value and produc from which wax is a valuable product. 55 tion of various food products. A non-Arabidopsis gene that is related to G975 is LA6408 G214 is homologous to a tomato (Cornell Lycopersicon BNAF1258 Mustard flower buds Brassica rapa cDNA clone esculentum) EST (cLER12A11) generated from a Pseudomo F1258. The similarity between G975 and the Brassica rapa nas resistant line. gene represented by EST LA6408 extends beyond the con G974: (SEQID NOS. 51 and 52) served AP2 domain that characterizes the AP2/EREBP fam 60 The complete sequence of G974 was obtained and G974 ily. This Brassica rapa gene appeared to be more closely was studied using transgenic plants in which G974 was related to G975 than Arabidopsis G1387, indicating that EST expressed under the control of the 35S promoter. Constitutive LA6408 may represent a true G975 ortholog. The similarity expression of G974 produced deleterious effects: the major between G975 and Arabidopsis G1387 (SEQ ID NO: 145) ity of 35S::G974 primary transformants showed a reduction also extends beyond the conserved AP2 domain. 65 in overall size and developed rather slowly compared to wild G2583 (SEQ ID NO: 143 and 144), a closely-related type controls. These phenotypic alterations were not homolog of G975 determined by BLAST, alignment and observed in the T2 generation, perhaps indicating silencing of US 7,598.429 B2 115 116 the transgene. The T2 plants were wild-type in the physiologi glucosinolate composition. Increases or decreases in specific cal and biochemical analyses performed. G974 was ubiqui glucosinolates or total glucosinolate content are be desirable tously expressed. 35S::G974 had altered seed oil content depending upon the particular application. For example: (1) Several AP2 proteins from a variety of species (Atriplex Glucosinolates are undesirable components of the oilseeds hortensis, Lycopersicon esculentum, Glycine max, Populus 5 used in animal feed, since they produce toxic effects. Low balsamifera, Medicago truncatula) exhibited Some sequence glucosinolate varieties of canola have been developed to com similarity with G974 outside of the signature AP2 domain bat this problem. (2) Some glucosinolates have anti-cancer sequence, and bear nearly identical AP2 domains. These pro activity; thus, increasing the levels or composition of these teins may be related. compounds might be of interest from a nutraceutical stand G2343: (SEQID NOs. 53 and 54) 10 point. (3) Glucosinolates form part of a plant's natural The complete sequence of G2343 was determined and defense against insects. Modification of glucosinolate com G2343 was analyzed using transgenic plants in which G2343 position or quantity can therefore afford increased protection was expressed under the control of the 35S promoter. The from predators. Furthermore, in edible crops, tissue specific phenotype of these transgenic plants was wild-type in all promoters can be used to ensure that these compounds accu 15 mulate specifically in tissues, such as the epidermis, which assays performed. As determined by RT-PCR, G2343 is are not taken for consumption. G2520 can also be used to expressed in shoots, embryos and siliques. G2343 expression modify seed tocopherol composition. Tocopherols have anti is induced in rosette leaves by auxin, heat stress, and infection by Fusarium oxysporum. 35S::G2343 had an altered seed oil oxidant and vitamin E activity. COntent G2343 is a related tomato gene LETHM1 (CAA64615). Example XII Similarity between G2343 and LETHM1 extends beyond the signature motif of the family to a level that would suggest the Identification of Homologous Sequences genes are orthologs. Homologous sequences from Arabidopsis and plant spe G2123: (SEQID NOS. 67 and 68) 25 cies other than Arabidopsis were identified using database G2123 was analyzed using transgenic plants in which sequence search tools, such as the Basic Local Alignment G2123 was expressed under the control of the 35S promoter. Search Tool (BLAST) (Altschul et al. (1990).J. Mol. Biol. The phenotype of these transgenic plants was wild-type in all 215: 403-410; and Altschulet al. (1997) Nucl. Acid Res. 25: assays performed. G2123 was expressed primarily in devel 3389-3402). The tblastx sequence analysis programs were oping seeds and silique tissue in wild-type plants. G2123 30 employed using the BLOSUM-62 scoring matrix (Henikoff corresponds to a predicted putative 14-3-3 protein in anno and Henikoff (1992) Proc. Natl. Acad. Sci. USA 89: 10915 tated BAC clone T11 I11 (AC012680), from chromosome 1 of 10919). Arabidopsis. The polynucleotide and polypeptide sequences derived G1777: (SEQ ID NOs. 55 and 56) from monocots (e.g., the rice or maize sequences) may be G1777 (SEQ ID NO: 55) was analyzed using transgenic 35 used to transform both monocot and dicot plants, and those plants in which G1777 was expressed under the control of the derived from dicots (e.g., the Arabidopsis and soy sequences) 35S promoter. Overexpression of G1777 in Arabidopsis may be used to transform either group, although it is expected resulted in an increase in seed oil content and a decrease in that some of these sequences will function best if the gene is seed protein content in T2 lines 1 and 13. The change in seed transformed into a plant from the same class as that from oil inline 1 was just below the significance cutoff, but the seed 40 which the sequence is derived. protein change was significant. G1777 was expressed in all examined tissue of Arabidopsis. G1777 was induced by auxin Example XIII and ABA treatment, and by heat stress. G1777 has utility in manipulating seed oil and protein content. 45 Transformation of Dicots to Produce Improved G2520; (SEQID NOs. 37 and 38) Biochemical and Other Traits G2520 was analyzed using transgenic plants in which G2520 was expressed under the control of the 35S promoter. Homologous sequences from Arabidopsis and plant spe At early stages, 35S::G2520 transformants displayed abnor cies other than Arabidopsis were identified using database mal curled cotyledons, long hypocotyls, and rather short 50 sequence search tools, such as the Basic Local Alignment roots. During the vegetative phase, these plants formed some Search Tool (BLAST) (Altschuletal. (1990) supra; and Alts what small flat leaves. Following the switch to reproductive chuletal. (1997) supra). The tblastx sequence analysis pro growth, 35S::G2520 inflorescences were typically very spin grams were employed using the BLOSUM-62 scoring matrix dly, slightly pale colored, and stems often split open at late (Henikoff and Henikoff (1992) supra). stages. Flowers were frequently small with narrow organs and 55 Crop species including tomato and Soybean plants that showed poor pollen production. As a result, the seed yield overexpress any of a considerable number of the transcription from 35S::G2520 plants was low compared to wild-type con factor polypeptides of the invention have been shown experi trols. These effects were observed in the majority of primary mentally to produce plants with increased drought tolerance transformants, and to varying extents, in all three of the T2 and/or biomass in field trials. For example, tomato plants populations. Overexpression of G2520 also resulted in an 60 overexpressing the G2153 polypeptide have been found to be increase in the leaf glucosinolate M39478 in lines 11 and 14. larger than wild-type control tomato plants. For example, Soy In addition, these lines showed an increase in seed 8-toco plants overexpressing a number of G481, G682, G867 and G pherol and a decrease in seed Y-tocopherol. No altered phe 1073 orthologs have been shown to be more drought tolerant notypes were detected in any of the physiological assays. than control plants. These observations indicate that these G2520 was expressed throughout the plant and was induced 65 genes, when overexpressed, will result in larger yields than by ABA, heat, salt, drought and osmotic stress. G2520 is non-transformed plants in both stressed and non-stressed useful for manipulating plant development and altering leaf conditions. US 7,598.429 B2 117 118 Thus, transcription factor polynucleotide sequences listed may be conducted using the protocols of Koornneef et al in the Sequence Listing recombined into, for example, one of (1986) In Tomato Biotechnology: Alan R. Liss, Inc., 169-178, the expression vectors of the invention, or another suitable and in U.S. Pat. No. 6,613,962, the latter method described in expression vector, may be transformed into a plant for the brief here. Eight day old cotyledon explants are precultured purpose of modifying plant traits for the purpose of improv ingyield and/or quality. The expression vector may contain a for 24 hours in Petri dishes containing a feeder layer of constitutive, tissue-specific or inducible promoter operably Petunia hybrida suspension cells plated on MS medium with linked to the transcription factor polynucleotide. The cloning 2% (w/v) sucrose and 0.8% agar supplemented with 10 LM vector may be introduced into a variety of plants by means C.-naphthalene acetic acid and 4.4LM 6-benzylaminopurine. well known in the art such as, for example, direct DNA 10 The explants are then infected with a diluted overnight culture transfer or Agrobacterium tumefaciens-mediated transforma of Agrobacterium tumefaciens containing an expression vec tion. It is now routine to produce transgenic plants using most tor comprising a polynucleotide of the invention for 5-10 dicot plants (see Weissbach and Weissbach, (1989) supra; minutes, blotted dry on sterile filter paper and cocultured for Gelvin et al. (1990) supra: Herrera-Estrella et al. (1983) 48 hours on the original feeder layer plates. Culture condi supra; Bevan (1984) supra; and Klee (1985) supra). Methods 15 tions are as described above. Overnight cultures of Agrobac for analysis of traits are routine in the art and examples are disclosed above. terium tumefaciens are diluted in liquid MS medium with 2% Numerous protocols for the transformation of tomato and (w/v/) sucrose, pH 5.7) to an ODoo of 0.8. soy plants have been previously described, and are well Following cocultivation, the cotyledon explants are trans known in the art. Gruber et al. (1993) in Methods in Plant ferred to Petri dishes with selective medium comprising MS Molecular Biology and Biotechnology, p. 89-119, and Glick medium with 4.56 uM Zeatin, 67.3 uM Vancomycin, 418.9 and Thompson (1993) Methods in Plant Molecular Biology uM cefotaxime and 171.6 uM kanamycin sulfate, and cul and Biotechnology, eds. CRC Press, Inc., Boca Raton, tured under the culture conditions described above. The describe several expression vectors and culture methods that explants are subcultured every three weeks onto fresh may be used for cell or tissue transformation and Subsequent 25 regeneration. For soybean transformation, methods are medium. Emerging shoots are dissected from the underlying described by Mikietal. (1993) in Methods in Plant Molecular callus and transferred to glass jars with selective medium Biology and Biotechnology, p. 67-88, Glick and Thompson, without Zeatin to form roots. The formation of roots in a eds., CRC Press, Inc., Boca Raton; and U.S. Pat. No. 5,563, kanamycin Sulfate-containing medium is a positive indica 055, (Townsend and Thomas), issued Oct. 8, 1996. 30 tion of a successful transformation. There are a substantial number of alternatives to Agrobac Transformation of soybean plants may be conducted using terium-mediated transformation protocols, other methods for the purpose of transferring exogenous genes into Soybeans or the methods found in, for example, U.S. Pat. No. 5,563,055 tomatoes. One Such method is microprojectile-mediated (Townsend et al., issued Oct. 8, 1996), described in briefhere. transformation, in which DNA on the surface of microprojec 35 In this method soybean seed is surface sterilized by exposure tile particles is driven into plant tissues with a biolistic device to chlorine gas evolved in a glass bell jar. Seeds are germi (see, for example, Sanford et al. (1987) Part. Sci. Technol. 5: nated by plating on 1/10 strength agar Solidified medium 27-37; Christou et al. (1992) Plant. J. 2: 275-281; Sanford without plant growth regulators and culturing at 28°C. with a (1993) Methods Enzymol. 217: 483-509; Klein et al. (1987) 16 hour day length. After three or four days, seed may be Nature 327: 70-73: U.S. Pat. No. 5,015,580 (Christou et al), 40 issued May 14, 1991; and U.S. Pat. No. 5,322,783 (Tomes et prepared for cocultivation. The seedcoat is removed and the al.), issued Jun. 21, 1994). elongating radicle removed 3-4 mm below the cotyledons. Alternatively, Sonication methods (see, for example, Overnight cultures of Agrobacterium tumefaciens harbor Zhangetal. (1991) Bio/Technology 9: 996-997); direct uptake ing the expression vector comprising a polynucleotide of the of DNA into protoplasts using CaCl precipitation, polyvinyl 45 alcohol or poly-L-ornithine (see, for example, Hain et al. invention are grown to log phase, pooled, and concentrated by (1985) Mol. Gen. Genet. 199: 161-168; Draper et al. (1982) centrifugation. Inoculations are conducted in batches Such Plant Cell Physiol. 23: 451–458); liposome or spheroplast that each plate of seed was treated with a newly resuspended fusion (see, for example, Deshayes et al. (1985) EMBOJ, 4: pellet of Agrobacterium. The pellets are resuspended in 20 ml 2731-2737; Christou et al. (1987) Proc. Natl. Acad. Sci. USA 50 inoculation medium. The inoculum is poured into a Petridish 84:3962-3966); and electroporation of protoplasts and whole containing prepared seed and the cotyledonary nodes are cells and tissues (see, for example, Donn et al. (1990) in macerated with a surgical blade. After 30 minutes the explants Abstracts of VIIth International Congress on Plant Cell and are transferred to plates of the same medium that has been Tissue Culture IAPTC, A2-38: 53; DHalluin et al. (1992); solidified. Explants are embedded with the adaxial side up and Spencer et al. (1994) Plant Mol. Biol. 24: 51-61) have 55 and level with the surface of the medium and cultured at 22° been used to introduce foreign DNA and expression vectors C. for three days under white fluorescent light. These plants into plants. may then be regenerated according to methods well estab After a plant or plant cell is transformed (and the latter lished in the art, such as by moving the explants after three regenerated into a plant), the transformed plant may be days to a liquid counter-selection medium (see U.S. Pat. No. crossed with itself or a plant from the same line, a non 60 transformed or wild-type plant, or another transformed plant 5,563,055). from a different transgenic line of plants. Crossing provides The explants may then be picked, embedded and cultured the advantages of producing new and often stable transgenic in solidified selection medium. After one month on selective varieties. Genes and the traits they confer that have been media transformed tissue becomes visible as green sectors of introduced into a tomato or soybean line may be moved into 65 regenerating tissue against a background of bleached, less distinct line of plants using traditional backcrossing tech healthy tissue. Explants with green sectors are transferred to niques well known in the art. Transformation of tomato plants an elongation medium. Culture is continued on this medium US 7,598.429 B2 119 120 with transfers to fresh plates every two weeks. When shoots genotype (Fromm et al. (1990) supra; Gordon-Kamm et al. are 0.5 cm in length they may be excised at the base and (1990) supra). After microprojectile bombardment the tissues placed in a rooting medium. are selected on phosphinothricin to identify the transgenic embryogenic cells (Gordon-Kamm (1990) Plant Cell 2: 603 Example XIV 618). Transgenic plants are regenerated by standard corn Transformation of Cereal Plants with an Expression regeneration techniques (Fromm et al. (1990) supra; Gordon Vector Kamm et al. (1990) supra).

Cereal plants such as, but not limited to, corn, wheat, rice, 10 Example XV Sorghum, or barley, may be transformed with the present polynucleotide sequences, including monocot or dicot-de Transcription Factor Expression and Analysis of rived sequences such as those presented in Tables 4-6, cloned Improved Traits into a vector Such as pGA643 and containing a kanamycin resistance marker, and expressed constitutively under, for 15 Biochemical assays Such as those disclosed above may be example, the CaMV 35S or COR15 promoters, or with tissue used to identify improved characteristics in any of the trans specific or inducible promoters. The expression vectors may genic or knock plants produced with sequences of the inven be one found in the Sequence Listing, or any other Suitable tion, such as polynucleotides SEQ ID NO: 2n-1, wherein expression vector may be similarly used. For example, n=1-84, or SEQID NO: 2n, wherein n=121-127. pMENO20 may be modified to replace the NptII coding Northern blot analysis, RT-PCR or microarray analysis of region with the BAR gene of Streptomyces hygroscopicus the regenerated, transformed plants may also be used to show that confers resistance to phosphinothricin. The KpnI and expression of a transcription factor polypeptide or the inven BglII sites of the Bar gene are removed by site-directed tion and related genes that are capable of inducing improved 25 biochemical characteristics, abiotic stress tolerance, and/or mutagenesis with silent codon changes. larger size. The cloning vector may be introduced into a variety of cereal plants by means well known in the art including direct To verify the ability to confer stress resistance, mature DNA transfer or Agrobacterium tumefaciens-mediated trans plants overexpressing a transcription factor of the invention, or alternatively, seedling progeny of these plants, may be formation. The latter approach may be accomplished by a 30 variety of means, including, for example, that of U.S. Pat. No. challenged by a stress such as drought, heat, cold, high salt, or 5,591,616, in which monocotyledon callus is transformed by desiccation. Alternatively, these plants may challenged in a contacting dedifferentiating tissue with the Agrobacterium hyperosmotic stress condition that may also measure altered containing the cloning vector. Sugar sensing. Such as a high Sugar condition. By comparing The sampletissues are immersed in a suspension of 3x10 35 control plants (for example, wild type) and transgenic plants cells of Agrobacterium containing the cloning vector for 3-10 similarly treated, the transgenic plants may be shown to have minutes. The callus material is cultured on Solid medium at greater tolerance to the particular stress. 25°C. in the dark for several days. The calli grown on this After a dicot plant, monocot plant or plant cell has been medium are transferred to Regeneration medium. Transfers transformed (and the latter regenerated into a plant) and are continued every 2-3 weeks (2 or 3 times) until shoots 40 shown to have improved biochemical characteristics, greater develop. Shoots are then transferred to Shoot-Elongation size or tolerance to abiotic stress, or produce greater yield medium every 2-3 weeks. Healthy looking shoots are trans relative to a control plant under the stress conditions, the ferred to rooting medium and after roots have developed, the transformed monocot plant may be crossed with itself or a plants are placed into moist potting soil. plant from the same line, a non-transformed or wild-type The transformed plants are then analyzed for the presence 45 of the NPTII gene/kanamycin resistance by ELISA, using the monocot plant, or another transformed monocot plant from a ELISA NPTII kit from 5Prime-3Prime Inc. (Boulder, Colo.). different transgenic line of plants. It is also routine to use other methods to produce transgenic These experiments would demonstrate that transcription plants of most cereal crops (Vasil (1994) Plant Mol. Biol. 25: factor polypeptides of the invention can be identified and 925-937) such as corn, wheat, rice, sorghum (Cassas et al. 50 shown to conferimproved biochemical characteristics, larger (1993) Proc. Natl. Acad. Sci. 90: 11212-1121), and barley size, greater yield, and/or abiotic stress tolerance in dicots or (Wan and Lemeaux (1994) Plant Physiol. 104:37-48). DNA monocots, including multiple improved biochemical charac transfer methods such as the microprojectile method can be teristics and/or tolerance to multiple stresses. used for corn (Fromm et al. (1990) supra; Gordon-Kamm et It is expected that the same methods may be applied to al. (1990) supra; Ishida (1990) Nature Biotechnol. 14: 745 55 identify other useful and valuable sequences of the present 750), wheat (Vasil et al. (1992) Bio/Technol. 10: 667-674; transcription factor clades, and the sequences may be derived Vasil et al. (1993a) Bio/Technology 10: 667-674; Vasil et al. from a diverse range of species. (1993b) Bio/Technol. 11: 1553-1558; Weeks et al. (1993) supra), and rice (Christou (1991) Bio/Technology 9:957-962: All references, publications, patent documents, web pages, Hiei et al. (1994) Plant J. 6: 271-282; Aldemita and Hodges 60 and other documents cited or mentioned herein are hereby (1996) Planta 199: 612-617; and Hieietal. (1997) Plant Mol. incorporated by reference in their entirety for all purposes. Biol. 35: 205-218). For most cereal plants, embryogenic cells Although the invention has been described with reference to derived from immature scutellum tissues are the preferred specific embodiments and examples, it should be understood cellular targets for transformation (Hiei et al. (1997) supra; that one of ordinary skill can make various modifications Vasil (1994) supra). For transforming corn embryogenic cells 65 without departing from the spirit of the invention. The scope derived from immature Scutellar tissue using microprojectile of the invention is not limited to the specific embodiments and bombardment, the A188XB73 genotype is the preferred examples provided.

US 7,598.429 B2 123 124

- Continued attatccttg ggatggatgt ttcacatgga t ct cotggac agtctgatgt ccc.gtc.catc

tgagttctag ggagtggc.ca citgatat coa aatatagagc atctgttcgg 21OO acacagccitt Ctalaggctga gatgattgag tcc cttgtca agaaaaatgg aactgaagac 216 O gatggcatta t caaggagtt gctgg tagat ttctacacca gct caataa gagaaaacca 222 O gaggat at Ca taattitt cag ggatggtgttg agtgaat citc aattcaatca ggttctgaat 228O attgaacttg atcagat cat cgaggcttgc aagct cittag acgcaaattg galacc caaag 234 O titccttttgt tggtggctica aaagaat cat cataccalagt tct tccago c aacgt.ct cot 24 OO gaaaatgttc CtcCagggac aat cattgac aacaaaatat gtcacccaaa gaacaatgat 246 O t to tacct ct gtgcticacgc tggaatgatt ggalactaccc gcc caactica ctaccacgt.c 252O

Ctgt atgatg agattggttt ttcagctgac gaact tcagg aacttgtc.ca citc.gctotcc 2580 tatgtgtacc aaagaa.gcac cagtgccatt tctgttgttg cgc.cgatctg ctatogct cac 264 O ttggcagctg Ctcagcttgg gacgttcatg aagtttgaag atcagtctga gacat catca 27 OO agc.catggtg gtat cacago tccaggacca atctotgttg cacagct coc aagacitcaaa 276 O gacaacgt.cg CCaact Coat gttcttctgt taa 2793

SEQ ID NO 2 LENGTH: 93 O TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1272 polypeptide SEQUENCE: 2

Met Asp Ser Thr Asn Gly Asn Gly Ala Asp Lieu. Glu Ser Ala Asin Gly 1. 5 1O 15

Ala Asn Gly Ser Gly Val Thr Glu Ala Lieu Pro Pro Pro Pro Pro Wall 25

Ile Pro Pro Asn. Wall Glu Pro Wall Arg Val Lys Thir Glu Luell Ala Glu 35 4 O 45

Lys Gly Pro Val Arg Val Pro Met Ala Arg Lys Gly Phe Gly Thr SO 55 6 O

Arg Gly Glin Lys Ile Pro Lieu. Lieu Thir Asn His Phe Wall Asp Wall 65 70 7s

Ala Asn Luell Gln Gly His Phe Phe His Tyr Ser Wall Ala Luell Phe Tyr 85 90 95

Asp Asp Gly Arg Pro Val Glu Glin Lys Gly Val Gly Arg Lys Ile Lieu. 105 11 O

Asp Wall His Glin Thr Tyr His Ser Asp Lieu. Asp Gly Glu Phe 115 12 O 125

Ala Tyr Asp Gly Glu Lys Thr Lieu. Phe Thr Tyr Gly Ala Luell Pro Ser 13 O 135 14 O

Asn Met Asp Phe Ser Val Val Lieu. Glu Glu Wall Ser Ala Thir Ser 145 150 155 160

Asp Phe Val Ser Arg Ala Asn Gly Asn Gly Ser Pro Asn Gly Asn 1.65 17O 17s

Glu Ser Pro Ser Asp Gly Asp Arg Lys Arg Lieu. Arg Arg Pro Asn Arg 18O 185 19 O

Ser Asn Phe Arg Val Glu Ile Ser Tyr Ala Ala Lys Ile Pro Leu 195

Glin Ala Luell Ala Asn Ala Met Arg Gly Glin Glu Ser Glu Asn Ser Glin US 7,598.429 B2 125 126

- Continued

21 O 215 22O

Glu Ala Ile Arg Wall Lell Asp Ile Ile Luell Arg Glin His Ala Ala Arg 225 23 O 235 24 O

Glin Gly Luell Lell Wall Arg Glin Ser Phe Phe His Asn Asp Pro Thir 245 250 255

Asn Glu Pro Wall Gly Gly Asn Ile Luell Gly Arg Gly Phe His 26 O 265 27 O

Ser Ser Phe Arg Thir Thir Glin Gly Gly Met Ser Lell Asn Met Asp Wall 27s 285

Thir Thir Thir Met Ile Ile Lys Pro Gly Pro Wall Wall Asp Phe Luell Ile 29 O 295 3 OO

Ala Asn Glin Asn Ala Arg Asp Pro Ser Ile Asp Trp Ser Ala 3. OS 310 315

Arg Thir Luell Lys Asn Lell Arg Wall Lys Wall Ser Pro Ser Gly Glin 3.25 330 335

Glu Phe Ile Thir Gly Lell Ser Asp Pro Arg Glu Glin Thir 34 O 345 35. O

Phe Glu Luell Lys Arg Asn Pro Asn Glu ASn Gly Glu Phe Glu Thir 355 360 365

Thir Glu Wall Thir Wall Ala Asp Phe Arg Asp Thir Arg His Ile Asp 37 O 375

Lell Glin Ser Ala Asp Lell Pro Ile ASn Wall Gly Pro Lys 385 390 395 4 OO

Arg Pro Thir Ile Pro Lell Glu Luell Cys Ala Lell Wall Pro Luell Glin 4 OS 415

Arg Thir Lys Ala Lell Thir Thir Phe Glin Arg Ser Ala Luell Wall Glu 425 43 O

Ser Arg Glin Lys Pro Glin Glu Arg Met Thir Wall Lell Ser Ala 435 44 O 445

Lell Lys Wall Ser Asn Asp Ala Glu Pro Luell Lell Arg Ser Gly 450 45.5 460

Ile Ser Ile Ser Ser Asn Phe Thir Glin Wall Glu Gly Arg Wall Luell Pro 465 470

Ala Pro Luell Lys Met Gly Gly Ser Glu Thir Phe Pro Arg Asn 485 490 495

Gly Arg Trp Asn Phe Asn Asn Glu Phe Wall Glu Pro Thir Ile SOO 505

Glin Arg Trp Wall Wall Wall Asn Phe Ser Ala Arg Asn Wall Arg Glin 515 525

Wall Wall Asp Asp Lell Ile Lys Ile Gly Gly Ser Lys Gly Ile Glu Ile 53 O 535 54 O

Ala Ser Pro Phe Glin Wall Phe Glu Glu Gly ASn Glin Phe Arg Arg Ala 5.45 550 555 560

Pro Pro Met Ile Arg Wall Glu Asn Met Phe Asp Ile Glin Ser 565 st O sts

Lell Pro Gly Wall Pro Glin Phe Ile Luell Wall Lell Pro Asp 585 59 O

Asn Ser Asp Luell Tyr Gly Pro Trp Asn Lell Thir Glu Phe 595 605

Gly Ile Wall Thir Glin Met Ala Pro Thir Arg Glin Pro Asn Asp Glin 610 615 62O

Tyr Luell Thir Asn Lell Lell Lell Ile Asn Ala Lell Gly Gly Luell 625 630 635 64 O US 7,598.429 B2 127 128

- Continued

Asn Ser Met Lieu. Ser Val Glu Arg Thr Pro Ala Phe Thir Wall Ile Ser 645 650 655

Wall Pro Thir Ile Ile Leu Gly Met Asp Wall Ser His Gly Ser Pro 660 665 67 O

Gly Glin Ser Asp Val Pro Ser Ile Ala Ala Wall Wall Ser Ser Arg Glu 675 685

Trp Pro Luell Ile Ser Llys Tyr Arg Ala Ser Wall Arg Thir Glin Pro Ser 69 O. 695 7 OO

Lys Ala Glu Met Ile Glu Ser Lieu. Val Llys Llys Asn Gly Thir Glu Asp 7 Os 71O 72O

Asp Gly Ile Ile Lys Glu Lieu. Lieu. Val Asp Phe Thir Ser Ser Asn 72 73 O 73

Arg Pro Glu. His Ile Ile Ile Phe Arg Asp Gly Wall Ser Glu 740 74. 7 O

Ser Glin Phe ASn Glin Wall Lieu. Asn Ile Glu Lieu. Asp Glin Ile Ile Glu 760 765

Ala Cys Lieu. Lieu. Asp Ala Asn Trp Asn Pro Lys Phe Luell Lieu. Luell 770 775

Wall Ala Glin Lys Asn His His Thr Llys Phe Phe Glin Pro Thir Ser Pro 79 O 79. 8OO

Glu Asn Wall Pro Pro Gly Thr Ile Ile Asp Asn Ile His Pro 805 810 815

Asn Asn Asp Phe Tyr Lieu. Cys Ala His Ala Gly Met Ile Gly Thr 825 83 O

Thir Arg Pro Thr His Tyr His Val Lieu. Tyr Asp Glu Ile Gly Phe Ser 835 84 O 845

Ala Asp Glu Lieu. Glin Glu Lieu Wall His Ser Lieu. Ser Wall Tyr Glin 850 855 860

Arg Ser Thir Ser Ala Ile Ser Wall Wall Ala Pro Ile Ala His 865 87O 87s 88O

Lell Ala Ala Ala Glin Lieu. Gly Thr Phe Met Lys Phe Glu Asp Glin Ser 885 890 895

Glu Thir Ser Ser Ser His Gly Gly Ile Thir Ala Pro Gly Pro Ile Ser 9 OO 905 91 O

Wall Ala Glin Lieu Pro Arg Lieu Lys Asp Asn Val Ala Asn Ser Met Phe 915 92 O 925

Phe Cys 93 O

SEQ ID NO 3 LENGTH: 1413 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1506

SEQUENCE: 3 atggggaagc aaggtoctitg citat cactgt ggagttacaa gtacacct ct atggagaaac 6 O gggccaccag agaa.gc.cggit gttgttgcaat gcgtgtggitt cgaggtggag alactaaagga 12 O t cattagtaa act acacacc tott catgct cgtgctgaag gtgatgagac tgagattgag 18O gat catagaa citcaaacggit gatgattaag ggaatgtctt tgaacaaaaa gatt.cccaag 24 O aggaalaccat atcaagaaaa citt cacagtg aaaagagcta acttggaatt c cataccggit 3OO ttcaagagga aggctctgga tgaagaagct agcaatagat cgagttcagg atcggttgta 360 US 7,598.429 B2 129 130

- Continued tcaaactic cq agagctgtgc acaatctaat gcgtgggact cgacttittcc ttgtaagaga agga catgtg tgggacgt.cc aaaggcagct tottctgttg aaaagct cac aaaggat citt tatact attc. tacaagaaca gcaat cittct tgtct citctg gtact tcaga ggaagatttg 54 O Ctttittgaga atgaaacacic aatgctgtta gga catggta gtgttctitat gagagat.cct

Cact caggtg Ctcgagaaga ggaatctgaa gctagotcac t cittagttga gag cagdaag 660 tott catcag tt cattctgt taaatttggit ggaaaagcaa. tgaag cagga gcaagtgaag 72 O aggagcaaat citcaagttctt aggaagacat agttcactac tctgtag cat agatttgaag gatgttitt ca actittgatga gttcatagaa aattt cacag aggaagaaca gcaaaaactg 84 O atgaaattac titcct caagt tact ctdtt gat.cgtc.ctg atago: ct cag aag catgttt 9 OO gagagttctic aattcaaaga gaactitat co ttgttt cago aacttgttggc agatggtgtt 96.O tittgagacaa att cqt citta togcaaaactt gaaga catta aga cacttgc aaagcttgct titat cagatc ctaacaaatc ccatttgttg gaaagctatt acatgct caa gagaa.gaga.g 108 O attgaagact gtgttactac aacat caagg tgagt ccatc gaataataat 114 O agtc.ttgtaa c cattgaaag accttgttgaa agcttaalacc aaaactt. Ct. C agaga Caaga 12 OO ggtgttgatga gaa.gc.ccgaa agaagtgatg aagattagat caaag cacac cgalagagaat 126 O ttagagaata gtgitat ctitc ctittaaac ct gtgagctgttg gtggacctict ggtgtttagc 132O tatgaagata atgatatttic tdatcaggat cittcttcttg gaacggctica 1380 titcc ct caag cagagcttct aaacatgata tga 1413

SEQ ID NO 4 LENGTH: 470 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1506 polypeptide

SEQUENCE: 4

Met Gly Lys Glin Gly Pro Cys Tyr His Cys Gly Wall Thir Ser Thir Pro 1. 5 15

Lell Trp Arg Asin Gly Pro Pro Glu Lys Pro Wall Lell Asn Ala Cys 2O 25

Gly Ser Arg Trp Arg Thr Lys Gly Ser Luell Wall Asn Tyr Thir Pro Leu 35 4 O 45

His Ala Arg Ala Glu Gly Asp Glu Thir Glu Ile Glu Asp His Arg Thr SO 55 6 O

Glin Thir Wall Met Ile Lys Gly Met Ser Luell ASn Ile Pro Llys 65 70 8O

Arg Pro Tyr Glin Glu Asn. Phe Thir Wall Arg Ala Asn Lieu. Glu 85 90 95

Phe His Thir Gly Phe Ala Luell Asp Glu Glu Ala Ser Asn 105 11 O

Arg Ser Ser Ser Gly Ser Wal Wall Ser Asn Ser Glu Ser Ala Glin 115 12 O 125

Ser Asn Ala Trp Asp Ser Thir Phe Pro Arg Arg Thir Cys Val 13 O 135 14 O

Gly Arg Pro Lys Ala Ala Ser Ser Wall Glu Lys Lell Thir Asp Lieu. 145 150 155 160

Thir Ile Lieu. Glin Glu Glin Glin Ser Ser Lell Ser Gly Thir Ser 1.65 17O 17s US 7,598.429 B2 131 132

- Continued

Glu Glu Asp Lieu. Luell Phe Glu Asn Glu Thr Pro Met Lieu. Lieu. Gly His 18O 185 19 O

Gly Ser Wall Luell Met Arg Asp Pro His Ser Gly Ala Arg Glu Glu Glu 195 2OO

Ser Glu Ala Ser Ser Lieu. Lieu Wall Glu Ser Ser Lys Ser Ser Ser Wall 21 O 215 22O

His Ser Wall Llys Phe Gly Gly Lys Ala Met Lys Glin Glu Glin Val Lys 225 23 O 235 24 O

Arg Ser Ser Glin Val Lieu. Gly Arg His Ser Ser Lell Luell Cys Ser 245 250 255

Ile Asp Luell Lys Asp Wall Phe Asn Phe Asp Glu Phe Ile Glu Asn. Phe 26 O 265 27 O

Thir Glu Glu Glu Glin Glin Llys Lieu. Met Lys Lieu. Lell Pro Glin Val Asp 285

Ser Wall Asp Arg Pro Asp Ser Lieu. Arg Ser Met Phe Glu Ser Ser Glin 29 O 295 3 OO

Phe Glu Asn Lieu. Ser Leu Phe Glin Glin Lieu. Wall Ala Asp Gly Val 3. OS 310 315 32O

Phe Glu Thir Asn. Ser Ser Tyr Ala Llys Lieu. Glu Asp Ile Thir Lieu. 3.25 330 335

Ala Luell Ala Lieu. Ser Asp Pro Asn Llys Ser His Lell Luell Glu Ser 34 O 345 35. O

Met Lieu Lys Arg Arg Glu Ile Glu Asp Wall Thir Th Thr 355 360 365

Ser Arg Wall Ser Ser Leul Ser Pro Ser Asn. Asn Asn Ser Luell Wall. Thir 37 O 375

Ile Glu Arg Pro Cys Glu Ser Lieu. ASn Glin Asn Phe Ser Glu Thr Arg 385 390 395 4 OO

Gly Wall Met Arg Ser Pro Llys Glu Val Met Lys Ile Arg Ser Llys His 4 OS 41O 415

Thir Glu Glu Asn Lieu. Glu Asn. Ser Wall Ser Ser Phe Pro Wall Ser 42O 425 43 O

Gly Gly Pro Leu. Wall Phe Ser Tyr Glu Asp Asn Asp Ile Ser Asp 435 44 O 445

Glin Asp Luell Lieu. Luell Asp Val Pro Ser Asn Gly Ser Phe Pro Glin Ala 450 45.5 460

Glu Luell Luell Asn Met Ile 465 470

SEO ID NO 5 LENGTH: 678 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1897

<4 OO SEQUENCE: 5 atgc ctitctgaatticagtga atctogtogg gttcc talaga titc.cccacgg cCaaggagga 6 O tctgttgcga titc.cgacgga t caacaagag cagctitt citt tgaat caacc 12 O aacaccaagt tctgttacta caacaactac aactt. Ct. CaC aac ct cqtca tittctgcaa.g 18O

gtt actggac tocatggaggit act ct cogtg acatt cocqt cggtggtgtt 24 O tcc.cgtaaaa gct caaaacg titc.ccggact tatt cct citg cc.gctaccac citc.cgttgtc 3OO ggaa.gc.cgga actitt.ccctt acaagctacg cctgttctitt tcc ct cagtic gtct tccaac 360 US 7,598.429 B2 133 134

- Continued ggcggit at Ca cgacggcgaa gggaagttgct tcqtcgttct atggcggittt cagct ctittg at Calactaca acgc.cgc.cgt. gag cagaaat gggCCtggtg gcgggitttaa tiggccagat gcttittggtc ttgggcttgg t cacgggit cq tattatgagg acgtcagata tiggcaagga 54 O ataacggit ct ggc.cgtttitc aagtggcgct actgatgctg caact actac aagccacatt gctcaaatac gcagtttgaa ggt caagaga gcaaagttctgg gttcgtgtct 660 ggagacitacg tagcgtga 678

SEQ ID NO 6 LENGTH: 225 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1897 polypeptide <4 OO SEQUENCE: 6 Met Pro Ser Glu Phe Ser Glu Ser Arg Arg Val Pro Lys Ile Pro His 1. 5 1O 15

Gly Glin Gly Gly Ser Val Ala Ile Pro Thr Asp Glin Glin Glu Gln Lieu. 25 3O

Ser Pro Arg Cys Glu Ser Thr Asn. Thir Lys Phe Cys Tyr Tyr Asn 35 4 O 45

Asn Tyr Asn Phe Ser Gln Pro Arg His Phe Cys Llys Ser Cys Arg Arg SO 55 6 O

Tyr Trp Thir His Gly Gly Thr Lieu Arg Asp Ile Pro Val Gly Gly Val 65 70 8O

Ser Arg Ser Ser Lys Arg Ser Arg Thr Tyr Ser Ser Ala Ala Thr 85 90 95

Thir Ser Wall Val Gly Ser Arg Asn Phe Pro Leu Glin Ala Thr Pro Wall 105 11 O

Lell Phe Pro Glin Ser Ser Ser Asn Gly Gly Ile Thir Thr Ala Lys Gly 115 12 O 125

Ser Ala Ser Ser Phe Tyr Gly Gly Phe Ser Ser Lieu. Ile Asn Tyr Asn 13 O 135 14 O

Ala Ala Wall Ser Arg Asn Gly Pro Gly Gly Gly Phe Asn Gly Pro Asp 145 150 155 160

Ala Phe Gly Lieu. Gly Lieu. Gly His Gly Ser Tyr Tyr Glu Asp Val Arg 1.65 17O 17s

Gly Glin Gly Ile Thr Val Trp Pro Phe Ser Ser Gly Ala Thr Asp 18O 185 19 O

Ala Ala Thir Thir Thir Ser His Ile Ala Glin Ile Pro Ala Thr Trp Glin 195 2O5

Phe Glu Gly Glin Glu Ser Llys Val Gly Phe Val Ser Gly Asp Tyr Val 21 O 215 22O

Ala 225

SEO ID NO 7 LENGTH: 1605 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1946

<4 OO SEQUENCE: 7 t ct caccitat tdtaaaaatc accagttt cq tatataaaac cctaattitt c ticaaaatticc 6 O US 7,598.429 B2 135 136

- Continued caaatattga cittggaatca aaaatcc.gaa tggatgtgag caaagta acc acaag.cgacg 12 O gcggaggaga ttcaatggag actaa.gc.cat CtcCt Calacc t cagcctg.cg gcc attctaa 18O gttcaaacgc gcct cotc.cg tittctgagca agacictatga tatggttgat gat cacaata 24 O cagatt cqat tgtct cittgg agtgctaata acaacagttt tat cqtttgg aalaccaccgg 3OO agttcgct cq cgatcttctt cctaagaact ttaa.gcataa taatt to tcc. agctt.cgitta 360 gacagottaa tacctatgt ttcaggaagg ttgacccaga tagatgggaa tittgcgaatg aaggttttitt aagaggltdag aag cacttgc tacaatcaat alactaggcga aaacctg.ccc atggac aggg acagggaCat cagcgatcto agc acticgaa tggacagaac t catctgtta 54 O gcgcatgtgt tgaagttggc aaatttggtc tcgaagaaga agttgaaagg Cttaaaagag ataagaacgt. cct tatgcaa gaact cqtca gattalagaca gcagdalacag tcc actgata 660 acca actt Ca aacgatggitt cagcgt.ct co agggcatgga gaatcggcaa. Caacaattaa. 72 O tgtcatt cct tgcaaaggca gtacaaagcc cit cattt tot atctoaattic ttacago agc agaatcagca aaacgagagt aat aggcgca t cagtgatac Cagtaagaag cggagattica 84 O agcgaga.cgg cattgtc.cgt. aataatgatt ctgct acticc tgatggacag at agtgaagt 9 OO at Calacct CC aatgcacgag caa.gc.caaag caatgtttaa acagottatg aagatggaac 96.O

Cttacaaaac cggcgatgat ggttt cottc taggtaatgg tacgt.ctact accgagggaa cagagatgga gactitcatca aac caagtat cgggtataac t cittaaggaa atgcc tacag

Cttctgagat acagtcatca toaccaattg aaacaacticc tgaaaatgtt tcggcagcat 14 O

Cagaa.gcaac cgagaactgt att cott cac citgatgatct aactott CCC gacitt cactic 2OO atatgctacc ggaaaataat t cagagaa.gc Ctc.ca.gagag titt catggaa ccaaacctgg 26 O gaggttctag to Cattacta gatccagatc tgttgat Ca tgatt ctittg tcc titcgaca 32O ttgacgacitt tccaatggat tctgatatag accctgttga ttacggttta citcgaacgct tact catgtc aagc.ccggitt ccagataata tggattcaac accagtggac aatgaaa.ca.g 44 O agcaggaa.ca aaatggatgg gacaaaacta agcatatgga taatctgact Calacagatgg SOO gtct cotcto tcc tigaaacc ttagatct ct Caaggcaaaa tcc ttgattit tgggagttitt 560 taaagttctitt tgaggtaa.ca cagtic cct ga gag cagdata tt cat 605

<210 SEQ ID NO 8 <211 LENGTH: 485 &212> TYPE : PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: <223> OTHER INFORMATION: G1946 polypeptide

<4 OO SEQUENCE: 8

Met Asp Wall Ser Lys Val Thr Thr Ser Asp Gly Gly Gly Asp Ser Met 1. 5 1O 15

Glu Thir Lys Pro Ser Pro Glin Pro Gln Pro Ala Ala Ile Lieu. Ser Ser 25 3O

Asn Ala Pro Pro Pro Phe Leu Ser Lys Thr Tyr Asp Met Val Asp Asp 35 4 O 45

His Asn Thir Asp Ser Ile Val Ser Trp Ser Ala Asn. Asn. Asn Ser Phe SO 55 6 O

Ile Wall Trp Llys Pro Pro Glu Phe Ala Arg Asp Lieu. Luell Pro Lys Asn 65 70

Phe Lys His Asn. Asn. Phe Ser Ser Phe Val Arg Gln Lieu. Asn Thr Tyr US 7,598.429 B2 137 138

- Continued

85 90 95

Gly Phe Arg Lys Wall Asp Pro Asp Arg Trp Glu Phe Ala Asn Glu Gly 105 11 O

Phe Luell Arg Gly Glin His Luell Luell Glin Ser Ile Thir Arg Arg 115 12 O 125

Pro His Gly Glin Gly Glin Gly His Glin Arg Ser Glin His Ser Asn 135 14 O

Gly Asn Ser Ser Wall Ser Ala Wall Glu Wall Gly Phe Gly 145 150 155 160

Lell Glu Glu Wall Glu Arg Luell Arg Asp Asn Wall Luell Met 1.65 17s

Glin Luell Wall Arg Lell Arg Glin Glin Glin Glin Ser Thir Asp Asn Glin 18O 185 19 O

Lell Thir Met Wall Glin Arg Luell Glin Gly Met Glu Asn Arg Glin Glin 195

Glin Met Ser Phe Lell Ala Ala Wall Glin Ser Pro His Phe Luell 215 22O

Ser Phe Luell Glin Glin Glin Asn Glin Glin ASn Glu Ser Asn Arg Arg 225 23 O 235 24 O

Ile Ser Asp Thir Ser Arg Arg Phe Arg Asp Gly Ile Wall 245 250 255

Arg Asn Asn Asp Ser Ala Thir Pro Asp Gly Glin Ile Wall Lys Glin 26 O 265 27 O

Pro Pro Met His Glu Glin Ala Lys Ala Met Phe Glin Luell Met 285

Met Glu Pro Lys Thir Gly Asp Asp Gly Phe Lell Lell Gly Asn Gly 29 O 295 3 OO

Thir Ser Thir Thir Glu Gly Thir Glu Met Glu Thir Ser Ser Asn Glin Wall 3. OS 310 315

Ser Gly Ile Thir Lell Glu Met Pro Thir Ala Ser Glu Ile Glin Ser 3.25 330 335

Ser Ser Pro Ile Glu Thir Thir Pro Glu Asn Wall Ser Ala Ala Ser Glu 34 O 345 35. O

Ala Thir Glu Asn Cys Ile Pro Ser Pro Asp Asp Lell Thir Luell Pro Asp 355 360 365

Phe Thir His Met Lell Pro Glu Asn Asn Ser Glu Lys Pro Pro Glu Ser 37 O 375

Phe Met Glu Pro Asn Lell Gly Gly Ser Ser Pro Lell Lell Asp Pro Asp 385 390 395 4 OO

Lell Luell Ile Asp Asp Ser Lell Ser Phe Asp Ile Asp Asp Phe Pro Met 4 OS 415

Asp Ser Asp Ile Asp Pro Wall Asp Tyr Gly Luell Lell Glu Arg Luell Luell 425 43 O

Met Ser Ser Pro Wall Pro Asp Asn Met Asp Ser Thir Pro Wall Asp Asn 435 44 O 445

Glu Thir Glu Glin Glu Glin Asn Gly Trp Asp Thir His Met Asp 450 45.5 460

Asn Luell Thir Glin Glin Met Gly Luell Luell Ser Pro Glu Thir Luell Asp Luell 465 470 47s 48O

Ser Arg Glin Asn Pro 485

<210 SEQ ID NO 9 US 7,598.429 B2 139 140

- Continued

LENGTH: 64 O TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2113

SEQUENCE: 9 ataacaaact catcaaactt cct cagogtt tottt ttctt acatalaacaa. ttitt tot tact 6 O ataaacaaat cittgttgttt gttgttgtca tggcaccgac agittaaaacg gcggc.cgt.ca 12 O aaac caacga aggtaacgga gtc.cgittaca gaggagtgag gaagagacca tggggacgtt 18O acgcagc.cga gat cagagat c ctittcaaga agt cacgtgt Ctggctcggit actitt cqaca 24 O citcc tdaaga agcc.gctcgt. gcc tacgaca aacgtgctat tgagttt cqt ggagctaaag 3OO

C Caaalaccala citt cocttgt tacaa.cat Ca acgcc cactg Cttgagtttg acacagagcc 360 tgagccagag Cagcaccgtg gaatcatcgt. t to citaatct Calacct cqqa tctgact citg ttagttcgag att coctet tt cctaagattic aggittaaggc tgggatgatg gtgttcgatg aaaggagtga atcggatt ct tCgtcggtgg tgatggatgt cgittagatat gaaggacgac 54 O gtgtggittitt ggacttggat cittaattit co ctic ct coacc tgagaactga ttaagattta attatgatta ttagatataa ttaaatgttt Ctgaattgag 64 O

SEQ ID NO 10 LENGTH: 166 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2113 polypeptide

<4 OO SEQUENCE: 10 Met Ala Pro Thr Val Lys Thr Ala Ala Wall Lys Thir Asn Glu Gly Asn 1. 5 1O 15

Gly Wall Arg Tyr Arg Gly Val Arg Lys Arg Pro Trp Gly Arg Tyr Ala 25 3O

Ala Glu Ile Arg Asp Pro Phe Lys Llys Ser Arg Wall Trp Luell Gly Thr 35 4 O 45

Phe Asp Thir Pro Glu Glu Ala Ala Arg Ala Tyr Asp Arg Ala Ile SO 55 6 O

Glu Phe Arg Gly Ala Lys Ala Lys Thir Asn. Phe Pro ASn Ile 65 70 7s

Asn Ala His Cys Lieu Ser Lieu. Thr Glin Ser Lieu. Ser Glin Ser Ser Thr 85 90 95

Wall Glu Ser Ser Phe Pro Asn Lieu. Asn Lieu. Gly Ser Asp Ser Wall Ser 105 11 O

Ser Arg Phe Pro Phe Pro Lys Ile Glin Val Lys Ala Gly Met Met Wall 115 12 O 125

Phe Asp Glu Arg Ser Glu Ser Asp Ser Ser Ser Wall Wall Met Asp Wall 13 O 135 14 O

Wall Arg Glu Gly Arg Arg Val Val Lieu. Asp Lell Asp Luell Asn. Phe 145 150 155 160

Pro Pro Pro Pro Glu Asn 1.65

SEQ ID NO 11 LENGTH: 506 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: US 7,598.429 B2 141 142

- Continued

OTHER INFORMATION: G2117

SEQUENCE: 11 at acttgtca acaaaaattt tottaaagaa cqcataactg ttitt t t t cat ggctggttct 6 O gtctataa.cc titccaagt ca aaaccctaat coacagt citt tatto Calaat Ctttgttgat 12 O cgagtaccac tittcaaactt gcc toccacg tdagacgact Ctagc.cggac tgcagaagat 18O aatgagagga agcggagaag galaggtatic aaccg.cgagt Cagct cqgag atcgc.gt atg 24 O cggaaa.ca.gc gtcacatgga agaactgtgg to catgcttg ttcaact Cat caataagaac 3OO aaatct ctag ticgatgagct aagccaagcc agggaatgtt acgagaaggt tatagaagag 360 alacatgaaac titcgagagga aaact coaag ticgaggalaga tgattggtga gat.cgggctt aataggittt C ttagcgtaga ggc.cgatcag atctggacct tctaatcgt.c tcqtaagctt gttggitttitt tdttgttt at ttaaag SO 6

SEQ ID NO 12 LENGTH: 138 TYPE PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2117 polypeptide

SEQUENCE: 12

Met Ala Gly Ser Val Tyr Asn Lieu Pro Ser Glin Asn Pro Asn Pro Glin 1. 15

Ser Leu Phe Glin Ile Phe Val Asp Arg Val Pro Lell Ser Asn Leul Pro 2O 25

Ala Thir Ser Asp Asp Ser Ser Arg Thr Ala Glu Asp Asn Glu Arg Llys 35 4 O 45

Arg Arg Arg Llys Val Ser Asn Arg Glu Ser Ala Arg Arg Ser Arg Met SO 55 6 O

Arg Lys Glin Arg His Met Glu Glu Lieu. Trp Ser Met Lell Wall Gln Lieu. 65 8O

Ile Asn Lys Asn Llys Ser Lieu Val Asp Glu Lieu. Ser Glin Ala Arg Glu 85 90 95

Cys Tyr Glu Lys Val Ile Glu Glu Asn Met Lys Lell Arg Glu Glu Asn 1OO 105 11 O

Ser Lys Ser Arg Llys Met Ile Gly Glu Ile Gly Lell Asn Arg Phe Lieu. 115 12 O 125 Ser Val Glu Ala Asp Glin Ile Trp Thr Phe 13 O 135

SEQ ID NO 13 LENGTH: 1.OSO TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2155

SEQUENCE: 13 ct catatata cca.accaaac ct ct citctgc atctittatta acacaaaatt c caaaagatt 6 O aaatgttgtc. gaagct coct acacagogac acttgcacct ctic tocc to c to tcc ct coa 12 O tgga aaccgt. C9ggcgt.cca C9tggcagac Ctcgaggttc Caaaaaaaa. cctaaagctic 18O caatctttgt caccattgac cct cotatga gtcct tacat Cct Caagtg c catc.cggaa 24 O acgatgtcgt talagc ccta aaccgtttct gcc.gcggtaa agc catcggc ttittgcgt.cc 3OO US 7,598.429 B2 143 144

- Continued t cagtggctic aggctc.cgtt gctgatgtca ctittgcgt.ca gcc ttct cog gcagotcctg 360 gct caaccat tact t t coac ggaaagttcg a tott ct citc. tgtct cogcc actitt cotcc citcc to taco t cottacct co ttgtc.ccctc ccgt.ctic caa tttctitcacc gtctotctog ccgg acct ca ggggaaagtic atcggtggat tcgtc.gctgg t cct citcgtt gcc.gc.cggaa 54 O ctgttt actt cgt.cgc.cact agtttcaaga accottcCta t caccggitta Cctgctacgg aggaagagca aagaaact cq gcggaagggg aagaggaggg acaatcgc.cg ccggit ct ctg 660 gaggtggtgg agagtcgatg tacgtgggtg gctctgatgt catttgggat cocaacgc.ca 72 O aagctic catc gcc.gtactga CCaCaaat CC atctogttca aac tagggitt tottctt citt tagat catca agaatcaa.ca aaaagattgc atttittagat totttgtaat at cataattg 84 O act cact citt taatct ct ct at cact tott ctittagctitt ttctgcagtg tdaaact tca 9 OO catatttgta gtttgatttg act at CCC Ca agttttgtat tittat catac aaatttittgc 96.O ctgtctictaa tggttgttitt titcgtttgta taatct tatg cattgttitat tdgagct coa gagattgaat gtataatata atggtttaat 1 OSO

SEQ ID NO 14 LENGTH: 225 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2155 polypeptide

<4 OO SEQUENCE: 14

Met Leu Ser Lys Lieu Pro Thr Glin Arg His Lieu. His Lell Ser Pro Ser 1. 5 1O 15

Ser Pro Ser Met Glu Thr Val Gly Arg Pro Arg Gly Arg Pro Arg Gly 25 3O

Ser Asn Llys Pro Lys Ala Pro Ile Phe Wall Thir Ile Asp Pro Pro 35 4 O 45

Met Ser Pro Tyr Ile Lieu. Glu Val Pro Ser Gly Asn Asp Wall Wall Glu SO 55 6 O

Ala Luell Asn Arg Phe Cys Arg Gly Lys Ala Ile Gly Phe CyS Val Lieu. 65 70 7s 8O

Ser Gly Ser Gly Ser Val Ala Asp Wall. Thir Lieu. Arg Glin Pro Ser Pro 85 90 95

Ala Ala Pro Gly Ser Thr Ile Thr Phe His Gly Phe Asp Lieu. Lieu 105 11 O

Ser Wall Ser Ala Thir Phe Leu Pro Pro Leu Pro Pro Thir Ser Luel Ser 115 12 O 125

Pro Pro Wall Ser Asn. Phe Phe Thr Wal Ser Lieu. Ala Gly Pro Glin Gly 13 O 135 14 O

Lys Wall Ile Gly Gly Phe Val Ala Gly Pro Leu Wall Ala Ala Gly Thr 145 150 155 160

Wall Phe Wall Ala Thir Ser Phe Lys Asn Pro Ser His Arg Lieu. 1.65 17O 17s

Pro Ala Thir Glu Glu Glu Glin Arg Asn Ser Ala Glu Gly Glu Glu Glu 18O 185 19 O

Gly Glin Ser Pro Pro Val Ser Gly Gly Gly Gly Glu Ser Met Tyr Val 195 2O5

Gly Gly Ser Asp Val Ile Trp Asp Pro Asn Ala Lys Ala Pro Ser Pro 21 O 215 22O US 7,598.429 B2 145 146

- Continued

225

<210 SEQ ID NO 15 <211 LENGTH: 1312 &212> TYPE: DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: &223> OTHER INFORMATION: G2290

<4 OO SEQUENCE: 15 ttctitt ctitt ctittctitt ct citt coaatca agaacaaacc ctagotcct c tictttittctic 6 O tctic tacctic tictittcticta t cittct citta t cact acttic tict cqc.cgat caatcat cat 12 O gaacgatcct gataatc.ccg atctgagcaa cacgactict gcttggagag aact cacact 18O cacagotcaa gattctgact tctitcgaccg agacact tcc aatat cotct c to actt cqg 24 O ttggaacctic caccactic ct c coat catcc ticacagt citc agatt cact c cqatttaac 3OO acaaaccacc ggagtcaaac ctaccaccgt cactt cittct togttcct cat cogcc.gc.cgt. 360 titcc.gttgcc gttaccticta ctaataataa tocct cagot acct caagtt caagtgaaga 42O tccggc.cgag aactica accg cct cogcc.ga gaaaacacca ccaccggaga Caccagtgaa 48O ggagaagaag aaggct caaa agcgaatticg gcaac Caaga titcgcattca taccalaga.g 54 O tgatgtggat aatcttgaag atggatatic atggcgtaaa tatggacaaa aag.ccgt caa 6OO gaatagcc.ca ttc.ccalagga gct actatag atgcacaaac agcagatgca C9gtgaagaa 660 gagagtagaa cqtt catcag atgat coatic gat agtgatc acaac at acg aagga caa.ca 72 O ttgc catcaa accattggat tcc ct cqtgg toggaatcctic actgcacacg acccacatag 78O citt cacttct catcatcatc. tcc ct cotcc attaccalaat cct tattatt accaagaact 84 O cctt catcaa citt cacagag acaataatgc ticcitt caccq cqgttacccc gacct actac 9 OO tgaagataca Cctgcc.gtgt Ctact coat C agaggaaggc titacttggtg at attgt acc 96.O tcaaactatg cgcaac cctt gaggtaagct ttacgtag caatagctaa ggaggtgcta 1 O2O actic attata tatagaagat attgcagacic agaatatgcg Cagggagggit ataacaat at 108 O gg.cgttgtaa caatggat ct at at attacct cattgttga t caatagcac accaccggta 114 O cgtttgcaat ttctt catgt at atttcttgttatatatgt agittatatat coaggtataa 12 OO ttittgatgta acacaa.catt aatcttaatc gtggat.c cat cccacatttg atgcatgitat 126 O gtgcacttaa gaaaaagaac atggaggaaa taacgittatt ttt tattatt ct 1312

<210 SEQ ID NO 16 <211 LENGTH: 287 &212> TYPE : PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: <223> OTHER INFORMATION: G2290 polypeptide <4 OO SEQUENCE: 16

Met Asn Asp Pro Asp ASn Pro Asp Luell Ser Asn Asp Asp Ser Ala Trp 1. 5 1O 15

Arg Glu Lieu. Thir Lieu. Thir Ala Glin Asp Ser Asp Phe Phe Asp Arg Asp 25 3O

Thir Ser Asn Ile Lieu. Ser Asp Phe Gly Trp Asn Lieu. His His Ser Ser 35 4 O 45

Asp His Pro His Ser Lieu. Arg Phe Asp Ser Asp Lieu. Thir Glin Th Thr SO 55 6 O

Gly Val Lys Pro Thir Thir Wall. Thir Ser Ser Cys Ser Ser Ser Ala Ala US 7,598.429 B2 147 148

- Continued

65 70

Wall Ser Wall Ala Wall. Thir Ser Thr Asn. Asn. Asn Pro Ser Ala Thir Ser 85 90 95

Ser Ser Ser Glu Asp Pro Ala Glu Asn. Ser Thr Ala Ser Ala Glu Lys 105 11 O

Thir Pro Pro Pro Glu. Thir Pro Wall Lys Glu Lys Lys Ala Gln Lys 115 12 O 125

Arg Ile Arg Gln Pro Arg Phe Ala Phe Met Thir Lys Ser Asp Val Asp 13 O 135 14 O

Asn Luell Glu Asp Gly Tyr Arg Trp Arg Llys Tyr Gly Glin Ala Wall 145 150 155 160

Asn Ser Pro Phe Pro Arg Ser Thir Asn Ser Arg 1.65 17O 17s

Thir Wall Llys Lys Arg Val Glu Arg Ser Ser Asp Asp Pro Ser Ile 18O 185 19 O

Wall Ile Thir Thr Tyr Glu Gly Glin His Cys His Glin Thir Ile Gly Phe 195

Pro Arg Gly Gly Ile Lieu. Thir Ala His Asp Pro His Ser Phe Thir Ser 21 O 215 22O

His His His Leul Pro Pro Pro Leu. Pro Asn. Pro Glin Glu 225 23 O 235 24 O

Lell Luell His Glin Lieu. His Arg Asp Asn Asn Ala Pro Ser Pro Arg Lieu. 245 250 255

Pro Arg Pro Thir Thr Glu Asp Thr Pro Ala Wall Ser Thir Pro Ser Glu 26 O 265 27 O

Glu Gly Luell Lieu. Gly Asp Ile Val Pro Gin. Thir Met Arg Asn Pro 27s 285

SEO ID NO 17 LENGTH: 14 O6 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G234 O

SEQUENCE: 17 atacaaaact CCCtottcto tat cit tot to atc.ttaaaga aaaaataaga gatatt cqta 6 O alaga.gagaac acaaaattitc. agtttacgaa aagctagdaa agt cagtat cgaggaataa 12 O

Cagaataaga cgitatictato cittgc cittaa tgttcttacc aaaagat cita gtcctittctt 18O tgtatgat cq atcCat Caca agcc.cacaac aacaacaact a catctott t c totatic tect 24 O agctitc tatt tittaataCat t caagaatca agaatgg tac ttgtagagca 3OO galagggttga agaaaggagc atggacticaa gaagaagacic aaaagctitat cgc.ctatgtt 360 Caacga catg gtgaaggcgg ttggcgalacc citt coggaca aagctggact caaaagatgt ggcaaaagct gCagattgag atgggcgaat tacttaagac ctgacattaa acgtggaga.g tittago caag acgaggaaga t to catcatc. aac ct coacg c catt catgg Caacaaatgg 54 O tcggccatag citcgtaaaat accalagaaga acagacaatg agatcaagaa c cattggaac acticacat Ca agaaatgtct ggt caagaaa ggt attgatc cgttgaccca Caaat CCCtt 660

Ctcgatggag ccggtaaatc atctgaccat tcc.gc.gcatc cc.gagaaaag cagcgttcat 72 O gacgacaaag atgat Cagaa ttcaaataac aaaaagttgt caggat catc atcagct cqg tttittgaaca gagtagcaaa Cagatt.cggit catagaatca accacaatgt tctgtctgat 84 O US 7,598.429 B2 149 150

- Continued attattggaa gtaatggc ct act tact agt CaCactactic caact acaag tdttt cagaa 9 OO ggtgagaggt caacgagttc titcCt CCaCa catacct citt cgaat ct coc catcaac cqt 96.O agcatalaccg ttgatgcaac at ct citat co t catccacgt. t citctgactic cc.ccgaccc.g tgtttatacg aggaaatagt cggtgacatt gaagatatga cgagattitt c atcaagatgt 108 O ttgagt catg titt tat cit ca. tgaagattta ttgatgtc.cg ttgagticttg tittggagaat 114 O actt Catt Ca tgagggaaat tacaatgatc titt Caagagg ataaaatcga gacgacgt.cg 12 OO tittaatgata gctacgtgac gcc gatcaat gaagttgatg act cotgtga agggattgac 126 O aattattittg gatgagtt at attgatgatg atgaaaattit gcatttggca tgtaaat caa 132O ttagagtttg atttgctato gtgtttittag tttgttgttgttg tag tdtgttt cgaccgt caa 1380 aaaaaaaaaa. aaaaaaaaaa. aaaaaa. 14 O6

SEQ ID NO 18 LENGTH 333 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G2340 polypeptide <4 OO SEQUENCE: 18

Met Val Arg Thr Pro Cys Cys Arg Ala Glu Gly Lell Gly Ala 1. 5 1O 15

Trp Thir Glin Glu Glu Asp Gln Lys Lieu. Ile Ala Wall Glin Arg His 25

Gly Glu Gly Gly Trp Arg Thr Lieu. Pro Asp Llys Ala Gly Luell 35 4 O 45

Gly Ser Cys Arg Lieu. Arg Trp Ala Asn Tyr Lell Arg Pro Asp SO 55 6 O

Ile Arg Gly Glu Phe Ser Glin Asp Glu Glu Asp Ser Ile Ile Asn 65 70 7s 8O

Lel His Ala Ile His Gly Asn Lys Trp Ser Ala Ile Ala Arg Lys Ile 85 90 95

Arg Arg Thir Asp Asn. Glu Ile Lys Asn His Trp Asn Thir His Ile 105 11 O

Cys Lieu Val Llys Lys Gly Ile Asp Pro Lell Thir His Llys Ser 115 12 O 125

Lel Luell Asp Gly Ala Gly Llys Ser Ser Asp His Ser Ala His Pro Glu 13 O 135 14 O

Lys Ser Ser Val His Asp Asp Llys Asp Asp Glin Asn Ser Asn Asn Lys 145 150 155 160

Luell Ser Gly Ser Ser Ser Ala Arg Phe Lieu. Asn Arg Wall Ala Asn 1.65 17O 17s

Arg Phe Gly His Arg Ile Asn His Asn. Wall Lieu. Ser Asp Ile Ile Gly 185 19 O

Ser Asn Gly Lieu Lleu. Thir Ser His Thir Thr Pro Thir Thir Ser Wall Ser 195

Glu Gly Glu Arg Ser Thr Ser Ser Ser Ser Thr His Thir Ser Ser Asn 21 O 215 22O

Lell Pro Ile Asn Arg Ser Ile Thr Val Asp Ala Thir Ser Luell Ser Ser 225 23 O 235 24 O

Ser Thir Phe Ser Asp Ser Pro Asp Pro Cys Lieu. Glu Glu Ile Wall 245 250 255

Gly Asp Ile Glu Asp Met Thr Arg Phe Ser Ser Arg Luell Ser His

US 7,598.429 B2 153 154

- Continued

<4 OO SEQUENCE:

Met Val Arg Thr Pro Ala Glu Luell Gly Lell Lys Gly 1. 15

Ala Trp Thir Pro Glu Glu Asp Glin Lys Luell Luell Ser Luell Asn Arg 2O 25

His Gly Glu Gly Gly Trp Arg Thir Luell Pro Glu Ala Gly Luell 35 4 O 45

Arg Cys Gly Ser Arg Luell Arg Trp Ala Asn Tyr Luell Arg Pro SO 55 6 O

Asp Ile Arg Gly Glu Phe Thir Glu Asp Glu Glu Arg Ser Ile Ile 65 70 8O

Ser Luell His Ala Lell His Gly Asn Trp Ser Ala Ile Ala Arg Gly 85 90 95

Lell Pro Gly Arg Thir Asp Asn Glu Ile ASn Tyr Trp Asn Thir His 105 11 O

Ile Lys Arg Lell Ile Lys Gly Ile Asp Pro Wall Thir His 115 12 O 125

Gly Ile Thir Ser Gly Thir Asp Ser Glu ASn Lell Pro Glu Glin 13 O 135 14 O

Asn Wall Asn Luell Thir Thir Ser Asp His Asp Luell Asp Asn Asp Ala 145 150 155 160

Asn Asn Lys Asn Phe Gly Luell Ser Ser Ala Ser Phe Luell Asn 1.65 17O 17s

Wall Ala Asn Arg Phe Gly Arg Ile ASn Glin Ser Wall Luell Ser 18O 185 19 O

Glu Ile Ile Gly Ser Gly Gly Pro Luell Ala Ser Thir Ser His Thir Thir 195

Asn Thir Thir Thir Thir Ser Wall Ser Wall Asp Ser Glu Ser Wall Ser 21 O 215

Thir Ser Ser Ser Phe Ala Pro Thir Ser Asn Luell Lell His Gly Thir 225 23 O 235 24 O

Wall Ala Thir Thir Pro Wall Ser Ser Asn Phe Asp Wall Asp Gly Asn Wall 245 250 255

Asn Luell Thir Cys Ser Ser Ser Thir Phe Ser Asp Ser Ser Wall Asn Asn 26 O 265 27 O

Pro Luell Met Asp Asn Phe Wall Gly ASn Asn Asn Wall Asp Asp 28O 285

Glu Asp Thir Ile Gly Phe Ser Thir Phe Luell ASn Asp Glu Asp Phe Met 29 O 295 3 OO

Met Luell Glu Glu Ser Cys Wall Glu Asn Thir Ala Phe Met Glu Luell 3. OS 310 315

Thir Arg Phe Luell His Glu Asp Glu Asn Asp Wall Wall Asp Wall Thir Pro 3.25 330 335

Wall Glu Arg Glin Asp Lell Phe Asp Glu Ile Asp Asn Tyr Phe Gly 34 O 345 35. O

SEQ ID NO 21 LENGTH: 727 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G353

SEQUENCE: 21 US 7,598.429 B2 155 156

- Continued accaaactica aaaaacacala accacaagag gat catttca tttitt tattg titt cqttitta 6 O atcatcatca totagaagaaa aatggttgcg at atcggaga t caagttcgac ggtggatgtc. 12 O acggcggcga attgtttgat gcttittat ct agagttggac aagaaaacgt tacggtggc 18O gatcaaaaac gogttitt cac atgtaaaacg tgtttgaagc agttt catt c gttccaa.gc.c 24 O ttaggagg to accgtgcgag t cacaagaag cctaacaacg acgctttgtc. gtctggattg 3OO atgaagaagg taaaacgt.c gtc.gcatcct tgtcc catat gtggagtgga gtttc.cgatg 360 ggacaa.gctt togaggaca Catgaggaga Cacaggaacg agagtggggc tigctggtggc gcgttggitta cacgc.gctitt gttgc.cggag CCC acggtga citacgttgaa gaaatctago agtgggalaga gagtggcttg tittggatctg agt ct aggga tggtggacaa tittgaatctic 54 O aagttggagc titggaagaac agtttattga titt tatt tatt titt cottaaa ttittctgaat at atttgttt citctoattct ttgaatttitt Cttaatatto tagattatac atacatcc.gc 660 agatttagga aactitt cata gagtgtaatc ttt tott tot gtaaaaatat attitt acttg 72 O tagcaaa 727

SEQ ID NO 22 LENGTH: 162 TYPE PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G353 polypeptide

<4 OO SEQUENCE: 22 Met Val Ala Ile Ser Glu Ile Llys Ser Thir Wall Asp Wall Thir Ala Ala 1. 5 1O 15

Asn Cys Lieu Met Lieu Lleu Ser Arg Val Gly Glin Glu Asn Val Asp Gly 25 3O

Gly Asp Gln Lys Arg Val Phe Thr Cys Llys Thr Lell Lys Glin Phe 35 4 O 45

His Ser Phe Glin Ala Lieu. Gly Gly His Arg Ala His Llys Llys Pro SO 55

Asn Asn Asp Ala Lieu. Ser Ser Gly Lieu Met Lys Wall Lys Thr Ser 65 70 7s

Ser His Pro Cys Pro Ile Cys Gly Wall Glu Phe Pro Met Gly Glin Ala 85 90 95

Lell Gly Gly His Met Arg Arg His Arg Asn. Glu Ser Gly Ala Ala Gly 105 11 O

Gly Ala Lieu Val Thr Arg Ala Lieu. Lieu Pro Glu Pro Thir Wall. Thir Thr 115 12 O 125

Lell Lys Llys Ser Ser Ser Gly Lys Arg Val Ala Cys Lell Asp Lieu. Ser 13 O 135 14 O

Lell Gly Met Val Asp Asn Lieu. Asn Lieu Lys Lieu. Glu Lell Gly Arg Thr 145 150 155 160

Wall Tyr

SEQ ID NO 23 LENGTH: 922 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G484

<4 OO SEQUENCE: 23 attatatt co gtacaatc.cg atcgattt co cqgcgc.caga t ct caccg.cg act cqtctac 6 O US 7,598.429 B2 157 158

- Continued titt.ccgattt ggttcgtgtt gacticagtta cqattaaact atggat.ccala tggat at agt 12 O cggcaaatcc aaggalagacg Cttct ctitcc aaaagctacg atgactaaaa ttataaagga 18O gatgttacca C cagatgttc gtgttgcaag agatgct caa gat cittctica ttgaatgttg 24 O tgtagagttt ataaatcttg tat ctitcaga atctaatgat gtttgtaa.ca aagaggataa 3OO acggacgatt gct Cotgagc atgttcticaa ggcattacag gttcttggitt ttggagaata 360 cattgaagaa gtctatgctg cg tatgagca acatalagtat gaaacaatgc aggacacaca gaggagcgtgaaatggalacc Ctggagctica aatgactgag gaggaagicag Cagctgagca acaacg tatgtttgcagaag cacgtgcaag aatgaatgga titcCt Calacc 54 O tgaacatcca gaalactgacc agagalagt cc gcaaagctaa ctgaaac cqt aagggitaagt gttaggcaag aaaaaacaac atccttittaa catt.ccc.ttg taagttgcaa atgcg tatgt 660 t citctgttta tatgct citta gitatgatata tottagttag tgttt cacga totaaaaa.ca 72 O cittgttgattic agatgtaatt agtaa.gcatt cottgttittg tgtttactitt gtgtc.ttgac talagcatggt gggtcagg to tacacaaagc atctgatt.cg atgacittaca ggaat cittaa 84 O tgtttgtaga ttggataa at ttggtgattg gtgtaattgt ttt to Catala acacaatgca 9 OO at cattgttt agtgttgtta ac 922

<210 SEQ ID NO 24 <211 LENGTH: 159 &212> TYPE: PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: <223> OTHER INFORMATION: G484 polypeptide

<4 OO SEQUENCE: 24

Met Asp Pro Met Asp Ile Val Gly Lys Ser Lys Glu Asp Ala Ser Luell 1. 5 1O 15

Pro Lys Ala Thr Met Thr Lys Ile Ile Lys Glu Met Lell Pro Pro Asp 2O 25 3O

Val Arg Val Ala Arg Asp Ala Glin Asp Lieu. Lieu. Ile Glu Cys Val 35 4 O 45

Glu Phe Ile Asn. Leu Wal Ser Ser Glu Ser Asn Asp Wall Asn Lys SO 55 6 O

Glu Asp Lys Arg Thir Ile Ala Pro Glu. His Val Lell Ala Lieu. Glin 65 70 7s

Val Lieu. Gly Phe Gly Glu Tyr Ile Glu Glu Val Ala Ala Tyr Glu 85 90 95

Gln His Llys Tyr Glu Thr Met Glin Asp Thr Glin Arg Ser Wall 1OO 105 11 O

Asn Pro Gly Ala Gln Met Thr Glu Glu Glu Ala Ala Ala Glu Glin Glin 115 12 O 125

Arg Met Phe Ala Glu Ala Arg Ala Arg Met Asn Gly Gly Wall Ser Wall 13 O 135 14 O

Pro Gln Pro Glu. His Pro Glu Thir Asp Glin Arg Ser Pro Glin Ser 145 150 155

<210 SEQ ID NO 25 <211 LENGTH: 786 &212> TYPE: DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: OTHER INFORMATION: G674 US 7,598.429 B2 159 160

- Continued <4 OO SEQUENCE: 25 atggtgttta aat cagaaaa atcaaaccgg gaaatgaaat Calaaggagaa gcaaaggaag 6 O ggattatggit Caccc.gagga agatgagaag Cttaggagtic atgtc.ct caa atatggc cat 12 O ggatgctgga gtactatt co tottcaagct ggattgcaga ggaatgggaa gagttgtaga 18O ttalaggtggg ttaattattt aag acctgga cittaagaagt citt tatt cac taaacaagag 24 O gaaactatac ttct t t cact t catt coatg ttgggtaiaca aatggtc.t.ca gat at Caaa 3OO ttcttaccag gaagaac.cga Caacgagatc aaaaactatt ggcattctaa tctaaagaag 360 ggtgta actt tgaaacaa.ca tgaaaccaca aaaaaa.catc. aalacacctitt aat Cacaaac t cacttgagg CCttgcagag ttcaactgaa agat citt citt Catct at Caa tgtcggagaa acgt.ctaatg Ctcaaacctic aagcttitt cq c caaatctog tgttcticgga atggittagat 54 O catagtttgc titatggat.ca gtcacct caa aagtictagot atgttcaaaa tottgttitta ccggaagaga gaggatt cat tggaccatgt ggc cctogtt atttgggaaa cgact ctittg 660 cctgattt cq tgccaaattic agaatttittg ttggatgatg agatat catc tgagat.cgag 72 O ttctgtactt cattitt caga caactttittg titcgatggtc t catcaacga gctacgacca atgtaa 786

SEQ ID NO 26 LENGTH: 261 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G674 polypeptide

<4 OO SEQUENCE: 26

Met Val Phe Llys Ser Glu Lys Ser Asn Arg Glu Met Ser Lys Glu 1. 5 1O 15

Lys Glin Arg Lys Gly Lieu. Trp Ser Pro Glu Glu Asp Glu Lys Lieu. Arg 25 3O

Ser His Wall Lieu Lys Tyr Gly His Gly Cys Trp Ser Thir Ile Pro Leu 35 4 O 45

Glin Ala Gly Lieu. Glin Arg Asn Gly Llys Ser Cys Arg Lell Arg Trp Val SO 55 6 O

Asn Luell Arg Pro Gly Lieu Lys Llys Ser Lieu. Phe Thir Glin Glu 65 70 8O

Glu Thir Ile Lieu Lleu Ser Lieu. His Ser Met Leu Gly Asn Trp Ser 85 90 95

Glin Ile Ser Llys Phe Leu Pro Gly Arg Thr Asp Asn Glu Ile Lys Asn 105 11 O

Trp His Ser Asn Lieu Lys Llys Gly Val Thr Lell Lys Glin His Glu 115 12 O 125

Thir Thir Llys His Glin Thr Pro Lieu. Ile Thr Asn Ser Luell Glu Ala 13 O 135 14 O

Lell Glin Ser Ser Thr Glu Arg Ser Ser Ser Ser Ile Asn Wall Gly Glu 145 150 155 160

Thir Ser Asn Ala Glin. Thir Ser Ser Phe Ser Pro Asn Lell Wall Phe Ser 1.65 17O 17s

Glu Trp Luell Asp His Ser Lieu. Lieu. Met Asp Glin Ser Pro Glin Llys Ser 18O 185 19 O

Ser Wall Glin Asn Lieu Wall Lieu Pro Glu Glu Arg Gly Phe Ile Gly 195 2O5 US 7,598.429 B2 161 162

- Continued Pro Cys Gly Pro Arg Tyr Lieu. Gly Asn Asp Ser Lieu. Pro Asp Phe Val 21 O 215

Pro Asn. Ser Glu Phe Lieu. Lieu. Asp Asp Glu Ile Ser Ser Glu Ile Glu 225 23 O 235 24 O Phe Cys Thr Ser Phe Ser Asp Asin Phe Leu Phe Asp Gly Lieu. Ile Asn 245 250 255 Glu Lieu. Arg Pro Met 26 O

<210 SEQ ID NO 27 <211 LENGTH: 1304 &212> TYPE : DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: &223> OTHER INFORMATION: G1052

<4 OO SEQUENCE: 27 tgat catcta aaactittcaa. tttctgtc..tt gatcct cact tgaattittitt gttgtttctic 6 O toaaat Cttt gatcctitt co tttgtttittc atttgacctic ttacaaaaaa. atctggtgtg 12 O

CCattaaatc. titt attaatg gcaca actitc citc.cgaaaat cc caaccatg acgacgc.cala 18O attggcctga ottctgct co cagaaactico citt coat agc cgcaacggcg gcago.cgcag 24 O

Calaccgctgg accitcaiacala Caaaa.ccctt Catggatgga tgagtttctic gacttct cag 3OO cgacticgc.cg tgggacticac cgt.cgttcta taag.cgactic cattgctitt c cittgaac cac 360 ctitcct cogg cgt.cggaaac caccactt.cg at aggtttga cgacgagcaa. tt catgtc.ca tgttcaacga cgacgtacac aacaataa.cc. acaat Catca toatcatCaC agcatcaacg gcaatgttggg tcc cacgcgt. to atcCtcCa acaccitcCaC gccgt.ccgat cataatago c 54 O ttagcgacga cgacaacaac aaagaagcac caccgt.ccga t catgat cat Cacatggaca ataatgtagc Caatcaaaac gtaacaatta caacgaatca gacgaggtoc 660 aaagcc agtg Caagacggag ccacaagatg gaatcaaaac tcc.ggtggaa 72 O gctic cqgtaa tcqtatt cac gaccctaaaa ggg taaaaag aattittagca aat aggcaat

Cagcacagag at Caagggtg aggaaattgc aatacatlatc. agagcttgaa aggagcgitta 84 O citt cattgca gactgaagtg t cagtgttat cgc.caagagt tgcgtttittg gat catcagc 9 OO gattgcttct Caacgt.cgac aatagtgcta t caa.gcaacg aatcgcagot ttagcacaag 96.O ataagattitt caaagacgct Cat Caagaag cattgaagag agaaatagag agactitcgac aagtatat ca t caacaaag.c Ctcaagaaga tggagaataa tgtct cogat caatcto cqg 108 O ccgatat caa accotic cqtt gagaaggaac agctic ct caa tgtctaaagc tgttcgttca 114 O ctaagat citt tcttitt catg gcgaaaagat t cittgacitat aaaacct Ctt tgttgtcaaga 12 OO aattaattta tcaaagaaga tggcc tttitt tatttgatct aat Cacattt ttittaagttg 126 O tgatgaattit gcttittgatg tatctgttitt tttitt tttitt ttitt 1304

<210 SEQ ID NO 28 <211 LENGTH: 329 &212> TYPE : PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: <223> OTHER INFORMATION: G1052 polypeptide <4 OO SEQUENCE: 28 Met Ala Gln Leu Pro Pro Lys Ile Pro Thr Met Thr Thr Pro Asn Trp 1. 5 15 US 7,598.429 B2 163 164

- Continued Pro Asp Phe Ser Ser Glin Llys Lieu Pro Ser Ile Ala Ala Thir Ala Ala 25

Ala Ala Ala Thr Ala Gly Pro Glin Glin Glin Asn Pro Ser Trp Met Asp 35 4 O 45

Glu Phe Luell Asp Phe Ser Ala Thr Arg Arg Gly Thir His Arg Arg Ser SO 55 6 O

Ile Ser Asp Ser Ile Ala Phe Lieu. Glu Pro Pro Ser Ser Gly Val Gly 65 70 8O

Asn His His Phe Asp Arg Phe Asp Asp Glu Glin Phe Met Ser Met Phe 85 90 95

Asn Asp Asp Wal His Asn. Asn. Asn His Asn His His His His His Ser 105 11 O

Ile Asn Gly Asn Val Gly Pro Thr Arg Ser Ser Ser Asn Thir Ser Thr 115 12 O 125

Pro Ser Asp His Asn. Ser Lieu. Ser Asp Asp Asp Asn Asn Glu Ala 13 O 135 14 O

Pro Pro Ser Asp His Asp His His Met Asp Asn Asn Wall Ala ASn Glin 145 150 155 160

Asn Asn Ala Ala Gly Asn. Asn Tyr Asn. Glu Ser Asp Glu Wall Glin Ser 1.65 17O 17s

Glin Thr Glu Pro Glin Asp Gly Pro Ser Ala Asn Glin Asn. Ser 18O 185 19 O

Gly Gly Ser Ser Gly Asn Arg Ile His Asp Pro Arg Wall 195

Ile Luell Ala Asn Arg Glin Ser Ala Glin Arg Ser Arg Wall Arg Llys Lieu. 21 O 215 22O

Glin Tyr Ile Ser Glu Lieu. Glu Arg Ser Wall. Thir Ser Lell Glin Thr Glu 225 23 O 235 24 O

Wall Ser Wall Lieu. Ser Pro Arg Val Ala Phe Lieu. Asp His Glin Arg Lieu. 245 250 255

Lell Luell Asn Val Asp Asn. Ser Ala Ile Lys Glin Arg Ile Ala Ala Lieu 26 O 265 27 O

Ala Glin Asp Lys Ile Phe Lys Asp Ala His Glin Glu Ala Luell 285

Glu Ile Glu Arg Lieu. Arg Glin Val Tyr His Glin Glin Ser Luell 29 O 295 3 OO

Met Glu Asn Asn Val Ser Asp Glin Ser Pro Ala Asp Ile Pro Ser 3. OS 310 315 32O

Wall Glu Glu Gln Lieu. Lieu. Asn Wall 3.25

SEQ ID NO 29 LENGTH: 1161. TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1328

<4 OO SEQUENCE: 29 aattcaat Ca Citat atttitt ttaaaaaCat ttgacitt cat cgatcggitta acaattaatc. 6 O aaaaagatgg gacgat Cacc atgttgtgag aagaagaatg gtc.tcaagaa aggaccatgg 12 O actic ctgagg aggatcaaaa gct cattgat tatat Calata tacatggitta tggaaattgg 18O agaact ctitc cCaagaatgc tgggittacaa agatgtggta agagttgtcg tct Coggtgg 24 O accalactato tcc gaccaga tattaag.cgt. ggaagattct cittittgaaga agaagaalacc 3OO US 7,598.429 B2 165 166

- Continued attatt Caac ttcacagoat Catgggaaac aagtggtctg cgattgcggc tcqtttgcct 360 ggaagaacag acaacgagat Caaaaactat tggaacactic a catcagaaa aag actticta aagatgggaa tcgacccggit tacacacact ccacgtc.ttg atc.ttct cqa tat cit cotcc. attct cagct Catctatota caact citt cq CatCatcatC atcatCatca toaacaa.cat 54 O atgaacatgt cgaggct cat gatgagtgat ggtaatcatc aac cattggit talacc cc gag at actcaaac tcqcaacctic t ct cit titt ca. aac Caaaacg. a CCC Caacaa. Cacacacgag 660 aacaac acgg ttalaccaaac cgaagtaaac Caataccalaa. ccggittacaa Catgcctggit 72 O aatgaagaat tacaat cittg gttcc citat c atggat caat t cacgaattit c caag acctic atgccalatga agacgacggit CCaaaatt Ca ttgtcatacg atgatgattg titcgaagt cc 84 O aattttgtat tagaac citta ttact cogac tittgctt cag t cittgaccac acct tott ca. 9 OO agc.ccgactic cgittaaactic aagttcctica act tacatca at agtag cac ttgcago acc 96.O gaggatgaaa aagagagitta ttacagtgat aat at Cacta attatt cott tgatgttaat ggttitt ct co aattic caata aacaaaacgc cattggaata gagittatgta aacatgcaat 108 O cattgt attt gttatataga ttttgttaca tat CCaaaat CCaaaatact at agttittaa 114 O aataaaaaaa. aaaaaaaaaa. a. 1161.

SEQ ID NO 3 O LENGTH: 324 TYPE : PRT ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1328 polypeptide

SEQUENCE: 3 O

Met Gly Arg Ser Pro Cys Cys Glu Lys ASn Gly Lell Lys Gly 1. 5 15

Pro Trp Thir Pro Glu Glu Asp Glin Lys Luell Ile Asp Ile ASn Ile 25

His Gly Tyr Gly Asn Trp Arg Thr Luell Pro Lys Asn Gly Lieu. Glin 35 4 O

Arg Cys Gly Llys Ser Cys Arg Lieu Arg Trp Thir Asn Luell Arg Pro SO 55 6 O

Asp Ile Arg Gly Arg Phe Ser Phe Glu Glu Glu Glu Thir Ile Ile 65 70 8O

Glin Luell His Ser Ile Met Gly Asn Trp Ser Ala Ile Ala Ala Arg 85 90 95

Lell Pro Gly Arg Thr Asp Asn. Glu Ile ASn Trp Asn Thir His 105 11 O

Ile Arg Lys Arg Lieu Lleu Lys Met Gly Ile Asp Pro Wall Thir His Thr 115 12 O 125

Pro Arg Luell Asp Lieu. Lieu. Asp Ile Ser Ser Ile Lell Ser Ser Ser Ile 13 O 135 14 O

Tyr Asn Ser Ser His His His His His His His Glin Glin His Met Asn 145 150 155 160

Met Ser Arg Leu Met Met Ser Asp Gly Asn His Glin Pro Luell Wall Asn 1.65 17O 17s

Pro Glu Ile Lieu Lys Lieu Ala Thr Ser Luell Phe Ser Asn Glin Asn His 18O 185 19 O

Pro Asn Asn Thir His Glu. ASn Asn Thir Wall ASn Glin Thir Glu Wall Asn 195 2O5 US 7,598.429 B2 167 168

- Continued

Glin Tyr Glin Thr Gly Tyr Asn Met Pro Gly Asn Glu Glu Luell Glin Ser 21 O 215

Trp Phe Pro Ile Met Asp Glin Phe Thir Asn. Phe Glin Asp Luell Met Pro 225 23 O 235 24 O

Met Thir Thir Wall Glin Asn. Ser Leu Ser Tyr Asp Asp Asp Cys Ser 245 250 255

Ser Asn Phe Wall Lieu. Glu Pro Tyr Tyr Ser Asp Phe Ala Ser Wall 26 O 265 27 O

Lell Thir Thir Pro Ser Ser Ser Pro Thr Pro Leu Asn Ser Ser Ser Ser 27s 28O 285

Thir Tyr Ile Asn Ser Ser Thr Cys Ser Thr Glu Asp Glu Glu Ser 29 O 295 3 OO

Tyr Ser Asp Asn. Ile Thr Asn Tyr Ser Phe Asp Wall Asn Gly Phe 3. OS 310 315 32O

Lell Glin Phe Glin

SEQ ID NO 31 LENGTH: 1155 TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G1930

<4 OO SEQUENCE: 31 attcacatta Ctaatctotc. aagattt cac aattittcttg tgattittctic t cagttt citt 6 O attt cqtttic atalacatgga tgc catgagt agcgtagacg agagctic tac aactacagat 12 O tccatt cogg cgagaaagtic atcgt.ctic cq gcgagtttac tatatagaat gggaa.gcgga 18O

tacttgattic agagaacggit gtcgaagtcg aagtcgaagc cgaat Caaga 24 O aagct tcc tt cittcaagatt Calaaggtgtt gttcct caac caaatggaag atggggagct 3OO cagatttacg agaaac at ca cittgg tactt t caacgagga agacgaagca 360 gctctgctt acgacgt.cgc ggct Caccgt. titc.cgtggcc tactaattt C aaagacacga cgttctgaaga agaggttgag ttcttaaacg cgcatt.cgaa atcagagat C gtagatatgt tgagaaaa.ca CaCttacaaa. gaagagittag accalaaggaa acgtaac cqt 54 O gacggtaacg gaaaagagac gacggcgttt gctittggctt cgatggtggit tatgacgggg tittaaaacgg cggagttact gtttgagaaa acggtaacgc Caagtgacgt. cgggaaact a 660 aaccotttag ttataccalaa. acaccaag.cg gagaalacatt titc.cgttacc gttaggtaat 72 O aataacgt.ct ccgittaaagg tatgctgttg aattt cqaag acgittaacgg gaaagttgttgg aggttc.cgtt act cittattg gaatagtagt caaagttatg tgttgaccala aggttggagt 84 O agatt.cgitta aagaga agag actttgttgct ggtgatttga t cagttittaa aagat coaac 9 OO gatcaagatc aaaaattctt aaatcgaaat tctagagacg 96.O ggtcgggitta tgagattgtt tggggttgat atttctittaa. tgtag taag gaaacaacgg aggtgttaat gtcgt.cgitta aggtgtaaga agcaacgagt tttgtaataa 108 O

Caatittaa.ca acttgggaaa gaaaaaaaag citttittgatt tta atttgtc. ttcaacgtta 114 O atcttgctga gatta 1155

<210 SEQ ID NO 32 <211 LENGTH: 333 &212> TYPE : PRT <213> ORGANISM: Arabidopsis thaliana US 7,598.429 B2 169 170

- Continued

&220s FEATURE: &223> OTHER INFORMATION: G1930 polypeptide

<4 OO SEQUENCE: 32

Met Asp Ala Met Ser Wall Asp Glu Ser Ser Thir Thir Thir Asp Ser 1. 15

Ile Pro Ala Arg Ser Ser Ser Pro Ala Ser Lell Lell Tyr Arg Met 25 3O

Gly Ser Gly Thir Ser Wall Wall Luell Asp Ser Glu Asn Gly Wall Glu Wall 35 4 O 45

Glu Wall Glu Ala Glu Ser Arg Luell Pro Ser Ser Arg Phe Gly SO 55 6 O

Wall Wall Pro Glin Pro Asn Gly Arg Trp Gly Ala Glin Ile Glu Lys 65 70

His Glin Arg Wall Trp Lell Gly Thir Phe Asn Glu Glu Asp Glu Ala Ala 85 90 95

Arg Ala Asp Wall Ala Ala His Arg Phe Arg Gly Arg Asp Ala Wall 1OO 105 11 O

Thir Asn Phe Asp Thir Thir Phe Glu Glu Glu Wall Glu Phe Luell Asn 115 12 O 125

Ala His Ser Ser Glu Ile Wall Asp Met Luell Arg His Thir Tyr 13 O 135 14 O

Lys Glu Glu Luell Asp Glin Arg Arg Asn Arg Asp Gly Asn Gly Lys 145 150 155 160

Glu Thir Thir Ala Phe Ala Lell Ala Ser Met Wall Wall Met Thir Gly Phe 1.65 17s

Thir Ala Glu Lell Lell Phe Glu Lys Thir Wall Thir Pro Ser Asp Wall 18O 185 19 O

Gly Luell Asn Arg Lell Wall Ile Pro His Glin Ala Glu His 195

Phe Pro Luell Pro Lell Gly Asn Asn Asn Wall Ser Wall Gly Met Luell 21 O 215 22O

Lell Asn Phe Glu Asp Wall Asn Gly Wall Trp Arg Phe Arg Ser 225 23 O 235 24 O

Trp Asn Ser Ser Glin Ser Wall Luell Thir Gly Trp Ser Arg 245 250 255

Phe Wall Glu Arg Lell Ala Gly Asp Lell Ile Ser Phe 26 O 265 27 O

Arg Ser Asn Asp Glin Asp Glin Lys Phe Phe Ile Gly Trp Ser 27s 28O 285

Ser Gly Luell Asp Lell Glu Thir Gly Arg Wall Met Arg Lell Phe Gly Wall 29 O 295 3 OO

Asp Ile Ser Luell Asn Ala Wall Wall Wall Wall Lys Glu Thir Thir Glu Wall 3. OS 310 315 32O

Lell Met Ser Ser Lell Arg Glin Arg Wall Lell 3.25 330

SEQ ID NO 33 LENGTH: 224 O TYPE: DNA ORGANISM: Arabidopsis thaliana FEATURE: OTHER INFORMATION: G214

<4 OO SEQUENCE: 33 tgagatttct c cattt cogt agcttctggit citcttitt citt tdttt cattg atcaaaag.ca

US 7,598.429 B2 173 174

- Continued <213> ORGANISM: Arabidopsis thaliana &220s FEATURE: <223> OTHER INFORMATION: G214 polypeptide

<4 OO SEQUENCE: 34

Met Glu. Thir Asn Ser Ser Gly Glu Asp Luell Wall Ile Thir Arg 1. 5 1O 15

Pro Tyr Thir Ile Thir Glin Arg Glu Arg Trp Thir Glu Glu Glu His 25

Asn Arg Phe Ile Glu Ala Lell Arg Luell Tyr Gly Arg Ala Trp Glin 35 4 O 45

Ile Glu Glu His Wall Ala Thir Thir Ala Wall Glin Ile Arg Ser His SO 55 6 O

Ala Glin Phe Phe Ser Wall Glu Glu Ala Glu Ala Gly 65 70

Wall Ala Met Gly Glin Ala Lell Asp Ile Ala Ile Pro Pro Pro Arg Pro 85 90 95

Arg Pro Asn Asn Pro Pro Arg Thir Gly Ser Gly Thir 105 11 O

Ile Luell Met Ser Thir Gly Wall Asn Asp Gly Glu Ser Luell Gly 115 12 O 125

Ser Glu Wall Ser His Pro Glu Met Ala ASn Glu Asp Arg Glin Glin 13 O 135 14 O

Ser Pro Glu Glu Lys Thir Luell Glin Glu Asp Asn Ser Asp Cys 145 150 155 160

Phe Thir His Glin Tyr Lell Ser Ala Ala Ser Ser Met Asn Ser 1.65 17O 17s

Ile Glu Thir Ser Asn Ala Ser Thir Phe Arg Glu Phe Lell Pro Ser Arg 18O 185 19 O

Glu Glu Gly Ser Glin Asn Asn Arg Wall Arg Glu Ser Asn Ser Asp 195

Lell Asn Ala Ser Lell Glu Asn Gly Asn Glu Glin Gly Pro Glin Thir 21 O 215 22O

Tyr Pro Met His Ile Pro Wall Luell Wall Pro Luell Gly Ser Ser Ile Thir 225 23 O 235 24 O

Ser Ser Luell Ser His Pro Pro Ser Glu Pro Asp Ser His Pro His Thir 245 250 255

Wall Ala Gly Asp Tyr Glin Ser Phe Pro Asn His Ile Met Ser Thir Luell 26 O 265 27 O

Lell Glin Thir Pro Ala Lell Thir Ala Ala Thir Phe Ala Ser Ser Phe 27s 285

Trp Pro Pro Asp Ser Ser Gly Gly Ser Pro Wall Pro Gly Asn Ser Pro 29 O 295 3 OO

Pro Asn Luell Ala Ala Met Ala Ala Ala Thir Wall Ala Ala Ala Ser Ala 3. OS 310 315

Trp Trp Ala Ala Asn Gly Lell Luell Pro Luell Cys Ala Pro Luell Ser Ser 3.25 330 335

Gly Gly Phe Thir Ser His Pro Pro Ser Thir Phe Gly Pro Ser Asp 34 O 345 35. O

Wall Glu Tyr Thir Ala Ser Thir Luell Glin His Gly Ser Wall Glin Ser 355 360 365

Arg Glu Glin Glu His Ser Glu Ala Ser Ala Arg Ser Ser Luell Asp 37 O 375 38O

Ser Glu Asp Wall Glu Asn Ser Pro Wall His Glu Glin Pro