US 20040098764A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2004/0098764A1 Heard et al. (43) Pub. Date: May 20, 2004

(54) TRANSCRIPTIONAL REGULATORS Related U.S. Application Data OF ABIOTIC STRESS (63) Continuation-in-part of application No. 09/810,836, (76) Inventors: Jacqueline E. Heard, Stonington, CT filed on Mar. 16, 2001. (US); Jose Luis Riechmann, Pasadena, CA (US); Robert A. Creelman, Castro Publication Classification Valley, CA (US); Oliver Ratcliffe, Oakland, CA (US); Roderick W. (51) Int. Cl...... A01H 1700; C12N 15/82; Kumimoto, San Bruno, CA (US); Neal C07H 21/04; C12N 5/04 Gutterson, Oakland, CA (US); T. (52) U.S. Cl...... 800/289; 435/468; 435/419; Lynne Reuber, San Mateo, CA (US); 536/23.6 Omaira Pineda, Vero Beach, FL (US); Jeffrey M. Libby, Cupertino, CA (US); Bradley K. Sherman, Berkeley, CA (57) ABSTRACT (US) The invention relates to plant transcription factor polypep Correspondence Address: tides, polynucleotides that encode them, homologs from a Jeffrey M. Libby, Ph.D. variety of plant , and methods of using the polynucle Mendel Biotechnology, Inc. otides and polypeptides to produce transgenic having 21375 Cabot Blvd. advantageous properties, including improved drought and Hayward, CA 94545 (US) other osmotic StreSS tolerance, as compared to wild-type or reference plants. Sequence information related to these (21) Appl. No.: 10/685,922 polynucleotides and polypeptides can also be used in bio informatic Search methods to identify related Sequences and (22) Filed: Oct. 14, 2003 is also disclosed.

Fagales Cucurbitales Rosales Fabales Oxalidales Sapindales Malvales Brassicales Myrtales GeranialesDipsacales Asterales Apiales Aquifoliales Solanales Lamiales Gentianales Garryales Ericales Cornales Saxifragales Santalales Caryophyllales Proteales ZingiberalesRanunculales Commelinates Poales Arecales Pandanales DioScorealesLiales Asparagales Alismatales AcoralesPiperales Magnoliates CeratophyllalesLaurales Patent Application Publication May 20, 2004 Sheet 1 of 17 US 2004/0098764 A1

Fagales Cucurbitales ROSales Fabales Oxalidales Malpighiales Sapindales i Malvales Brassicales Myrtales Geraniales Dipsacales Asterales Apiales Aquifoliales Solanales Lamiales Gentianates Garryales Ericales Cornales Saxifragales CaryophyllalesSantalales Proteales Ranunculales Zingiberales Commelinales Poales Arecales Pandanales Liliales DioSCOreales Asparagales Alismatales AcoralesPiperales Magnoliales CeratophyllalesLaurales

FIGURE 1 Patent Application Publication US 2004/0098764 A1

ZETRIQOIH Patent Application Publication May 20, 2004 Sheet 3 of 17 US 2004/0098764 A1

99 G3451 soy 98 G3454 soy G3452 soy 100 G3453 soy G867 70 80 G1930 G9 97 G993 99 G3388 rice 100 G3389 rice 72 G3433 maize

98 G3390 rice 80 G3391 rice 87 G3432 maize SGN-UNIGENE-47598 G2690 G2687

FIGURE 3 Patent Application Publication May 20, 2004 Sheet 4 of 17 US 2004/0098764 A1

3 - 4 - - a v H - - - H - - -G IV - X: 4 x X: A - to V. S. H. H. V. Z. A - - -

A - - - A V Pl, r. v . . . . H H H E H 43 - V) V - V. H P. f. r. Z. A - e v H - H H to A. Z.V.P, V H. L. R s: b a b at b c to to 4 to A V i u H V a wn b v. V. a W. st b b v 3 - - - - Un a H - A c - V a c I I - c - V I H V. Z. V. Z. H H He - A to to A A A E. E. A A A A - - e. A A i s - - a V V cab . < - H (A - - E to E to - A P. w Va v u to a a a A vo 2 A A A A C E A A E. E. c P a O a A r Oq E- of

C a A c - A t of

Oy

S. p V u H H r

r

A. &H CN s er Co - en on od go H on rrh. c - St N N C 5 en on on en do co N en r u u on cS do d NP in a Vi er er r en er vd on r r r on NS MS as No on O S. CD 33333333333O C C C (to O O. O. O. CO CO COO SS CO attCo O OC Patent Application Publication May 20, 2004 Sheet 5 of 17 US 2004/0098764 A1

> ...... a p I I I I I I I I I I I I I I I I I I I c d I a I A. A. A a c. c. c. c. H - c. is se c - d - E I H I I A, E- I - a I I I I Z I I H z z H to a P. P. is a a z. z Z, a - P - z >eb but ( a to the P. o, a b bo s. s. s. sco A A A A a a 4 to A. A. A. A P - - - - - A - P - U U I

c - - E () () .

c - p q v. t ----, a i az) se p 2 a prove oil-a -, p: - - ) - - e - - HA. H. H. H - Pi A is Z - py of S. to b A - P P. R. I to Z un c. ea a per P Ed c H I b . . . . P R c r I is E- of r s - - - V P & V. P. A. P - v to a v Of to S ( Pig Pig Pl - - va Va a A to a state v. v . . . . v c s ap: - G - e H. v) = - - a via a EH st- ogs van a 4 * - - C - - - P - a EEEEECCcc - n: z c and- - a R. R. P. - - - Ed c s : P A - iš 2W 3 & a S & 3 N S S A - 3 & a S S : 33333333333S. 3 S ; ; 22 S S Sig & 8S S& 2 Sa &55 SS 3 Patent Application Publication May 20, 2004 Sheet 6 of 17 US 2004/0098764 A1

Patent Application Publication May 20, 2004 Sheet 7 Of 17 US 2004/0098764 A1

S S. C. C is p . a - F-4 M - - - 2 As a V ------a - s: s 6 s: 4 x x d - d. YEH - 42 V. V. A to A A u ta of H H > - - - w? A p. T I I I I I I I I- St E. E. z. z. H EH ) p2 p. p. c z. z. z. z va r z. z c (v) a to 5 - J ------bb b - p. p - c. A z is a - G - A A a 4 x 4 H - A - 2 P v v c V - P. P. Ed H H E e ------a - less is is 1 b - A - - - to I Z. s is c. c. - - - A A. A. A. A ------c - V I A A Z I H H I a c l r H P. Pig A A I a I I 8 &b a a to Eb a to a to to . m a - E - b - b a to to or or of O' ss scar - - - - b. Bo - c. b > H - P - O - P. P. e. V. A ta P. P. P. P. A. A Pig P. P. V. Z, a r Z Z Z Z.

n; pl. - e, 4 x 44 4 44 z. z. . . . st r r a a 2 Z Z. Z Z Z, Z. IV, V) v. A z. D - H H - H H A -1 - - - - O

CP - - - - Z. Y. p 2 p. P. t s - c q c t pm A A A A A A A A p; o; d. 2d., p. 4 eled ab b ib bob bab b E p 2 p. p. x 2 p2 2 p. p;3 a e s - r z r a re. SN px p : P py. A 2 p. o o – arr oar c -, -, -a, -g CAA A ce r 3P- - to H V e - A. A. A A A A A. A A Z v?. Z Z. A > X - > ------< - - - - e. <------2 p. 2 p. A p. p. H - - - - p. c - c. c. c. crb rb b ------ti S. Z. SC A - A - A a late role c. ca. A A A. A A Z 2 Z, Z. a r a { - Z. Va a iš 2v rf 3C gH CN :on 3do SSd gH gn efg g ge2 & 3 S33 S asS 2 3333333333of er er of rn of oo H on Y or &on &c SCn S(n at- v.53 Cd Patent Application Publication May 20, 2004 Sheet 8 of 17 US 2004/0098764 A1

3 ------4 M. p. p. cc (; b. e. p. 44 M. a. o. 4 - e, e is a te

P. P. b R a. A A S > S S - - - H v2 - H - - - R ) - a - SC 2 A at e - e. H P A - E - > - as a to V2 - - - - P Of Of z. z p. p 3 p. A c. c > - I I I I I O O". I I I I p. p. b. I I I I I to Z I I I ( A V - A I I I I I I b > I I I I s I Z. I a I q v2 b va I I I I I I I I I I I - I - E b b b to or b I to b to V. V V - E I to H to to E- E- I I Z. I E- v c - V - p. I I I I - 2 - H H v2 to I I I I - a E. C. A o O O" I 1 EH to - - - - Z. 4 - I I I I p. Ed R. A. A to p. v. A I H E V S R b I I I I H E. z c p b : to I Y: Ed I Z. 4 - I Z - H Pl - I b at to Z 2 I P = Z Z A A H H I I I I R b p. p. p. b > Z of s P. A Z. A 4 p. A 2; V = c v - ! I p 2 p. p 2 - - - - 2 H z H - I I I I P c. c. I - of of a 2 3 t = I a I a V t K e p b : I O p p A () to p: p. p. p.2 p.V p.to p. pa pha pipe-G.H H p.H. lepio,p. A ?: St. A - St. ( . E- - Z - - - - C to z up b to u?) p. v.2 t) to V2 2 is a e - a t? of cy of cy cy of cy of cy of oa a A - a a o H - O a Of Of a p to

a a a r z c s r. S. S. R. R. O. S. A A. A - - A AZA All T 2 p. pp. H z. A - Z. 4 Z. cd O. O. A R ) O' O. 5 ------e - e. a- E m H. H. H. H. - H H H H a u tuo fa I I I I I I I I sta a st a di Si S S S 2 x : H H H H S p. x. p 2 p2 2 A p 2 p. A. A. A. A 2 E. E. E. E. E. . . . . - C - - - A All A A A A A A A A A a A a ?ala

er Co - en on do c H on cy 9 N sit S N C is er on on er do od N or in ur ur ey as to Go N2 r 3 33333333333S S 2 as S. S. S. g &. S S 2S S atS S53 Patent Application Publication May 20, 2004 Sheet 9 of 17 US 2004/0098764 A1

8 c I - ) I G I I I I I - P. P. - I I I I I I I • I I I I I - V - I I I I I ( ) I I I I H U I I I I I I I c I I - r I I I I c. I - A I I I I I I I I I I I I I I I I I I I a I e is I I I I I I I I I of I H I I I I I I I I I S. H I I I I 4. H to un I I - I I I I ca u ri I Z. Z. P I I I - I Of Of I X H A p = i Z Z to EH to a I (r. V. p = i b (a V. C. E. P. b b P. P. V2 - V to V A A. P. Ov C. A. P. H. H. R. R. P. V. R - - re. (-, -th r > - 's P. Pl P. P. P. P. P. P. lab - - Ps e. - is, is esse B pp: Is s is r r: I r r > a. azi r cq S. S. a? a 4 a a - a r s > - H - I - c c - c. C. P. O. c I I I I- - Ua CA CA o Of or of O. Of a SE - E- P s Y d a da; a 4 x - E - I - g Ps Al A A P. P. P. P. Pl

- S - P C - P. P. Z. P e s ------w A pp2A. Z. Z Z Z Z 2 Z- 2. ------) A Sid Sa S. Sa. A s: b A CP P P = H P A A A Alph A a A. A. A. Pr: P.P. 3 H. H - H - - = H - 2 p. O & C H. H. w: Y:

e e I-A

do N. A. N. oo - CN Y d v- CN CN do o H CN r 9 S N N S C is ey on Cn of od oo N en ury r n rr. on do go N2 i? P , w er er r er er No on S K s on No d S \ce on o E 33333333333O C C C C C C C C C C C &C SSCO CO Ot Oc O3 Patent Application Publication May 20, 2004 Sheet 10 of 17 US 2004/0098764A1

s H H - H -- - - - H H b (b. ab b z. z. z. z. z. P Y: Y - Sa Sa. d. d Sa Al a a es c Y: A p & S. & S4 S. S. - Y. e CP P. b > P P t a le. E. z. z. z. z. s r. e px p. Y 2 p2 p. p. A P. p. t 2 V V c 2, 2 c e s a t e se b b |tb b (b. b > b |ub b b M S. S. a Sa Sa Sa Sa Sa. Sa Sa. c s: - H H H. H. ------M to C P CP - - - C s ------E-- >>

or of Or on Of Oy Of Oy or of to > - O u y A b a v2 as is z. z. z. z. z. z. z. z. z. z. DJ K P 2. t e t e is e a c e set e i - - e ------p of Ov - to up - - - Ex -i > ------> Pig I - 2 p. x 2 p. p 2 p. p 2 p. 2P. l l l S. a4 p.e p.E. p4E. p.4E. p.E. p.z p4c p.s p.r. 4E. asE. b. toI p it e. g is- isCr- isC is- isP is- isd is> is> se- O O"eja b Z z > 3 S. Y. S. S. S. S. S. st S?. p. A H - S6 to to b b b to b b b box blub ( - A - Z. Z. to EH Z - a pr: I b c to Y to

A. A. A A A a A A A A. AP P. O. C. A a is a a za e s a o i r I (a fe e - a z. z al - - - - I Z re. 2, 2 Z Z Z z. z. z. z. z. Z.

H ------lozi O - s - - - - - 3 - size > p; p., -o- M to b to to b b ba A. A. P. of S. S. S. S. S. S4 St Sa Sa Sa Sala a a. A to to - C - H H > d - Z. A E- on v c c ab eb v2 of A H -- b - Co - E- C - V - a to P. C. P. to a u Z Z. P. C. c. P. P. C. P. A c > / b c 2 - A Z - b - Z. I - H - I to E vo A - c. - E a p c I cc 1 I - - - b. I b - I 3 & 2 en 3C g- en :on 3oxo SSd gH ong ong g &9 33is sit 3S NSR 3C 52 33 33 33 888 & 3 & S S to 3 Patent Application Publication May 20, 2004 Sheet 11 of 17 US 2004/0098764A1

S. P. Pl I Us A A CP CP I - I p I. As I v c. c. I ) P • Pl - Of c. A & c. H - A p As I P. H to A c. U) A I Co D- Of A cc px p Pa Pa I p P. I - - c - I to I A. P. P. b S r A e. I l l l r to fun I - - V. V. p. E- E - b A I I e a Y V -> { H A ) l l to I D C H P Of c. U C - I I I - Z. c pe. H H Z Z b - a c - - H I Z 2 A A p2 p2 A2 p. H - - P. Z. I - H - b. b. b Z CP to E E a A ) Z. { - - 3. S. p4 R b to I to I to z. z o. I I I M c a z c () z to to V2 - Z. I p; ze, a pp: 4 p. 4 a r or - A to H C - to p2. P. P. P. S. 4- 44 cycle s - 3 to a ) ) is at a ) is a b to p, v) = - A - A A to A A A to P. Pl s

aEa b - a - a - - sigl:- H H H S v V. V. ; - A ------S S to V. A - 2 se O A Of . Of Of O. Of to H H y to S. S. Of O. S. p. x Z 2 P P Z - a A. A. A. A. A A Z Z. C. i - H. S CA Of O Pat Pa A - P. A b I b b b to a b to b 2) - - ) I -; c. 3 > z. z. H H H H a A - c. c. I I I I I I - I Pi As I I I I to a A c 1 I I I I to ab U2 cc 3. Y. p. p. p 2 S. E.a pi- (; p. N - - - 4 U R D H C - e. e. e. e. e. e. e. e. e. - B - z, c is to to ) - P - p. A

H in (d. c. I s A. A - A - b to - -g I Š en 3co g- N on: od3 SSd -g ong eng g &9 33is sit S& is4 cS 25 e 3333333333er er en er en do - en en en on &on 8cN. StCN r 3d Patent Application Publication May 20, 2004 Sheet 12 of 17 US 2004/0098764 A1

R S. S. c Y S Sid S. Sa S to b - ce vP e e to U. S. La A V E r ) to c HA ) - - - p r b. p. E. E. A ------e, e - u so to a to s vs. b. p. v c . C p - C - e. e. b b to b or S S - - - 3 & 4 st b - r A A S. Z. P. a H - H - E S A a - (- p. H to s is - A a P re- a c R Z - S. ph C. C. C. A - O E- p. p. V. p. 4 S u) p2 p. A and a sa St. A U - H - P R c I C S St b S. p - d - - H - Z = z > St I - b > v) b b - A G B at A z - P I b - a - I b - - p. a V a > a. E. H s: - Of ZZ = 1 cc. A to Z. r c ps to Y - Z. to H C • S. E 2, b P = x > 0 a to I 2 A A H H > a 2 to Z Z - Z. A of r c - P. P. f. r. A Of H A I = R S S - H - Z. s O. Of I H > a CP S S X X Z. - - E-4 Z - 4 - A H g a > A a - A R 9. r P. P. p V I O - c. I to - - - ) b A Co - U. c Pl s: A C As a V E- v2 v - H z. z. c - A to a - - H C H H b ) H - to to P. a Sa. Z Z a ) - c of V2 - - a I O p r E ------ess

A. A A. A Z A Z Z. Z. A H H v H Z to to P - - - - to to to S. r E E. r. E. E. z. z. r w A H - A - - - A py Y; p. py. A p. p. p H - C - S - - c - - - - - a 4 Flo: a 2 cyo - - C on of or V • A I H A Pl C - to p. p = - a P. P. Z. Z. p. p - E V I - - - - Z. & p A. V. I Šg 2 on C. H. CN ON od Co - On ef 9 s S N C is en on on of oo od N en r u v on or do so N2 i? R 2 33333333333is a S S a Si Si Sig 3 SSS S S as 3 Patent Application Publication May 20, 2004 Sheet 13 of 17 US 2004/0098764 A1 3.3

3.

s

s - - HA - - st t - z Z. z - A H H - - - - H H of |--acycle 2 p. S. - ofc cy p,- pla a st - S. - e. Š & 5 e c - en on od c - en er ge is sit S N C er on on er do do N en r r ef as oo go N2 r - 2 Y er er V er er vd on r r r on Vo Vo 9 V on o E er33 er 33er er 333er er od at... 8888er er er on &on 6n &n 555 at 3 Patent Application Publication May 20, 2004 Sheet 14 of 17 US 2004/0098764 A1

FIGURE 5 Patent Application Publication May 20, 2004 Sheet 15 of 17 US 2004/0098764A1

FIGURE 6 Patent Application Publication May 20, 2004 Sheet 16 of 17 US 2004/0098764A1

FIGURE 7 Patent Application Publication May 20, 2004 Sheet 17 of 17 US 2004/0098764A1

FIGURE 8 US 2004/0098764 A1 May 20, 2004

PLANT TRANSCRIPTIONAL REGULATORS OF 0005 Phylogenetic relationships among organisms have ABIOTC STRESS been demonstrated many times, and Studies from a diversity of prokaryotic and eukaryotic organisms Suggest a more or RELATIONSHIP TO COPENDING less gradual evolution of biochemical and physiological APPLICATIONS mechanisms and metabolic pathways. Despite different evo 0001. This application claims priority from copending lutionary pressures, proteins that regulate the cell cycle in U.S. patent application Ser. No. 09/810,836, filed Mar. 16, yeast, plant, nematode, fly, rat, and man have common 2001; U.S. patent application Ser. No. 10/412,699, filed Apr. chemical or Structural features and modulate the same 10, 2003, which claims priority from U.S. patent application general cellular activity. Comparisons of Arabidopsis gene Ser. No. 10/171.468, filed Jun. 14, 2002, U.S. Non-provi Sequences with those from other organisms where the Struc sional Application No. 09/532,591, filed Mar. 22, 2000, U.S. ture and/or function may be known allow researchers to Non-provisional Application No. 09/533,029, filed Mar. 22, draw analogies and to develop model Systems for testing 2000, U.S. Non-provisional Application No. 09/533,392, hypotheses. These model Systems are of great importance in filed Mar. 22, 2000, which in turned claimed priority from developing and testing plant varieties with novel traits that U.S. Provisional Patent Application 60/125,814, filed Mar. may have an impact upon agronomy. 23, 1999, U.S. Non-provisional Application No. 09/713,994, filed Nov. 16, 2000, which in turn claimed priority from U.S. 0006 Because transcription factors are key controlling Provisional Patent Application No. 60/166,228, filed Nov. elements of biological pathways, altering the expression 17, 1999; U.S. Non-provisional Application No. 09/394,519, levels of one or more transcription factors can change entire filed Sep. 13, 1999, which in turn claimed priority from U.S. biological pathways in an organism. For example, manipu Provisional Application No. 60/101,349, filed Sep. 22, 1998, lation of the levels of Selected transcription factors may and U.S. Provisional Application No. 60/108,734, filed Nov. result in increased expression of economically useful pro 17, 1998; U.S. Non-provisional Application No. 10/374,780, teins or biomolecules in plants or improvement in other filed Feb. 25, 2003, which claims priority from U.S. Non agriculturally relevant characteristics. Conversely, blocked provisional Application No. 09/934,455, filed Aug. 22, or reduced expression of a transcription factor may reduce 2001, which in turn claims priority from U.S. Provisional biosynthesis of unwanted compounds or remove an unde Application No. 60/227,439, filed Aug. 22, 2000; U.S. Sirable trait. Therefore, manipulating transcription factor Non-provisional Application No. 10/225,068, filed Aug. 9, levels in a plant offers tremendous potential in agricultural 2002, which claims priority from U.S. Provisional Applica biotechnology for modifying a plant's traits, including traits tion No.60/336,049, filed Nov. 19, 2001 and U.S. Provi that improve a plant's Survival and yield during periods of sional Patent Application No. 60/310,847, filed Aug. 9, 2001; U.S. Non-provisional Application No. 10/225,066, abiotic stress, including germination in cold and hot condi filed Aug. 9, 2002; and U.S. Non-provisional Application tions, and osmotic StreSS, including drought, Salt StreSS, and No. 10/225,067, filed Aug. 9, 2002; the entire contents of other abiotic Stresses, as noted below. which are hereby incorporated by reference. 0007 Problems associated with drought. A drought is a period of abnormally dry weather that persists long enough FIELD OF THE INVENTION to produce a serious hydrologic imbalance (for example crop 0002 The present invention relates to compositions and damage, water Supply shortage, etc.). While much of the methods for modifying a plant phenotypically, Said plant weather that we experience is brief and Short-lived, drought having altered Sugar Sensing and an altered response to is a more gradual phenomenon, Slowly taking hold of an area abiotic Stresses, including osmotic stresses, including ger and tightening its grip with time. In Severe cases, drought mination in cold and heat, increased tolerance to drought and can last for many years and can have devastating effects on high Salt StreSS. agriculture and water Supplies. With burgeoning population and chronic Shortage of available fresh water, drought is not BACKGROUND OF THE INVENTION only the number one weather related problem in agriculture, it also ranks as one of the major natural disasters of all time, 0003) A plant's traits, such as its biochemical, develop causing not only economic damage, but also loSS of human mental, or phenotypic characteristics, may be controlled lives. For example, losses from the U.S. drought of 1988 through a number of cellular processes. One important way exceeded S40 billion, exceeding the losses caused by Hur to manipulate that control is through transcription factors, ricane Andrew in 1992, the Mississippi River floods of 1993, proteins that influence the expression of a particular gene or and the San Francisco earthquake in 1989. In some areas of Sets of genes. Transformed and transgenic plants that com the world, the effects of drought can be far more severe. In prise cells having altered levels of at least one Selected the Horn of Africa the 1984-1985 drought led to a famine transcription factor, for example, possess advantageous or that killed 750,000 people. desirable traits. Strategies for manipulating traits by altering a plant cell's transcription factor content can therefore result 0008 Problems for plants caused by low water availabil in plants and crops with new and/or improved commercially ity include mechanical Stresses caused by the withdrawal of valuable properties. cellular water. Drought also causes plants to become more susceptible to various diseases (Simpson (1981). “The Value 0004 Transcription factors can modulate gene expres of Physiological Knowledge of Water Stress in Plants”, In Sion, either increasing or decreasing (inducing or repressing) Water Stress on Plants, (Simpson, G. M., ed.), Praeger, N.Y., the rate of transcription. This modulation results in differ ential levels of gene expression at various developmental pp. 235-265). Stages, in different tissues and cell types, and in response to 0009. In addition to the many land regions of the world different exogenous (e.g., environmental) and endogenous that are too arid for most if not all crop plants, overuse and Stimuli throughout the life cycle of the organism. over-utilization of available water is resulting in an increas US 2004/0098764 A1 May 20, 2004 ing loSS of agriculturally-usable land, a proceSS which, in the tials (Hall et al. (2000) Plant Physiol. 123:1449-1458). extreme, results in desertification. The problem is further High-temperature damage to pollen almost always occurs in compounded by increasing Salt accumulation in Soils, as conjunction with drought StreSS, and rarely occurs under described above, which adds to the loss of available water in well-watered conditions. Thus, Separating the effects of heat Soils. and drought StreSS on pollination is difficult. Combined 0.010 Problems associated with high salt levels. One in StreSS can alter plant metabolism in novel ways, therefore five hectares of irrigated land is damaged by Salt, an impor understanding the interaction between different Stresses may tant historical factor in the decline of ancient agrarian be important for the development of Strategies to enhance Societies. This condition is only expected to worsen, further StreSS tolerance by genetic manipulation. reducing the availability of arable land and crop production, 0015 Problems associated with excessive chilling con Since none of the top five food crops-wheat, corn, rice, ditions. The term “chilling sensitivity” has been used to potatoes, and Soybean can tolerate excessive Salt. describe many types of physiological damage produced at low, but above freezing, temperatures. Most crops of tropical 0.011) Detrimental effects of salt on plants are a conse origins, Such as Soybean, rice, maize and cotton are easily quence of both water deficit resulting in OSmotic StreSS damaged by chilling. Typical chilling damage includes wilt (similar to drought stress) and the effects of excess Sodium ions on critical biochemical processes. AS with freezing and ing, necrosis, chlorosis or leakage of ions from cell mem drought, high Saline causes water deficit; the presence of branes. The underlying mechanisms of chilling Sensitivity high Salt makes it difficult for plant roots to extract water are not completely understood yet, but probably involve the from their environment (Buchanan et al. (2000) in Biochem level of membrane Saturation and other physiological defi istry and Molecular Biology of Plants, American Society of ciencies. For example, photoinhibition of photosynthesis Plant Physiologists, Rockville, Md.). Soil salinity is thus one (disruption of photosynthesis due to high light intensities) of the more important variables that determines where a often occurs under clear atmospheric conditions Subsequent plant may thrive. In many parts of the World, Sizable land to cold late Summer/autumn nights. For example, chilling areas are uncultivable due to naturally high Soil Salinity. To may lead to yield losses and lower product quality through compound the problem, Salination of Soils that are used for the delayed ripening of maize. Another consequence of poor agricultural production is a significant and increasing prob growth is the rather poor ground cover of maize fields in Spring, often resulting in Soil erosion, increased occurrence lem in regions that rely heavily on agriculture. The latter is of weeds, and reduced uptake of nutrients. A retarded uptake compounded by over-utilization, over-fertilization and water of mineral nitrogen could also lead to increased losses of Shortage, typically caused by climatic change and the nitrate into the ground water. By Some estimates, chilling demands of increasing population. Salt tolerance is of par accounts for monetary losses in the United States (U.S.) ticular importance early in a plant's lifecycle, Since evapo behind only to drought and flooding. ration from the SoilSurface causes upward water movement, and Salt accumulates in the upper Soil layer where the Seeds 0016 Desirability of altered sugar sensing. Sugars are are placed. Thus, germination normally takes place at a Salt key regulatory molecules that affect diverse processes in concentration much higher than the mean Salt level in the higher plants including germination, growth, flowering, whole soil profile. Senescence, Sugar metabolism and photosynthesis. Sucrose, for example, is the major transport form of photosynthate 0012 Problems associated with excessive heat. Germi and its flux through cells has been shown to affect gene nation of many crops is very Sensitive to temperature. A expression and alter Storage compound accumulation in transcription factor that would enhance germination in hot Seeds (Source-Sink relationships). Glucose-specific hexose conditions would be useful for crops that are planted late in Sensing has also been described in plants and is implicated the Season or in hot climates. Seedlings and mature plants in cell division and repression of "famine' genes (photo that are exposed to exceSS heat may experience heat shock, Synthetic or glyoxylate cycles). which may arise in various organs, including leaves and particularly fruit, when transpiration is insufficient to over 0017 Water deficit is a common component of many come heat StreSS. Heat also damages cellular Structures, plant Stresses. Water deficit occurs in plant cells when the including organelles and cytoskeleton, and impairs mem whole plant transpiration rate exceeds the water uptake. In brane function (Buchanan et al. (2000) in Biochemistry and addition to drought, other Stresses, Such as Salinity and low Molecular Biology of Plants, American Society of Plant temperature, produce cellular dehydration (McCue and Han Physiologists, Rockville, Md.). son (1990) Trends Biotechnol. 8:358-362 0018 Salt and drought stress signal transduction consist 0013 Heat shock may produce a decrease in overall of ionic and OSmotic homeostasis signaling pathways. The protein Synthesis, accompanied by expression of heat shock ionic aspect of Salt StreSS is signaled via the SOS pathway proteins. Heat shock proteins function as chaperones and are where a calcium-responsive SOS3-SOS2 protein kinase involved in refolding proteins denatured by heat. complex controls the expression and activity of ion trans 0.014 Heat stress often accompanies conditions of low porterS Such as SOS1. The pathway regulating ion homeo water availability. Heat itself is seen as an interacting StreSS Stasis in response to Salt StreSS has been reviewed recently by and adds to the detrimental effects caused by water deficit Xiong and Zhu (2002) Plant Cell Environ. 25:131-139. conditions. Evaporative demand exhibits near exponential 0019. The osmotic component of salt stress involves increases with increases in daytime temperatures and can complex plant reactions that overlap with drought and/or result in high transpiration rates and low plant water poten cold StreSS responses. US 2004/0098764 A1 May 20, 2004

0020 Common aspects of drought, cold and salt stress 0032. We have identified polynucleotides encoding tran response have been reviewed recently by Xiong and Zhu scription factors, including G867, G9, G993, G1930, and (2002) supra). Those include: their equivalogs listed in the Sequence Listing, and struc turally and functionally similar Sequences, developed 0021 (a) transient changes in the cytoplasmic calcium numerous transgenic plants using these polynucleotides, and levels very early in the signaling event (Knight, (2000) Int. have analyzed the plants for their tolerance to abiotic Re: Cytol. 195:269-324; Sanders et al. (1999) Plant Cell Stresses, including those associated with drought, excessive 11:691-706); Salt, cold and heat. In So doing, we have identified important polynucleotide and polypeptide Sequences for producing 0022 (b) signal transduction via mitogen-activated and/ commercially valuable plants and crops as well as the or calcium dependent protein kinases (CDPKs, see Xiong et methods for making them and using them. Other aspects and al., 2002) and protein phosphatases (Merlot et al. (2001) embodiments of the invention are described below and can Plant J. 25:295-303; Tähtiharju and Palva (2001) Plant J. be derived from the teachings of this disclosure as a whole. 26:461-470); We have identified polynucleotides encoding transcription 0023 (c) increases in abscisic acid levels in response to factors, developed numerous transgenic plants using these stress triggering a Subset of responses (Xiong et al. (2002) polynucleotides, and have analyzed the plants for their Supra, and references therein); tolerance to abiotic Stresses, including those associated with cold or oSmotic stresses Such as drought and Salt tolerance. 0024 (d) inositol phosphates as signal molecules (at least In So doing, we have identified important polynucleotide and for a Subset of the StreSS responsive transcriptional changes polypeptide Sequences for producing commercially valuable (Xiong et al. (2001) Genes Dev. 15:1971-1984); plants and crops as well as the methods for making them and using them. Other aspects and embodiments of the invention 0025 (e) activation of phospholipases which in turn are described below and can be derived from the teachings generate a diverse array of Second messenger molecules, of this disclosure as a whole. Some of which might regulate the activity of StreSS respon Sive kinases (phospholipase D functions in an ABA inde pendent pathway, Frank et al. (2000) Plant Cell 12:111 SUMMARY OF THE INVENTION 124); 0033. The invention is directed to nucleic acid sequences that may be used to transform plants and confer abiotic StreSS 0026 (f) induction of late embryogenesis abundant tolerance to those plants. These plants have been shown to (LEA) type genes including the CRT/DRE responsive COR/ be tolerant to Such diverse abiotic stresses as heat, chilling, RD genes (Xiong and Zhu (2002) supra); cold, drought, and Salt stress. The nucleic acid sequences 0027 (g) increased levels of antioxidants and compatible may be incorporated into recombinant polynucleotide con oSmolytes Such as proline and Soluble Sugars (Hasegawa et Structs prior to their use for transforming plants. These al. (2000) Annu. Rev. Plant Mol. Plant Physiol. 51:463-499); nucleic acid sequences include SEQ ID NO:1 (G867), SEQ ID NO:3 (G9), SEQ ID NO:5 (G993), SEQ ID NO:7 and (G1930), and other Sequences encoding polypeptides having 0028 (h) accumulation of reactive oxygen species such AP2 and B3 domains that have been shown to confer abiotic as Superoxide, hydrogen peroxide, and hydroxyl radicals StreSS tolerance in plants. The polypeptides encoded by the (Hasegawa et al. (2000) Supra). nucleic acid Sequences of the invention, including SEQ ID NOS:2, 4, 6, 8 and others, have been shown to contain a 0029 Abscisic acid biosynthesis is regulated by osmotic newly discovered and highly conserved Subsequence of StreSS at multiple StepS. Both ABA-dependent and -indepen about 22 amino acid residues in length, referred to herein as dent osmotic StreSS Signaling first modify constitutively the “DML motif, and which is always present between the expressed transcription factors, leading to the expression of AP2 and B3 domains of the polypeptides of the invention. early response transcriptional activators, which then activate For example, the “DML motif" of SEQ ID NO 1, (SEQ ID downstream StreSS tolerance effector genes. NO 64) is a 22-amino acid residue between the AP2 and may 0030 Based on the commonality of many aspects of cold, be found in Table 2 and in the Sequence Listing. drought and Salt StreSS responses, it can be concluded that 0034. The invention includes recombinant polynucle genes that increase tolerance to cold or Salt StreSS can also otides comprising a nucleotide Sequence that is capable of improve drought StreSS protection. In fact this has already hybridizing over its full length to the complement of SEQID been demonstrated for transcription factors (in the case of NOS:1, 3, 5, 7, or orthologs of these Sequences, under At CBF/DREB 1) and for other genes such as OsCDPK7 stringent conditions that include two wash steps of 6xSSC (Saijo et al. (2000) Plant J. 23:319-327), or AVP1 (a and 65 C., each step having a duration of 10-30 minutes. vacuolar pyrophosphatase-proton-pump, Gaxiola et al. These Sequences may be incorporated into expression vec (2001) Proc. Natl. Acad. Sci. USA 98: 11444-11449). tors and host plant cells. 0031. The present invention relates to methods and com 0035. The invention also includes an isolated nucleotide positions for producing transgenic plants with modified Sequence that hybridizes over its full length to the comple traits, particularly traits that address agricultural and food ment of a polynucleotide that encodes the DML motif under needs. These traits, including altered Sugar Sensing and stringent conditions that include two wash steps of 6xSSC tolerance to abiotic and osmotic stress (e.g., tolerance to and 65 C., each step being 10-30 minutes in duration: cold, high Salt concentrations and drought), may provide 0036) The invention also includes transgenic plants that Significant value in that they allow the plant to thrive in have increased tolerance to abiotic StreSS. In one iteration, hostile environments, where, for example, high or low these transgenic plants overexpress a recombinant poly temperature, low water availability or high Salinity may nucleotide that hybridizes over its full length to the comple limit or prevent growth of non-transgenic plants. ment of SEQ ID NO 1 under stringent conditions including US 2004/0098764 A1 May 20, 2004 two wash steps of 6xSSC and 65 C. for 10-30 minutes, in encode a polypeptide that comprises AP2 and B3 domains which instance the transgenic plants have increased abiotic that are substantially identical with the AP2 and B3 domains StreSS tolerance as compared to a non-transformed plant that of SEQID NO:2, respectively. These domains may be found does not overexpress the polypeptide. in Table 1. The next step of this method is performed by inserting the polynucleotide of (i), (ii), (iii) or (iv) into an 0037. The invention also pertains to transgenic plants that expression cassette that also includes a constitutive, induc comprise a recombinant polynucleotide that encodes a ible, or tissue-specific promoter. The expression cassette is polypeptide having an AP2 domain and a B3 domain and has then introduced into a plant or plant cell to overexpress the the property of the G867 polypeptide (SEQ ID NO:2) of polynucleotide sequence of (i), (ii), (iii) or (iv), thus pro regulating abiotic StreSS tolerance in a plant when overex ducing a transgenic plant having increased tolerance to pressed. In this instance, the AP2 domain is Sufficiently abiotic StreSS. This method may also include Steps that homologous to the AP2 and B3 domains of the G867 include identifying transgenic plants So produced that have polypeptide that the polypeptide binds to a transcription increased tolerance to abiotic StreSS, Selecting one of these regulating region comprising the motifs CAACA and CAC transgenic plants, and crossing the transgenic plant with CTG, respectively. This binding cooperatively enhances the either itself or another plant. The seed that develops in the DNA binding affinity of the polypeptide and thus confers plants produced by this crossing may be used to grow a increased abiotic StreSS tolerance in the transgenic plant, as progeny plant, thus producing a transgenic progeny plant compared to a non-transformed plant that does not overex preSS the polypeptide. that also has increased tolerance to abiotic StreSS. 0041. The invention is also directed to a method for 0.038. The polypeptide of the invention contain a DML increasing a plants tolerance to abiotic StreSS. In this motif that is Similar or identical in Sequence to those found method, the Steps include providing a vector with regulatory in the Sequence Listing (as a Subsequence of any of the elements that are able to control expression of a polynucle listed Sequences) or listed in Table 2. These transgenic plants otide Sequence in a target plant; and a polynucleotide may be characterized by altered responses to high Sugar Sequence that is flanked by the regulatory elements and that concentrations when grown in Petri plates, which is often encodes a polypeptide having an AP2 domain and a B3 indicative of an osmotic stress (one group of abiotic stress) domain that are sufficiently homologous to the AP2 and B3 tolerance phenotype. domain of SEQ ID NO:1 (G867 polypeptide), respectively, 0.039 The invention incorporates methods for producing that the polypeptide binds to the transcription regulating one or more transgenic plants that have increased tolerance regions comprising the motifs CAACA and CACCTG, to abiotic stress. This method includes the steps of providing respectively. This binding is necessary to provide the an expression vector comprising any of the recombinant polypeptide with the same property of SEQ ID NO:2 of polynucleotides of the invention, introducing this expression regulating abiotic StreSS tolerance in a plant; that is, the vector into a plant cell, and then allowing the plant cell to binding confers increased abiotic StreSS tolerance in the overexpress a polypeptide encoded by the recombinant transgenic plant as compared to a non-transformed plant that polynucleotide in the expression vector. The plant cells So does not overexpress the polypeptide. A target plant is then produced may then be cultured and allowed to develop into transformed with this vector, thus generating a transformed more mature plant, individuals of which may then be iden plant with increased tolerance to abiotic StreSS. The poly tified as having increased abiotic StreSS tolerance. The nucleotide of this aspect of the invention may include, for polypeptides of the invention that are expressed in these example: (i) SEQID NO:1; (ii) any nucleotide sequence that plants have the property of regulating abiotic StreSS tolerance encodes SEQ ID NO:2; (iii) any nucleotide sequence that in the plant, as determined by comparing the StreSS tolerance hybridizes to the nucleotide sequence of (i) or (ii) under of the plant to that of a non-transformed plant that does not stringent conditions of 6xSSC and 65 C.; or (iv) a nucle overexpress the polypeptide. Those plants that are So altered otide Sequence encoding a polypeptide comprising has AP2 and identified as having abiotic StreSS tolerance may then be and B3 domains that are substantially identical with the AP2 Selected. and B3 domains of SEO ID NO:2. 0040 Transgenic plants of the invention that have BRIEF DESCRIPTION OF THE SEQUENCE increased tolerance to abiotic StreSS may also be produced by LISTING AND FIGURES the following method. A polynucleotide (or its complement) that encodes a polypeptide having an AP2 domain and a B3 0042. The file of this patent contains at least one drawing domain is selected. In this case the AP2 domain and B3 executed in color. Copies of this patent with color draw domain is Sufficiently homologous to the corresponding ing(s) will be provided by the Patent and Trademark Office domains of SEQ ID NO:2 that the polypeptide will bind to upon request and payment of the necessary fee. a first transcription regulating region comprising the motif 0043. The Sequence Listing provides exemplary poly CAACA, and a Second transcription regulating region com nucleotide and polypeptide Sequences of the invention. The prising the motif CACCTG; respectively. This binding to the traits associated with the use of the Sequences are included transcription regulating regions conferS increased abiotic in the Examples. StreSS tolerance in the transgenic plant when the plant is compared to a nontransformed plant that does not overex 0044) CD-ROM 1 is a read-only memory computer preSS the polypeptide. The polynucleotides of this aspect of readable compact disc and contains a copy of the Sequence the invention either: (i) comprise SEQ ID NO:1; (ii) encode Listing in ASCII text format. The Sequence Listing is named SEQ ID NO:2; (iii) hybridize to any nucleotide sequence of “MBIOO49CIPST25.txt” and is 139 kilobytes in size. The (i) or (ii) under Stringent conditions that include two wash copies of the Sequence Listing on the CD-ROM disc are steps of 6xSSC and 65° C. for 10 to 30 minutes; or (iv) hereby incorporated by reference in their entirety. US 2004/0098764 A1 May 20, 2004

004.5 FIG. 1 shows a conservative estimate of phyloge or not a Specific mention of “incorporation by reference” is netic relationships among the orders of flowering plants noted. The contents and teachings of each and every one of (modified from Angiosperm Phylogeny Group (1998) Ann. the information Sources can be relied on and used to make Missouri Bot. Gard. 84:1-49). Those plants with a single and use embodiments of the invention. cotyledon (monocots) are a monophyletic clade nested within at least two major lineages of dicots, the are 0054 As used herein and in the appended claims, the further divided into and asterids. Arabidopsis is a singular forms “a,”“an,” and “the include plural reference unless the context clearly dictates otherwise. Thus, for rosid eudicot classified within the order Brassicales, rice is example, a reference to “a plant' includes a plurality of Such a member of the monocot order Poales. FIG. 1 was adapted plants, and a reference to “a stress' is a reference to one or from Daly et al. (2001) Plant Physiol. 127:1328–1333. more Stresses and equivalents thereof known to those skilled 0.046 FIG. 2 shows a phylogenic dendogram depicting in the art, and So forth. phylogenetic relationships of higher plant taxa, including clades containing tomato and Arabidopsis, adapted from Ku 0055) Definitions et al. (2000) Proc. Natl. Acad. Sci. 97.9121-91.26; and Chase 0056 “Nucleic acid molecule” refers to a oligonucle et al. (1993) Ann. Missouri Bot. Gard. 80:528-580. otide, polynucleotide or any fragment thereof. It may be 0047 FIG.3 depicts a phylogenetic tree of several mem DNA or RNA of genomic or synthetic origin, double bers of the RAV family, identified through BLAST analysis Stranded or Single-Stranded, and combined with carbohy of proprietary (using corn, Soy and rice genes) and public drate, lipids, protein, or other materials to perform a par data Sources (all plant species). This tree was generated as a ticular activity Such as transformation or form a useful Clustal X 1.81 alignment:MEGA2 tree, Maximum Parsi composition Such as a peptide nucleic acid (PNA). mony, bootstrap consensus. 0057 “Polynucleotide' is a nucleic acid molecule com 0.048 FIGS. 4A-4J show an alignment of AP2 transcrip prising a plurality of polymerized nucleotides, e.g., at least tion factors from Arabidopsis, Soybean, rice and corn, Show about 15 consecutive polymerized nucleotides, optionally at ing conserved (identical or similar residues) and the AP2 least about 30 consecutive nucleotides, at least about 50 domains, DML motifs, and B3 domains. consecutive nucleotides. A polynucleotide may be a nucleic acid, oligonucleotide, nucleotide, or any fragment thereof. In 0049. In FIG. 5, three G867-overexpressing lines and a many instances, a polynucleotide comprises a nucleotide wild-type control were germinated on media containing high Sequence encoding a polypeptide (or protein) or a domain or (150 mM) NaCl. The overexpressing lines showed increased fragment thereof. Additionally, the polynucleotide may Seedling vigor, manifested by increased expansion of the comprise a promoter, an intron, an enhancer region, a cotyledons, compared to the wild-type controls. polyadenylation Site, a translation initiation site, 5' or 3' 0050. In FIG. 6, three G867-overexpressing lines and a untranslated regions, a reporter gene, a Selectable marker, or wild-type control were germinated on media containing high the like. The polynucleotide can be single Stranded or double (9.4%) Sucrose. Increased seedling vigor was also noted stranded DNA or RNA. The polynucleotide optionally com with the overexpressors as compared to the wild-type plants, prises modified bases or a modified backbone. The poly as indicated by the increased expansion of the cotyledons in nucleotide can be, e.g., genomic DNA or RNA, a transcript the overexpressors. (such as an mRNA), a cDNA, a PCR product, a cloned DNA, a synthetic DNA or RNA, or the like. The polynucle 0051 FIG. 7 is a photograph of plants grown on MS otide can be combined with carbohydrate, lipids, protein, or media; the eight plants to the left of the vertical black line other materials to perform a particular activity Such as are from three different G9-overexpressing lines and have transformation or form a useful composition Such as a Significantly more root mass and root branching than the peptide nucleic acid (PNA). The polynucleotide can com four wild-type control plants to the right of the black line. prise a Sequence in either Sense or antisense orientations. 0.052 FIG. 8 shows that, when grown on media contain “Oligonucleotide' is substantially equivalent to the termns ing 10 uM methyl jasmonate, plants from three different amplimer, primer, oligomer, element, target, and probe and G9-overexpressing lines, as Seen with the eight plants to the is preferably Single Stranded. left of the vertical black line, have more root hairs than the 0058 “Gene” or “gene sequence” refers to the partial or four wild-type control plants to the right of the black line. complete coding Sequence of a gene, its complement, and its 5' or 3' untranslated regions. A gene is also a functional unit DESCRIPTION OF THE INVENTION of inheritance, and in physical terms is a particular Segment or sequence of nucleotides along a molecule of DNA (or 0053. In an important aspect, the present invention relates RNA, in the case of RNA viruses) involved in producing a to polynucleotides and polypeptides, for example, for modi polypeptide chain. The latter may be Subjected to Subsequent fying phenotypes of plants, particularly those associated processing Such as Splicing and folding to obtain a func with osmotic StreSS tolerance. Throughout this disclosure, tional protein or polypeptide. A gene may be isolated, various information Sources are referred to and/or are spe partially isolated, or be found with an organism's genome. cifically incorporated. The information Sources include Sci By way of example, a transcription factor gene encodes a entific journal articles, patent documents, textbooks, and transcription factor polypeptide, which may be functional or World Wide Web browser-inactive page addresses, for require processing to function as an initiator of transcription. example. While the reference to these information sources clearly indicates that they can be used by one of skill in the 0059 Operationally, genes may be defined by the cis art, each and every one of the information Sources cited trans test, a genetic test that determines whether two muta herein are Specifically incorporated in their entirety, whether tions occur in the same gene and which may be used to US 2004/0098764 A1 May 20, 2004

determine the limits of the genetically active unit (Rieger et 20%, or more than about 50%, or more, enriched, i.e., al. (1976) Glossary of Genetics and Cytogenetics: Classical alternatively denoted: 105%, 110%, 120%, 150% or more, and Molecular, 4th ed., Springer Verlag. Berlin). A gene enriched relative to wild type standardized at 100%. Such an generally includes regions preceding (“leaders'; upstream) enrichment is not the result of a natural response of a and following (“trailers'; downstream) of the coding region. wild-type plant. Alternatively, or additionally, the isolated A gene may also include intervening, non-coding Sequences, polypeptide is separated from other cellular components referred to as “introns', located between individual coding with which it is typically associated, e.g., by any of the Segments, referred to as “exons'. Most genes have an various protein purification methods herein. asSociated promoter region, a regulatory Sequence 5' of the transcription initiation codon (there are Some genes that do 0066 “Homology” refers to sequence similarity between not have an identifiable promoter). The function of a gene a reference Sequence and at least a fragment of a newly may also be regulated by enhancers, operators, and other Sequenced clone insert or its encoded amino acid Sequence. regulatory elements. 0067 “Hybridization complex” refers to a complex 0060 A “recombinant polynucleotide' is a polynucle between two nucleic acid molecules by virtue of the forma otide that is not in its native State, e.g., the polynucleotide tion of hydrogen bonds between purines and pyrimidines. comprises a nucleotide Sequence not found in nature, or the 0068 “Identity” or “similarity” refers to sequence simi polynucleotide is in a context other than that in which it is larity between two polynucleotide Sequences or between two naturally found, e.g., Separated from nucleotide Sequences polypeptide Sequences, with identity being a more Strict with which it typically is in proximity in nature, or adjacent comparison. The phrases “percent identity” and “% identity” (or contiguous with) nucleotide sequences with which it refer to the percentage of Sequence Similarity found in a typically is not in proximity. For example, the Sequence at comparison of two or more polynucleotide Sequences or two issue can be cloned into a vector, or otherwise recombined or more polypeptide Sequences. “Sequence Similarity” refers with one or more additional nucleic acid. to the percent Similarity in base pair Sequence (as deter 0061 An "isolated polynucleotide' is a polynucleotide mined by any suitable method) between two or more poly whether naturally occurring or recombinant, that is present nucleotide Sequences. Two or more Sequences can be any outside the cell in which it is typically found in nature, where from 0-100% similar, or any integer value whether purified or not. Optionally, an isolated polynucle therebetween. Identity or similarity can be determined by otide is Subject to one or more enrichment or purification comparing a position in each Sequence that may be aligned procedures, e.g., cell lysis, extraction, centrifugation, pre for purposes of comparison. When a position in the com cipitation, or the like. pared Sequence is occupied by the same nucleotide base or amino acid, then the molecules are identical at that position. 0.062. A “polypeptide' is an amino acid sequence com A degree of Similarity or identity between polynucleotide prising a plurality of consecutive polymerized amino acid Sequences is a function of the number of identical or residues e.g., at least about 15 consecutive polymerized matching nucleotides at positions shared by the polynucle amino acid residues, optionally at least about 30 consecutive otide Sequences. A degree of identity of polypeptide polymerized amino acid residues, at least about 50 consecu Sequences is a function of the number of identical amino tive polymerized amino acid residues. In many instances, a acids at positions shared by the polypeptide Sequences. A polypeptide comprises a polymerized amino acid residue degree of homology or similarity of polypeptide Sequences Sequence that is a transcription factor or a domain or portion is a function of the number of amino acids at positions or fragment thereof. Additionally, the polypeptide may com shared by the polypeptide Sequences. prise 1) a localization domain, 2) an activation domain, 3) a repression domain, 4) an oligomerization domain, or 5) a 0069. With regard to polypeptides, the terms “substantial DNA-binding domain, or the like. The polypeptide option identity” or "Substantially identical” may refer to Sequences ally comprises modified amino acid residues, naturally of Sufficient Similarity and Structure to the transcription occurring amino acid residues not encoded by a codon, factors in the Sequence Listing to produce Similar function non-naturally occurring amino acid residues. when expressed or overexpressed in a plant; in the present invention, this function is increased tolerance to abiotic 0.063 “Protein” refers to an amino acid sequence, oli StreSS. Sequences that are at least about 80% identical, to the gopeptide, peptide, polypeptide or portions thereof whether instant polypeptide Sequences, including AP2 and B3 naturally occurring or Synthetic. domain Sequences, are considered to have “Substantial iden 0.064 “Portion”, as used herein, refers to any part of a tity” with the latter. Sequences having lesser degrees of protein used for any purpose, but especially for the Screening identity but comparable biological activity are considered to of a library of molecules which specifically bind to that be equivalents. The Structure required to maintain proper portion or for the production of antibodies. functionality is related to the tertiary structure of the polypeptide. There are discreet domains and motifs within a 0065. A “recombinant polypeptide' is a polypeptide pro transcription factor that must be present within the polypep duced by translation of a recombinant polynucleotide. A tide to confer function and Specificity. These specific struc “Synthetic polypeptide' is a polypeptide created by consecu tures are required So that interactive Sequences will be tive polymerization of isolated amino acid residues using properly oriented to retain the desired activity. “Substantial methods well known in the art. An "isolated polypeptide,” identity” may thus also be used with regard to Subsequences, whether a naturally occurring or a recombinant polypeptide, for example, motifs, that are of Sufficient Structure and is more enriched in (or out of) a cell than the polypeptide in similarity, being at least about 80% identical to similar its natural State in a wild-type cell, e.g., more than about 5% motifs in other related Sequences So that each conferS or is enriched, more than about 10% enriched, or more than about required for increased tolerance to abiotic StreSS. US 2004/0098764 A1 May 20, 2004

0070 The term “amino acid consensus motif refers to 0076) The AP2-binding and B3 (or conserved) domains the portion or Subsequence of a polypeptide Sequence that is for SEQ ID NO:2, 4, 6, and 8 and numerous orthologs are Substantially conserved among the polypeptide transcription listed in Table 1. Also, the polypeptides of Table 1 have factors listed in the Sequence Listing. AP2-binding and B3 domains specifically indicated by start and Stop sites. A comparison of the regions of the polypep 0.071) “Alignment” refers to a number of nucleotide or tides in Table 1 allows one of skill in the art to identify amino acid residue Sequences aligned by lengthwise com AP2-binding and B3 domains for any of the polypeptides parison So that components in common (i.e., nucleotide listed or referred to in this disclosure. bases or amino acid residues) may be visually and readily identified. The fraction or percentage of components in 0077 “Complementary” refers to the natural hydrogen common is related to the homology or identity between the bonding by base pairing between purines and pyrimidines. Sequences. Alignments Such as those found in FIGS. 4A-4J For example, the sequence A-C-G-T (5'->3') forms hydrogen may be used to identify AP2, DML and B3 domains and bonds with its complements A-C-G-T (5->3') or A-C-G-U relatedness within these domains. An alignment may Suit (5'->3'). Two single-stranded molecules may be considered ably be determined by means of computer programs known partially complementary, if only Some of the nucleotides in the art, such as MacVector (1999) (Accelrys, Inc., San bond, or “completely complementary’ if all of the nucle Diego, Calif.). otides bond. The degree of complementarity between nucleic acid Strands affects the efficiency and Strength of the 0.072 A “conserved domain” or “conserved region” as hybridization and amplification reactions. "Fully comple used herein refers to a region in heterologous polynucleotide mentary refers to the case where bonding occurs between or polypeptide Sequences where there is a relatively high every base pair and its complement in a pair of Sequences, degree of Sequence identity between the distinct Sequences. and the two Sequences have the same number of nucleotides. AP2 binding domains and B3 domains are examples of 0078. The terms “highly stringent” or “highly stringent conserved domain. condition” refer to conditions that permit hybridization of 0.073 With respect to polynucleotides encoding presently DNA Strands whose Sequences are highly complementary, disclosed transcription factors, a conserved domain is pref wherein these same conditions exclude hybridization of erably at least 10 base pairs (bp) in length. Significantly mismatched DNAS. Polynucleotide Sequences capable of hybridizing under Stringent conditions with the 0.074. A “conserved domain”, with respect to presently polynucleotides of the present invention may be, for disclosed polypeptides refers to a domain within a transcrip example, variants of the disclosed polynucleotide tion factor family that exhibits a higher degree of Sequence Sequences, including allelic or Splice variants, or Sequences homology, Such as at least 70% sequence Similarity, includ that encode orthologs or paralogs of presently disclosed ing conservative Substitutions, and more preferably at least polypeptides. Nucleic acid hybridization methods are dis 79% sequence identity, and even more preferably at least closed in detail by Kashima et al. (1985) Nature 81%, or at least about 86%, or at least about 87%, or at least 313:402404, and Sambrook et al. (1989) Molecular Clon about 89%, or at least about 91%, or at least about 95%, or ing.A Laboratory Manual, 2nd Ed., Cold Spring Harbor at least about 98% amino acid residue Sequence identity of Laboratory, Cold Spring Harbor, N.Y. (“Sambrook”); and by a polypeptide of consecutive amino acid residues. A frag Haymes et al., “Nucleic Acid Hybridization: A Practical ment or domain can be referred to as outside a conserved Approach”, IRL Press, Washington, D.C. (1985), which domain, outside a consensus Sequence, or outside a consen references are incorporated herein by reference. sus DNA-binding site that is known to exist or that exists for a particular transcription factor class, family, or Subfamily. 0079. In general, stringency is determined by the tem In this case, the fragment or domain will not include the perature, ionic Strength, and concentration of denaturing exact amino acids of a consensus Sequence or consensus agents (e.g., formamide) used in a hybridization and wash DNA-binding Site of a transcription factor class, family or ing procedure (for a more detailed description of establish Sub-family, or the exact amino acids of a particular tran ing and determining Stringency, see below). The degree to Scription factor consensus Sequence or consensus DNA which two nucleic acids hybridize under various conditions binding Site. Furthermore, a particular fragment, region, or of Stringency is correlated with the extent of their Similarity. domain of a polypeptide, or a polynucleotide encoding a Thus, Similar nucleic acid Sequences from a variety of polypeptide, can be “outside a conserved domain if all the Sources, Such as within a plant's genome (as in the case of amino acids of the fragment, region, or domain fall outside paralogs) or from another plant (as in the case of orthologs) of a defined conserved domain(s) for a polypeptide or that may perform Similar functions can be isolated on the protein. Sequences having lesser degrees of identity but basis of their ability to hybridize with known transcription comparable biological activity are considered to be equiva factor Sequences. Numerous variations are possible in the lents. conditions and means by which nucleic acid hybridization can be performed to isolate transcription factor Sequences 0075. As one of ordinary skill in the art recognizes, having Similarity to transcription factor Sequences known in conserved domains may be identified as regions or domains the art and are not limited to those explicitly disclosed of identity to a specific consensus sequence (See, for herein. Such an approach may be used to isolate polynucle example, Riechmann et al. (2000) Supra). Thus, by using otide Sequences having various degrees of Similarity with alignment methods well known in the art, the conserved disclosed transcription factor Sequences, Such as, for domains (i.e., the AP2 domains) of the AP2 plant transcrip example, transcription factors having 60% identity, or more tion factors (Riechmann and Meyerowitz (1998) Biol. Chem. preferably greater than about 70% identity, most preferably 379:633-646) may be determined. 72% or greater identity with disclosed transcription factors. US 2004/0098764 A1 May 20, 2004

0080 Regarding the terms “paralog and “ortholog”, degree of equivalent or similar biological activity) but homologous polynucleotide Sequences and homologous differs in Sequence from the Sequence in the Sequence polypeptide Sequences may be paralogs or Orthologs of the Listing, due to degeneracy in the genetic code. Included claimed polynucleotide or polypeptide Sequence. Orthologs within this definition are polymorphisms that may or may and paralogs are evolutionarily related genes that have not be readily detectable using a particular oligonucleotide Similar Sequence and Similar functions. Orthologs are struc probe of the polynucleotide encoding polypeptide, and turally related genes in different Species that are derived by improper or unexpected hybridization to allelic variants, a speciation event. Paralogs are Structurally related genes with a locus other than the normal chromosomal locus for within a single Species that are derived by a duplication the polynucleotide Sequence encoding polypeptide. event. Sequences that are Sufficiently similar to one another will be appreciated by those of skill in the art and may be 0085 “Allelic variant” or “polynucleotide allelic variant” based upon percentage identity of the complete Sequences, refers to any of two or more alternative forms of a gene percentage identity of a conserved domain or Sequence occupying the same chromosomal locus. Allelic variation within the complete Sequence, percentage Similarity to the arises naturally through mutation, and may result in pheno complete Sequence, percentage Similarity to a conserved typic polymorphism within populations. Gene mutations domain or Sequence within the complete Sequence, and/or an may be “silent” or may encode polypeptides having altered arrangement of contiguous nucleotides or peptides particular amino acid Sequence. “Allelic variant' and “polypeptide to a conserved domain or complete Sequence. Sequences allelic variant may also be used with respect to polypep that are sufficiently similar to one another will also bind in tides, and in this case the term refer to a polypeptide encoded a similar manner to the same DNA binding Sites of tran by an allelic variant of a gene. Scriptional regulatory elements using methods well known 0086) “Splice variant” or “polynucleotide splice variant” to those of skill in the art. as used herein refers to alternative forms of RNA transcribed 0081. The term “equivalog describes members of a set from a gene. Splice variation naturally occurs as a result of of homologous proteins that are conserved with respect to alternative sites being Spliced within a Single transcribed function Since their last common ancestor. Related proteins RNA molecule or between separately transcribed RNA are grouped into equivalog families, and otherwise into molecules, and may result in Several different forms of protein families with other hierarchically defined homology mRNA transcribed from the Same gene. This, Splice variants types. This definition is provided at the Institute for may encode polypeptides having different amino acid Genomic Research (TIGR) worldwide web (www) website, Sequences, which may or may not have similar functions in “tigr.org under the heading "Terms associated with TIGR the organism. “Splice variant' or “polypeptide splice Vari FAMS. ant” may also refer to a polypeptide encoded by a splice variant of a transcribed mRNA. 0082 The term “variant”, as used herein, may refer to polynucleotides or polypeptides, that differ from the pres 0087 As used herein, “polynucleotide variants' may also ently disclosed polynucleotides or polypeptides, respec refer to polynucleotide Sequences that encode paralogs and orthologs of the presently disclosed polypeptide Sequences. tively, in Sequence from each other, and as Set forth below. “Polypeptide variants' may refer to polypeptide Sequences 0.083 With regard to polynucleotide variants, differences that are paralogs and orthologs of the presently disclosed between presently disclosed polynucleotides and polynucle polypeptide Sequences. otide variants are limited So that the nucleotide Sequences of the former and the latter are closely similar overall and, in 0088. Differences between presently disclosed polypep many regions, identical. Due to the degeneracy of the tides and polypeptide variants are limited So that the genetic code, differences between the former and latter Sequences of the former and the latter are closely similar nucleotide sequences o may be silent (i.e., the amino acids overall and, in many regions, identical. Presently disclosed encoded by the polynucleotide are the same, and the variant polypeptide Sequences and Similar polypeptide variants may polynucleotide Sequence encodes the same amino acid differ in amino acid Sequence by one or more Substitutions, Sequence as the presently disclosed polynucleotide. Variant additions, deletions, fusions and truncations, which may be nucleotide Sequences may encode different amino acid present in any combination. These differences may produce Sequences, in which case Such nucleotide differences will Silent changes and result in a functionally equivalent tran result in amino acid Substitutions, additions, deletions, inser Scription factor. Thus, it will be readily appreciated by those tions, truncations or fusions with respect to the Similar of skill in the art, that any of a variety of polynucleotide disclosed polynucleotide Sequences. These variations result Sequences is capable of encoding the transcription factors in polynucleotide variants encoding polypeptides that share and transcription factor homolog polypeptides of the inven at least one functional characteristic. The degeneracy of the tion. A polypeptide Sequence variant may have “conserva genetic code also dictates that many different variant poly tive' changes, wherein a Substituted amino acid has similar nucleotides can encode identical and/or Substantially similar Structural or chemical properties. Deliberate amino acid polypeptides in addition to those Sequences illustrated in the Substitutions may thus be made on the basis of Similarity in Sequence Listing. polarity, charge, Solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues, as long as the 0084. Also within the scope of the invention is a variant functional or biological activity of the transcription factor is of a transcription factor nucleic acid listed in the Sequence retained. For example, negatively charged amino acids may Listing, that is, one having a sequence that differs from the include aspartic acid and glutamic acid, positively charged one of the polynucleotide Sequences in the Sequence List amino acids may include lysine and arginine, and amino ing, or a complementary Sequence, that encodes a function acids with uncharged polar head groups having Similar ally equivalent polypeptide (i.e., a polypeptide having Some hydrophilicity values may include leucine, isoleucine, and US 2004/0098764 A1 May 20, 2004

Valine, glycine and alanine; asparagine and glutamine; transformation of a parent plant. A plant refers to a whole Serine and threonine; and phenylalanine and tyrosine (for plant as well as to a plant part, Such as Seed, fruit, leaf, or more detail on conservative substitutions, see Table 3). More root, plant tissue, plant cells or any other plant material, for rarely, a variant may have “non-conservative' changes, for example, a plant explant, as well as to progeny thereof, and example, replacement of a glycine with a tryptophan. Simi to in vitro Systems that mimic biochemical or cellular lar minor variations may also include amino acid deletions components or processes in a cell. or insertions, or both. Related polypeptides may comprise, for example, additions and/or deletions of one or more 0094) “Control plant” refers to a plant that serves as a N-linked or 0-linked glycosylation Sites, or an addition Standard of comparison for testing the results of a treatment and/or a deletion of one or more cysteine residues. Guidance or genetic alteration, or the degree of altered expression of in determining which and how many amino acid residues a gene or gene product. Examples of control plants include may be substituted, inserted or deleted without abolishing plants that are untreated, or genetically unaltered (i.e., wild functional or biological activity may be found using com type). puter programs well known in the art, for example, DNAS 0.095 “Wild type”, as used herein, refers to a cell, tissue TAR software (see U.S. Pat. No. 5,840,544). or plant that has not been genetically modified to knock out 0089) “Ligand” refers to any molecule, agent, or com or overexpress one or more of the presently disclosed pound that will bind Specifically to a complementary Site on transcription factors. Wild-type cells, tissue or plants may be a nucleic acid molecule or protein. Such ligands Stabilize or used as controls to compare levels of expression and the modulate the activity of nucleic acid molecules or proteins extent and nature of trait modification with cells, tissue or of the invention and may be composed of at least one of the plants in which transcription factor expression is altered or following:inorganic and organic Substances including ectopically expressed, e.g., in that it has been knocked out or nucleic acids, proteins, carbohydrates, fats, and lipids. overexpressed. 0090 “Modulates” refers to a change in activity (biologi 0096) “Fragment', with respect to a polynucleotide, cal, chemical, or immunological) or lifespan resulting from refers to a clone or any part of a polynucleotide molecule Specific binding between a molecule and either a nucleic that retains a usable, functional characteristic. Useful frag acid molecule or a protein. ments include oligonucleotides and polynucleotides that may be used in hybridization or amplification technologies 0.091 The term “plant” includes whole plants, shoot or in the regulation of replication, transcription or transla vegetative organs/structures (for example, leaves, stems and tion. A polynucleotide fragment” refers to any Subsequence tubers), roots, flowers and floral organs/structures (for of a polynucleotide, typically, of at least about 9 consecutive example, bracts, Sepals, petals, stamens, carpels, anthers and nucleotides, preferably at least about 30 nucleotides, more ovules), Seed (including embryo, endosperm, and Seed coat) preferably at least about 50 nucleotides, of any of the and fruit (the mature ovary), plant tissue (for example, Sequences provided herein. Exemplary polynucleotide frag vascular tissue, ground tissue, and the like) and cells (for ments are the first Sixty consecutive nucleotides of the example, guard cells, egg cells, and the like), and progeny of transcription factor polynucleotides listed in the Sequence Same. The class of plants that can be used in the method of Listing. Exemplary fragments also include fragments that the invention is generally as broad as the class of higher and lower plants amenable to transformation techniques, includ comprise a region that encodes an AP2 domain of a tran ing angiosperms (monocotyledonous and dicotyledonous Scription factor. plants), gymnosperms, ferns, horsetails, psilophytes, lyco 0097 Fragments may also include Subsequences of phytes, bryophytes, and multicellular algae. (See for polypeptides and protein molecules, or a Subsequence of the example, FIG. 1, adapted from Daly et al. (2001) Plant polypeptide. Fragments may have uses in that they may have Physiol. 127:1328-1333; FIG. 2, adapted from Ku et al. antigenic potential. In Some cases, the fragment or domain (2000) Proc. Natl. Acad. Sci. 97.9121-9126; and see also is a Subsequence of the polypeptide which performs at least Tudge in The Variety of Life, Oxford University Press, New one biological function of the intact polypeptide in Substan York, N.Y. (2000) pp. 547-606). tially the same manner, or to a Similar extent, as does the 0092. A “transgenic plant” refers to a plant that contains intact polypeptide. For example, a polypeptide fragment can genetic material not found in a wild-type plant of the same comprise a recognizable structural motif or functional Species, variety or cultivar. The genetic material may include domain such as a DNA-binding site or domain that binds to a transgene, an insertional mutagenesis event (Such as by a DNA promoter region, an activation domain, or a domain transposon or T-DNA insertional mutagenesis), an activation for protein-protein interactions, and may initiate transcrip tagging Sequence, a mutated Sequence, a homologous tion. Fragments can vary in size from as few as 3 amino acid recombination event or a Sequence modified by chimera residues to the full length of the intact polypeptide, but are plasty. Typically, the foreign genetic material has been preferably at least about 30 amino acid residues in length and more preferably at least about 60 amino acid residues in introduced into the plant by human manipulation, but any length. Exemplary polypeptide fragments are the first twenty method can be used as one of skill in the art recognizes. consecutive amino acids of a mammalian protein encoded 0093. A transgenic plant may contain an expression vec by are the first twenty consecutive amino acids of the tor or cassette. The expression cassette typically comprises transcription factor polypeptides listed in the Sequence a polypeptide-encoding sequence operably linked (i.e., Listing. Exemplary fragments also include fragments that under regulatory control of) to appropriate inducible or comprise an AP2 binding or a B3 domain of a transcription constitutive regulatory Sequences that allow for the expres factor, for example, amino acid residues 59-124 or amino Sion of polypeptide. The expression cassette can be intro acid residues 187-272 of G867 (SEQ ID NO:2), as noted in duced into a plant by transformation or by breeding after Table 1. US 2004/0098764 A1 May 20, 2004

0098. The invention also encompasses production of 0103). “Ectopic expression or altered expression” in ref DNA sequences that encode transcription factors and tran erence to a polynucleotide indicates that the pattern of Scription factor derivatives, or fragments thereof, entirely by expression in, for example, a transgenic plant or plant tissue, Synthetic chemistry. After production, the Synthetic is different from the expression pattern in a wild-type plant Sequence may be inserted into any of the many available or a reference plant of the same Species. The pattern of expression vectors and cell Systems using reagents well expression may also be compared with a reference expres known in the art. Moreover, synthetic chemistry may be Sion pattern in a wild-type plant of the same Species. For used to introduce mutations into a sequence encoding tran example, the polynucleotide or polypeptide is expressed in Scription factors or any fragment thereof. a cell or tissue type other than a cell or tissue type in which 0099) “Derivative” refers to the chemical modification of the Sequence is expressed in the wild-type plant, or by a nucleic acid molecule or amino acid Sequence. Chemical expression at a time other than at the time the Sequence is modifications can include replacement of hydrogen by an expressed in the wild-type plant, or by a response to different alkyl, acyl, or amino group or glycosylation, pegylation, or inducible agents, Such as hormones or environmental Sig any Similar process that retains or enhances biological nals, or at different expression levels (either higher or lower) activity or lifespan of the molecule or Sequence. compared with those found in a wild-type plant. The term 0100. A “trait” refers to a physiological, morphological, also refers to altered expression patterns that are produced biochemical, or physical characteristic of a plant or particu by lowering the levels of expression to below the detection lar plant material or cell. In Some instances, this character level or completely abolishing expression. The resulting istic is visible to the human eye, Such as Seed or plant Size, expression pattern can be transient or Stable, constitutive or or can be measured by biochemical techniques, Such as inducible. In reference to a polypeptide, the term “ectopic detecting the protein, Starch, or oil content of Seed or leaves, expression or altered expression' further may relate to or by observation of a metabolic or physiological process, altered activity levels resulting from the interactions of the e.g. by measuring tolerance to water deprivation or particu polypeptides with exogenous or endogenous modulators or lar Salt or Sugar concentrations, or by the observation of the from interactions with factors or as a result of the chemical expression level of a gene or genes, for example, by employ modification of the polypeptides. ing Northern analysis, RT-PCR, microarray gene expression 0104. The term “overexpression” as used herein refers to assays, or reporter gene expression Systems, or by agricul a greater expression level of a gene in a plant, plant cell or tural observations Such as OSmotic StreSS tolerance or yield. plant tissue, compared to expression in a wild-type plant, Any technique can be used to measure the amount of, cell or tissue, at any developmental or temporal Stage for the comparative level of, or difference in any selected chemical gene. Overexpression can occur when, for example, the compound or macromolecule in the transgenic plants, how genes encoding one or more transcription factors are under CWC. the control of a strong expression signal, Such as one of the 0101) “Trait modification” refers to a detectable differ promoters described herein (for example, the cauliflower ence in a characteristic in a plant ectopically expressing a mosaic virus 35S transcription initiation region). Overex polynucleotide or polypeptide of the present invention rela pression may occur throughout a plant or in Specific tissues tive to a plant not doing So, Such as a wild-type plant. In of the plant, depending on the promoter used, as described Some cases, the trait modification can be evaluated quanti below. tatively. For example, the trait modification can entail at least about a 2% increase or decrease in an observed trait 0105 Overexpression may take place in plant cells nor (difference), at least a 5% difference, at least about a 10% mally lacking expression of polypeptides functionally difference, at least about a 20% difference, at least about a equivalent or identical to the present transcription factors. 30%, at least about a 50%, at least about a 70%, or at least Overexpression may also occur in plant cells where endog about a 100%, or an even greater difference compared with enous expression of the present transcription factors or a wild-type plant. It is known that there can be a natural functionally equivalent molecules normally occurs, but Such variation in the modified trait. Therefore, the trait modifi normal expression is at a lower level. Overexpression thus cation observed entails a change of the normal distribution results in a greater than normal production, or "overproduc of the trait in the plants compared with the distribution tion' of the transcription factor in the plant, cell or tissue. observed in wild-type plants. 0106 The term “transcription regulating region” refers to 0102) The term “transcript profile” refers to the expres a DNA regulatory Sequence that regulates expression of one Sion levels of a Set of genes in a cell in a particular State, or more genes in a plant when a transcription factor having particularly by comparison with the expression levels of that one or more specific binding domains binds to the DNA Same Set of genes in a cell of the same type in a reference regulatory Sequence. Transcription factors of the present State. For example, the transcript profile of a particular invention possess an AP2 domain, a B3 domain, or both of transcription factor in a Suspension cell is the expression these binding domains. The AP2 domain of the transcription levels of a Set of genes in a cell repressing or overexpressing factor binds to a transcription regulating region comprising that transcription factor compared with the expression levels the motif CAACA, and the B3 domain of the same tran of that same Set of genes in a Suspension cell that has normal Scription factor binds to a transcription regulating region levels of that transcription factor. The transcript profile can comprising the motif CACCTG. The transcription factors of be presented as a list of those genes whose expression level the invention also comprise an amino acid Subsequence that is Significantly different between the two treatments, and the forms a transcription activation domain that regulates difference ratios. Differences and similarities between expression of one or more abiotic StreSS tolerance genes in expression levels may also be evaluated and calculated using a plant when the transcription factor binds to the regulating Statistical and clustering methods. region. US 2004/0098764 A1 May 20, 2004

0107 The term “phase change” refers to a plant's pro 0113. The sequences of the present invention may be gression from embryo to adult, and, by Some definitions, the from any Species, particularly plant Species, in a naturally transition wherein flowering plants gain reproductive com occurring form or from any Source whether natural, Syn petency. It is believed that phase change occurs either after thetic, Semi-Synthetic or recombinant. The Sequences of the a certain number of cell divisions in the shoot apex of a invention may also include fragments of the present amino developing plant, or when the Shoot apex achieves a par acid Sequences. Where "amino acid Sequence' is recited to ticular distance from the roots. Thus, altering the timing of refer to an amino acid Sequence of a naturally occurring phase changes may affect a plant's size, which, in turn, may protein molecule, "amino acid Sequence' and like terms are affect yield and biomass. not meant to limit the amino acid Sequence to the complete 0108. A “sample” with respect to a material containing native amino acid Sequence associated with the recited nucleic acid molecules may comprise a bodily fluid; an protein molecule. extract from a cell, chromosome, organelle, or membrane isolated from a cell; genomic DNA, RNA, or cDNA in 0114. In addition to methods for modifying a plant phe Solution or bound to a Substrate; a cell; a tissue; a tissue notype by employing one or more polynucleotides and print, a forensic Sample, and the like. In this context "Sub polypeptides of the invention described herein, the poly Strate” refers to any rigid or Semi-rigid Support to which nucleotides and polypeptides of the invention have a variety nucleic acid molecules or proteins are bound and includes of additional uses. These uses include their use in the membranes, filters, chips, Slides, wafers, fibers, magnetic or recombinant production (i.e., expression) of proteins; as nonmagnetic beads, gels, capillaries or other tubing, plates, regulators of plant gene expression, as diagnostic probes for polymers, and microparticles with a variety of Surface forms the presence of complementary or partially complementary including wells, trenches, pins, channels and pores. A Sub nucleic acids (including for detection of natural coding Strate may also refer to a reactant in a chemical or biological nucleic acids); as Substrates for further reactions, for reaction, or a Substance acted upon (for example, by an example, mutation reactions, PCR reactions, or the like; as enzyme). Substrates for cloning for example, including digestion or 0109) “Substantially purified” refers to nucleic acid mol ligation reactions, and for identifying exogenous or endog ecules or proteins that are removed from their natural enous modulators of the transcription factors. In many environment and are isolated or Separated, and are at least instances, a polynucleotide comprises a nucleotide Sequence about 60% free, preferably about 75% free, and most encoding a polypeptide (or protein) or a domain or fragment preferably about 90% free, from other components with thereof. Additionally, the polynucleotide may comprise a which they are naturally associated. promoter, an intron, an enhancer region, a polyadenylation Site, a translation initiation site, 5' or 3' untranslated regions, DETALED DESCRIPTION a reporter gene, a Selectable marker, or the like. The poly 0110 Transcription Factors Modify Expression of nucleotide can be single stranded or double stranded DNA or Endogenous Genes RNA. The polynucleotide optionally comprises modified bases or a modified backbone. The polynucleotide can be, 0111 A transcription factor may include, but is not lim for example, genomic DNA or RNA, a transcript (Such as an ited to, any polypeptide that can activate or repress tran Scription of a single gene or a number of genes. AS one of mRNA), a cDNA, a PCR product, a cloned DNA, a synthetic ordinary skill in the art recognizes, transcription factors can DNA or RNA, or the like. The polynucleotide can comprise be identified by the presence of a region or domain of a Sequence in either Sense or antisense orientations. Structural Similarity or identity to a specific consensus 0115 Expression of genes that encode transcription fac Sequence or the presence of a specific consensus DNA tors that modify expression of endogenous genes, polynucle binding site or DNA-binding site motif (see, for example, otides, and proteins are well known in the art. In addition, Riechmann et al. (2000) Science 290:2105-2110). The plant transgenic plants comprising isolated polynucleotides transcription factors may belong to the AP2 protein tran encoding transcription factors may also modify expression scription factor family (Riechmann and Meyerowitz (1998) of endogenous genes, polynucleotides, and proteins. Supra). Examples include Peng et al. (1997) Genes Development 0112 Generally, the transcription factors encoded by the 11:3194-3205, and Peng et al. (1999) Nature, 400:256-261). present Sequences are involved in cell differentiation and In addition, many others have demonstrated that an Arabi proliferation and the regulation of growth. Accordingly, one dopsis transcription factor expressed in an exogenous plant skilled in the art would recognize that by expressing the Species elicits the same or very similar phenotypic response present Sequences in a plant, one may change the expression (see, for example, Fu et al. (2001) Plant Cell 13:1791-1802; of autologous genes or induce the expression of introduced genes. By affecting the expression of Similar autologous Nandiet al. (2000) Curr. Biol. 10:215-218; Coupland (1995) Sequences in a plant that have the biological activity of the Nature 377:482483; and Weigel and Nilsson (1995) Nature present Sequences, or by introducing the present Sequences 377:482-500). into a plant, one may alter a plant's phenotype to one with 0116. In another example, Mandel et al. (1992) Cell improved traits related to OSmotic Stresses. The Sequences of 71-133-143), and Suzuki et al.(2001) Plant J. 28: 409418 the invention may also be used to transform a plant and teach that a transcription factor expressed in another plant introduce desirable traits not found in the wild-type cultivar Species elicits the same or very similar phenotypic response or strain. Plants may then be selected for those that produce of the endogenous Sequence, as often predicted in earlier the most desirable degree of over- or under-expression of Studies of Arabidopsis transcription factors in Arabidopsis target genes of interest and coincident trait improvement. (see Mandel et al. (1992) supra; Suzuki et al. (2001) supra). US 2004/0098764 A1 May 20, 2004 12

0117. Other examples include Muiller et al. (2001) Plant 0.122 The sequences of G867 and G9 were previously J. 28:169-179); Kim et al. (2001) Plant J. 25:247-259); identified in U.S. provisional patent application 60/101,349, Kyozuka and Shimamoto (2002) Plant Cell Physiol. 43: 130 filed Sep. 22, 1998, at which time these sequences were 135); Boss and Thomas (2002) Nature, 416:847-850); He et identified as encoding or being transcription factors, which al. (2000) Transgenic Res. 9:223-227); and Robson et al. were defined as polypeptides having the ability to effect (2001) Plant J. 28:619-631). transcription of a target gene. It is noted that Sequences that have gene-regulating activity have been determined to have 0118. In yet another example, Gilmour et al. (1998) Plant specific and substantial utility (Federal Register (2001) J. 16:433-442, teach an Arabidopsis AP2 transcription fac 66(4):1095). The functions of G867 and G9 were previously tor, CBF 1 (SEQ ID NO:55), which, when overexpressed in disclosed in U.S. provisional patent applications 60/227, transgenic plants, increases plant freezing tolerance. Jaglo et 439, filed Aug. 22, 2000, and 60/166.228, filed Nov. 17, al. (2001) Plant Physiol. 127:910-917, further identified 1999, respectively. The sequence of G993 was previously Sequences in Brassica napuS which encode CBF-like genes identified in U.S. provisional applications 60/108,734, filed and that transcripts for these genes accumulated rapidly in Nov. 17, 1998, and 60/125,814, filed Mar. 23, 1999. The response to low temperature. Transcripts encoding CBF-like function of G993 was implied from its homologous rela tionship with G867, as disclosed in U.S. non-provisional proteins were also found to accumulate rapidly in response application 09/934,455, filed Aug. 22, 2001. The sequence to low temperature in wheat, as well as in tomato. An of G1930 was previously identified in U.S. non-provisional alignment of the CBF proteins from Arabidopsis, B. napus, application 09/934,455, filed Aug. 22, 2001. The functions wheat, rye, and tomato revealed the presence of conserved of G1930 were previously disclosed in U.S. non-provisional consecutive amino acid residues, PKK/RPAGRXKFX patent application 09/934,455, filed Aug. 22, 2001. ETRHP and DSAWR, that bracket the AP2/EREBP DNA binding domains of the proteins and distinguish them from 0123. In Some cases, exemplary polynucleotides encod other members of the AP2/EREBP protein family (Jaglo et ing the polypeptides of the invention were identified in the al. (2001) supra). Arabidopsis thaliana GenBank database using publicly available Sequence analysis programs and parameters. 0119 Transcription factors mediate cellular responses Sequences initially identified were then further character and control traits through altered expression of genes con ized to identify Sequences comprising Specified Sequence taining cis-acting nucleotide Sequences that are targets of the Strings corresponding to Sequence motifs present in families introduced transcription factor. It is well appreciated in the of known transcription factors. In addition, further exem art that the effect of a transcription factor on cellular plary polynucleotides encoding the polypeptides of the responses or a cellular trait is determined by the particular invention were identified in the plant GenBank database genes whose expression is either directly or indirectly (for using publicly available Sequence analysis programs and example, by a cascade of transcription factor binding events parameters. Sequences initially identified were then further and transcriptional changes) altered by transcription factor characterized to identify Sequences comprising Specified binding. In a global analysis of transcription comparing a Sequence Strings corresponding to Sequence motifs present Standard condition with one in which a transcription factor in families of known transcription factors. Polynucleotide is overexpressed, the resulting transcript profile associated Sequences meeting Such criteria were confirmed as transcrip with transcription factor overexpression is related to the trait tion factors. or cellular process controlled by that transcription factor. For 0.124. Additional polynucleotides of the invention were example, the PAP2 gene (and other genes in the MYB identified by Screening Arabidopsis thaliana and/or other family) have been shown to control anthocyanin biosynthe plant cDNA libraries with probes corresponding to known sis through regulation of the expression of genes known to transcription factors under low Stringency hybridization be involved in the anthocyanin biosynthetic pathway (Bruce conditions. Additional Sequences, including full length cod et al. (2000) Plant Cell, 12:65-79; Borevitz et al. (2000) ing Sequences were Subsequently recovered by the rapid Plant Cell 12:2383-93). Further, global transcript profiles amplification of cDNA ends (RACE) procedure, using a have been used Successfully as diagnostic tools for Specific commercially available kit according to the manufacturer's cellular states (for example, cancerous vs. non-cancerous; instructions. Where necessary, multiple rounds of RACE are Bhattacharjee et al. (2001) Proc Natl. Acad. Sci., USA, performed to isolate 5' and 3' ends. The full-length cDNA 98:13790-13795; Xu et al. (2001) Proc. Natl. Acad. Sci., was then recovered by a routine end-to-end polymerase USA, 98:15089-15094). Consequently, it is evident to one chain reaction (PCR) using primers specific to the isolated 5' skilled in the art that Similarity of transcript profile upon and 3' ends. Exemplary Sequences are provided in the overexpression of different transcription factors would indi Sequence Listing. cate Similarity of transcription factor function. 0.125 The polynucleotides of the invention can be or 0120 Polypeptides and Polynucleotides of the Invention were ectopically expressed in overexpressor or knockout plants and the changes in the characteristic(s) or trait(s) of 0121 The present invention provides, among other the plants observed. Therefore, the polynucleotides and things, transcription factors (TFS), and transcription factor polypeptides can be employed to improve the characteristics homolog polypeptides, and isolated or recombinant poly of plants. nucleotides encoding the polypeptides, or novel Sequence variant polypeptides or polynucleotides encoding novel 0.126 The polynucleotides of the invention can be or variants of transcription factors derived from the Specific were ectopically expressed in overexpressor plant cells and Sequences provided here. These polypeptides and polynucle the changes in the expression levels of a number of genes, otides may be employed to modify a plant's characteristics. polynucleotides, and/or proteins of the plant cells observed. US 2004/0098764 A1 May 20, 2004

Therefore, the polynucleotides and polypeptides can be Plant Physiol. 102: 1185-1191) and the maize ortholog VP1 employed to change expression levels of a genes, polynucle (Carson et al. (1997) Plant J. 12:1231-1240) and references otides, and/or proteins of plants. therein) show Severe defects in the attainment of desiccation tolerance. Also, 35S::ABI3 overexpression in combination 0127 G867, which we have determined to confer with increased levels of abscisic acid results in an induction oSmotic StreSS tolerance in plants when overexpressed, has of Several ABA/cold/drought-responsive genes Such as been described in the literature as related to ABI3/VP1 (RA VI; Kagaya et al. (1999) Nucleic Acids Res. 27:470-478) RAB18 and RD29A and increased freezing tolerance in based on the presence of a B3 domain (which is also found Arabidopsis (Tammninen et al. (2001) Plant J. 25:1-8). This in the ABI3/VP1 family of transcription factors). The protein illustrates the relatedness of desiccation and dehydration also contains an AP2 domain, and is therefore presently tolerance and demonstrates that the Seed-specific ABI3 included in the AP2/ERF family of transcription factors. transcription factor does not require additional Seed-specific Both the AP2 domain transcription factors and the B3 proteins to function in vegetative tissues. domain transcription factors are described below. 0.135 Both in Arabidopsis and maize, the B3 domain of 0128 AP2 domain transcription factors. Ohme-Takagi ABI3NP 1 binds the RY/SPH motif (Ezcurra et al. (2000) and Shinshi (1995). Plant Cell 7, 173-182) determined that Plant J. 24:57-66); Carson et al. (1997) supra) while the B2 the function of the AP2 domain is DNA binding. The AP2 domain interacts with the ABRE elements in a complex region of the putative tobacco transcription factor EREBP2 involving bZIP transcription factors (TRAB 1 in maize, is responsible for its binding to the cis-acting ethylene Hobo et al. (1999) Proc. Natl. Acad. Sci. USA 96:15348 15353). While in Arabidopsis the B3 domain of ABI3 is response DNA element referred to as the GCC-repeat. As essential for abscisic acid dependent activation of late discussed by Ohme-Takagi and Shinshi (1995) Supra), the embryogenesis genes (Ezcurra et al. (2000) Supra), the B3 DNA-binding or AP2 domain of EREBP2 contains no domain of VPI is not essential for ABA regulated gene Significant amino acid Sequence Similarities or obvious expression in maize seed (Carson et al. (1997) Supra; Structural Similarities with other known transcription factors McCarty et al. (1989) Plant Cell 1:523-532). This difference or DNA binding motifs beyond AP2 transcription factors. in the regulatory network between Arabidopsis and maize Thus, the domain appears to be a novel DNA-binding motif can be explained by differential usage of the RY/SPH versus that, to date, has only been found in plant proteins. the ABRE element in the control of seed maturation gene 0129. The RAV-like proteins, including G897, G9, G993 expression (motif (Ezcurra et al. (2000) supra). The RY/SPH and G1930, form a small subgroup in the AP2/ERF family element is a key element in gene regulation during late of AP2 transcription factors. This large gene family includes embryogenesis in Arabidopsis (Reidt et al. (2000) Plant J. at least 145 transcription factors, and can be further divided 21:401408) while it seems to be less important for seed in three larger Subfamilies: maturation in maize (McCarty et al. (1989) supra). 0130 (a) The APETALA2 class is characterized by the 0.136 Mutations in two other B3 domain transcription presence of two AP2 DNA binding domains, and contains factors, FUS3 (G1014) and LEC2 (G3035) result in pleio fourteen genes. tropic effects. In the case of fuS3, these effects are mainly restricted to Seed development during late embryogenesis 0131 (b) The RAV subgroup, which includes six genes, (Keith et al. (1994) Plant Cell 6:589-600). Overexpression is characterized by the presence of a B3 DNA binding of LEC2 results in somatic embryo formation on the coty domain in addition to the AP2 DNA binding domain. ledons (Stone et al. (2001) Proc. Natl. Acad. Sci. USA 0132 (c) The AP2/ERF subfamily, which is the largest 98: 11806-11811). The FUS3 protein can be considered as a Subfamily and includes 125 genes, is characterized by the natural truncation of the ABI3 protein (Luerssen et al. (1998) presence of only one AP2 DNA binding domain, and Plant J. 15:755-764); like the latter, it binds to the RY/SPH includes genes that are involved in abiotic and biotic StreSS element, and can activate the expression from target pro responses. This Subfamily is composed of two relatively moters even in non-seed tissues (Reidt et al. (2000) supra). equal size subgroups, the DREB and ERF subgroups 0.137 Singh et al. have recently submitted a polynucle (Sakuma et al. (2002) Biochem and Biophy's Res Comm otide sequence (Accession No. CB686050) from a trans 290:998-1009), which are distinguished on the basis of genic Brassica napus (CBF 17) that has been shown to be specific residues in the AP2 DNA binding domain. constitutively frost resistant. The predicted polypeptide 0133) The binding characteristics of G867 (RAVI) have sequence has a DML motif that is 90% identical, and a B3 been characterized by Kagaya et al. (1999) Nucleic Acids domain that is 95% identical, to the DML motif and B3 Res. 27:470478; see below). There is no published informa domain of G867, respectively. The protein predicted from tion on the biological function of the RAV-like transcription this Sequence does not comprise an AP2 domain. factors. 0.138 Binding of G867 and G9 to bipartite recognition 0134) B3 domain transcription factors. Transcription fac sequences by the AP2 and B3 DNA-binding domains. tors of the ABI3/VP 1 family have been implicated in seed Kagaya et al. (1999) supra) cloned and characterized G867 maturation processes. AB13 (G621) plays an important role (RAV1) and G9 (RAV2) from Arabidopsis thaliana. The two in the acquisition of desiccation tolerance in late embryo transcription factors were found to contain two distinct genesis. This process is related to dehydration tolerance as amino acid Sequence domains found only in higher plant evidenced by the protective function of late embryogenesis species, the AP2 and B3 domains. The N-terminal regions of abundant (LEA) genes such as HVA1 (Xu et al. (1996) Plant G867 and G9 were shown to be homologous to the AP2 Physiol. 110:249-257; Sivamani et al. (2000) Plant Science DNA-binding domain present in the Arabidopsis 155: 1-9). Mutants for Arabidopsis ABI3 (Ooms et al. (1993) APETALA2 and tobacco EREBP proteins families, while US 2004/0098764 A1 May 20, 2004

the C-terminal region exhibited homology to the B3 domain tion of the AP2 and B3 domains in all members of the clade. of VP 1/ABI3 transcription factors. Binding site selection The proteins in the G867 clade were also found to possess assays using a recombinant glutathione S-transferase fusion a Subsequence with a high degree of conservation between protein revealed that G867 bound specifically to bipartite the AP2 and B3 domains. This subsequence was designated recognition Sequences composed of two unrelated motifs, DML motif. The DML motif does not appear to be present 5'-CAACA-3' and 5'-CACCTG-3', separated by various in transcription factors outside of the G867 clade (more spacings in two different relative orientations. Analyses detailed description of the DML motif appears below, and a using various deletion derivatives of the RAV1 fusion pro tein showed that the AP2 and B3-like domains of RAV1 bind list of DML motif sequences may be found in Table 2). autonomously to the CAACA and CACCTG motifs, respec 0140 Table 1 shows the polypeptides identified by tively, and together achieve a high affinity and Specificity of polypeptide SEQ ID NO and Mendel Gene ID (GID) No., binding. Kagaya et al. concluded that the AP2 and B3-like presented in order of similarity to G867 by AP2 domain, and domains of RAVI are connected by a highly flexible struc includes the AP2 and B3 binding domains of the polypeptide ture enabling the two domains to bind to the CAACA and in amino acid coordinates, the respective AP2 domain CACCTG motifs in various spacings and orientations. Sequences, the extent of identity in percentage terms to the 013:9) The RAV-like proteins, including G897, G9, G993 AP2 domain of G867, the respective B3 domains, and the and G1930, generally have both AP2 and B3 domains. extent of identity in percentage terms to the B3 domain of Within the G867 clade, there is a high degree of conserva G867.

TABLE 1. Gene families and binding domains AP2 and B3 % ID to % ID to SEO Domains in AP2 Domain B3 Domain ID NO: GID No. AA Coordinates AP2 Domain of G867 B3 Domain of G867 2 G867 AP2: 59-124 SSKYKGVVPOPNGRWG 100% LFEKAVTPSDVGKLNRLVIP 100% B3: 187-272 AOIYEKHORVWLGTFN KHHAEKHFPLPSSNVSVKGV EEDEAARAYDVAVHRF LLNFEDVNGKVWRFRYSYW RRRDAVTNFKDVKMDE NSSOSYVLTKGWSRFVKEK DE NLRAGDVV 6 G993 AP2: 69-134 SSKYKGVVPOPNGRWG 89% LFEKTVTPSDVGKLNRLVIP 79% B3: 194-286 AOIYEKHORVWLGTFN KOHAEKHFPLPAMTTAMG EEEEAASSYDAVRRFR MNPSPTKGVLINLEDRTGKV GRDAVTNFKSOVDGND WRFRYSYWNSSOSYVLTKG A. WSRFVKEKNLRAGDVV 42 BZ458719 AP2: 42-107 SSKFKGVVPOPNGRWG 87% LFEKTVTPSDVGKLNRLVIP 86% B3: 172-258 AOIYEKHKRVWLGTFN KHOAEKHFPLPLTGDVSVR EEEEAARVYDVAAHRF GTLLNFEDVNGKVWRFRYS RGSDAVTNFKPDTTFRN YWNSSOSYVLTKGWSRFVK G EKRLCAGDL 8 G1930 AP2: 59-124 SSRFKGVVPOPNGRWG 86% LFEKTVTPSDVGKLNRLVIP 87% B3: 182-269 AOIYEKHORVWLGTFN KHOAEKHFPLPLGNNNVSV EEDEAARAYDVAAHRF KGMLLNFEDVNGKVWRFR RGRDAVTNFKDTTFEEE YSYWNSSOSYVLTKGWSRF V VKEKRLCAGDL 36 G3391 AP2: 79-145 SSKFKGVVPOPNGRWG 84% LFDKTVTPSDVGKLNRLVIP 83% B3: 215-302 AOIYERHORVWLGTFA KOHAEKHFPLOLPSAGGESK GEDDAARAYDVAAORF GVLLNFEDAAGKVWRFRYS RGRDAVTNFRPLAEADP YWNSSOSYVLTKGWSRFVK DA EKGLHADGKL 46 BUO25988 AP2: 25-90 SSRYKGVVPOPNGRWG 83% LFOKTVTPSDVGKLNRLVIP 81% B3: 152-236 AOIYEKHORVWLGTFN KOHAEKHFPVOKGSNSKGV DEDEAAKAYDVAVORF LLHFEDKGSKVWRFRYSYW RGRDAVTNIKOVDADD NSSOSYVLTKGWSRFVKEK KE NLKAGDSV 22 G3452 AP2:51-116 SSKYKGVVPOPNGRWG 83% LFEKTVTPSDVGKLNRLVIP 78% B3: 171-266 AOIYEKHORVWLGTFN KOHAEKHFPLSGSGDESSPC EEDEAARAYDAALRFR VAGASAAKGMLLNFEDVGG GPDAVTNFKPPAASDDA KVWRFRYSYWNSSOSYVLT KGWSRFVKEKNLRAGDAV 24 G3453 AP2: 57-122 SSKYKGVVPOPNGRWG 83% LVEKTVTPSDVGKLNRLVIP 77% B3: 177-272 AOIYEKHORVWLGTFN KOHAEKHFPLSGSGGGALPC EEDEAVRAYDIVAHRFR MAAAAGAKGMLLNFEDVG GRDAVTNFKPLAGADD GKVWRFRYSYWNSSOSYVL A. TKGWSRFVKEKNLRAGDAV 38 G3.432 AP2: 75-141 SSRYKGVVPOPNGRWG 82% LFDKTVTPSDVGKLNRLVIP 82% B3: 212-299 AOIYERHORVWLGTFA KOHAEKHFPLOLPSAGGESK GEADAARAYDVAAORF GVLLNLEDAAGKVWRFRYS RGRDAVTNFRPLADADP YWNSSOSYVLTKGWSRFVK DA EKGLOAGDVV

US 2004/0098764 A1 May 20, 2004

a plant when the transcription factor binds to the regulating 0147 The invention also encompasses transcription fac region. As shown in Table 1, the AP2 and B3 domains of the tor polypeptides that comprise the DML motif, which, in the transcription factors within the G867 clade are at least 75% case of G867, is HSKSEIVDMLRKHTYNEELEQS (SEQ (for the AP2 domain) and 69% (for the B3 domain) identical ID NO:64), or a motif that has 71% or greater identity to the to the corresponding domains of G867, and all four of these DML motif of G867, and having substantially similar activ transcription factors, which rely on the binding Specificity of ity with that of SEQ ID NO:2. their conserved AP2 and B3 domains, have very similar or 0148 Identification of Motifs Unique to G867 Dicot identical functions in plants, conferring increased abiotic, Orthologs including osmotic, StreSS tolerance when overexpressed. 0149 Arabidopsis sequences thought to be paralogous or 0142. Therefore, the invention provides polynucleotides otherwise highly related evolutionarily to G867 were comprising: Arabidopsis SEQ ID NOS:1, 3, 5, 7, and frag aligned using Clustal X (version 1.81, June 2000). Addi ments thereof, and non-Arabidopsis Sequences SEQ ID tionally, by BLASTP analysis of proprietary and public NOs: 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, databases with protein Sequences of this Set, additional 39,41,43, 45,47,49, 51, paralogs, orthologs, equivalogs, and Sequences were identified with a high degree of Sequence fragments thereof. The invention also provides polypeptides relatedness to G867. A number of these Sequences, in and the polynucleotides that encode them, Said polypeptides addition to G867, are known to enhance abiotic stress comprising: Arabidopsis SEQ ID NOS:2, 4, 6, 8, and frag tolerance, including G9 (SEQ ID NO:3 and 4), G993 (SEQ ments thereof; and non-Arabidopsis SEQ ID NOS:10, 12, ID NO:5 and 6), and G1930 (SEQ ID NO:7 and 8). These 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, Sequences were then aligned again, and a neighbor-joining 46, 48, 50, 52, 53, paralogs, orthologs, equivalogs, and algorithm used to generate a phylogenetic tree, using Clustal fragments thereof. A number of these polynucleotides have X v1.81's phylogenetic capabilities. In this alignment, G867 and it paralogs G9, G993 and G1930 appeared in a clade been shown to have a strong association with osmotic StreSS along with two Soybean Sequences and Several rice tolerance, in that plants that overexpress these Sequences are Sequences. Based on the utility of the Arabidopsis more tolerant to these Stresses. The invention also encom Sequences, as noted below, and the evolutionary history passes a complement of the polynucleotides. The polynucle revealed by analysis of the phylogenetic tree (that the last otides are useful for Screening libraries of molecules or common ancestor of the monocots and the eudicots had only compounds for Specific binding and for creating transgenic one gene corresponding to transcription factors of the plants having increased osmotic StreSS tolerance. present invention, which functioned in abiotic StreSS toler ance), transcription factors of the G867 clade comprise a 0143 A number of the polynucleotides of the invention number of genes involved in the control of abiotic StreSS have been, and the remainder of the polynucleotides of the tolerance. invention may be, ectopically expressed in overexpressor or knockout plants and the changes in the characteristic(s) or 0150. Examination of the alignment of only those Sequences in the G867 clade (having monocot and dicot trait(s) of the plants observed. Therefore, the polynucle Subnodes), indicates 1) a high degree of conservation of the otides and polypeptides can be employed to improve the AP2 domains in all members of the clade, 2) a high degree characteristics of plants. of conservation of the B3 domains in all members of the 0144. The polynucleotides are particularly useful when clade; and 3) a high degree of conservation of an additional motif, the DML motif found between the AP2 and B3 they are hybridizable array elements in a microarray. Such a domains in all members of the clade:(H/R S K Xa E/G I/V microarray can be employed to monitor the expression of V D M L R K/R H TY Xa E/D/N E L/F Xa O/HS/N/R/G genes that are differentially expressed in response to osmotic (where Xa is any amino acid), constituting positions 135 Stresses. The microarray can be used in large Scale genetic 152 in G867, SEQ ID NO:64. As a conserved motif found or gene expression analysis of a large number of polynucle in G867 and its paralogs, the DML motif was used to otides, or in the diagnosis of OSmotic StreSS before pheno identify additional orthologs of SEQ ID 2. A significant typic Symptoms are evident. Furthermore, the microarray number of Sequences were found that had a minimum of 71 can be employed to investigate cellular responses, Such as % identity to the 22 residue DML motif of G867, a number cell proliferation, transformation, and the like. of these motifs are shown in Table 2. 0145 When the polynucleotides of the invention may 0151. Upon translation of these nucleotide sequences in a also be used as hybridizable array elements in a microarray, frame that provided the identified conserved motif, all the the array elements are organized in an ordered fashion So resulting protein Sequences were found to have either a that each element is present at a Specified location on the conserved AP2 domain before the DML motif, or a B3 Substrate. Because the array elements are at Specified loca domain after DML motif (i.e., in BUO24575, BQ405698, tions on the Substrate, the hybridization patterns and inten BF424857, BZA58719, AP002913, and AX654438). The Sities (which together create a unique expression profile) can protein Sequences having conserved AP2 and/or B3 domains be interpreted in terms of expression levels of particular in the expected location were aligned with the previously genes and can be correlated with a particular StreSS, pathol aligned Set of AP2 and B3 Sequences, and a neighbor-joining ogy, or treatment. algorithm was used to generate a phylogenetic tree, as described above. In this tree, the additional Sequences iden 0146 The invention also entails an agronomic composi tified through the DML motif all were found within the tion comprising a polynucleotide of the invention in con G867 clade identified previously, indicating that the DML junction with a Suitable carrier and a method for altering a motif was Successfully used to identify new orthologs of plant's trait using the composition. G867, listed in Table 2. US 2004/0098764 A1 May 20, 2004 17

TABLE 2 Representative Ortholog sequences identified using conservation to the DML motif Identity (%) SEO Identifier or with the DML ID NO: Accession No. Species Subsequence motif of G867 2 G867 Arabidopsis thaliana HSKSEIVDMLRKHTYNEELEOS 100% 28 G3455 Glycine max HSKSEIVDMLRKHTYNDELEOS 95% BUO24575 Helianthus annuus HSKSEIVDMLRKHTYNDELEOS 95% BO137035 Medicago truncatula HSKSEIVDMLRKHTYNDELEOS 95% AV412541 Lotus japonicus HSKSEIVDMLRKHTYNDELEOS 95% 8 G1930 Arabidopsis thaliana HSKSEIVDMLRKHTY KEELDOR 90% 18 G3451 Glycine max HSKPEIVDMLRKHTYNDELEOS 90% 42 BZ458719 Brassica oleracea HSKYEIVDMLRKHTY KEELEOR 90% BU871082 Populus balsamifera HSKAEIVDMLRKHTYNDELEOS 90% subsp. Trichocarpa BG524914 Stevia rebaudiana HSKAEIVDMLRKHTYNDELEOS 90% BO405698 Gossypium arboreun HSKAEIVDMLRKHTYNDELEOS 90% BF424857 Glycine max HSKPEIVDMLRKHTYNDELEOS 90% CB686OSO Brassica napus HSKSGIVDMLRKHTYSEELEOS 90% 46 BUO25988 Helianthus annuus HSESEIVDMLRKHTYNDELEOS 90% BQ855250 Lactuca Sativa HSKAEIVDMLRKHTYNDELOOS 86% BM8789O2 Ipomoea batatas HSKAEIVDMLRKHTYADELEOS 86% BG590382 Solanum tuberoSun HSKAEIVDMLRKHTYLDELEOS 86% BG124312 Lycopersicon esculentum HSKAEIVDMLRKHTYIDELEQS 86% 26 G3454 Glycine max HSKPEIVDMLRKHTYNDELEHS 86% 44 BO971511 Helianthus annuus HSKSEIVDMLRKHTYNDELEOS 86% SO CC616336 Zea mayS RSKAEVVDMLRKHTYGEELAHN 83% 4 G9 Arabidopsis thaliana HSKAEIVDMLRKHTYADELEON 81% 6 G993 Arabidopsis thaliana HSKAEIVDMLRKHTYADEFEOS 81% 22 G3452 Glycine max HSKFEIVDMLRKHTYDDELOOS 81% 24 G3453 Glycine max HSKSEIVDMLRREHTYDNELOOS 81% G3388 Oryza sativa (japonica HSKAEIVDMLRKHTYADELROG 80% APOO2913 cultivar-group) CAOO4137 Hordeum vulgare subsp. HSKAEIVDMLRKHTYDDELROG 80% vulgare 32 G3389 Oryza Sativa (japonica HSKAEVVDMLRKHTYDDELOOG 76% cultivar-group) 34 G3390 Oryza sativa (japonica RSKAEVVDMLRKHTYLEELTON 76% cultivar-group) 36 G3391 Oryza sativa (japonica RSKAEVVDMLRKHTYFDELAOS 76% cultivar-group) 40 G3433 Zea mayS RSKAEVVDMLRKHTYGEELAON 76% AX654438 Oryza Sativa HSKAEVVDMLRKHTYDDELOOG 76% 52 AAAAO100.0997 Oryza sativa RSKAEVVDMLRKHTYLEELTON 76% 48 BTOO931O Triticium aestivum RSKAEVVDMLRKHTYPDELAOY 75% 38 G34.32 Zea mayS RSKAEVVDMLRKHTYFDELAON 71%

0152 Producing Polypeptides polynucleotide encoding a transcription factor or transcrip tion factor homolog polypeptide is an endogenous or exog 0153. The polynucleotides of the invention include enous gene. Sequences that encode transcription factors and transcription 0154) A variety of methods exist for producing the poly factor homolog polypeptides and Sequences complementary nucleotides of the invention. Procedures for identifying and thereto, as well as unique fragrments of coding Sequence, or isolating DNA clones are well known to those of skill in the Sequence complementary thereto. Such polynucleotides can art, and are described in, for example, Berger and Kimmel, be, for example, DNA or RNA, the latter including mRNA, Guide to Molecular Cloning Techniques, Methods in Enzy cRNA, synthetic RNA, genomic DNA, cDNA synthetic mology, Vol. 152 Academic PreSS, Inc., San Diego, Calif. DNA, oligonucleotides, etc. The polynucleotides are either (“Berger'); Sambrook et al. Molecular Cloning A Labo double-Stranded or Single-Stranded, and include either, or ratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor both Sense (i.e., coding) sequences and antisense (i.e., non Laboratory, Cold Spring Harbor, N.Y., 1989 (“Sambrook”) coding, complementary) sequences. The polynucleotides and Current Protocols in Molecular Biology, Ausubel et al. include the coding Sequence of a transcription factor, or eds., Current Protocols, a joint Venture between Greene transcription factor homolog polypeptide, in isolation, in Publishing Associates, Inc. and John Wiley & Sons, Inc., combination with additional coding sequences (e.g., a puri (supplemented through 2000) ("Ausubel”). fication tag, a localization Signal, as a fusion-protein, as a O155 Alternatively, polynucleotides of the invention, can pre-protein, or the like), in combination with non-coding be produced by a variety of in vitro amplification methods Sequences (for example, introns or inteins, regulatory ele adapted to the present invention by appropriate Selection of ments Such as promoters, enhancers, terminators, and the Specific or degenerate primers. Examples of protocols Suf like), and/or in a vector or host environment in which the ficient to direct perSons of skill through in Vitro amplifica US 2004/0098764 A1 May 20, 2004 tion methods, including the polymerase chain reaction addition, homologous Sequences may be derived from plants (PCR) the ligase chain reaction (LCR), Qbeta-replicase that are evolutionarily related to crop plants, but which may amplification and other RNA polymerase mediated tech not have yet been used as crop plants. Examples include niques (for example, NASBA), e.g., for the production of the deadly nightshade (Atropa belladona), related to tomato; homologous nucleic acids of the invention are found in jimson weed (Datura Strommium), related to peyote, and Berger (Supra), Sambrook (Supra), and Ausubel (Supra), as teosinte (Zea species), related to corn (maize). well as Mullis et al. (1987) PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press 0159) Orthologs and Paralogs Inc. San Diego, Calif. (1990) (Innis). Improved methods for 0160 Homologous sequences as described above can cloning in Vitro amplified nucleic acids are described in comprise orthologous or paralogous Sequences. Several dif Wallace et al. U.S. Pat. No. 5,426,039. Improved methods ferent methods are known by those of skill in the art for for amplifying large nucleic acids by PCR are Summarized identifying and defining these functionally homologous in Cheng et al. (1994) Nature 369:684-685 and the refer Sequences. Three general methods for defining orthologs ences cited therein, in which PCR amplicons of up to 40 kb and paralogs are described; an ortholog, paralog or homolog are generated. One of skill will appreciate that essentially may be identified by one or more of the methods described any RNA can be converted into a double stranded DNA below. Suitable for restriction digestion, PCR expansion and 0.161 Orthologs and paralogs are evolutionarily related Sequencing using reverse transcriptase and a polymerase. genes that have similar Sequence and Similar functions. See, e.g., AuSubel, Sambrook and Berger, all Supra. Orthologs are structurally related genes in different Species 0156 Alternatively, polynucleotides and oligonucle that are derived by a speciation event. Paralogs are struc otides of the invention can be assembled from fragments turally related genes within a Single Species that are derived produced by Solid-phase Synthesis methods. Typically, frag by a duplication event. ments of up to approximately 100 bases are individually Synthesized and then enzymatically or chemically ligated to 0162. Within a single plant species, gene duplication may produce a desired Sequence, e.g., a polynucleotide encoding cause two copies of a particular gene, giving rise to two or more genes with Similar Sequence and often similar function all or part of a transcription factor. For example, chemical known as paralogs. A paralog is therefore a Similar gene Synthesis using the phosphoramidite method is described, formed by duplication within the same species. Paralogs e.g., by Beaucage et al. (1981) Tetrahedron Letters 22:1859 typically cluster together or in the same lade (a group of 1869; and Matthes et al. (1984) EMBO J. 3:801-805. Similar genes) when a gene family phylogeny is analyzed According to Such methods, oligonucleotides are synthe using programs such as CLUSTA1. (Thompson et al. (1994) sized, purified, annealed to their complementary Strand, Nucleic Acids Res. 22:4673-4680; Higgins et al. (1996) ligated and then optionally cloned into Suitable vectors. And Methods Enzymol. 266:383402). Groups of similar genes if So desired, the polynucleotides and polypeptides of the can also be identified with pair-wise BLAST analysis (Feng invention can be custom ordered from any of a number of and Doolittle (1987) J. Mol. Evol. 25:351-360). For commercial Suppliers. example, a clade of very Similar MADS domain transcrip O157 Homologous Sequences tion factors from Arabidopsis all share a common function 0158 Sequences homologous, i.e., that share significant in flowering time (Ratcliffe et al. (2001) Plant Physiol. Sequence identity or Similarity, to those provided in the 126:122-132), and a group of very similar AP2 domain Sequence Listing, derived from Arabidopsis thaliana or transcription factors from Arabidopsis are involved in tol from other plants of choice, are also an aspect of the erance of plants to freezing (Gilmour et al. (1998) Plant J. invention. Homologous Sequences can be derived from any 16:433-442). Analysis of groups of similar genes with plant including monocots and dicots and in particular agri similar function that fall within one lade can yield Sub culturally important plant species, including but not limited Sequences that are particular to the clade. These Sub-Se to, crops such as Soybean, wheat, corn (maize), potato, quences, known as consensus Sequences, can not only be cotton, rice, rape, oilseed rape (including canola), Sunflower, used to define the Sequences within each lade, but define the alfalfa, clover, Sugarcane, and turf; or fruits or fruit trees, functions of these genes, genes within a lade may contain vegetables Such as banana, blackberry, blueberry, Straw paralogous Sequences, or orthologous Sequences that share berry, and raspberry, cantaloupe, carrot, cauliflower, coffee, the same function (See also, for example, Mount (2001), in cucumber, eggplant, grapes, honeydew, lettuce, mango, Bioinformatics. Sequence and Genome Analysis Cold Spring melon, Onion, papaya, peas, peppers, pineapple, pumpkin, Harbor Laboratory Press, Cold Spring Harbor, N.Y., page Spinach, Squash, Sweet corn, tobacco, tomato, tomatillo, 543.) watermelon, rosaceous fruits (such as apple, peach, pear, 0163 Speciation, the production of new species from a cherry and plum) and brassicas (Such as broccoli, cabbage, parental Species, can also give rise to two or more genes with cauliflower, Brussels sprouts, and kohlrabi). Other crops, Similar Sequence and Similar function. These genes, termed including fruits and vegetables, whose phenotype can be orthologs, often have an identical function within their host changed and which comprise homologous Sequences plants and are often interchangeable between Species with include barley, rye, millet, Sorghum, currant, avocado, citrus out losing function. Because plants have common ancestors, fruits Such as oranges, lemons, grapefruit and tangerines, many genes in any plant Species will have a corresponding artichoke, cherries, nuts Such as the walnut and peanut, orthologous gene in another plant species. Once a phylo endive, leek; roots Such as arrowroot, beet, cassava, turnip, genic tree for a gene family of one species has been radish, yam, and Sweet potato, and beans. The homologous constructed using a program Such as CLUSTA1. (Thompson Sequences may also be derived from Woody Species, Such et al. (1994) Nucleic Acids Res. 22:4673-4680; Higgins et pine, poplar and eucalyptus, or mint or other labiates. In al. (1996) Supra) potential orthologous sequences can be US 2004/0098764 A1 May 20, 2004

placed into the phylogenetic tree and their relationship to specific transactivation of the ABA-inducible wheat, Arabi genes from the Species of interest can be determined. dopsis, bean, and barley promoters. These results demon Orthologous Sequences can also be identified by a reciprocal Strate that Sequentially Similar ABI5 transcription factors are BLAST Strategy. Once an orthologous Sequence has been key targets of a conserved ABA signaling pathway in diverse identified, the function of the ortholog can be deduced from plants. (Gampala et al. (2001) J. Biol. Chem. 277:1689 the identified function of the reference Sequence. 1694). 0164 Transcription factor gene Sequences are conserved 0170 (4) Sequences of three Arabidopsis GAMYB-like across diverse eukaryotic species lines (Goodrich et al. genes were obtained on the basis of Sequence Similarity to (1993) Cell 75:519-530; Lin et al. (1991) Nature 353:569 GAMYB genes from barley, rice, and L. temulentum. These 571; Sadowski et al. (1988) Nature 335:563-564). Plants are three Arabadopsis genes were determined to encode tran no exception to this observation; diverse plant Species scription factors (AtMYB33, AtMYB65, and AtMYB101) possess transcription factors that have Similar Sequences and and could substitute for a barley GAMYB and control functions. alpha-amylase expression (Gocal et al. (2001) Plant Physiol. 0.165. Orthologous genes from different organisms have 127:1682-1693). highly conserved functions, and very often essentially iden 0171 (5) The floral control gene LEAFY from Arabidop tical functions (Lee et al. (2002) Genome Res. 12:493-502; sis can dramatically accelerate flowering in numerous dic Remm et al. (2001) J. Mol. Biol. 314:1041-1052). Paralo toyledonous plants. Constitutive expression of Arabidopsis gous genes, which have diverged through gene duplication, LEAFY also caused early flowering in transgenic rice (a may retain Similar functions of the encoded proteins. In Such monocot), with a heading date that was 26-34 days earlier cases, paralogs can be used interchangeably with respect to than that of wild-type plants. These observations indicate certain embodiments of the instant invention (for example, that floral regulatory genes from Arabidopsis are useful tools transgenic expression of a coding sequence). An example of for heading date improvement in cereal crops (He et al. such highly related paralogs is the CBF family, with four (2000) Transgenic Res. 9:223-227). well-defined members in Arabidopsis (SEQID NOS:54, 56, 0172 (6) Bioactive gibberellins (GAS) are essential 58, and GenBank accession number AB015478) and at least endogenous regulators of plant growth. GA Signaling tends one ortholog in Brassica napus, (SEQ ID NO:60), all of to be conserved acroSS the plant kingdom. GA Signaling is which control pathways involved in both freezing and mediated via GAI, a nuclear member of the GRAS family of drought stress (Gilmour et al. (1998) Plant J. 16:433-442; plant transcription factorS. Arabidopsis GAI has been shown Jaglo et al. (1998) Plant Physiol. 127:910-917). to function in rice to inhibit gibberellin response pathways 0166 The following references represent a small Sam (Fu et al. (2001) Plant Cell 13:1791-1802). pling of the many Studies that demonstrate that conserved 0173 (7) The Arabidopsis gene SUPERMAN (SUP), transcription factor genes from diverse species are likely to encodes a putative transcription factor that maintains the function similarly (i.e., regulate Similar target Sequences and boundary between Stamens and carpels. By over-expressing control the same traits), and that transcription factors may be Arabidopsis SUP in rice, the effect of the gene's presence on transformed into diverse Species to confer or improve traits. whorl boundaries was shown to be conserved. This demon 0167 (1) The Arabidopsis NPRI gene regulates systemic strated that SUP is a conserved regulator of floral whorl acquired resistance (SAR) (Cao et al. (1997) Cell 88:57-63); boundaries and affects cell proliferation (Nandi et al. (2000) over-expression of NPR1 leads to enhanced resistance in Curr. Biol. 10:215-218). Arabidopsis. When either Arabidopsis NPR1 or the rice 0174 (8) Maize, petunia and Arabidopsis myb transcrip NPR1 ortholog was overexpressed in rice (which, as a tion factors that regulate flavonoid biosynthesis are very monocot, is diverse from Arabidopsis), challenge with the genetically similar and affect the same trait in their native rice bacterial blight pathogen Xanthomonas Oryzae pv. Species, therefore Sequence and function of these myb Oryzae, the transgenmc plants displayed enhanced resis transcription factors correlate with each other in these tance (Chern et al. (2001) Plant J. 27:101-113). NPR1 acts diverse species (Borevitz et al. (2000) Plant Cell 12:2383 through activation of expression of transcription factor 2394). genes, such as TGA2 (Fan and Dong (2002) Plant Cell 0175 (9) Wheat reduced height-I (Rht-B I/Rht-D 1) and 14:1377-1389). maize dwarf-8 (d8) genes are orthologs of the Arabidopsis 0168 (2) E2F genes are involved in transcription of plant gibberellin insensitive (GAI) gene. Both of these genes have genes for proliferating cell nuclear antigen (PCNA). Plant been used to produce dwarf grain varieties that have E2FS share a high degree of Similarity in amino acid improved grain yield. These genes encode proteins that Sequence between monocots and dicots, and are even Similar resemble nuclear transcription factors and contain an SH2 to the conserved domains of the animal E2Fs. Such conser like domain, indicating that phosphotyrosine may participate Vation indicates a functional Similarity between plant and in gibberellin Signaling. Transgenic rice plants containing a animal E2FS. E2F transcription factors that regulate mer mutant GAI allele from Arabidopsis have been shown to istem development act through common cis-elements, and produce reduced responses to gibberellin and are dwarfed, regulate related (PCNA) genes (Kosugi and Ohashi, (2002) indicating that mutant GAI orthologs could be used to Plant J. 29:45-59). increase yield in a wide range of crop species (Peng et al. 0169 (3) The ABI5 gene (ABA insensitive 5) encodes a (1999) Nature 400:256-261). basic leucine Zipper factor required for ABA response in the 0176 Transcription factors that are homologous to the Seed and vegetative tissues. Co-transformation experiments listed sequences will typically share at least about 75% and with ABI5 cl)NA constructs in rice protoplasts resulted in 69% amino acid sequence identity in the AP2 and B3 US 2004/0098764 A1 May 20, 2004 20 domains, respectively. More closely related transcription (see Shpaer (1997) Methods Mol. Biol. 70:173-187). Also, factors can share at least about 79% or about 90% or about the GAP program using the Needleman and Wunsch align 95% or about 98% or more sequence identity with the listed ment method can be utilized to align Sequences. An alter Sequences, or with the listed Sequences but excluding or native search strategy uses MPSRCH software, which runs outside a known consensus Sequence or consensus DNA on a MASPAR computer. MPSRCH uses a Smith-Waterman binding site, or with the listed Sequences excluding one or all algorithm to Score Sequences on a massively parallel com conserved domains. Factors that are most closely related to puter. This approach improves ability to pick up distantly the listed Sequences share, e.g., at least about 85%, about related matches, and is especially tolerant of Small gaps and 90% or about 95% or more % sequence identity to the listed nucleotide Sequence errors. Nucleic acid-encoded amino Sequences, or to the listed Sequences but excluding or acid Sequences can be used to Search both protein and DNA outside a known consensus Sequence or consensus DNA databases. binding Site or outside one or all conserved domain. At the nucleotide level, the Sequences will typically share at least 0179 The percentage similarity between two polypeptide about 40% nucleotide Sequence identity, preferably at least Sequences, e.g., Sequence A and Sequence B, is calculated by about 50%, about 60%, about 70% or about 80% sequence dividing the length of Sequence A, minus the number of gap identity, and more preferably about 85%, about 90%, about residues in Sequence A, minus the number of gap residues in 95% or about 97% or more sequence identity to one or more Sequence B, into the Sum of the residue matches between of the listed Sequences, or to a listed Sequence but excluding Sequence A and Sequence B, times one hundred. Gaps of low or Outside a known consensus Sequence or consensus DNA or of no similarity between the two amino acid Sequences are binding Site, or outside one or all conserved domain. The not included in determining percentage Similarity. Percent degeneracy of the genetic code enables major variations in identity between polynucleotide Sequences can also be the nucleotide Sequence of a polynucleotide while maintain counted or calculated by other methods known in the art, ing the amino acid Sequence of the encoded protein. AP2 e.g., the Jotun Hein method. (See, e.g., Hein (1990) Methods domains within the AP2 transcription factor family may Enzymol. 183:626-645.) Identity between sequences can exhibit a higher degree of Sequence homology, Such as at also be determined by other methods known in the art, e.g., least 77% amino acid Sequence identity including conser by varying hybridization conditions (see U.S. patent appli vative substitutions, and preferably at least 80% sequence cation Ser. No. 20010010913). identity, and more preferably at least 85%, or at least about 0180 Thus, the invention provides methods for identify 86%, or at least about 87%, or at least about 88%, or at least ing a Sequence Similar or paralogous or orthologous or about 90%, or at least about 95%, or at least about 98% homologous to one or more polynucleotides as noted herein, Sequence identity. Transcription factors that are homologous or one or more target polypeptides encoded by the poly to the listed Sequences should share at least 30%, or at least nucleotides, or otherwise noted herein and may include about 60%, or at least about 75%, or at least about 80%, or linking or associating a given plant phenotype or gene at least about 90%, or at least about 95% amino acid function with a sequence. In the methods, a Sequence Sequence identity over the entire length of the polypeptide or database is provided (locally or across an internet or intra the homolog. net) and a query is made against the Sequence database using 0177 Percent identity can be determined electronically, the relevant Sequences herein and associated plant pheno e.g., by using the MEGALIGN program (DNASTAR, Inc. types or gene functions. Madison, Wis.). The MEGALIGN program can create align ments between two or more Sequences according to different 0181. In addition, one or more polynucleotide sequences methods, for example, the clustal method. (See, for example, or one or more polypeptides encoded by the polynucleotide Higgins and Sharp (1988) Gene 73:237-244.) The clustal Sequences may be used to Search against a BLOCKS (Bai algorithm groups Sequences into clusters by examining the roch et al. (1997) Nucleic Acids Res. 25:217-221), PFAM, distances between all pairs. The clusters are aligned pairwise and other databaseS which contain previously identified and and then in groups. Other alignment algorithms or programs annotated motifs, Sequences and gene functions. Methods may be used, including FASTA, BLAST, or ENTREZ, that Search for primary Sequence patterns with Secondary FASTA and BLAST, and which may be used to calculate structure gap penalties (Smith et al. (1992) Protein Engi percent similarity. These are available as a part of the GCG neering 5:35-51) as well as algorithms such as Basic Local Sequence analysis package (University of Wisconsin, Madi Alignment Search Tool (BLAST: Altschul (1993) J. Mol. Son, Wis.), and can be used with or without default Settings. Evol. 36:290-300; Altschul et al. (1990) supra), BLOCKS ENTREZ is available through the National Center for Bio (Henikoff and Henikoff (1991) Nucleic Acids Res. 19:6565 technology Information. In one embodiment, the percent 6572), Hidden Markov Models (HMM; Eddy (1996) Curr. identity of two sequences can be determined by the GCG Opin. Str. Biol. 6:361-365; Sonnhammer et al. (1997) Pro teins 28:405-420), and the like, can be used to manipulate program with a gap Weight of 1, e.g., each amino acid gap and analyze polynucleotide and polypeptide Sequences is weighted as if it were a single amino acid or nucleotide encoded by polynucleotides. These databases, algorithms mismatch between the two sequences (see U.S. Pat. No. and other methods are well known in the art and are 6,262,333). described in Ausubel et al. (1997; Short Protocols in 0178. Other techniques for alignment are described in Molecular Biology, John Wiley & Sons, New York, N.Y., Methods in Enzymology, vol. 266, Computer Methods for unit 7.7) and in Meyers (1995; Molecular Biology and Macromolecular Sequence Analysis (1996), ed. Doolittle, Biotechnology, Wiley VCH, New York, N.Y., p. 856-853). Academic Press, Inc., San Diego, Calif., USA. Preferably, an alignment program that permits gaps in the Sequence is 0182. A further method for identifying or confirming that utilized to align the Sequences. The Smith-Waterman is one Specific homologous Sequences control the same function is type of algorithm that permits gaps in Sequence alignments by comparison of the transcript profile(s) obtained upon US 2004/0098764 A1 May 20, 2004

overexpression or knockout of two or more related tran higher the Stringency, the more Similar are the two poly Scription factors. Since transcript profiles are diagnostic for nucleotide Strands. Stringency is influenced by a variety of Specific cellular States, one skilled in the art will appreciate factors, including temperature, Salt concentration and com that genes that have a highly similar transcript profile (e.g., position, organic and non-organic additives, Solvents, etc. with greater than 50% regulated transcripts in common, present in both the hybridization and wash solutions and more preferably with greater than 70% regulated transcripts incubations (and number thereof), as described in more in common, most preferably with greater than 90% regu detail in the references cited above. lated transcripts in common) will have highly similar func 0187 Encompassed by the invention are polynucleotide tions. Fowler et al. (2002, Plant Cell, 14:1675-79) have Sequences that are capable of hybridizing to the claimed shown that three paralogous AP2 family genes (CBF1, polynucleotide Sequences, including any of the transcription CBF2 and CBF3), each of which is induced upon cold factor polynucleotides within the Sequence Listing, and treatment, and each of which can condition improved freez fragments thereof under various conditions of Stringency ing tolerance, have highly Similar transcript profiles. Once a (See, for example, Wahl and Berger (1987) Methods Enzy transcription factor has been shown to provide a specific mol. 152:399407; and Kimmel (1987) Methods Enzymol. function, its transcript profile becomes a diagnostic tool to 152:507-511). In addition to the nucleotide sequences in the determine whether putative paralogs or orthologs have the Sequence Listing, full length cDNA, orthologs, and paralogs Same function. of the present nucleotide Sequences may be identified and 0183. Furthermore, methods using manual alignment of isolated using well-known methods. The cDNA libraries, Sequences similar or homologous to one or more polynucle orthologs, and paralogs of the present nucleotide Sequences otide Sequences or one or more polypeptides encoded by the may be Screened using hybridization methods to determine polynucleotide Sequences may be used to identify regions of their utility as hybridization target or amplification probes. similarity and AP2 binding domains. Such manual methods 0188 With regard to hybridization, conditions that are are well-known of those of skill in the art and can include, highly Stringent, and means for achieving them, are well for example, comparisons of tertiary Structure between a known in the art. See, for example, Sambrook et al. (1989) polypeptide Sequence encoded by a polynucleotide which “Molecular Cloning.A Laboratory Manual'(2nd ed., Cold comprises a known function with a polypeptide Sequence Spring Harbor Laboratory); Berger and Kimmel, eds., encoded by a polynucleotide Sequence which has a function (1987) “Guide to Molecular Cloning Techniques”, In Meth not yet determined. Such examples of tertiary Structure may Ods in Enzymology: 152:467469; and Anderson and Young comprise predicted alpha helices, beta-sheets, amphipathic (1985) “Quantitative Filter Hybridisation.” In: Hames and helices, leucine Zipper motifs, Zinc finger motifs, proline Higgins, ed., Nucleic Acid Hybridisation, A Practical rich regions, cysteine repeat motifs, and the like. Approach. Oxford, IRL Press, 73-111. 0184 Orthologs and paralogs of presently disclosed tran 0189 Stability of DNA duplexes is affected by such Scription factors may be cloned using compositions provided factors as base composition, length, and degree of base pair by the present invention according to methods well known mismatch. Hybridization conditions may be adjusted to in the art. cDNAs can be cloned using mRNA from a plant allow DNAS of different sequence relatedness to hybridize. cell or tissue that expresses one of the present transcription The melting temperature (T) is defined as the temperature factors. Appropriate mRNA sources may be identified by when 50% of the duplex molecules have dissociated into interrogating Northern blots with probes designed from the their constituent Single Strands. The melting temperature of present transcription factor Sequences, after which a library a perfectly matched duplex, where the hybridization buffer is prepared from the mRNA obtained from a positive cell or contains formamide as a denaturing agent, may be estimated tissue. Transcription factor-encoding cDNA is then isolated by the following equations: using, for example, PCR, using primerS designed from a presently disclosed transcription factor gene Sequence, or by 0190 (I) DNA-DNA: probing with a partial or complete cDNA or with one or T( C.)=81.5+16.6(log Na+I)+0.4 1 (% G+C)- more Sets of degenerate probes based on the disclosed 0.62(% formamide)-31 500/L sequences. The cDNA library may be used to transform plant cells. Expression of the cDNAs of interest is detected 0191 (II) DNA-RNA: using, for example, methods disclosed herein Such as microarrays, Northern blots, quantitative PCR, or any other technique for monitoring changes in expression. Genomic 0192 (III) RNA-RNA: clones may be isolated using Similar techniques to those. 0185. Identifyring Polynucleotides or Nucleic Acids by Hybridization 0193 where L is the length of the duplex formed, Na+ 0186 Polynucleotides homologous to the sequences is the molar concentration of the sodium ion in the hybrid illustrated in the Sequence Listing and tables can be iden ization or washing Solution, and % G+C is the percentage of tified, e.g., by hybridization to each other under Stringent or (guanine-cytosine) bases in the hybrid. For imperfectly under highly Stringent conditions. Single Stranded poly matched hybrids, approximately 1 C. is required to reduce nucleotides hybridize when they associate based on a variety the melting temperature for each 1% mismatch. of well characterized physical-chemical forces, Such as 0194 Hybridization experiments are generally conducted hydrogen bonding, Solvent exclusion, base Stacking and the in a buffer of pH between 6.8 to 7.4, although the rate of like. The Stringency of a hybridization reflects the degree of hybridization is nearly independent of pH at ionic Strengths Sequence identity of the nucleic acids involved, Such that the likely to be used in the hybridization buffer (Anderson et al. US 2004/0098764 A1 May 20, 2004 22

(1985) supra). In addition, one or more of the following may formamide, whereas high Stringency hybridization may be be used to reduce non-specific hybridization: Sonicated obtained in the presence of at least about 35% formamide, salmon sperm DNA or another non-complementary DNA, and more preferably at least about 50% formamide. Strin bovine Serum albumin, Sodium pyrophosphate, Sodium gent temperature conditions will ordinarily include tempera dodecylsulfate (SDS), polyvinyl-pyrrolidone, ficoll and tures of at least about 30° C., more preferably of at least Denhardt's Solution. Dextran Sulfate and polyethylene gly about 37 C., and most preferably of at least about 42 C. col 6000 act to exclude DNA from solution, thus raising the with formamide present. Varying additional parameters, effective probe DNA concentration and the hybridization Such as hybridization time, the concentration of detergent, Signal within a given unit of time. In Some instances, e.g., Sodium dodecyl Sulfate (SDS) and ionic strength, are conditions of even greater Stringency may be desirable or well known to those skilled in the art. Various levels of required to reduce non-specific and/or background hybrid Stringency are accomplished by combining these various ization. These conditions may be created with the use of conditions as needed. higher temperature, lower ionic Strength and higher concen 0198 The washing steps that follow hybridization may tration of a denaturing agent Such as formamide. also vary in Stringency; the post-hybridization wash Steps 0.195 Stringency conditions can be adjusted to screen for primarily determine hybridization Specificity, with the most moderately similar fragments Such as homologous critical factors being temperature and the ionic Strength of Sequences from distantly related organisms, or to highly the final wash Solution. Wash Stringency can be increased by Similar fragments Such as genes that duplicate functional decreasing Salt concentration or by increasing temperature. enzymes from closely related organisms. The Stringency can Stringent Salt concentration for the wash Steps will prefer be adjusted either during the hybridization Step or in the ably be less than about 30 mM NaCl and 3 mM trisodium post-hybridization washes. Salt concentration, formamide citrate, and most preferably less than about 15 mM NaCl and concentration, hybridization temperature and probe lengths 1.5 mM trisodium citrate. are variables that can be used to alter Stringency (as described by the formula above). As a general guidelines 0199 Thus, hybridization and wash conditions that may high Stringency is typically performed at T-5 C. to be used to bind and remove polynucleotides with less than T-20 C., moderate stringency at T-20 C. to T-35 C. the desired homology to the nucleic acid Sequences or their and low stringency at T-35 C. to T-50° C. for duplex complements that encode the present transcription factors >150 base pairs. Hybridization may be performed at low to include, for example: moderate stringency (25-50° C. below T), followed by 0200) 6xSSC at 65° C.; post-hybridization washes at increasing Stringencies. Maxi mum rates of hybridization in Solution are determined 0201 50% formamide, 4xSSC at 42° C.; or empirically to occur at T-25 C. for DNA-DNA duplex 0202 0.5xSSC, 0.1% SDS at 65° C.; and T-15 C. for RNA-DNA duplex. Optionally, the degree of dissociation may be assessed after each wash Step 0203 with, for example, two wash steps of 10-30 minutes to determine the need for Subsequent, higher Stringency each. Useful variations on these conditions will be readily wash Steps. apparent to those skilled in the art. 0204. A person of skill in the art would not expect 0196) High stringency conditions may be used to select Substantial variation among polynucleotide species encom for nucleic acid Sequences with high degrees of identity to passed within the Scope of the present invention because the the disclosed Sequences. An example of Stringent hybrid highly Stringent conditions Set forth in the above formulae ization conditions obtained in a filterbased method Such as yield Structurally similar polynucleotides. a Southern or northern blot for hybridization of complemen tary nucleic acids that have more than 100 complementary 0205 If desired, one may employ wash steps of even residues is about 5 C. to 20 C. lower than the thermal greater stringency, including about 0.2xSSC, 0.1% SDS at melting point (T) for the specific Sequence at a defined 65 C. and washing twice, each wash step being about 30 ionic strength and pH. Conditions used for hybridization min, or about 0.1XSSC, 0.1% SDS at 65° C. and washing may include about 0.02 M to about 0.15 M Sodium chloride, twice for 30 min. The temperature for the wash solutions about 0.5% to about 5% casein, about 0.02% SDS or about will ordinarily be at least about 25 C., and for greater 0.1% N-laurylsarcosine, about 0.001 M to about 0.03 M Stringency at least about 42 C. Hybridization Stringency Sodium citrate, at hybridization temperatures between about may be increased further by using the Same conditions as in 50° C. and about 70° C. More preferably, high stringency the hybridization Steps, with the wash temperature raised conditions are about 0.02 M Sodium chloride, about 0.5% about 3 C. to about 5 C., and stringency may be increased casein, about 0.02% SDS, about 0.001 M Sodium citrate, at even further by using the same conditions except the wash a temperature of about 50 C. Nucleic acid molecules that temperature is raised about 6 C. to about 9 C. For hybridize under stringent conditions will typically hybridize identification of less closely related homologs, wash Steps to a probe based on either the entire DNA molecule or may be performed at a lower temperature, e.g., 50 C. Selected portions, e.g., to a unique Subsequence, of the DNA. 0206. An example of a low stringency wash step employs 0.197 Stringent salt concentration will ordinarily be less a solution and conditions of at least 25 C. in 30 mM NaCl, than about 750 mM NaCl and 75 mM trisodium citrate. 3 mM trisodium citrate, and 0.1% SDS over 30 min. Greater Increasingly Stringent conditions may be obtained with leSS stringency may be obtained at 42 C. in 15 mM NaCl, with than about 500 mM NaCl and 50 mM trisodium citrate, to 1.5 mM trisodium citrate, and 0.1% SDS over 30 min. Even even greater stringency with less than about 250 mM NaCl higher Stringency wash conditions are obtained at 65 C. and 25 mM trisodium citrate. Low stringency hybridization -68 C. in a solution of 15 mM NaCl, 1.5 mM trisodium can be obtained in the absence of organic Solvent, e.g., citrate, and 0.1% SDS. Wash procedures will generally US 2004/0098764 A1 May 20, 2004

employ at least two final wash Steps. Additional variations transcription factor homologs, using the methods described on these conditions will be readily apparent to those skilled above. The selected cDNAs can be confirmed by sequencing in the art (see, for example, U.S. patent application Ser. No. and enzymatic activity. 20010010913). 0211 Sequence Variations 0207 Stringency conditions can be selected such that an 0212. It will readily be appreciated by those of skill in the oligonucleotide that is perfectly complementary to the cod art, that any of a variety of polynucleotide Sequences are ing oligonucleotide hybridizes to the coding oligonucleotide capable of encoding the transcription factors and transcrip with at least about a 5-10x higher Signal to noise ratio than tion factor homolog polypeptides of the invention. Due to the ratio for hybridization of the perfectly complementary the degeneracy of the genetic code, many different poly oligonucleotide to a nucleic acid encoding a transcription nucleotides can encode identical and/or Substantially similar factor known as of the filing date of the application. It may polypeptides in addition to those Sequences illustrated in the be desirable to Select conditions for a particular assay Such Sequence Listing. Nucleic acids having a Sequence that that a higher Signal to noise ratio, that is, about 15x or more, differs from the Sequences shown in the Sequence Listing, or is obtained. Accordingly, a Subject nucleic acid will hybrid complementary Sequences, that encode functionally equiva ize to a unique coding oligonucleotide with at least a 2x or lent peptides (i.e., peptides having Some degree of equiva greater Signal to noise ratio as compared to hybridization of lent or similar biological activity) but differ in Sequence the coding oligonucleotide to a nucleic acid encoding known from the Sequence shown in the Sequence Listing due to polypeptide. The particular Signal will depend on the label degeneracy in the genetic code, are also within the Scope of used in the relevant assay, e.g., a fluorescent label, a calo the invention. rimetric label, a radioactive label, or the like. Labeled 0213 Altered polynucleotide sequences encoding hybridization or PCR probes for detecting related polynucle polypeptides include those Sequences with deletions, inser otide Sequences may be produced by oligolabeling, nick tions, or Substitutions of different nucleotides, resulting in a translation, end-labeling, or PCR amplification using a polynucleotide encoding a polypeptide with at least one labeled nucleotide. functional characteristic of the instant polypeptides. Included within this definition are polymorphisms which 0208 Encompassed by the invention are polynucleotide may or may not be readily detectable using a particular Sequences that are capable of hybridizing to the claimed oligonucleotide probe of the polynucleotide encoding the polynucleotide Sequences, and, in particular, to those shown instant polypeptides, and improper or unexpected hybrid in SEQ ID NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, ization to allelic variants, with a locus other than the normal 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, and chromosomal locus for the polynucleotide Sequence encod fragments thereof under various conditions of Stringency. ing the instant polypeptides. (See, e.g., Wahl and Berger (1987) Methods Enzymol. 152:399407; Kimmel (1987) Methods Enzymol. 152:507 0214) Allelic variant refers to any of two or more alter 511). Estimates of homology are provided by either DNA native forms of a gene occupying the Same chromosomal DNA or DNA-RNA hybridization under conditions of strin locus. Allelic variation arises naturally through mutation, gency as is well understood by those skilled in the art and may result in phenotypic polymorphism within popu (Hames and Higgins, Eds. (1985) Nucleic Acid Hybridisa lations. Gene mutations can be silent (i.e., no change in the tion, IRL Press, Oxford, U.K.). Stringency conditions can be encoded polypeptide) or may encode polypeptides having adjusted to Screen for moderately Similar fragments, Such as altered amino acid Sequence. The term allelic variant is also homologous Sequences from distantly related organisms, to used herein to denote a protein encoded by an allelic variant highly similar fragments, Such as genes that duplicate func of a gene. Splice variant refers to alternative forms of RNA tional enzymes from closely related organisms. Post-hybrid transcribed from a gene. Splice variation arises naturally ization washes determine Stringency conditions. through use of alternative splicing sites within a transcribed RNA molecule, or leSS commonly between Separately tran 0209) Identifying Polynucleotides or Nucleic Acids with scribed RNA molecules, and may result in several mRNAS Expression Libraries transcribed from the same gene. Splice variants may encode 0210. In addition to hybridization methods, transcription polypeptides having altered amino acid Sequence. The term factor homolog polypeptides can be obtained by Screening Splice variant is also used herein to denote a protein encoded an expression library using antibodies specific for one or by a splice variant of an mRNA transcribed from a gene. more transcription factors. With the provision herein of the 0215 Those skilled in the art would recognize that, for disclosed transcription factor, and transcription factor example, G867, SEQID NO:2, represents a single transcrip homolog nucleic acid sequences, the encoded polypeptide(s) tion factor; allelic variation and alternative Splicing may be can be expressed and purified in a heterologous expression expected to occur. Allelic variants of SEQ ID NO:1 can be System (e.g., E. coli) and used to raise antibodies (mono cloned by probing cDNA or genomic libraries from different clonal or polyclonal) specific for the polypeptide(s) in individual organisms according to Standard procedures. question. Antibodies can also be raised against Synthetic Allelic variants of the DNA sequence shown in SEQID NO: peptides derived from transcription factor, or transcription 1, including those containing Silent mutations and those in factor homolog, amino acid Sequences. Methods of raising which mutations result in amino acid Sequence changes, are antibodies are well known in the art and are described in within the Scope of the present invention, as are proteins Harlow and Lane (1988), Antibodies.A Laboratory Manual, which are allelic variants of SEQ ID NO:2. cDNAs gener Cold Spring Harbor Laboratory, N.Y. Such antibodies can ated from alternatively spliced mRNAS, which retain the then be used to Screen an expression library produced from properties of the transcription factor are included within the the plant from which it is desired to clone additional Scope of the present invention, as are polypeptides encoded US 2004/0098764 A1 May 20, 2004 24 by such cDNAS and mRNAs. Allelic variants and splice 0220 For example, substitutions, deletions and insertions variants of these Sequences can be cloned by probing cDNA introduced into the Sequences provided in the Sequence or genomic libraries from different individual organisms or Listing, are also envisioned by the invention. Such Sequence tissues according to Standard procedures known in the art modifications can be engineered into a Sequence by Site (see U.S. Pat. No. 6,388,064). directed mutagenesis (Wu (ed.) Methods Enzymol. (1993) 0216) Thus, in addition to the sequences set forth in the vol. 217, Academic Press) or the other methods noted below. Sequence Listing, the invention also encompasses related Amino acid Substitutions are typically of Single residues; nucleic acid molecules that include allelic or Splice variants insertions usually will be on the order of about from 1 to 10 of SEQ ID NO:1, 3, 5, 7,9,11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, and include amino acid residues; and deletions will range about from 1 Sequences which are complementary to any of the above to 30 residues. In preferred embodiments, deletions or nucleotide Sequences. Related nucleic acid molecules also insertions are made in adjacent pairs, e.g., a deletion of two include nucleotide Sequences encoding a polypeptide com residues or insertion of two residues. Substitutions, dele prising or consisting essentially of a Substitution, modifica tions, insertions or any combination thereof can be com tion, addition and/or deletion of one or more amino acid bined to arrive at a Sequence. The mutations that are made residues compared to the polypeptide as Set forth in any of in the polynucleotide encoding the transcription factor SEQ ID NO:2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, or 53. Such should not place the Sequence out of reading frame and related polypeptides may comprise, for example, additions should not create complementary regions that could produce and/or deletions of one or more N-linked or O-linked gly secondary mRNA structure. Preferably, the polypeptide cosylation Sites, or an addition and/or a deletion of one or encoded by the DNA performs the desired function. more cysteine residues. 0221 Conservative substitutions are those in which at 0217 For example, Table 3 illustrates, e.g., that the codons AGC, AGT, TCA, TCC, TCG, and TCT all encode least one residue in the amino acid Sequence has been the same amino acid:Serine. Accordingly, at each position in removed and a different residue inserted in its place. Such the Sequence where there is a codon encoding Serine, any of Substitutions generally are made in accordance with the the above trinucleotide Sequences can be used without Table 4 when it is desired to maintain the activity of the altering the encoded polypeptide. protein. Table 4 shows amino acids which can be substituted for an amino acid in a protein and which are typically TABLE 3 regarded as conservative Substitutions. Amino acid Possible Codons TABLE 4 Alanine Ala A GGA, GCC GCG GCT Gysteine Cys C TGC TGT Residue Conservative Substitutions Aspartic acid Asp D GAG GAT Glutamic acid Glu E GAA GAG Ala Ser Phenylalanine Phe F TTC TTT Arg Lys Glycine Gly G GGA. GGC GGG GGT Asn Gln; His Histidine His H GAG CAT Asp Glu Isoleucine Ile I ATA ATC ATT Glin Asn Lysine Lys K AAA AAG Cys Ser Leucine Telu T. TTA TTG CTA CTC GTG. CTT Glu Asp Methionine Met M. ATG Gly Pro Asparagine Asn N AAG AAT His Asn; Glin Proline Pro P CGA CCC GCG CCT Ile Leu, Val Glutamine Glin Q GAA GAG Leu Ile; Val Arginine Arg R AGA AGG CGA CGC GGG CGT Lys Arg, Gln Serine Ser S AGC AGT TGA TGC TCG TCT Met Leu: Ile Threonine Thir T ACA ACG ACG ACT Phe Met; Leu: Tyr Waline Wall W GTA GTC GTG GTT Ser Thr; Gly Tryptophan Trp W TGG Thr Ser; Val Tyrosine Tyr Y TAG TAT Trp Tyr Tyr Trp; Phe Val Ile: Leu 0218 Sequence alterations that do not change the amino acid Sequence encoded by the polynucleotide are termed “silent” variations. With the exception of the codons ATG and TGG, encoding methionine and tryptophan, respec 0222 Similar substitutions are those in which at least one tively, any of the possible codons for the Same amino acid residue in the amino acid Sequence has been removed and a can be Substituted by a variety of techniques, e.g., Site different residue inserted in its place. Such Substitutions directed mutagenesis, available in the art. Accordingly, any generally are made in accordance with the Table 5 when it and all Such variations of a Sequence Selected from the above is desired to maintain the activity of the protein. Table 5 table are a feature of the invention. shows amino acids which can be Substituted for an amino 0219. In addition to silent variations, other conservative acid in a protein and which are typically regarded as variations that alter one, or a few amino acid residues in the Structural and functional Substitutions. For example, a resi encoded polypeptide, can be made without altering the due in column 1 of Table 5 may be substituted with a residue function of the polypeptide, these conservative variants are, in column 2; in addition, a residue in column 2 of Table 5 likewise, a feature of the invention. may be substituted with the residue of column 1. US 2004/0098764 A1 May 20, 2004

thetic transcription factors and other polypeptides are TABLE 5 described, for example, by Zhanget al. (2000).J. Biol. Chem. 275:33850-33860, Liu et al. (2001) J. Biol. Chem. Residue Similar Substitutions 276:11323-11334, and Isalan et al. (2001) Nature Biotech Ala Ser; Thr; Gly; Val; Leu: Ile nol. 19:656-660. Many other mutation and evolution meth Arg Lys; His; Gly ods are also available and expected to be within the skill of Asn Gln: His; Gly; Ser; Thr the practitioner. Asp Glu, Ser; Thr Glin ASn; Ala 0227 Similarly, chemical or enzymatic alteration of Cys Ser; Gly expressed nucleic acids and polypeptides can be performed Glu Asp Gly Pro; Arg by Standard methods. For example, Sequence can be modi His Asn; Gln: Tyr; Phe; Lys; Arg fied by addition of lipids, Sugars, peptides, organic or Ile Ala; Leu; Val; Gly; Met inorganic compounds, by the inclusion of modified nucle Leu Ala; Ile; Val; Gly; Met otides or amino acids, or the like. For example, protein Lys Arg; His; Glin; Gly; Pro Met Leu: Ile; Phe modification techniques are illustrated in Ausubel, Supra. Phe Met; Leu; Tyr; Trp: His; Val; Ala Further details on chemical and enzymatic modifications can Ser Thr; Gly; Asp; Ala; Val; Ile: His be found herein. These modification methods can be used to Thr Ser; Val; Ala; Gly modify any given Sequence, or to modify any Sequence Trp Tyr; Phe: His Tyr Trp; Phe: His produced by the various mutation and artificial evolution Val Ala; Ile; Leu; Gly; Thr; Ser; Glu modification methods noted herein. 0228. Accordingly, the invention provides for modifica tion of any given nucleic acid by mutation, evolution, 0223 Substitutions that are less conservative than those chemical or enzymatic modification, or other available in Table 5 can be selected by picking residues that differ methods, as well as for the products produced by practicing more significantly in their effect on maintaining (a) the Such methods, e.g., using the Sequences herein as a starting Structure of the polypeptide backbone in the area of the Substrate for the various modification approaches. Substitution, for example, as a sheet or helical conformation, 0229. For example, optimized coding sequence contain (b) the charge or hydrophobicity of the molecule at the target ing codons preferred by a particular prokaryotic or eukary site, or (c) the bulk of the side chain. The substitutions which otic host can be used e.g., to increase the rate of translation in general are expected to produce the greatest changes in or to produce recombinant RNA transcripts having desirable protein properties will be those in which (a) a hydrophilic properties, such as a longer half-life, as compared with residue, e.g., Seryl or threonyl, is Substituted for (or by) a transcripts produced using a non-optimized Sequence. Trans hydrophobic residue, e.g., leucyl, isoleucyl, phenylalanyl, lation Stop codons can also be modified to reflect host valyl or alanyl; (b) a cysteine or proline is Substituted for (or preference. For example, preferred Stop codons for Saccha by) any other residue; (c) a residue having an electropositive Side chain, e.g., lysyl, arginyl, or histidyl, is Substituted for romyces cerevisiae and mammals are TAA and TGA, (or by) an electronegative residue, e.g., glutamylor aspartyl; respectively. The preferred Stop codon for monocotyledon or (d) a residue having a bulky side chain, e.g., phenylala ous plants is TGA, whereas insects and E. coli prefer to use nine, is Substituted for (or by) one not having a side chain, TAA as the Stop codon. e.g., glycine. 0230. The polynucleotide sequences of the present inven tion can also be engineered in order to alter a coding 0224 Further Modifying Sequences of the Invention Sequence for a variety of reasons, including but not limited Mutation/Forced Evolution to, alterations which modify the Sequence to facilitate clon ing, processing and/or expression of the gene product. For 0225. In addition to generating silent or conservative example, alterations are optionally introduced using tech Substitutions as noted, above, the present invention option niques which are well known in the art, e.g., Site-directed ally includes methods of modifying the Sequences of the mutagenesis, to insert new restriction Sites, to alter glyco Sequence Listing. In the methods, nucleic acid or protein Sylation patterns, to change codon preference, to introduce modification methods are used to alter the given Sequences to produce new Sequences and/or to chemically or enzy Splice sites, etc. matically modify given Sequences to change the properties 0231. Furthermore, a fragment or domain derived from of the nucleic acids or proteins. any of the polypeptides of the invention can be combined with domains derived from other transcription factors or 0226. Thus, in one embodiment, given nucleic acid Synthetic domains to modify the biological activity of a Sequences are modified, e.g., according to Standard transcription factor. For instance, a DNA-binding domain mutagenesis or artificial evolution methods to produce derived from a transcription factor of the invention can be modified Sequences. The modified Sequences may be created combined with the activation domain of another transcrip using purified natural polynucleotides isolated from any tion factor or with a Synthetic activation domain. A tran organism or may be Synthesized from purified compositions Scription activation domain assists in initiating transcription and chemicals using chemical means well know to those of from a DNA-binding site. Examples include the transcrip skill in the art. For example, Ausubel, Supra, provides tion activation region of VP 16 or GAL4 (Moore et al. additional details on mutagenesis methods. Artificial forced (1998) Proc. Natl. Acad. Sci. 95.376-381; Aoyama et al. evolution methods are described, for example, by Stemmer (1995) Plant Cell 7:1773-1785), peptides derived from (1994) Nature 370:389-391, Stemmer (1994) Proc. Natl. bacterial sequences (Ma and Ptashne (1987) Cell 51:113 Acad. Sci. 91:10747-10751, and U.S. Pat. Nos. 5,811,238, 119) and synthetic peptides (Giniger and Ptashne (1987) 5,837,500, and 6,242.568. Methods for engineering syn Nature 330:670-672). US 2004/0098764 A1 May 20, 2004 26

0232 Expression and Modification of Polypeptides 0238 Alternatively, non-Ti vectors can be used to trans fer the DNA into monocotyledonous plants and cells by 0233. Typically, polynucleotide sequences of the inven using free DNA delivery techniques. Such methods can tion are incorporated into recombinant DNA (or RNA) involve, for example, the use of liposomes, electroporation, molecules that direct expression of polypeptides of the microprojectile bombardment, Silicon carbide whiskers, and invention in appropriate host cells, transgenic plants, in Vitro Viruses. By using these methods transgenic plants Such as translation Systems, or the like. Due to the inherent degen wheat, rice (Christou (1991) Bio/Technology 9.957-962) and eracy of the genetic code, nucleic acid Sequences which corn (Gordon-Kamm (1990) Plant Cell 2:603-618) can be encode Substantially the same or a functionally equivalent produced. An immature embryo can also be a good target amino acid Sequence can be Substituted for any listed tissue for monocots for direct DNA delivery techniques by Sequence to provide for cloning and expressing the relevant using the particle gun (Weeks et al. (1993) Plant Physiol. homolog. 102:1077-1084; Vasil (1993) Bio/Technology 10:667-674; Wan and Lemeaux (1994) Plant Physiol. 104:37-48, and for 0234. The transgenic plants of the present invention Agrobacterium-mediated DNA transfer (Ishida et al. (1996) comprising recombinant polynucleotide Sequences are gen Nature Biotechnol. 14:745-750). erally derived from parental plants, which may themselves 0239). Typically, plant transformation vectors include one be non-transformed (or non-transgenic) plants. These trans or more cloned plant coding sequence (genomic or cDNA) genic plants may either have a transcription factor gene under the transcriptional control of 5' and 3' regulatory “knocked out' (for example, with a genomic insertion by Sequences and a dominant Selectable marker. Such plant homologous recombination, an antisense or ribozyme con transformation vectors typically also contain a promoter Struct) or expressed to a normal or wild-type extent. How (e.g., a regulatory region controlling inducible or constitu ever, overexpressing transgenic "progeny plants will tive, environmentally-or developmentally-regulated, or cell exhibit greater mRNA levels, wherein the mRNA encodes a or tissue-specific expression), a transcription initiation start transcription factor, that is, a DNA-binding protein that is Site, an RNA processing signal (Such as intron splice sites), capable of binding to a DNA regulatory Sequence and a transcription termination site, and/or a polyadenylation inducing transcription, and preferably, expression of a plant Signal. trait gene. Preferably, the MRNA expression level will be at 0240 A potential utility for the transcription factor poly least three-fold greater than that of the parental plant, or nucleotides disclosed herein is the isolation of promoter more preferably at least ten-fold greater mRNA levels elements from these genes that can be used to program compared to said parental plant, and most preferably at least expression in plants of any genes. Each transcription factor fifty-fold greater compared to Said parental plant. gene disclosed herein is expressed in a unique fashion, as 0235 Vectors, Promoters, and Expression Systems determined by promoter elements located upstream of the Start of translation, and additionally within an intron of the 0236. The present invention includes recombinant con transcription factor gene or downstream of the termination Structs comprising one or more of the nucleic acid Sequences codon of the gene. AS is well known in the art, for a herein. The constructs typically comprise a vector, Such as a Significant portion of genes, the promoter Sequences are plasmid, a cosmid, a phage, a virus (e.g., a plant Virus), a located entirely in the region directly upstream of the Start of bacterial artificial chromosome (BAC), a yeast artificial translation. In Such cases, typically the promoter Sequences chromosome (YAC), or the like, into which a nucleic acid are located within 2.0 kb of the start of translation, or within Sequence of the invention has been inserted, in a forward or 1.5 kb of the start of translation, frequently within 1.0 kb of reverse orientation. In a preferred aspect of this embodi the start of translation, and sometimes within 0.5 kb of the ment, the construct further comprises regulatory Sequences, Start of translation. including, for example, a promoter, operably linked to the Sequence. Large numbers of Suitable vectors and promoters 0241 The promoter Sequences can be isolated according are known to those of skill in the art, and are commercially to methods known to one skilled in the art. available. 0242 Examples of constitutive plant promoters which 0237 General texts that describe molecular biological can be useful for expressing the TF Sequence include: the techniques useful herein, including the use and production cauliflower mosaic virus (CaMV) 35S promoter, which of Vectors, promoters and many other relevant topics, conferS constitutive, high-level expression in most plant include Berger, Sambrook, Supra and Ausubel, Supra. Any of tissues (see, e.g., Odell et al. (1985) Nature 313:810-812); the identified Sequences can be incorporated into a cassette the nopaline synthase promoter (An et al. (1988) Plant or vector, e.g., for expression in plants. A number of expres Physiol. 88:547-552); and the octopine synthase promoter Sion vectorS Suitable for Stable transformation of plant cells (Fromm et al. (1989) Plant Cell 1:977-984). or for the establishment of transgenic plants have been 0243 The transcription factors of the invention may be described including those described in Weissbach and operably linked with a specific promoter that causes the Weissbach (1989) Methods for Plant Molecular Biology, transcription factor to be expressed in response to environ Academic Press, and Gelvin et al. (1990) Plant Molecular mental, tissue-specific or temporal Signals. A variety of plant Biology Manual, Kluwer Academic Publishers. Specific gene promoters that regulate gene expression in response to examples include those derived from a Ti plasmid of Agro environmental, hormonal, chemical, developmental Signals, bacterium tumefaciens, as well as those disclosed by Her and in a tissue-active manner can be used for expression of rera-Estrella et al. (1983) Nature 303:209, Bevan (1984) a TF Sequence in plants. Choice of a promoter is based Nucleic Acids Res. 12:8711-8721, Klee (1985) Bio/Technol largely on the phenotype of interest and is determined by ogy 3:637-642, for dicotyledonous plants. Such factors as tissue (e.g., seed, fruit, root, pollen, vascular US 2004/0098764 A1 May 20, 2004 27 tissue, flower, carpel, etc.), inducibility (e.g., in response to nals including the ATG initiation codon can be separately drought, Wounding, heat, cold, light, pathogens, etc.), tim provided. The initiation codon is provided in the correct ing, developmental Stage, and the like. Numerous known reading frame to facilitate transcription. Exogenous tran promoters have been characterized and can favorably be Scriptional elements and initiation codons can be of various employed to promote expression of a polynucleotide of the origins, both natural and Synthetic. The efficiency of expres invention in a transgenic plant or cell of interest. For Sion can be enhanced by the inclusion of enhancers appro example, tissue specific promoters include:Seed-specific priate to the cell System in use. promoters (such as the napin, phaseolin or DC3 promoter described in U.S. Pat. No. 5,773,697), fruit-specific promot 0247 Expression Hosts ers that are active during fruit ripening (Such as the dru 1 0248. The present invention also relates to host cells promoter (U.S. Pat. No. 5,783,393), or the 2A1 promoter which are transduced with vectors of the invention, and the (U.S. Pat. No. 4,943,674) and the tomato polygalacturonase production of polypeptides of the invention (including frag promoter (Bird et al. (1988) Plant Mol. Biol. 11:651-662), ments thereof) by recombinant techniques. Host cells are root-specific promoters, Such as those disclosed in U.S. Pat. genetically engineered (i.e., nucleic acids are introduced, Nos. 5,618,988, 5,837,848 and 5,905,186, pollen-active e.g., transduced, transformed or transfected) with the vectors promoters such as PTA29, PTA26 and PTA13 (U.S. Pat. No. of this invention, which may be, for example, a cloning 5,792,929), promoters active in vascular tissue (Ringli and vector or an expression vector comprising the relevant Keller (1998) Plant Mol. Biol. 37:977-988), flowerspecific nucleic acids herein. The vector is optionally a plasmid, a (Kaiser et al. (1995) Plant Mol. Biol. 28:231-243), pollen Viral particle, a phage, a naked nucleic acid, etc. The (Baerson et al. (1994) Plant Mol. Biol. 26:1947-1959), engineered host cells can be cultured in conventional nutri carpels (Ohl et al. (1990) Plant Cell 2:837-848), pollen and ent media modified as appropriate for activating promoters, ovules (Baerson et al. (1993) Plant Mol. Biol. 22:255-267), Selecting transformants, or amplifying the relevant gene. auxin-inducible promoters (such as that described in Van der The culture conditions, Such as temperature, pH and the like, Kop et al. (1999) Plant Mol. Biol. 39:979-990 or Baumann are those previously used with the host cell selected for et al., (1999) Plant Cell 11:323-334), cytokinin-inducible expression, and will be apparent to those skilled in the art promoter (Guevara-Garcia (1998) Plant Mol. Biol. 38:743 and in the references cited herein, including, Sambrook, 753), promoters responsive to gibberellin (Shi et al. (1998) Supra and Ausubel, Supra. Plant Mol. Biol. 38:1053-1060, Willmott et al. (1998) Plant 0249. The host cell can be a eukaryotic cell, such as a Molec. Biol. 38:817-825) and the like. Additional promoters yeast cell, or a plant cell, or the host cell can be a prokaryotic are those that elicit expression in response to heat (Ainley et cell, Such as a bacterial cell. Plant protoplasts are also al. (1993) Plant Mol. Biol. 22:13–23), light (e.g., the pea suitable for some applications. For example, the DNA rbcS-3A promoter, Kuhlemeier et al. (1989) Plant Cell fragments are introduced into plant tissues, cultured plant 1:471-478, and the maize rbcS promoter, Schafffier and cells or plant protoplasts by Standard methods including Sheen (1991) Plant Cell 3:997-1012); wounding (e.g., wunI, electroporation (Fromm et al. (1985) Proc. Natl. Acad. Sci. Siebertz et al. (1989) Plant Cell 1:961-968); pathogens 82:5824-5828, infection by viral vectors such as cauliflower (such as the PR-1 promoter described in Buchel et al. (1999) mosaic virus (CaMV) (Hohn et al. (1982) Molecular Biol Plant Mol. Biol. 40:387-396, and the PDF1.2 promoter ogy of Plant Tumors Academic Press, New York, N.Y., pp. described in Manners et al. (1998) Plant Mol. Biol. 38:1071 549-560; U.S. Pat. No. 4,407,956), high velocity ballistic 1080), and chemicals Such as methyl jasmonate or Salicylic penetration by Small particles with the nucleic acid either acid (Gatz (1997) Annu. Rev. Plant Physiol. Plant Mol. Biol. within the matrix of Small beads or particles, or on the 48:89-108). In addition, the timing of the expression can be surface (Klein et al. (1987) Nature 327:70-73), use of pollen controlled by using promoterS Such as those acting at as vector (WO 85/01856), or use of Agrobacterium tume senescence (Gan and Amasino (1995) Science 270:1986 faciens or A. rhizogenes carrying a T-DNA plasmid in which 1988); or late seed development (Odell et al. (1994) Plant DNA fragments are cloned. The T-DNA plasmid is trans Physiol. 106:447-458). mitted to plant cells upon infection by Agrobacterium tume 0244 Plant expression vectors can also include RNA faciens, and a portion is stably integrated into the plant processing Signals that can be positioned within, upstream or genome (Horsch et al. (1984) Science 233:496-498; Fraley downstream of the coding Sequence. In addition, the expres et al. (1983) Proc. Natl. Acad. Sci. 80:4803-4807). Sion vectors can include additional regulatory Sequences 0250) The cell can include a nucleic acid of the invention from the 3'-untranslated region of plant genes, e.g., a 3' that encodes a polypeptide, wherein the cell expresses a terminator region to increase mRNA stability of the mRNA, polypeptide of the invention. The cell can also include Such as the PI-II terminator region of potato or the octopine vector Sequences, or the like. Furthermore, cells and trans or nopaline Synthase 3' terminator regions. genic plants that include any polypeptide or nucleic acid 0245 Additional Expression Elements above or throughout this specification, e.g., produced by 0246 Specific initiation signals can aid in efficient trans transduction of a vector of the invention, are an additional lation of coding Sequences. These signals can include, e.g., feature of the invention. the ATG initiation codon and adjacent Sequences. In cases 0251 For long-term, high-yield production of recombi where a coding Sequence, its initiation codon and upstream nant proteins, stable expression can be used. Host cells Sequences are inserted into the appropriate expression vec transformed with a nucleotide Sequence encoding a polypep tor, no additional translational control Signals may be tide of the invention are optionally cultured under conditions needed. However, in cases where only coding Sequence Suitable for the expression and recovery of the encoded (e.g., a mature protein coding Sequence), or a portion protein from cell culture. The protein or fragment thereof thereof, is inserted, exogenous transcriptional control Sig produced by a recombinant cell may be Secreted, membrane US 2004/0098764 A1 May 20, 2004 28 bound, or contained intracellularly, depending on the double-stranded DNA arrays to identify molecules that Sequence and/or the vector used. AS will be understcod by affect the interactions of the transcription factors with their those of skill in the art, expression vectors containing promoters (Bulyk et al. (1999) Nature Biotechnol. 17:573 polynucleotides encoding mature proteins of the invention 577). can be designed with Signal Sequences which direct Secre 0258. The identified transcription factors are also useful tion of the mature polypeptides through a prokaryotic or to identify proteins that modify the activity of the transcrip eukaryotic cell membrane. tion factor. Such modification can occur by covalent modi 0252) Modified Amino Acid Residues fication, Such as by phosphorylation, or by protein-protein (homo or-heteropolymer) interactions. Any method Suitable 0253 Polypeptides of the invention may contain one or for detecting protein-protein interactions can be employed. more modified amino acid residues. The presence of modi Among the methods that can be employed are co-immuno fied amino acids may be advantageous in, for example, precipitation, cross-linking and co-purification through gra increasing polypeptide half-life, reducing polypeptide anti dients or chromatographic columns, and the two-hybrid genicity or toxicity, increasing polypeptide Storage Stability, yeast System. or the like. Amino acid residue(s) are modified, for example, co-translationally or post-translationally during recombinant 0259. The two-hybrid system detects protein interactions production or modified by Synthetic or chemical means. in vivo and is described in Chien et al. (1991) Proc. Natl. Acad. Sci. 88:9578-9582) and is commercially available 0254 Non-limiting examples of a modified amino acid from Clontech (Palo Alto, Calif.). In such a system, plasmids residue include incorporation or other use of acetylated are constructed that encode two hybrid proteins:One consists amino acids, glycosylated amino acids, Sulfated amino of the DNA-binding domain of a transcription activator acids, prenylated (e.g., fameSylated, geranylgeranylated) protein fused to the TF polypeptide and the other consists of amino acids, PEG modified (e.g., “PEGylated”) amino the transcription activator protein's activation domain fused acids, biotinylated amino acids, carboxylated amino acids, to an unknown protein that is encoded by a cDNA that has phosphorylated amino acids, etc. References adequate to been recombined into the plasmid as part of a cDNA library. guide one of Skill in the modification of amino acid residues The DNA-binding domain fusion plasmid and the cDNA are replete throughout the literature. library are transformed into a Strain of the yeast Saccharo 0255 The modified amino acid residues may prevent or myces cerevisiae that contains a reporter gene (e.g., lacz) increase affinity of the polypeptide for another molecule, whose regulatory region contains the transcription activa including, but not limited to, polynucleotide, proteins, car tor's binding site. Either hybrid protein alone cannot activate bohydrates, lipids and lipid derivatives, and other organic or transcription of the reporter gene. Interaction of the two Synthetic compounds. hybrid proteins reconstitutes the functional activator protein and results in expression of the reporter gene, which is 0256 Identification of Additional Protein Factors detected by an assay for the reporter gene product. Then, the library plasmids responsible for reporter gene expression are 0257. A transcription factor provided by the present isolated and Sequenced to identify the proteins encoded by invention can also be used to identify additional endogenous the library plasmids. After identifying proteins that interact or exogenous molecules that can affect a phentoype or trait with the transcription factors, assays for compounds that of interest. Such molecules include endogenous molecules interfere with the TF protein-protein interactions can be that are acted upon either at a transcriptional level by a preformed. transcription factor of the invention to modify a phenotype as desired. For example, the transcription factors can be 0260 Subsequences employed to identify one or more downstream genes that are Subject to a regulatory effect of the transcription factor. In 0261 Also contemplated are uses of polynucleotides, one approach, a transcription factor or transcription factor also referred to herein as oligonucleotides, typically having homolog of the invention is expressed in a host cell, e.g., a at least 12 bases, preferably at least 15, more preferably at transgenic plant cell, tissue or explant, and expression least 20, 30, or 50 bases, which hybridize under at least products, either RNA or protein, of likely or random targets highly stringent (or ultra-high Stringent or ultra-ultra-high are monitored, e.g., by hybridization to a microarray of Stringent conditions) conditions to a polynucleotide nucleic acid probes corresponding to genes expressed in a Sequence described above. The polynucleotides may be used tissue or cell type of interest, by two-dimensional gel as probes, primers, Sense and antisense agents, and the like, electrophoresis of protein products, or by any other method according to methods as noted Supra. known in the art for assessing expression of gene products 0262. Subsequences of the polynucleotides of the inven at the level of RNA or protein. Alternatively, a transcription tion, including polynucleotide fragments and oligonucle factor of the invention can be used to identify promoter otides are useful as nucleic acid probes and primers. An Sequences (Such as binding sites on DNA sequences) oligonucleotide Suitable for use as a probe or primer is at involved in the regulation of a downstream target. After least about 15 nucleotides in length, more often at least identifying a promoter Sequence, interactions between the about 18 nucleotides, often at least about 21 nucleotides, transcription factor and the promoter Sequence can be modi frequently at least about 30 nucleotides, or about 40 nucle fied by changing Specific nucleotides in the promoter otides, or more in length. A nucleic acid probe is useful in Sequence or specific amino acids in the transcription factor hybridization protocols, e.g., to identify additional polypep that interact with the promoter Sequence to alter a plant trait. tide homologs of the invention, including protocols for Typically, transcription factor DNA-binding Sites are iden microarray experiments. Primers can be annealed to a tified by gel shift assays. After identifying the promoter complementary target DNA strand by nucleic acid hybrid regions, the promoter region Sequences can be employed in ization to form a hybrid between the primer and the target US 2004/0098764 A1 May 20, 2004 29

DNA strand, and then extended along the target DNA strand 0270 Arabidopsis Genes in Transgenic Plants. by a DNA polymerase enzyme. Primer pairs can be used for 0271 Expression of genes which encode transcription amplification of a nucleic acid Sequence, e.g., by the poly factorS modify expression of endogenous genes, polynucle merase chain reaction (PCR) or other nucleic-acid amplifi otides, and proteins are well known in the art. In addition, cation methods. See Sambrook, Supra, and Ausubel, Supra. transgenic plants comprising isolated polynucleotides 0263. In addition, the invention includes an isolated or encoding transcription factors may also modify expression recombinant polypeptide including a Subsequence of at least of endogenous genes, polynucleotides, and proteins. about 15 contiguous amino acids encoded by the recombi Examples include Peng et al. (1997 Genes and Development nant or isolated polynucleotides of the invention. For 11:3194-3205) and Peng et al. (1999 Nature 400:256-261). example, Such polypeptides, or domains or fragments In addition, many others have demonstrated that an Arabi thereof, can be used as immunogens, e.g., to produce anti dopsis transcription factor expressed in an exogenous plant bodies Specific for the polypeptide Sequence, or as probes for Species elicits the same or very similar phenotypic response. detecting a sequence of interest. A Subsequence can range in See, for example, Fu et al. (2001 Plant Cell 13:1791-1802); Size from about 15 amino acids in length up to and including Nandi et al. (2000 Curr. Biol. 10:215-218); Coupland (1995 the full length of the polypeptide. Nature 377:482483); and Weigel and Nilsson (1995, Nature 377:482-500). 0264. To be encompassed by the present invention, an expressed polypeptide which comprises Such a polypeptide 0272 Homologous Genes Introduced into Transgenic Subsequence performs at least one biological function of the Plants. intact polypeptide in Substantially the same manner, or to a 0273 Homologous genes that may be derived from any Similar extent, as does the intact polypeptide. For example, plant, or from any Source whether natural, Synthetic, Semi a polypeptide fragment can comprise a recognizable Struc Synthetic or recombinant, and that share significant Sequence tural motif or functional domain such as a DNA binding identity or similarity to those provided by the present domain that activates transcription, e.g., by binding to a invention, may be introduced into plants, for example, crop Specific DNA promoter region an activation domain, or a plants, to confer desirable or improved traits. Consequently, domain for protein-protein interactions. transgenic plants may be produced that comprise a recom binant expression vector or cassette with a promoter oper 0265 Production of Transzenic Plants ably linked to one or more Sequences homologous to pres 0266 Modification of Traits ently disclosed Sequences. The promoter may be, for example, a plant or viral promoter. 0267 The polynucleotides of the invention are favorably employed to produce transgenic plants with various traits, or 0274 The invention thus provides for methods for pre characteristics, that have been modified in a desirable man paring transgenic plants, and for modifying plant traits. ner, e.g., to improve the Seed characteristics of a plant. For These methods include introducing into a plant a recombi example, alteration of expression levels or patterns (e.g., nant expression vector or cassette comprising a functional Spatial or temporal expression patterns) of one or more of the promoter operably linked to one or more Sequences homolo transcription factors (or transcription factor homologs) of gous to presently disclosed Sequences. Plants and kits for the invention, as compared with the levels of the same producing these plants that result from the application of protein found in a wild-type plant, can be used to modify a these methods are also encompassed by the present inven plant's traits. An illustrative example of trait modification, tion. improved characteristics, by altering expression levels of a 0275 Transcription Factors of Interest for the Modifica particular transcription factor is described further in the tion of Plant Traits Examples and the Sequence Listing. 0276 Currently, the existence of a series of maturity 0268 Arabidopsis as a Model System groups for different latitudes represents a major barrier to the introduction of new valuable traits. Any trait (e.g. drought 0269 Arabidopsis thaliana is the object of rapidly grow tolerance) has to be bred into each of the different maturity ing attention as a model for genetics and metabolism in groups separately, a laborious and costly exercise. The plants. Arabidopsis has a Small genome, and well-docu availability of Single Strain, which could be grown at any mented Studies are available. It is easy to grow in large latitude, would therefore greatly increase the potential for numbers and mutants defining important genetically con introducing new traits to crop species Such as Soybean and trolled mechanisms are either available, or can readily be obtained. Various methods to introduce and express isolated COtton. homologous genes are available (see Koncz et al., eds., 0277 For the specific effects, traits and utilities conferred Methods in Arabidopsis Research (1992) World Scientific, to plants, one or more transcription factor genes of the New Jersey, N.J., in “Preface”). Because of its small size, present invention may be used to increase or decrease, or Short life cycle, obligate autogamy and high fertility, Ara improve or prove deleterious to a given trait. For example, bidopsis is also a choice organism for the isolation of knocking out a transcription factor gene that naturally occurs mutants and Studies in morphogenetic and development in a plant, or Suppressing the gene (with, for example, pathways, and control of these pathways by transcription antisense Suppression), may cause decreased tolerance to an factors (Koncz. Supra, p. 72). A number of studies introduc oSmotic StreSS relative to non-transformed or wild-type ing transcription factors into A. thaliana have demonstrated plants. By overexpressing this gene, the plant may experi the utility of this plant for understanding the mechanisms of ence increased tolerance to the Same StreSS. More than one gene regulation and trait alteration in plants. (See, for transcription factor gene may be introduced into a plant, example, Koncz. Supra, and U.S. Pat. No. 6,417,428). either by transforming the plant with one or more vectors US 2004/0098764 A1 May 20, 2004 30

comprising two or more transcription factors, or by Selective Plant Physiol. 119:205-212) have shown that genetic and breeding of plants to yield hybrid crosses that comprise molecular Studies may be used to show extensive interaction more than one introduced transcription factor. between OSmotic StreSS, temperature StreSS, and ABA responses in plants. These investigators analyzed the expres 0278 Genes, Traits and Utilities that Affect Plant Char sion of RD29A-LUC in response to various treatment acteristics regimes in Arabidopsis. The RD29A promoter contains both 0279 Plant transcription factors can modulate gene the ABA-responsive and the dehydration-responsive ele expression, and, in turn, be modulated by the environmental ment-also termed the C-repeat-and can be activated by experience of a plant. Significant alterations in a plant's oSmotic StreSS, low temperature, or ABA treatment; tran environment invariably result in a change in the plant's Scription of the RD29A gene in response to osmotic and cold transcription factor gene expression pattern. Altered tran stresses is mediated by both ABA-dependent and ABA Scription factor expression patterns generally result in phe independent pathways (Xiong, Ishitani, and Zhu (1999) notypic changes in the plant. Transcription factor gene Supra). LUC refers to the firefly luciferase coding sequence, product(s) in transgenic plants then differ(s) in amounts or which, in this case, was driven by the StreSS responsive proportions from that found in wild-type or non-transformed plants, and those transcription factors likely represent RD29A promoter. The results revealed both positive and polypeptides that are used to alter the response to the negative interactions, depending on the nature and duration environmental change. By way of example, it is well of the treatments. Low temperature StreSS was found to accepted in the art that analytical methods based on altered impair osmotic Signaling but moderate heat StreSS Strongly expression patterns may be used to Screen for phenotypic enhanced osmotic,StreSS induction, thus acting Synergisti changes in a plant far more effectively than can be achieved cally with OSmotic Signaling pathways. In this study, the authors reported that oSmotic StreSS and ABA can act Syn using traditional methods. ergistically by showing that the treatments simultaneously 0280 Sugar Sensing. induced transgene and endogenous gene expression. Similar 0281. In addition to their important role as an energy results were reported by Bostock and Quatrano (1992) Source and Structural component of the plant cell, Sugars are Plant Physiol. 98:1356-1363), who found that osmotic stress central regulatory molecules that control Several aspects of and ABA act Synergistically and induce maize Em gene plant physiology, metabolism and development (Hsieh et al. expression. Ishitani et al (1997) Plant Cell 9:1935-1949) (1998) Proc. Natl. Acad. Sci.95:13965-13970). It is thought isolated a group of Arabidopsis Single-gene mutations that that this control is achieved by regulating gene expression confer enhanced responses to both osmotic StreSS and ABA. and, in higher plants, Sugars have been shown to repress or The nature of the recovery of these mutants from osmotic activate plant genes involved in many essential processes StreSS and ABA treatment Suggested that although Separate Such as photosynthesis, glyoxylate metabolism, respiration, Signaling pathways exist for OSmotic StreSS and ABA, the Starch and Sucrose Synthesis and degradation, pathogen pathways share a number of components, these common response, Wounding response, cell cycle regulation, pigmen components may mediate Synergistic interactions between tation, flowering and Senescence. The mechanisms by which oSmotic StreSS and ABA. Thus, contrary to the previously SugarS control gene expression are not understood. held belief that ABA-dependent and ABA-independent 0282. Several Sugar sensing mutants have turned out to StreSS Signaling pathways act in a parallel manner, our data be allelic to abscisic acid (ABA) and ethylene mutants. ABA reveal that these pathways cross-talk and converge to acti is found in all photosynthetic organisms and acts as a key Vate StreSS gene expression. regulator of transpiration, StreSS responses, embryogenesis, 0284. Because Sugars are important signaling molecules, and seed germination. Most ABA effects are related to the compound acting as a signal of decreased water availability, the ability to control either the concentration of a Signaling whereby it triggers a reduction in water loSS, Slows growth, Sugar or how the plant perceives or responds to a signaling and mediates adaptive responses. However, ABA also influ Sugar could be used to control plant development, physiol ences plant growth and development via interactions with ogy or metabolism. For example, the flux of Sucrose (a other phytohormones. Physiological and molecular Studies disaccharide Sugar used for Systemically transporting carbon indicate that maize and Arabidopsis have almost identical and energy in most plants) has been shown to affect gene pathways with regard to ABA biosynthesis and Signal trans expression and alter Storage compound accumulation in duction. For further review, see Finkelstein and Rock Seeds. Manipulation of the Sucrose signaling pathway in ((2002) Abscisic acid biosynthesis and response (In The Seeds may therefore cause Seeds to have more protein, oil or Arabidopsis Book, Editors: Somerville and Meyerowitz carbohydrate, depending on the type of manipulation. Simi (American Society of Plant Biologists, Rockville, Md.). larly, in tubers, Sucrose is converted to Starch which is used as an energy Store. It is thought that Sugar Signaling path 0283) This potentially implicates G867, G9, G993 and ways may partially determine the levels of Starch Synthe G1930 in hormone signaling based on the Sucrose Sugar sized in the tubers. The manipulation of Sugar Signaling in sensing phenotype of 35S::G867, 35S::G9, 35S::G993 and tubers could lead to tubers with a higher Starch content. 35S::G1930 transgenic lines (see Example VIII, below). On the other hand, the Sucrose treatment used in these experi 0285 Thus, the presently disclosed transcription factor ments (9.4% w/v) could also be an osmotic stress. Therefore, genes that manipulate the Sugar Signal transduction pathway, one could interpret these data as an indication that these including, for example, G867, G9, G993 and G1930, along transgenic lines are more tolerant to osmotic StreSS. How with their equivalogs, may lead to altered gene expression to ever, it is well known that plant responses to ABA, OSmotic produce plants with desirable traits. In particular, manipu and other StreSS may be linked, and these different treatments lation of Sugar Signal transduction pathways could be used may even act in a Synergistic manner to increase the degree to alter Source-Sink relationships in Seeds, tubers, roots and of a response. For example, Xiong, Ishitani, and Zhu (1999) other Storage organs leading to increase in yield. US 2004/0098764 A1 May 20, 2004

0286 Abiotic stress:drought and low humidity tolerance. (2000) Proc. Natl. Acad. Sci. USA97:11632-11637), result Exposure to dehydration invokes Similar Survival Strategies ing in altered activity of transcription factors that bind to an in plants as does freezing stress (see, for example, Yelenosky upstream element within the rc29B promoter. In Mesembry (1989) Plant Physiol89:444-451) and drought stress induces anthemum crystallinum (ice plant), Patharker and Cushman freezing tolerance (see, for example, Siminovitch et al. have shown that a calcium-dependent protein kinase (1982) Plant Physiol 69:250-255; and Guy et al. (1992) (McCDPK1) is induced by exposure to both drought and salt Planta 188:265-270). In addition to the induction of cold stresses (Patharker and Cushman (2000) Plant J. 24:679 acclimation proteins, Strategies that allow plants to Survive 691). The stress-induced kinase was also shown to phos in low water conditions may include, for example, reduced phorylate a transcription factor, presumably altering its Surface area, or Surface oil or wax production. Modifying the activity, although transcript levels of the target transcription expression of a number of presently disclosed transcription factor are not altered in response to Salt or drought StreSS. factor genes, Such as G867, may be used to increase a plant's Similarly, Saijo et al. demonstrated that a rice Salt/drought tolerance to low water conditions and provide the benefits of induced calmodulin-dependent protein kinase (OsCDPK7) improved Survival, increased yield and an extended geo conferred increased Salt and drought tolerance to rice when graphic and temporal planting range. overexpressed (Saijo et al. (2000) Plant J. 23:319-327). 0287. Osmotic stress. Modification of the expression of a 0291 Exposure to dehydration invokes similar survival number of presently disclosed transcription factor genes, Strategies in plants as does freezing stress (See, for example, e.g., G867, G1930, and their equivalogs, may be used to Yelenosky (1989) Plant Physiol 89:444-451) and drought increase germination rate or growth under adverse osmotic stress induces freezing tolerance (see, for example, Simino conditions, which could impact Survival and yield of Seeds vitch et al. (1982) Plant Physiol 69:250-255; and Guy et al. and plants. Osmotic stresses may be regulated by Specific (1992) Planta 188: 265-270). In addition to the induction of molecular control mechanisms that include genes control cold-acclimation proteins, Strategies that allow plants to ling water and ion movements, functional and structural Survive in low water conditions may include, for example, StreSS-induced proteins, Signal perception and transduction, reduced Surface area, or Surface oil or wax production. and free radical Scavenging, and many others (Wang et al. 0292 Consequently, one skilled in the art would expect (2001) Acta Hort. (ISHS) 560:285-292). Instigators of that Some pathways involved in resistance to one of these oSmotic StreSS include freezing, drought and high Salinity, Stresses, and hence regulated by an individual transcription each of which are discussed in more detail below. factor, will also be involved in resistance to another of these 0288. In many ways, freezing, high salt and drought have Stresses, regulated by the same or homologous transcription Similar effects on plants, not the least of which is the factors. Of course, the overall resistance pathways are induction of common polypeptides that respond to these related, not identical, and therefore not all transcription different Stresses. For example, freezing is similar to water factors controlling resistance to one StreSS will control deficit in that freezing reduces the amount of water available resistance to the other stresses. Nonetheless, if a transcrip to a plant. Exposure to freezing temperatures may lead to tion factor conditions resistance to one of these Stresses, it cellular dehydration as water leaves cells and forms ice would be apparent to one skilled in the art to test for crystals in intercellular spaces (Buchanan, Supra). AS with resistance to these related Stresses. high Salt concentration and freezing, the problems for plants 0293. The genes of the sequence listing, including, for caused by low water availability include mechanical Stresses example, G867, G1930, and their equivalogs, that provide caused by the withdrawal of cellular water. Thus, the incor tolerance to Salt may be used to engineer Salt tolerant crops poration of transcription factors that modify a plant's and trees that can flourish in Soils with high Saline content response to osmotic StreSS into, for example, a crop or or under drought conditions. In particular, increased Salt ornamental plant, may be useful in reducing damage or loSS. tolerance during the germination Stage of a plant enhances Specific effects caused by freezing, high Salt and drought are survival and yield. Presently disclosed transcription factor addressed below. genes that provide increased Salt tolerance during germina tion, the Seedling Stage, and throughout a plant's life cycle, 0289 Salt and Drought Tolerance would find particular value for imparting Survival and yield 0290 Plants are subject to a range of environmental in areas where a particular crop would not normally prosper. challenges. Several of these, including Salt StreSS, general 0294 Root growth and vigor. Some of the genes in the oSmotic StreSS, drought StreSS and freezing StreSS, have the Sequence Listing, e.g., G9, have been shown to increase root ability to impact whole plant and cellular water availability. growth and to produce hairy roots on media containing Not Surprisingly, then, plant responses to this collection of methyl jasmonate. Thus, these genes could potentially be Stresses are related. In a recent review, Zhu notes that “most Studies on water StreSS Signaling have focused on Salt StreSS used to increase root growth and Vigor, which might in turn primarily because plant responses to Salt and drought are allow better plant growth during periods of OSmotic StreSS, closely related and the mechanisms overlap" (Zhu (2002) or limited nutrient availability. Ann. Rev. Plant Biol. 53:247-273). Many examples of simi 0295 Summary of altered plant characteristics. A clade lar responses and pathways to this set of Stresses have been of Structurally and functionally related Sequences that derive documented. For example, the CBF transcription factors from a wide range of plants, including polynucleotide SEQ have been shown to condition resistance to Salt, freezing and ID NOs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, drought (Kasuga et al. (1999) Nature Biotech. 17:287-291). 31, 33,35, 37, 39, 41, 43, 45, 47,49, or 51, polynucleotides The Arabidopsis rd29B gene is induced in response to both that encode polypeptide SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, Salt and dehydration StreSS, a process that is mediated largely 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, through an ABA signal transduction process (Uno et al. 48, 50, 52, or 53, fragments thereof, paralogs, orthologs, US 2004/0098764 A1 May 20, 2004 32

equivalogs, and fragments thereof, is provided. These to be transformed. Typically, the antisense Sequence need Sequences have been shown in laboratory and field experi only be capable of hybridizing to the target gene or RNA of ments to confer altered size and abiotic StreSS tolerance interest. Thus, where the introduced Sequence is of Shorter phenotypes in plants. The invention also provides polypep length, a higher degree of homology to the endogenous tides comprising SEQ ID NOS:2, 4, 6, 8, 10, 12, 14, 16, 18, transcription factor Sequence will be needed for effective 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, antisense Suppression. While antisense Sequences of various 52, or 53, and fragments thereof, conserved domains thereof, lengths can be utilized, preferably, the introduced antisense paralogs, orthologs, equivalogs, and fragments thereof. sequence in the vector will be at least 30 nucleotides in Plants that overexpress these Sequences have been observed length, and improved antisense Suppression will typically be to be more tolerant to a wide variety of abiotic Stresses, observed as the length of the antisense Sequence increases. including, germination in heat and cold, and OSmotic stresses Preferably, the length of the antisense Sequence in the Vector Such as drought and high Salt levels. Many of the orthologs will be greater than 100 nucleotides. Transcription of an of these Sequences are listed in the Sequence Listing, and antisense construct as described results in the production of due to the high degree of Structural Similarity to the RNA molecules that are the reverse complement of MRNA Sequences of the invention, it is expected that these molecules transcribed from the endogenous transcription Sequences may also function to increase abiotic StreSS tol factor gene in the plant cell. erance. The invention also encompasses the complements of 0299 Suppression of endogenous transcription factor the polynucleotides. The polynucleotides are useful for gene expression can also be achieved using RNA interfer Screening libraries of molecules or compounds for Specific ence , or RNAi. RNAi is a post-transcriptional, targeted binding and for creating transgenic plants having increased gene-Silencing technique that uses double-Stranded RNA abiotic StreSS tolerance. (dsRNA) to incite degradation of messenger RNA (mRNA) 0296 Antisense and Co-Suppression containing the same sequence as the dsRNA (Constans, (2002) The Scientist 16:36). Small interfering RNAs, or 0297. In addition to expression of the nucleic acids of the SiRNAS are produced in at least two steps: an endogenous invention as gene replacement or plant phenotype modifi ribonuclease cleaves longer dsRNA into shorter, 21-23 cation nucleic acids, the nucleic acids are also useful for nucleotide-long RNAS. The siRNA segments then mediate Sense and anti-Sense Suppression of expression, e.g. to the degradation of the target mRNA (Zamore, (2001) Nature down-regulate expression of a nucleic acid of the invention, Struct. Biol., 8:746–50). RNAi has been used for gene e.g. as a further mechanism for modulating plant phenotype. function determination in a manner Similar to antisense That is, the nucleic acids of the invention, or Subsequences oligonucleotides (Constans, (2002) The Scientist 16:36). or anti-sense Sequences thereof, can be used to block expres Expression vectors that continually express siRNAS in tran Sion of naturally occurring homologous nucleic acids. A Siently and Stably transfected have been engineered to variety of Sense and anti-Sense technologies are known in express Small hairpin RNAs (shRNAS), which get processed the art, e.g. as set forth in Lichtenstein and Nellen (1997) in Vivo into siRNAS-like molecules capable of carrying out Antisense Technology: A Practical Approach IRL Press at gene-specific silencing (Brummelkamp et al., (2002) Sci Oxford University Press, Oxford, U.K. Antisense regulation ence 296:550-553, and Paddison, et al. (2002) Genes & Dev. is also described in Crowley et al. (1985) Cell 43:633-641; 16:948-958). Post-transcriptional gene silencing by double Rosenberget al. (1985) Nature 313:703-706; Preiss et al. stranded RNA is discussed in further detail by Hammond et (1985) Nature 313:27-32; Melton (1985) Proc. Natl. Acad. al. (2001) Nature Rev. Gen 2:110-119, Fire et al. (1998) Sci. 82:144-148; Izant and Weintraub (1985) Science Nature 391:806-811 and Timmons and Fire (1998) Nature 229:345-352; and Kim and Wold (1985) Cell 42:129-138. 395:854. Vectors in which RNA encoded by a transcription Additional methods for antisense regulation are known in factor or transcription factor homolog cDNA is over-ex the art. AntiSense regulation has been used to reduce or pressed can also be used to obtain co-Suppression of a inhibit expression of plant genes in, for example in European corresponding endogenous gene, e.g., in the manner Patent Publication No. 271988. Antisense RNA may be used described in U.S. Pat. No. 5,231,020 to Jorgensen. Such to reduce gene expression to produce a visible or biochemi co-Suppression (also termed Sense Suppression) does not cal phenotypic change in a plant (Smith et al. (1988) Nature, require that the entire transcription factor cDNA be intro 334:724-726; Smith et al. (1990) Plant Mol. Biol. 14:369 duced into the plant cells, nor does it require that the 379). In general, Sense or anti-sense sequences are intro introduced Sequence be exactly identical to the endogenous duced into a cell, where they are optionally amplified, e.g. by transcription factor gene of interest. However, as with anti transcription. Such Sequences include both simple oligo Sense Suppression, the Suppressive efficiency will be nucleotide Sequences and catalytic Sequences Such as enhanced as Specificity of hybridization is increased, e.g., as ribozymes. the introduced Sequence is lengthened, and/or as the 0298 For example, a reduction or elimination of expres Sequence Similarity between the introduced Sequence and Sion (i.e., a “knock-out”) of a transcription factor or tran the endogenous transcription factor gene is increased. Scription factor homolog polypeptide in a transgenic plant, 0300 Vectors expressing an untranslatable form of the e.g., to modify a plant trait, can be obtained by introducing transcription factor MRNA, e.g., Sequences comprising one an antisense construct corresponding to the polypeptide of or more stop codon, or nonsense mutation) can also be used interest as a cDNA. For antisense Suppression, the transcrip to SuppreSS expression of an endogenous transcription fac tion factor or homolog cDNA is arranged in reverse orien tor, thereby reducing or eliminating its activity and modi tation (with respect to the coding sequence) relative to the fying one or more traits. Methods for producing Such promoter Sequence in the expression vector. The introduced constructs are described in U.S. Pat. No. 5,583,021. Prefer Sequence need not be the full length cDNA or gene, and need ably, Such constructs are made by introducing a premature not be identical to the cDNA or gene found in the plant type Stop codon into the transcription factor gene. Alternatively, US 2004/0098764 A1 May 20, 2004 a plant trait can be modified by gene Silencing using double 0306 The plant can be any higher plant, including gym strand RNA (Sharp (1999) Genes and Development 13:139 nosperms, monocotyledonous and dicotyledenous plants. 141). Another method for abolishing the expression of a Suitable protocols are available for Leguminosae (alfalfa, gene is by insertion mutagenesis using the T-DNA of Agro Soybean, clover, etc.), Umbellhferae (carrot, celery, parsnip), bacterium tumefaciens. After generating the insertion Cruciferae (cabbage, radish, rapeseed, broccoli, etc.), Cur mutants, the mutants can be Screened to identify those curbitaceae (melons and cucumber), Gramineae (wheat, containing the insertion in a transcription factor or transcrip corn, rice, barley, millet, etc.), Solanaceae (potato, tomato, tion factor homolog gene. Plants containing a single trans tobacco, peppers, etc.), and various other crops. See proto gene insertion event at the desired gene can be crossed to cols described in Ammirato et al., eds., (1984) Handbook of generate homozygous plants for the mutation. Such methods Plant Cell Culture-Crop Species, Macmillan Publ. Co., are well known to those of skill in the art (See for example New York, N.Y.; Shimamoto et al. (1989) Nature 338:274 Koncz et al. (1992) Methods in Arabidopsis Research, 276; Frominet al. (1990) Bio/Technol. 8:833-839; and Vasil World Scientific Publishing Co. Pte. Ltd., River Edge, N.J.). et al. (1990) Bio/Technol. 8:429434. 0307 Transformation and regeneration of both mono 0301 Alternatively, a plant phenotype can be altered by cotyledonous and dicotyledonous plant cells is now routine, eliminating an endogenous gene, Such as a transcription and the Selection of the most appropriate transformation factor or transcription factor homolog, e.g., by homologous technique will be determined by the practitioner. The choice recombination (Kempin et al. (1997) Nature 389:802-803). of method will vary with the type of plant to be transformed; those skilled in the art will recognize the Suitability of 0302) A plant trait can also be modified by using the particular methods for given plant types. Suitable methods Cre-lox system (for example, as described in U.S. Pat. No. can include, but are not limited to:electroporation of plant 5,658,772). A plant genome can be modified to include first protoplasts, liposome-mediated transformation; polyethyl and Second loX sites that are then contacted with a Cre ene glycol (PEG) mediated transformation; transformation recombinase. If the loX Sites are in the same orientation, the using viruses, micro-injection of plant cells, micro-projec intervening DNA sequence between the two Sites is excised. tile bombardment of plant cells, vacuum infiltration; and If the lox Sites are in the opposite orientation, the intervening Agrobacterium tumefacienS mediated transformation. Trans Sequence is inverted. formation means introducing a nucleotide Sequence into a plant in a manner to cause Stable or transient expression of 0303. The polynucleotides and polypeptides of this the Sequence. invention can also be expressed in a plant in the absence of 0308 Successful examples of the modification of plant an expression cassette by manipulating the activity or characteristics by transformation with cloned Sequences expression level of the endogenous gene by other means, which serve to illustrate the current knowledge in this field Such as, for example, by ectopically expressing a gene by of technology, and which are herein incorporated by refer T-DNA activation tagging (Ichikawa et al. (1997) Nature 390 698-701; Kakimoto et al. (1996) Science 274:982-985). ence, include: U.S. Pat. Nos. 5,571,706; 5,677,175; 5,510, This method entails transforming a plant with a gene tag 471; 5,750,386; 5,597,945; 5,589,615; 5,750,871; 5,268, containing multiple transcriptional enhancers and once the 526; 5,780,708; 5,538,880; 5,773,269; 5,736,369 and 5,610, tag has inserted into the genome, expression of a flanking O42. gene coding Sequence becomes deregulated. In another 0309 Following transformation, plants are preferably example, the transcriptional machinery in a plant can be Selected using a dominant Selectable marker incorporated modified So as to increase transcription levels of a poly into the transformation vector. Typically, Such a marker will nucleotide of the invention (See, e.g., PCT Publications WO confer antibiotic or herbicide resistance on the transformed 96/06166 and WO 98/53057 which describe the modifica plants, and Selection of transformants can be accomplished tion of the DNA-binding Specificity of Zinc finger proteins by exposing the plants to appropriate concentrations of the by changing particular amino acids in the DNA-binding antibiotic or herbicide. motif). 0310. After transformed plants are selected and grown to maturity, those plants showing a modified trait are identified. 0304. The transgenic plant can also include the machin The modified trait can be any of those traits described above. ery necessary for expressing or altering the activity of a Additionally, to confirm that the modified trait is due to polypeptide encoded by an endogenous gene, for example, changes in expression levels or activity of the polypeptide or by altering the phosphorylation State of the polypeptide to polynucleotide of the invention can be determined by ana maintain it in an activated State. lyzing mRNA expression using Northern blots, RT-PCR or 0305 Transgenic plants (or plant cells, or plant explants, microarrays, or protein expression using immunoblots or or plant tissues) incorporating the polynucleotides of the Western blots or gel shift assays. invention and/or expressing the polypeptides of the inven 0311) Integrated Systems-Sequence Identity tion can be produced by a variety of well established 0312. Additionally, the present invention may be an inte techniques as described above. Following construction of a grated System, computer or computer readable medium that vector, most typically an expression cassette, including a comprises an instruction Set for determining the identity of polynucleotide, e.g., encoding a transcription factor or tran one or more Sequences in a database. In addition, the Scription factor homolog, of the invention, Standard tech instruction Set can be used to generate or identify Sequences niques can be used to introduce the polynucleotide into a that meet any Specified criteria. Furthermore, the instruction plant, a plant cell, a plant explant or a plant tissue of interest. Set may be used to associate or link certain functional Optionally, the plant cell, explant or tissue can be regener benefits, Such improved characteristics, with one or more ated to produce a transgenic plant. identified Sequence. US 2004/0098764 A1 May 20, 2004 34

0313 For example, the instruction Set can include, e.g., a and speed of the alignment. The BLASTN program (for Sequence comparison or other alignment program, e.g., an nucleotide Sequences) uses as defaults a wordlength (W) of available program Such as, for example, the Wisconsin 11, an expectation (E) of 10, a cutoff of 100, M=5, N=4, and Package Version 10.0, such as BLAST, FASTA, PILEUP, a comparison of both Strands. For amino acid Sequences, the FINDPATTERNS or the like (GCG, Madison, Wis.). Public BLASTP program uses as defaults a wordlength (W) of 3, an sequence databases such as GenBank, EMBL, Swiss-Prot expectation (E) of 10, and the BLOSUM62 scoring matrix and PIR or private sequence databases such as PHYTOSEQ (see Henikoff and Henikoff (1992) Proc. Natl. Acad. Sci. Sequence database (Incyte Genomics, Palo Alto, Calif.) can 89:10915-10919). Unless otherwise indicated, “sequence be searched. identity” here refers to the % Sequence identity generated 0314 Alignment of Sequences for comparison can be from athlastx using the NCBI version of the algorithm at the conducted by the local homology algorithm of Smith and default Settings using gapped alignments with the filter “off” Waterman (1981) Adv. Appl. Math. 2:482-489, by the (see, for example, NIH NLM NCBI website at ncbi.nlm.nih, homology alignment algorithm of Needleman and Wunsch Supra). (1970).J. Mol. Biol. 48:443-453, by the search for similarity 0317. In addition to calculating percent Sequence identity, method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. the BLAST algorithm also performs a statistical analysis of 85:2444-2448, by computerized implementations of these the similarity between two Sequences (see, e.g. Karlin and algorithms. After alignment, Sequence comparisons between Altschul (1993) Proc. Natl. Acad. Sci. 90:5873-5787). One two (or more) polynucleotides or polypeptides are typically measure of similarity provided by the BLAST algorithm is performed by comparing Sequences of the two Sequences the Smallest sum probability (P(N)), which provides an over a comparison window to identify and compare local indication of the probability by which a match between two regions of Sequence Similarity. The comparison window can nucleotide or amino acid Sequences would occur by chance. be a Segment of at least about 20 contiguous positions, For example, a nucleic acid is considered Similar to a usually about 50 to about 200, more usually about 100 to reference sequence (and, therefore, in this context, homolo about 150 contiguous positions. A description of the method gous) if the Smallest Sum probability in a comparison of the is provided in Ausubel et al. Supra. test nucleic acid to the reference nucleic acid is less than about 0.1, or less than about 0.01, and or even less than about 0315) A variety of methods for determining sequence 0.001. An additional example of a useful Sequence align relationships can be used, including manual alignment and ment algorithm is PILEUPPILEUP creates a multiple computer assisted Sequence alignment and analysis. This Sequence alignment from a group of related Sequences using later approach is a preferred approach in the present inven progressive, pairwise alignments. The program can align, tion, due to the increased throughput afforded by computer e.g., up to 300 Sequences of a maximum length of 5,000 assisted methods. AS noted above, a variety of computer programs for performing Sequence alignment are available, letters. or can be produced by one of skill. 0318. The integrated system, or computer typically includes a user input interface allowing a user to Selectively 0316 One example algorithm that is suitable for deter View one or more Sequence records corresponding to the one mining percent Sequence identity and Sequence Similarity is or more character Strings, as well as an instruction Set which the BLAST algorithm, which is described in Altschul et al. aligns the one or more character Strings with each other or (1990) J. Mol. Biol. 215:403410. Software for performing with an additional character String to identify one or more BLAST analyses is publicly available, e.g., through the region of Sequence Similarity. The System may include a link National Library of Medicine's National Center for Biotech of one or more character Strings with a particular phenotype nology Information (ncbi.nlm.nih; see at world wide web or gene function. Typically, the System includes a user (www) National Institutes of Health U.S. government (gov) readable output element that displays an alignment produced website). This algorithm involves first identifying high scor by the alignment instruction Set. ing sequence pairs (HSPs) by identifying short words of length W in the query Sequence, which either match or 03.19. The methods of this invention can be implemented Satisfy Some positive-valued threshold Score T when aligned in a localized or distributed computing environment. In a with a word of the same length in a database Sequence. T is distributed environment, the methods may implemented on referred to as the neighborhood word score threshold (Alts a single computer comprising multiple processors or on a chul et al. Supra). These initial neighborhood word hits act multiplicity of computers. The computers can be linked, e.g. as Seeds for initiating Searches to find longer HSPS contain through a common bus, but more preferably the computer(s) ing them. The word hits are then extended in both directions are nodes on a network. The network can be a generalized along each Sequence for as far as the cumulative alignment or a dedicated local or wide-area network and, in certain Score can be increased. Cumulative Scores are calculated preferred embodiments, the computers may be components using, for nucleotide sequences, the parameters M (reward of an intra-net or an internet. Score for a pair of matching residues; always >0) and N 0320 Thus, the invention provides methods for identify (penalty Score for mismatching residues; always <0). For ing a Sequence Similar or homologous to one or more amino acid Sequences, a Scoring matrix is used to calculate polynucleotides as noted herein, or one or more target the cumulative Score. Extension of the word hits in each polypeptides encoded by the polynucleotides, or otherwise direction are halted when: the cumulative alignment Score noted herein and may include linking or associating a given falls off by the quantity X from its maximum achieved value; plant phenotype or gene function with a Sequence. In the the cumulative Score goes to Zero or below, due to the methods, a sequence database is provided (locally or across accumulation of one or more negative-Scoring residue align an inter or intra net) and a query is made against the ments; or the end of either sequence is reached. The BLAST Sequence database using the relevant Sequences herein and algorithm parameters W, T, and X determine the sensitivity asSociated plant phenotypes or gene functions. US 2004/0098764 A1 May 20, 2004

0321) Any sequence herein can be entered into the data of the endogenous Sequence may also be regulated by base, before or after querying the database. This provides for treatment with a particular chemical or other means, Such as both expansion of the database and, if done before the EMR. Some examples of such compounds well known in the querying Step, for insertion of control Sequences into the art include: ethylene, cytokinins; phenolic compounds, database. The control Sequences can be detected by the which Stimulate the transcription of the genes needed for query to ensure the general integrity of both the database and infection; Specific monosaccharides and acidic environ the query. AS noted, the query can be performed using a web ments which potentiate Vir gene induction; acidic polysac browser based interface. For example, the database can be a charides which induce one or more chromosomal genes, and centralized public database Such as those noted herein, and opines; other mechanisms include light or dark treatment the querying can be done from a remote terminal or com (for a review of examples of Such treatments, See, Winans puter across an internet or intranet. (1992) Microbiol. Rev. 56:12-31; Eyalet al. (1992) Plant 0322) Any sequence herein can be used to identify a Mol. Biol. 19:589-599; Chrispeels et al. (2000) Plant Mol. Similar, homologous, paralogous, or orthologous Sequence Biol. 42:279-290; Piazza et al. (2002) Plant Physiol. in another plant. This provides means for identifying endog 128:1077-1086). enous Sequences in other plants that may be useful to alter 0323 Table 6 lists sequences discovered to be ortholo a trait of progeny plants, which results from crossing two gous to a number of representative transcription factors of plants of different Strain. For example, Sequences that the present invention, in decreasing order of Similarity to encode an Ortholog of any of the Sequences herein that G867. The column headings include the transcription factors naturally occur in a plant with a desired trait can be listed by (a) the SEQ ID NO:of the homolog (paralog or identified using the Sequences disclosed herein. The plant is ortholog) or the nucleotide encoding the homolog; (b) the then crossed with a Second plant of the same Species but GID sequence identifier; (c) the Sequence Identifier or which does not have the desired trait to produce progeny GenBank Accession Number; (d) the species from which the which can then be used in further crossing experiments to homologs (orthologs or paralogs) to the transcription factors produce the desired trait in the Second plant. Therefore the are derived; and (e) the Smallest Sum probability relationship resulting progeny plant contains no transgenes, expression to G867 determined by BLAST analysis.

TABLE 6 Homologs of Representative Arabidopsis Transcription Factor Genes Identified using BLAST SEO ID NO: Smallest of Homolog Sum or Nucleotide Probability Encoding GID Sequence Identifier or Species from Which tO Homolog No. Accession Number Homolog is Derived G867 1. G867 Arabidopsis thaliana O.O 7 G1930 Arabidopsis thaliana 1.OOE-132 3 G9 Arabidopsis thaliana 1.00E-115 5 G993 Arabidopsis thaliana 1.00E-115 41 BZA58719 Brassica oleracea 1.OOE-113 17 G3451 GLYMA-28NOVO1- Glycine max 1.OOE-110 CLUSTER19062 3 25 G3454 Glycine max 1.OOE-109 21 G3452 GLYMA-28NOVO1- Glycine max 2.OOE-99 CLUSTER19062 7 23 G3453 Glycine max 3.OE-98 CB686OSO Brassica napus 1.OOE-97 43 BO971511 Helianthus annuus 2.OOE-94 45 BUO25988 Helianthus annuus 3.OOE-92 BQ971525 Helianthus annuus 2.OOE-92 37 G34.32 Zea mayS 1.OOE-87 35 APOO3450 Oryza Sativa 9.OOE-85 gi18565433 Oryza sativa (japonica 3.OOE-85 cultivar-group) 33 G3390 AC130725 Oryza Sativa 8.OOE-84 47 BTOO931O Triticium aestivum 4.OOE-82 35 G3391 APOO3450 Oryza Sativa 1.OOE-82 29 G3388 OSC21673.C1.p5.fg Oryza sativa 2.OOE-8O APOO2913 49 CC616336 Zea mayS 2.OOE-8O 33 AC130725 Oryza sativa (japonica 1.OOE-8O cultivar-group) AC1364.92 Oryza sativa (japonica 1.OOE-8O cultivar-group) 51 AAAAO1OOO997 Oryza sativa (indica 1.OOE-79 cultivar-group) 31 G3389 OSC21674.C1-p12.f.g. Oryza sativa 1.OOE-79 APOO2913 BO405698 Gossypium arboreun 2.OOE-77 US 2004/0098764 A1 May 20, 2004 36

TABLE 6-continued Homologs of Representative Arabidopsis Transcription Factor Genes Identified using BLAST SEO ID NO: Smallest of Homolog Sum or Nucleotide Probability Encoding GID Sequence Identifier or Species from Which tO Homolog No. Accession Number Homolog is Derived G867 39 G3433 Zea mayS 2.OOE-73 27 G3455 GLYMA-28NOVO1- Glycine max 3.OOE-70 CLUSTER19062 5 BZO15521 Brassica oleracea S.OOE-69 BF520598 Medicago truncatula 2.OOE-66 BU994.579 Hordeum vulgare 5.00 E-64 subsp. Vulgare CD814840 Brassica napus 4.OOE-64 CB894555 Medicago truncatula 3.OOE-64 BF424857 Glycine max 2.OOE-62 BU871082 Populus balsamifera 2.OOE-61 subsp. trichocarpa BQ855250 Lactuca Sativa 7.OOE-61

0324 Molecular Modeling EXAMPLES 0325 Another means that may be used to confirm the 0327. The invention, now being generally described, will utility and function of transcription factor Sequences that are be more readily understood by reference to the following orthologous or paralogous to presently disclosed transcrip examples, which are included merely for purposes of illus tration of certain aspects and embodiments of the present tion factorS is through the use of molecular modeling invention and are not intended to limit the invention. It will Software. Molecular modeling is routinely used to predict be recognized by one of skill in the art that a transcription polypeptide structure, and a variety of protein structure factor that is associated with a particular first trait may also modeling programs, Such as “Insight II” (Accelrys, Inc.) are be associated with at least one other, unrelated and inherent commerically available for this purpose. Modeling can thus second trait which was not predicted by the first trait. be used to predict which residues of a polypeptide can be 0328. The complete descriptions of the traits associated changed without altering function (Crameri et al. (2003) with each polynucleotide of the invention are fully disclosed U.S. Pat. No. 6, 521, 453). Thus, polypeptides that are in Example VIII. The complete description of the transcrip Sequentially similar can be shown to have a high likelihood tion factor gene family and identified AP2 binding domains of Similar function by their Structural Similarity, which may, and B3 domains of the polypeptide encoded by the poly for example, be established by comparison of regions of nucleotide is fully disclosed in Table 1. SuperStructure. The relative tendencies of amino acids to form regions of SuperStructure (for example, helixes and Example I -sheets) are well established. For example, O’Neil et al. 0329 Full Length Gene Identification and Cloning ((1990) Science 250:646-651) have discussed in detail the helix forming tendencies of amino acids. Tables of relative 0330 Putative transcription factor sequences (genomic or Structure forming activity for amino acids can be used as ESTs) related to known transcription factors were identified in the Arabidopsis thaliana GenBank database using the Substitution tables to predict which residues can be func tblastin Sequence analysis program using default parameters tionally Substitued in a given region, for example, in DNA and a P-value cutoff threshold of -4 or -5 or lower, binding domains of known transcription factors and equiva depending on the length of the query Sequence. Putative logs. Homologs that are likely to be functionally similar can transcription factor Sequence hits were then Screened to then be identified. identify those containing particular Sequence Strings. If the Sequence hits contained Such Sequence Strings, the 0326. Of particular interest is the structure of a transcrip Sequences were confirmed as transcription factors. tion factor in the region of its conserved domains, Such as those identified in Table 1. Structural analyses may be 0331 Alternatively, Arabidopsis thaliana cDNA libraries performed by comparing the Structure of the known tran derived from different tissueS or treatments, or genomic libraries were screened to identify novel members of a Scription factor around its conserved domain with those of transcription family using a low Stringency hybridization orthologs and paralogs. Analysis of a number of polypep approach. Probes were Synthesized using gene Specific prim tides within a transcription factor group or lade, including ers in a standard PCR reaction (annealing temperature 60 the functionally or Sequentially Similar polypeptides pro C.) and labeled with 'P dCTP using the High Prime DNA Vided in the Sequence Listing, may also provide an under Labeling Kit (Boehringer Mannheim Corp. (now Roche Standing of Structural elements required to regulate tran Diagnostics Corp., Indianapolis, Ind.). Purified radiolabelled Scription within a given family. probes were added to filters immersed in Church hybridiza US 2004/0098764 A1 May 20, 2004 37 tion medium (0.5 M NaPO pH 7.0, 7% SDS, 1% w/v bovine with shaking until an absorbance over 1 cm at 600 nm (Agoo) serum albumin) and hybridized overnight at 60° C. with of 0.5-1.0 was reached. Cells were harvested by centrifuga shaking. Filters were washed two times for 45 to 60 minutes tion at 4,000xg for 15 min at 4 C. Cells were then with 1xSCC, 1% SDS at 60° C. resuspended in 250 ul chilled buffer (1 mM HEPES, pH 0332 To identify additional sequence 5' or 3' of a partial adjusted to 7.0 with KOH). Cells were centrifuged again as cDNA sequence in a cDNA library, 5' and 3' rapid amplifi described above and resuspended in 125 ul chilled buffer. cation of cDNA ends (RACE) was performed using the Cells were then centrifuged and resuspended two more times MARATHON cDNA amplification kit (Clontech, Palo Alto, in the same HEPES buffer as described above at a volume Calfi.). Generally, the method entailed first isolating poly(A) of 100 ul and 750 ul, respectively. Resuspended cells were mRNA, performing first and second strand cDNA synthesis then distributed into 40 ul aliquots, quickly frozen in liquid to generate double stranded cDNA, blunting cDNA ends, nitrogen, and stored at -80 C. followed by ligation of the MARATHON Adaptor to the 0338 Agrobacterium cells were transformed with plas cDNA to form a library of adaptor-ligated ds cDNA. mids prepared as described above following the protocol described by Nagel et al. (Supra). For each DNA construct to 0333 Gene-specific primers were designed to be used be transformed, 50-100 ng DNA (generally resuspended in along with adaptor specific primers for both 5' and 3' RACE 10 mM Tris-HCI, 1 mM EDTA, pH 8.0) was mixed with 40 reactions. Nested primers, rather than Single primers, were All of Agrobacterium cells. The DNA/cell mixture was then used to increase PCR specificity. Using 5' and 3' RACE transferred to a chilled cuvette with a 2mm electrode gap and reactions, 5' and 3' RACE fragments were obtained, subject to a 2.5 kV charge dissipated at 25 uF. and 200 uF. Sequenced and cloned. The proceSS can be repeated until 5' using a Gene Pulser II apparatus (Bio-Rad, Hercules, Calif.). and 3' ends of the full-length gene were identified. Then the After electroporation, cells were immediately resuspended full-length cDNA was generated by PCR using primers in 1.0 ml LB and allowed to recover without antibiotic specific to 5' and 3' ends of the gene by end-to-end PCR. selection for 2-4 hours at 28°C. in a shaking incubator. After recovery, cells were plated onto Selective medium of LB Example II broth containing 100 ug/ml spectinomycin (Sigma) and 0334 Construction of Expression Vectors incubated for 24-48 hours at 28 C. Single colonies were then picked and inoculated in fresh medium. The presence of 0335 The sequence was amplified from a genomic or the plasmid construct was verified by PCR amplification and cDNA library using primerS Specific to Sequences upstream Sequence analysis. and downstream of the coding region. The expression vector was pMEN20 or pMEN65, which are both-derived from Example IV pMON316 (Sanders et al. (1987) Nucleic Acids Res. 15:1543-1558) and contain the CaMV 35S promoter to 0339 Transformation of Arabidopsis Plants with Agro express transgenes. To clone the Sequence into the vector, bacterium tumefaciens with Expression Vector both pMEN20 and the amplified DNA fragment were 0340. After transformation of Agrobacterium tumefa digested Separately with Sal and Not restriction enzymes at cienS with plasmid vectors containing the gene, Single 37 C. for 2 hours. The digestion products were subject to Agrobacterium colonies were identified, propagated, and electrophoresis in a 0.8% agarose gel and Visualized by used to transform Arabidopsis plants. Briefly, 500 ml cul ethidium bromide Staining. The DNA fragments containing tures of LB medium containing 50 mg/l kanamycin were the Sequence and the linearized plasmid were excised and inoculated with the colonies and grown at 28 C. with purified by using a QIAQUICK gel extraction kit (Qiagen, Shaking for 2 days until an optical absorbance at 600 nm Valencia Calif.). The fragments of interest were ligated at a wavelength over 1 cm (Agoo) of >2.0 is reached. Cells were ratio of 3:1 (vector to insert). Ligation reactions using T4 then harvested by centrifugation at 4,000xg for 10 min, and DNA ligase (New England Biolabs, Beverly Mass.) were resuspended in infiltration medium (%xMurashige and carried out at 16 C. for 16 hours. The ligated DNAS were Skoog salts (Sigma), 1.xGamborg's B-5 Vitamins (Sigma), transformed into competent cells of the E. coli Strain 5.0% (w/v) sucrose (Sigma), 0.044 uM benzylamino purine DH5alpha by using the heat shock method. The transfor (Sigma), 200 ul/l Silwet L-77 (Lehle Seeds) until an Asoo of mations were plated on LB plates containing 50 mg/l 0.8 was reached. kanamycin (Sigma Chemical Co. St. Louis Mo.). Individual colonies were grown overnight in five milliliters of LB broth 0341 Prior to transformation, Arabidopsis thaliana seeds containing 50 mg/l kanamycin at 37 C. Plasmid DNA was (ecotype Columbia) were Sown at a density of ~10 plants per purified by using Qiaquick Mini Prep kits (Qiagen). 4" pot onto Pro-Mix BX potting medium (Hummert Inter national) covered with fiberglass mesh (18 mmx16 mm). Example III Plants were grown under continuous illumination (50-75 uE/m°/sec) at 22-23° C. with 65-70% relative humidity. 0336 Transformation of Agrobacterium with the Expres After about 4 weeks, primary inflorescence stems (bolts) are Sion Vector cut off to encourage growth of multiple Secondary bolts. 0337 After the plasmid vector containing the gene was After flowering of the mature Secondary bolts, plants were constructed, the vector was used to transform Agrobacte prepared for transformation by removal of all Siliques and rium tumefacienS cells expressing the gene products. The opened flowers. Stock of Agrobacterium tumefaciens cells for transformation 0342. The pots were then immersed upside down in the were made as described by Nagel et al. (1990) FEMS mixture of Agrobacterium infiltration medium as described Microbiol Letts. 67:325-328. Agrobacterium strain ABI was above for 30 Sec, and placed on their sides to allow draining grown in 250 ml LB medium (Sigma) overnight at 28° C. into a 1'x2' flat surface covered with plastic wrap. After 24 US 2004/0098764 A1 May 20, 2004 38 h, the plastic wrap was removed and pots are turned upright. Example VII The immersion procedure was repeated one week later, for a total of two immersions per pot. Seeds were then collected 0348 Identification of Modified Phenotypes in Overex from each transformation pot and analyzed following the pression or Gene Knockout Plants. protocol described below. 0349 Experiments were performed to identify those transformants or knockouts that exhibited modified bio Example V chemical characteristics. 0343) Identification of Arabidopsis Primary Transfor 0350 Calibration of NIRS response was performed using mantS data obtained by wet chemical analysis of a population of 0344 Seeds collected from the transformation pots were Arabidopsis ecotypes that were expected to represent diver Sterilized essentially as follows. Seeds were dispersed into in sity of oil and protein levels. a solution containing 0.1% (v/v) Triton X-100 (Sigma) and 0351 Experiments were performed to identify those Sterile water and washed by Shaking the Suspension for 20 transformants or knockouts that exhibited modified Sugar min. The wash Solution was then drained and replaced with Sensing. For Such studies, Seeds from transformants were fresh wash Solution to wash the seeds for 20 min with germinated on media containing 5% glucose or 9.4% Shaking. After removal of the ethanol/detergent Solution, a Sucrose which normally partially restrict hypocotyl elonga solution containing 0.1% (v/v) Triton X-100 and 30% (v/v) bleach (CLOROX; Clorox Corp. Oakland Calif.) was added tion. Plants with altered Sugar Sensing may have either to the Seeds, and the Suspension was Shaken for 10 min. longer or Shorter hypocotyls than normal plants when grown After removal of the bleach/detergent Solution, Seeds were on this media. Additionally, other plant traits may be varied then washed five times in sterile distilled water. The seeds Such as root mass. were stored in the last wash water at 4 C. for 2 days in the 0352. In Some instances, expression patterns of the stress dark before being plated onto antibiotic Selection medium induced genes may be monitored by microarray experi (1xMurashige and Skoog salts (pH adjusted to 5.7 with 1M ments. In these experiments, cDNAs are generated by PCR KOH), 1.xGamborg's B-5 vitamins, 0.9% phytagar (Life Technologies), and 50 mg/l kanamycin). Seeds were germi and resuspended at a final concentration of -100 ng/ul in nated under continuous illumination (50-75 uE/m°/sec) at 3xSSC or 150 mM Na-phosphate (Eisen and Brown (1999) 22-23 C. After 7-10 days of growth under these conditions, Methods Enzymol. 303:179-205). The cDNAs are spotted on kanamycin resistant primary transformants (T1 generation) microScope glass slides coated with polylysine. The pre were visible and obtained. These seedlings were transferred pared cDNAS are aliquoted into 384 well plates and spotted first to fresh Selection plates where the Seedlings continued on the slides using, for example, an x-y-Z gantry (OmniGrid) to grow for 3-5 more days, and then to soil (Pro-Mix BX which may be purchased from GeneMachines (Menlo Park, potting medium). Calif.) outfitted with quill type pins which may be purchased from Telechem International (Sunnyvale, Calif.). After spot 0345 Primary transformants were crossed and progeny ting, the arrays are cured for a minimum of one week at Seeds (T) collected; kanamycin resistant Seedlings were room temperature, rehydrated and blocked following the Selected and analyzed. The expression levels of the recom binant polynucleotides in the transformants varies from protocol recommended by Eisen and Brown (1999; supra). about a 5% expression level increase to a least a 100% 0353 Sample total RNA (10 ug) samples are labeled expression level increase. Similar observations are made using fluorescent Cy3 and Cy5 dyes. Labeled Samples are with respect to polypeptide level expression. resuspended in 4xSSC/0.03% SDS/4 ug salmon sperm DNA/2ug tRNA/50 mM Na-pyrophosphate, heated for 95° Example VI C. for 2.5 minutes, spun down and placed on the array. The array is then covered with a glass coverslip and placed in a 0346) Identification of Arabidopsis Plants with Transcrip sealed chamber. The chamber is then kept in a water bath at tion Factor Gene Knockouts 62 C. overnight. The arrays are washed as described in 0347 The screening of insertion mutagenized Arabidop Eisen and Brown (1999, Supra) and scanned on a General sis collections for null mutants in a known target gene was Scanning 3000 laser scanner. The resulting files are subse essentially as described in Krysan et al. (1999) Plant Cell quently quantified using IMAGENE, software (BioDiscov 11:2283-2290. Briefly, gene-specific primers, nested by ery, Los Angeles Calif.). 5-250 base pairs to each other, were designed from the 5' and 0354) RT-PCR experiments may be performed to identify 3' regions of a known target gene. Similarly, nested Sets of those genes induced after exposure to osmotic StreSS. Gen primers were also created specific to each of the T-DNA or erally, the gene expression patterns from ground plant leaf transpose ends (the “right” and “left” borders). All possible tissue is examined. Reverse transcriptase PCR was con combinations of gene Specific and T-DNA/transpose primers ducted using gene Specific primers within the coding region were used to detect by PCR an insertion event within or for each Sequence identified. The primers were designed close to the target gene. The amplified DNA fragments were near the 3' region of each DNA binding Sequence initially then Sequenced which allows the precise determination of the T-DNA/transpose insertion point relative to the target identified. gene. Insertion events within the coding or intervening 0355 Total RNA from these ground leaf tissues was Sequence of the genes were deconvoluted from a pool isolated using the CTAB extraction protocol. Once extracted comprising a plurality of insertion events to a Single unique total RNA was normalized in concentration acroSS all the mutant plant for functional characterization. The method is tissue types to ensure that the PCR reaction for each tissue described in more detail in Yu and Adam, U.S. application received the same amount of cDNA template using the 28S Ser. No. 09/177,733 filed Oct. 23, 1998. band as reference. Poly(A+) RNA was purified using a US 2004/0098764 A1 May 20, 2004 39 modified protocol from the Qiagen OLIGOTEX purification Example VIII kit batch protocol. cDNA was Synthesized using Standard protocols. After the first strand cDNA synthesis, primers for 0373) Genes that Confer Significant Improvements to Actin 2 were used to normalize the concentration of cDNA Plants acroSS the tissue types. Actin 2 is found to be constitutively expressed in fairly equal levels acroSS the tissue types we are 0374 Examples of genes and homologs that confer sig investigating. nificant improvements to knockout or overexpressing plants are noted below. Experimental observations made by us with 0356. For RT PCR, cDNA template was mixed with regard to Specific genes whose expression has been modified corresponding primers and Taq DNA polymerase. Each in overexpressing or knock-out plants, and potential appli reaction consisted of 0.2 til cDNA template, 2 til 10xTricine cations based on these observations, are also presented. buffer, 2 til 10xTricine buffer and 16.8 til water, 0.05 ul Primer 1, 0.05ul, Primer 2, 0.3 ul Taq DNA polymerase and 0375. This example provides experimental evidence for 8.6 ul water. increased biomass and abiotic StreSS tolerance controlled by the transcription factor polypeptides and polypeptides of the 0357 The 96 well plate is covered with microfilm and set in the thermocycler to Start the reaction cycle. By way of invention. illustration, the reaction cycle may comprise the following 0376 Salt stress assays are intended to find genes that Steps: confer better germination, Seedling vigor or growth in high Salt. Evaporation from the Soil Surface causes upward water 0358 Step 1: 93° C. for 3 min; movement and Salt accumulation in the upper Soil layer 0359 Step 2: 93° C. for 30 sec; where the Seeds are placed. Thus, germination normally takes place at a Salt concentration much higher than the 0360 Step 3: 65° C. for 1 min; mean salt concentration of in the whole soil profile. Plants 0361 Step 4: 72° C. for 2 min; differ in their tolerance to NaCl depending on their stage of development, therefore Seed germination, Seedling vigor, 0362 Steps 2, 3 and 4 are repeated for 28 cycles; and plant growth responses are evaluated. 0363 Step 5: 72° C. for 5 min; and 0377 Osmotic stress assays (including NaCl and manni 0364) Step 64° C. tol assays) are intended to determine if an osmotic stress 0365. To amplify more products, for example, to identify phenotype is NaCl-Specific or if it is a general osmotic StreSS genes that have very low expression, additional Steps may be related phenotype. Plants tolerant to osmotic stress could performed: The following method illustrates a method that also have more tolerance to drought and/or freezing. may be used in this regard. The PCR plate is placed back in 0378 Drought assays are intended to find genes that the thermocycler for 8 more cycles of Steps 2-4. mediate better plant Survival after short-term, Severe water 0366 Step 2 93° C. for 30 sec; deprivation. Ion leakage will be measured if needed. Osmotic StreSS tolerance would also Support a drought 0367 Step 3 65° C. for 1 min; tolerant phenotype. 0368 Step 472 C. for 2 min, repeated for 8 cycles; and 0379 Temperature stress assays are intended to find 0369 Step 5 4° C. genes that confer better germination, Seedling vigor or plant 0370. Eight microliters of PCR product and 1.5 ul of growth under temperature stress (cold, freezing and heat). loading dye are loaded on a 1.2% agarose gel for analysis 0380 Sugar sensing assays are intended to find genes after 28 cycles and 36 cycles. Expression levels of Specific involved in Sugar Sensing by germinating Seeds on high transcripts are considered low if they were only detectable concentrations of Sucrose and glucose and looking for after 36 cycles of PCR. Expression levels are considered degrees of hypocotyl elongation. The germination assay on medium or high depending on the levels of transcript mannitol controls for responses related to osmotic StreSS. compared with observed transcript levels for an internal Sugars are key regulatory molecules that affect diverse control Such as actin2. Transcript levels are determined in processes in higher plants including germination, growth, repeat experiments and compared to transcript levels in flowering, Senescence, Sugar metabolism and photosynthe control (e.g., non-transformed) plants. Sis. Sucrose is the major transport form of photosynthate and 0371 Modified phenotypes observed for particular over its flux through cells has been shown to affect gene expres expressor or knockout plants are provided. For a particular Sion and alter Storage compound accumulation in Seeds overexpressor that shows a leSS beneficial characteristic, it (Source-Sink relationships). Glucose-specific hexose-Sens may be more useful to Select a plant with a decreased ing has also been described in plants and is implicated in cell expression of the particular transcription factor. For a par division and repression of “famine” genes (photosynthetic or ticular knockout that shows a leSS beneficial characteristic, glyoxylate cycles). it may be more useful to Select a plant with an increased expression of the particular transcription factor. 0381 Germination assays followed modifications of the Same basic protocol. Sterile Seeds were Sown on the condi 0372 The sequences of the Sequence Listing, can be tional media listed below. Plates were incubated at 22 C. used to prepare transgenic plants and plants with altered under 24-hour light (120-130 uEin/m/s) in a growth cham oSmotic StreSS tolerance. The Specific transgenic plants listed ber. Evaluation of germination and Seedling vigor was below are produced from the Sequences of the Sequence conducted 3 to 15 days after planting. The basal media was Listing, as noted. 80% Murashige-Skoog medium (MS)+vitamins. US 2004/0098764 A1 May 20, 2004 40

0382 For salt and osmotic stress germination experi morphologically wild-type. Increased Seedling vigor, mani ments, the medium was supplemented with 150 mM NaCl or fested by increased expansion of the cotyledons, was 300 mM mannitol. Growth regulator sensitivity assays were observed in germination assays on both high salt (150 mM performed in MS media, vitamins, and either 0.3 uM ABA, salt; FIG.5) and media containing high Sucrose (9.4%; FIG. 9.4% sucrose, or 5% glucose. 6), as compared to wild-type controls. Subsequently, G867 0383 Temperature stress cold germination experiments overexpressing Arabidopsis plants were shown to be more were carried out at 8 C. Heat stress germination experi tolerant of drought in a Soil-based assay, as compared to ments were conducted at 32 C. to 37 C. for 6 hours of wild-type plants. eXposure. 0394 G867 overexpressing plants exposed to chilling 0384. For stress experiments conducted with more conditions (6 hat 4-8 C.) were more vigorous than control mature plants, Seeds were germinated and grown for Seven plants exposed to the same chilling conditions. days on MS+vitamins+1% sucrose at 22 C. and then 0395. Several G867 overexpressing lines were found to transferred to chilling and heat StreSS conditions. The plants be more sensitive to 0.3 uM ABA. were either exposed to chilling stress (6 hour exposure to 4-8 C.), or heat stress (32° C. was applied for five days, 0396) Utilities after which the plants were transferred back 22 C. for 0397) Most ABA effects are related to the compound recovery and evaluated after 5 days relative to controls not acting as a Signal of decreased water availability, whereby it exposed to the depressed or elevated temperature). triggers a reduction in water loSS, Slows growth, and medi ates adaptive responses, and thus increased ABA Sensitivity 0385) Results: is a likely indicator of an enhanced StreSS response. These 0386. As noted below, G867, G9, G993, and G1930 observation, and those in Salt and Sucrose tolerance assays, overexpression has been shown to increase osmotic StreSS indicate that G867 or its equivalogs can be used to increase tolerance. or facilitate Seed germination and Seedling or plant growth under adverse conditions Such osmotic stresses, including 0387 G867 (Polynucleotide SEQ ID NO:1) drought and Salt StreSS. 0388 Published Information 0398. The enhanced performance of 35S::G867 seedlings 0389. There are six RAV-like proteins in Arabidopsis. under chilling conditions indicates that the gene or its One of them, G867, has been described in the literature as equivalogs might be applied to engineer crops that Show related to ABBI3/VPI (RAV1; Kagaya et al. (1999) Nucleic better growth under cold conditions, which may extend a Acids Res. 27:470478) based on the presence of a B3 domain crops planting Season or range, or improve yield or perfor (which is also found in the ABI3/VP1 family of transcription CC. factors). G867/RAV1 belongs to a small subgroup within the AP2/EREBP family of transcription factors, whose distin 0399 G9 (Polynucleotide SEQ ID NO:3) guishing characteristic is that its members contain a Second 0400 Published Information DNA-binding domain, in addition to the conserved AP2 04.01 G9 was first identified in a partial cINA clone, and domain, that is related to the B3 domain of VP1/ABI3 the corresponding gene named RAP2.8 (Okamuro et al., (Kagaya et al., 1999) Supra). Analyses using various deletion 1997). It has also been named RAV2 (Kagaya et al. (1999) derivatives of the RAV1 fusion protein showed that the two Nucleic Acids Res. 27:470-478). G91RAV21RAP2.8 DNA-binding domains of G867, the AP2 and B3 domains, belongs to a small subgroup within the AP2/EREBP family Separately recognize each of two motifs that constitute a of transcription factors, whose distinguishing characteristic bipartite binding Sequence, CAACA and CACCTG, respec is that its members contain a Second DNA-binding domain, tively, and together cooperatively enhance the DNA-binding in addition to the conserved AP2 domain, that is related to affinity and specificity of the transcription factor (Kagaya et the B3 domain of VPI/ABI3 (Kagaya et al., 1999) supra). al., 1999) supra). No functional data are available for It has been shown that the two DNA-binding domains of G867/RAV1. RAV1 (another member of this subgroup of proteins) can 0390 Experimental Observations Separately recognize each of two motifs that constitute a 0391 G867 was initially identified as a public Arabidop bipartite binding Sequence and together cooperatively sis EST. G867 appears to be constitutively expressed at enhance its DNA-binding affinity and specificity (Kagaya et medium levels. al., 1999) supra). No functional data are available for G9/RAV2/RAP2.8 or RAV1. 0392 G867 was first characterized using a line that contained a T-DNA insertion in the gene. The insertion in 04.02 Experimental Observations that line resides immediately downstream of the conserved 0403. The complete sequence of G9 was determined. G9 AP2 domain, and would therefore be expected to result in a appeared to be constitutively expressed. However, overex severe or null mutation. G867 knockout mutant plants do not pression of G9 caused phenotypic changes in the roots: more show significant changes in overall plant morphology, nei root growth on MS plates (FIG. 7), and hairy roots on media ther has a significant difference between these plants and containing 10 uM methyl jasmonate (Me.J; FIG. 8). control plants been detected in any of the assays that have 04.04 Increased seedling vigor, manifested by increased been performed So far. expansion of the cotyledons of G9 overexpressing plants, 0393. The function of G867 was also analyzed using was observed in germination assays on both high Salt (150 transgenic plants in which this gene was expressed under the mM NaCl) and high Sucrose-containing media (9.4% control of the 35S promoter. G867 overexpressing lines are Sucrose), as compared to wild-type controls. US 2004/0098764 A1 May 20, 2004

04.05 35S::G9 transgenic plants were more tolerant to 0419 G993 is ubiquitously expressed and does not chilling (4 -8° C. for 6 h) compared to the wild-type appear to be significantly induced by any of the conditions controls in Seedling growth assayS. tested. 04.06) Several G9 overexpressing lines were found to be 0420 Increased seedling vigor, manifested by increased more sensitive to 0.3 uM ABA. expansion of the cotyledons of G993 overexpressing plants, was observed in germination assays on both high Salt (150 0407 Utilities mM) and high Sucrose (9.4%) containing-media, as com 0408 G9 or its equivalogs could potentially be used to pared to wild-type controls. increase root growth/vigor, which might in turn allow better 0421. In addition, several 35S::G993 transgenic lines plant growth under adverse conditions (for example, limited were more tolerant to cold germination (8 C.) and numer water or nutrient availability). ous lines were more tolerant to chilling (4-8 C. for 6 h) 04.09 Most ABA effects are related to the compound compared to the wild-type controls, in both germination as acting as a signal of decreased water availability, whereby it well as Seedling growth assays, respectively. triggers a reduction in water loss, Slows growth, and medi 0422) Utilities ates adaptive responses, and thus increased ABA Sensitivity 0423. The salt and sucrose tolerance assays indicate that is a likely indicator of an enhanced StreSS response. These G993 or its equivalogs could potentially be used to increase observations, coupled with the root growth results and the or facilitate Seed germination and Seedling or plant growth Salt and Sucrose tolerance assays, indicate that G9 or its under adverse conditions Such osmotic stresses, including equivalogs could potentially be used to increase or facilitate drought and Salt StreSS. Seed germination and Seedling or plant growth under adverse conditions Such osmotic Stresses, including drought and Salt 0424 The enhanced performance of 35S::G993 seedlings under cold germination and chilling conditions indicates that StreSS. the gene or its equivalogs might be applied to engineer crops 0410 The enhanced performance of 35S::G9 seedlings that show better germination and growth under cold condi under chilling conditions indicates that the gene or its tions, which may extend a crops planting Season or range, or equivalogs might be applied to engineer crops that Show improve yield or performance. better growth under cold conditions, which may extend a crops planting Season or range, or improve yield or perfor 0425 G1930 (Polynucleotide SEQ ID NO:7) CC. 0426 Published Information 0411 G993 (Polynucleotide SEQ ID NO:5) 0427 G1930 was identified in the sequence of PI clone K13N2 (gene K13N2.7, GenBank protein accession number 0412 Published Information BAA95760). No information is available about the func 0413 G993 corresponds to gene F2J7.3 (AAG12735). tion(s) of G1930. No informination is available about the function(s) of G993. 0428 Closely Related Genes from Other Sipecies 0414. Closely Related Genes from Other Species 0429 G1930 shows sequence similarity, outside of the 0415 G993 shows some sequence similarity, outside of conserved AP2 and ABI3 domains, to a predicted rice the conserved AP2/EREBP and B3 domains, to other RAV protein (GenBank accession number BAB21218). proteins from different Species, Such as a putative DNA 0430 Experimental Observations binding protein RAV2 from Oryza sativa (GenBank acces sion number gil2328560). 0431 G1930 is ubiquitously expressed and does not appear to be induced by any of the conditions tested. 0416 Experimental Observations 0432. The function of G1930 was studied using trans 0417. The function of G993 was studied using transgenic genic plants in which this gene was expressed under the plants in which the gene was expressed under the control of control of the 35S promoter. the 35S promoter. 0433 35S:G1930 TI plants were generally small and 0418 Overexpression of G993 produced highly pleiotro developed spindly inflorescences. The fertility of these pic effects on plant development and influenced growth rate, plants was low and flowers often failed to open or pollinate. overall plant size, branching pattern and fertility. 35S::G993 0434 G1930 overexpressors were more tolerant to Seedlings were Small, developed slowly, and produced inflo oSmotic StreSS conditions. The plants responded to high rescences markedly later than wild-type controls. They also NaCl (150 mM) and high Sucrose (9.4%) on plates with showed a reduction in apical dominance and disorganized more Seedling vigor compared to wild-type control plants. In rosettes, as multiple axillary Shoots developed Simulta addition, an increase in the amount of chlorophylls a and b neously. Inflorescence Stems were generally shorter than in seeds of two T2 lines was detected. wild type, and produced an increased number of cauline leaf nodes leading to a leafy, bushy, appearance. In addition, the 0435. In addition, 35S::several G1930 transgenic lines seed yield of 35S::G993 plants was generally very poor, and were more tolerant to cold germination conditions (8° C. for 6 h) and numerous G1930 transgenic lines were more Senescence occurred later than in wild-type controls. The tolerant to chilling (4 -8° C. for 6 h) compared to the transformation rate attained with the G993 construct was wild-type controls, in both germination as well as Seedling relatively low, suggesting that high levels of G993 activity growth assays, respectively. might produce lethal effects. No alterations were detected in 35S::G993 plants in the biochemical analyses that were 0436 Several G1930 overexpressing lines were found to performed. be more sensitive to 0.3 uM ABA. US 2004/0098764 A1 May 20, 2004 42

0437 Utilities Example IX 0438 Most ABA effects are related to the compound 0441) Identification of Homologous Sequences acting as a signal of decreased water availability, whereby it triggers a reduction in water loss, Slows growth, and medi 0442. This example describes identification of genes that ates adaptive responses, and thus increased ABA Sensitivity are orthologous to Arabidopsis thaliana transcription factors is a likely indicator of an enhanced StreSS response. These from a computer homology Search. observations, coupled with the root growth results and the Salt and Sucrose tolerance assays, indicate that G1930 or its 0443 Homologous sequences, including those of paral equivalogs could potentially be used to increase or facilitate ogs and orthologs from Arabidopsis and other plant species, Seed germination and Seedling or plant growth under adverse were identified using database Sequence Search tools, Such conditions Such osmotic Stresses, including drought and Salt as the Basic Local Alignment Search Tool (BLAST) (Alts chulet al. (1990).J. Mol. Biol. 215:403-410; and Altschulet StreSS. al. (1997) Nucleic Acid Res. 25:3389-3402). The tblastx 0439. The enhanced performance of 35S::G1930 seed Sequence analysis programs were employed using the BLO lings under cold germination and chilling conditions indi SUM-62 scoring matrix (Henikoff and Henikoff(1992) Proc. cates that the gene or its equivalogs might be applied to Natl. Acad. Sci. 89:10915-10919). engineer crops that show better germination and/or growth under cold conditions, which may extend a crop's planting 0444 The entire NCBI GenBank database was filtered Season or range, or improve yield or performance. for Sequences from all plants except Arabidopsis thaliana by selecting all entries in the NCBI GenBank database associ 0440 Table 7 provides a summary of the data collected ated with NCBI taxonomic ID 33090 (Viridiplantae; all from one Series of experiments conducted with plants over plants) and excluding entries associated with taxonomic ID expressing G867 or a paralog of G867. In each case the 3701 (Arabidopsis thaliana). promoter used for regulating the introduced transcription factor was the cauliflower mosaic virus 35S transcription 0445. These sequences are compared to Sequences rep initiation region. The column headings include the transcrip resenting genes of SEQ ID NO:1, 3, 5, 7, 9, 11, 13, 15, 17, tion factors used to transform the Arabidopsis plants listed 19, 21, 23, 25, 27, 29, 31, 33,35, 37, 39, 41, 43, 45, 47,49, by Gene ID (GIID) numbers, the corresponding polypeptide 51, using the Washington University TBLASTX algorithm SEQ ID NO; the transformation system used in these assays, (version 2.0a19MP) at the default settings using gapped and the ratio of lines determine to have one of the enhanced alignments with the filter “off”. For each gene of SEQ ID abiotic stress phenotypes listed over the number of lines NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, tested. In Table 9, “Direct promoter fusion” refers to a 33,35, 37, 39, 41, 43, 45, 47,49, 51, individual comparisons transgenic plant generated by transforming wild type Ara were ordered by probability score (P-value), where the score bidopsis with a DNA construct in which the CaMV35S reflects the probability that a particular alignment occurred promoter is directly linked to the transcription factor gene by chance. For example, a score of 3.6E-40 is 3.6x10-40. In and used drive expression of the latter. “SupTfn” refers to a addition to P-values, comparisons were also Scored by transgenic plant generated by transforming an Arabidopsis percentage identity. Percentage identity reflects the degree to line, containing a transactivator construct, incorporating a which two segments of DNA or protein are identical over a LeXA DNA binding domain, programmed for expression particular length. Examples of Sequences So identified are using the promoter indicated in the next column, with a presented in Table 6. The percent Sequence identity among DNA construct in which a LeXA operator promoter region is these Sequences can be as low as 47%, or even lower fused to the transcription factor gene. Sequence identity.

TABLE 7 Summary of results of physiological assays. Overexpressor lines showing phenotype/No. lines tested Improved Improved Improved Improved Improved SEO One or two Component Germination Germination in ABA Germination Germination Chilling GID ID NO: Transformation System Promoter in High NaCl High Sucrose sensitivity in Cold in Heat Tolerance G867 2 Direct 35S 5/10 4f10 5/10 Of 10 Of 10 6/10 promoter-fusion G867 2 2-components- CUT1 1/10 1/10 Of 10 2/10 1/10 Of 10 supTfn G9 4 Direct 35S 10/10 6/10 3/10 Of 10 Of 10 6/10 promoter-fusion G993 6 Direct 35S 6/10 5/10 Of 10 3/10 Of 10 6/10 promoter-fusion G193O 8 Direct 35S 6/10 5/10 Of 10 Of 10 Of 10 7/10 promoter-fusion G193O 8 2-components- 35S 10/10 8/10 5/10 Of 10 Of 10 6/10 supTfn US 2004/0098764 A1 May 20, 2004

0446 Candidate paralogous sequences were identified 0450. Upon screening about 2x10 yeast transformants, among Arabidopsis transcription factors through alignment, positive cDNA clones are isolated; i.e., clones that cause identity, and phylogenic relationships. Paralogs of G867 yeast Strains carrying lacZ reporters operably linked to determined in this manner include G9, G993 and G1930. wild-type transcription factor binding promoter elements to Candidate orthologous Sequences were identified from pro form blue colonies on X-gal-treated filters. The cDNA prietary unigene Sets of plant gene Sequences in Zea mayS, clones do not cause a yeast Strain carrying a mutant type Glycine max and Oryza Sativa based on Significant homol transcription factor binding promoter elements fused to ogy to Arabidopsis transcription factors. These candidates Lac7, to turn blue. Thus, a polynucleotide encoding tran were reciprocally compared to the Set of Arabidopsis tran Scription factor DNA binding domain, a conserved domain, Scription factors. If the candidate showed maximal Similarity is shown to activate transcription of a gene. in the protein domain to the eliciting transcription factor or to a paralog of the eliciting transcription factor, then it was Example XI considered to be an Ortholog. Identified non-Arabidopsis 0451) Gel Shift Assays. Sequences that were shown in this manner to be orthologous to the Arabidopsis Sequences are provided in Table 6. 0452. The presence of a transcription factor comprising a DNA binding domain which binds to a DNA transcription Example X factor binding element is evaluated using the following gel shift assay. The transcription factor is recombinantly 0447 Screen of Plant cDNA Library for Sequence expressed and isolated from E. coli or isolated from plant Encoding a Transcription Factor DNA Binding Domain that material. Total Soluble protein, including transcription fac Binds to a Transcription Factor Binding Promoter Element tor, (40 ng) is incubated at room temperature in 10 ul of and Demonstration of Protein Transcription Regulation 1xbinding buffer (15 mM HEPES (pH 7.9), 1 mM EDTA, 30 Activity. mM KCl, 5% glycerol, 5% bovine serum albumin, 1 mM DTT) plus 50 ng poly(dl-dc):poly(dl-dc) (Pharmacia, Pis 0448 The “one-hybrid” strategy (Li and Herskowitz cataway N.J.) with or without 100 ng competitor DNA. After (1993) Science 262:1870-1874) is used to screen for plant 10 minutes incubation, probe DNA comprising a DNA cDNA clones encoding a polypeptide comprising a tran transcription factor binding element (1 ng) that has been Scription factor DNA binding domain, a conserved domain. P-labeled by end-filling (Sambrook et al. (1989) supra) is In brief, yeast Strains are constructed that contain a lac7. added and the mixture incubated for an additional 10 min reporter gene with either wild-type or mutant transcription utes. Samples are loaded onto polyacrylamide gels (4% w/v) factor binding promoter element Sequences in place of the and fractionated by electrophoresis at 150V for 2 h (Sam normal UAS (upstream activator Sequence) of the GAL1. brook et al. Supra). The degree of transcription factor-probe promoter. Yeast reporter Strains are constructed that carry DNA binding is visualized using autoradiography. Probes transcription factor binding promoter element Sequences as and competitor DNAS are prepared from oligonucleotide inserts ligated into the BamHI site of puC118 (Vieira et al. UAS elements are operably linked upstream (5') of a lacZ (1987) Methods Enzymol. 153:3-11). Orientation and con reporter gene with a minimal GAL1 promoter. The Strains catenation number of the inserts are determined by dideoxy are transformed with a plant expression library that contains DNA sequence analysis (Sambrook et al. Supra). Inserts are random cDNA inserts fused to the GAL4 activation domain recovered after restriction digestion with EcoRI and HindIII (GALA-ACT) and screened for blue colony formation on and fractionation on polyacrylamide gels (12% w/v) (Sam X-gal-treated filters (X-gal:5-bromo-4-chloro-3-indolyl-B- brook et al. Supra). D-galactoside; Invitrogen Corporation, Carlsbad Calif.). Alternatively, the strains are transformed with a cDNA Example XII polynucleotide encoding a known transcription factor DNA 0453 Introduction of Polynucleotides into Dicotyledon binding domain polypeptide Sequence. ous Plants 0449 Yeast strains carrying these reporter constructs 0454) SEQ ID NO:1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, produce low levels of beta-galactosidase and form white 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, or colonies on filters containing X-gal. The reporter Strains polynucleotide sequences encoding SEQ ID NO:2, 4, 6, 8, carrying wild-type transcription factor binding promoter 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,38, 40, element Sequences are transformed with a polynucleotide 42, 44, 46, 48, 50, 52, 53, paralogous, and orthologous that encodes a polypeptide comprising a plant transcription sequences recombined into pMEN20 or pMEN65 expres factor DNA binding domain operably linked to the acidic Sion vectors are transformed into a plant for the purpose of activator domain of the yeast GAL4 transcription factor, modifying plant traits. The cloning vector may be introduced into a variety of cereal plants by means well known in the “GAL4-ACT. The clones that contain a polynucleotide art Such as, for example, direct DNA transfer or Agrobac encoding a transcription factor DNA binding domain oper terium tumefaciens-mediated transformation. It is now rou ably linked to GLA4-ACT can bind upstream of the lacZ tine to produce transgenic plants using most dicot plants (see reporter genes carrying the wild-type transcription factor Weissbach and Weissbach, (1989) supra; Gelvin et al. (1990) binding promoter element Sequence, activate transcription of supra; Herrera-Estrella et al. (1983) supra; Bevan (1984) the lac7 gene and result in yeast forming blue colonies on supra; and Klee (1985) Supra). Methods for analysis of traits X-gal-treated filters. are routine in the art and examples are disclosed above. US 2004/0098764 A1 May 20, 2004 44

Example XIII Example XIV 0455 Transformation of Cereal Plants with an Expres 0460 Transformation of Tomato and Soy Plants Sion Vector 0461) Numerous protocols for the transformation of 0456 Cereal plants such as, but not limited to, corn, tomato and Soy plants have been previously described, and wheat, rice, Sorghum, or barley, may also be transformed are well known in the art. Gruber et al. (1993) in Methods with the present polynucleotide sequences in pMEN20 or in Plant Molecular Biology and Biotechnology, p. 89-119, pMEN65 expression vectors for the purpose of modifying Glick and Thompson, eds., CRC Press, Inc., Boca Raton) plant traits. For example, pMENO20 may be modified to describe Several expression vectors and culture methods that replace the NptII coding region with the BAR gene of may be used for cell or tissue transformation and Subsequent Streptomyces hygroscopicus that conferS resistance to phos regeneration. For Soybean transformation, methods are phinothricin. The KpnI and Bg|II sites of the Bar gene are described by Miki et al. (1993) in Methods in Plant Molecu removed by Site-directed mutagenesis with Silent codon lar Biology and Biotechnology, p. 67-88, Glick and Thomp changes. son, eds., CRC Press, Inc., Boca Raton; and U.S. Pat. No. 5,563,055, (Townsend and Thomas), issued Oct. 8, 1996. 0457. The cloning vector may be introduced into a vari 0462. There are a substantial number of alternatives to ety of cereal plants by means well known in the art Such as, Agrobacterium-mediated transformation protocols, other for example, direct DNA transfer or Agrobacterium tume methods for the purpose of transferring exogenous genes faciens-mediated transformation. It is now routine to pro into Soybeans or tomatoes. One Such method is microprojec duce transgenic plants of most cereal crops (Vasil (1994) tile-mediated transformation, in which DNA on the Surface Plant Mol. Biol. 25:925-937) such as corn, wheat, rice, of microprojectile particles is driven into plant tissues with sorghum (Cassas et al. (1993) Proc. Natl. Acad. Sci. a biolistic device (see, for example, Sanford et al., (1987) 90:11212-11216, and barley (Wan and Lemeaux (1994) Part. Sci. Technol. 5:27-37; Christou et al. (1992) Plant. J. Plant Physiol. 104:37-48. DNA transfer methods such as the 2:275-281; Sanford (1993) Methods Enzymol. 217:483-509; microprojectile can be used for corn (Fromm et al. (1990) Klein et al. (1987) Nature 327:70-73; U.S. Pat. No. 5,015, Bio/Technol. 8:833-839); Gordon-Kamm et al. (1990) Plant 580 (Christou et al), issued May 14, 1991; and U.S. Pat. No. Cell 2:603-618; Ishida (1990) Nature Biotechnol. 14:745 5,322,783 (Tomes et al.), issued Jun. 21, 1994. 750), wheat (Vasil et al. (1992) Bio/Technol. 10:667-674; Vasil et al. (1993) Bio/Technol. 11:1553-1558; Weeks et al. 0463 Alternatively, Sonication methods (see, for (1993) Plant Physiol. 102:1077-1084), rice (Christou (1991) example, Zhang et al. (1991) Bio/Technology 9:996-997); Bio/Technol. 9:957-962; Hiei et al. (1994) Plant J. 6:271 direct uptake of DNA into protoplasts using CaCl2 precipi 282; Aldemita and Hodges (1996) Planta 199:612-617; and tation, polyvinyl alcohol or poly-L-ornithine (see, for Hiei et al. (1997) Plant Mol. Biol. 35:205-218). For most example, Hain et al. (1985) Mol. Gen. Genet. 199:161-168; cereal plants, embryogenic cells derived from immature Draper et al., Plant Cell Physiol. 23:451–458 (1982)); lipo Scutellum tissues are the preferred cellular targets for trans Some or spheroplast fusion (see, for example, Deshayes et al. formation (Hiei et al. (1997) Plant Mol. Biol. 35:205-218; (1985) EMBO J., 4:2731-2737; Christou et al. (1987) Proc. Vasil (1994) Plant Mol. Biol. 25:925-937). Natl. Acad. Sci. U.S.A. 84:3962-3966); and electroporation of protoplasts and whole cells and tissues (see, for example, 0458 Vectors according to the present invention may be Donn et al.(1990) in Abstracts of VIIth International Con transformed into corn embryogenic cells derived from gress on Plant Cell and Tissue Culture IAPTC, A2-38:53; immature Scutellar tissue by using microprojectile bombard DHalluin et al. (1992) Plant Cell 4:1495-1505;and Spencer ment, with the A188XB73 genotype as the preferred geno et al. (1994) Plant Mol. Biol. 24:51-61) have been used to type (Fromm et al. (1990) Bio/Technol. 8:833-839; Gordon introduce foreign DNA and expression vectors into plants. Kamm et al. (1990) Plant Cell 2:603-618). After microprojectile bombardment the tissues are Selected on 0464) After plants or plant cells are transformed (and the phosphinothricin to identify the transgenic embryogenic latter regenerated into plants) the transgenic plant thus cells (Gordon-Kanmr et al. (1990) Plant Cell 2:603-618). generated may be crossed with itself (“Selfing”) or a plant Transgenic plants are regenerated by Standard corn regen from the same line, a non-transformed or wild-type plant, or another transformed plant from a different transgenic line of eration techniques (Fromm et al. (1990) Bio/Technol. 8:833 plants. Crossing provides the advantages of being able to 839; Gordon-Kamm et al. (1990) Plant Cell 2:603-618). produce new and perhaps Stable transgenic varieties. Genes 0459. The plasmids prepared as described above can also and the traits they confer that have been introduced into a be used to produce transgenic wheat and rice plants (Chris tomato or Soybean line may be moved into distinct line of tou (1991) Bio/Technol. 9:957-962; Hiei et al. (1994) Plant plants using traditional backcrossing techniques well known J. 6:271-282; Aldemita and Hodges (1996) Planta 199:612 in the art. Transformation of tomato plants may be con 617; and Hiei et al. (1997) Plant Mol. Biol. 35:205-218) that ducted using the protocols of Koornneef et al (1986) In coordinately express genes of interest by following Standard Tomato Biotechnology. Alan R. Liss, Inc., 169-178,and in transformation protocols known to those skilled in the art for U.S. Pat. No. 6,613,962, the latter method described in brief rice and wheat (Vasil et al. (1992) Bio/Technol. 10:667-674; here. Eight day old cotyledon explants are precultured for 24 Vasil et al. (1993) Bio/Technol. 11:1553-1558; and Weeks et hours in Petri dishes containing a feeder layer of Petunia al. (1993) Plant Physiol. 102:1077-1084), where the bar hybrida suspension cells plated on MS medium with 2% gene is used as the Selectable marker. (w/v) sucrose and 0.8% agar supplemented with 10 uM US 2004/0098764 A1 May 20, 2004

C.-naphthalene acetic acid and 4.4 uM 6-benzylaminopurine. Example XV The explants are then infected with a diluted overnight 0469 Genes that Confer Significant Improvements to culture of Agrobacterium tumefaciens containing an expres Non-Arabidopsis Species Sion vector comprising a polynucleotide of the invention for 5-10 minutes, blotted dry on sterile filter paper and cocul 0470 The function of specific orthologs of G867 may be tured for 48 hours on the original feeder layer plates. Culture analyzed through their ectopic overexpression in plants, conditions are as described above. Overnight cultures of using the CaMV 35S or other appropriate promoter, identi Agrobacterium tumefaciens are diluted in liquid MS fied above. These genes, which include polynucleotide sequences found in the Sequence Listing, Table 6 and FIG. medium with 2% (w/v/) sucrose, pH 5.7) to an ODoo of 0.8. 3, encode members of the AP2 transcription factors, Such as 0465 Following the cocultivation, the cotyledon explants those found in Oryza sativa (SEQ ID NO:20, 30, 32,34, 36, 52, and 53), Arabidopsis thaliana (SEQ ID NO 2, 4, 6, 8), are transferred to Petri dishes with selective medium con Glycine max (SEQ ID NO:18, 22, 24, 26, 28), Zea mays sisting of MS medium supplemented with 4.56 uM Zeatin, (SEQ ID NO:38, 40, 50), Triticum aestivum (SEQ ID 67.3 uM Vancomycin, 418.9 uM cefotaxime and 171.6 uM NO:48), Brassica oleracea (SEQ ID NO:42), and Helian kanamycin Sulfate, and cultured under the culture conditions thus annuus (SEQ ID NO:44 and 46). The polynucleotide described above. The explants are subcultured every three and polypeptide Sequences derived from monocots may be weeks onto fresh medium. Emerging Shoots are dissected used to transform both monocot and dicot plants, and those from the underlying callus and transferred to glass jars with derived from dicots may be used to transform either group, Selective medium without zeatin to form roots. The forma although Some of these Sequences will function best if the tion of roots in a medium containing kanamycin Sulfate is gene is transformed into a plant from the same group as that regarded as a positive indication of a Successful transfor from which the Sequence is derived. mation. 0471) Seeds of these transgenic plants are subjected to 0466 Transformation of soybean plants may be con germination assays to measure Sucrose Sensing. Sterile ducted using the methods found in, for example, U.S. Pat. monocot Seeds, including, but not limited to, corn, rice, No. 5,563,055 (Townsend et al., issued Oct. 8, 1996), wheat, rye and Sorghum, as well as dicots including, but not described in brief here. In this method soybean seed is limited to soybean and alfalfa, are sown on 80% MS Surface Sterilized by exposure to chlorine gas evolved in a medium plus vitamins with 9.4% Sucrose; control media glass bell jar. Seeds are germinated by plating on /10 lack Sucrose. All assay plates are then incubated at 22 C. Strength agar Solidified medium without plant growth regu under 24-hour light, 120-130 uEin/m°/s, in a growth cham lators and culturing at 28 C. with a 16 hour day length. ber. Evaluation of germination and Seedling vigor is then conducted three days after planting. Overexpressors of these After three or four days, Seed may be prepared for coculti genes may be found to be more tolerant to high Sucrose by Vation. The Seedcoat is removed and the elongating radicle having better germination, longer radicles, and more coty removed 3-4 mm below the cotyledons. ledon expansion. These results would indicate that overex 0467. Overnight cultures of Agrobacterium tumefaciens pressors of G867, G9, G993 and/or G1930 orthologs are harboring the expression vector comprising a polynucleotide involved in Sucrose-specific Sugar Sensing. of the invention are grown to log phase, pooled, and con 0472 Plants overexpressing these orthologs may also be centrated by centrifugation. Inoculations are conducted in Subjected to Soil-based drought assays to identify those lines batches Such that each plate of Seed was treated with a newly that are more tolerant to water deprivation than wild-type resuspended pellet of Agrobacterium. The pellets are resus control plants. Generally, 35S:G867, G9, G993 and/or pended in 20 ml inoculation medium. The inoculum is G1930 ortholog overexpressing plants will appear Signifi poured into a Petri dish containing prepared Seed and the cantly larger and greener, with less wilting or desiccation, cotyledonary nodes are macerated with a Surgical blade. than wild-type controls plants, particularly after a period of After 30 minutes the explants are transferred to plates of the water deprivation is followed by rewatering and a Subse Same medium which has been Solidified. Explants are quent incubation period. embedded with the adaxial side up and level with the surface of the medium and cultured at 22 C. for three days under Example XVI white fluorescent light. These plants may then be regener ated according to methods well established in the art, Such 0473 Identification of Orthologous and Paralogous as by moving the explants after three days to a liquid Sequences counter-selection medium (see U.S. Pat. No. 5,563,055). 0474 Orthologs to Arabidopsis genes may identified by Several methods, including hybridization, amplification, or 0468. The explants may then be picked, embedded and bioinformatically. This example describes how one may cultured in Solidified Selection medium. After one month on identify homologs to the Arabidopsis AP2 family transcrip Selective media transformed tissue becomes Visible as green tion factor CBFI (polynucleotide SEQ ID NO:54, encoded Sectors of regenerating tissue against a background of polypeptide SEQ ID NO:55), which confers tolerance to bleached, leSS healthy tissue. Explants with green Sectors are abiotic stresses (Thomashow et al. (2002) U.S. Pat. No. transferred to an elongation medium. Culture is continued 6,417,428), and an example to confirm the function of on this medium with transfers to fresh plates every two homologous Sequences. In this example, orthologs to CBF1 weeks. When shoots are 0.5 cm in length they may be were found in canola (Brassica napus) using polymerase excised at the base and placed in a rooting medium. chain reaction (PCR). US 2004/0098764 A1 May 20, 2004 46

0475 Degenerate primers were designed for regions of to the CBF1 gene (Stockinger et al. (1997) Proc. Natl. Acad. AP2 binding domain and outside of the AP2 (carboxyl Sci.94:1035-1040). CBF 1 was P-radiolabeled by random terminal domain): priming (Sambrook et al. Supra) and used to Screen the library by the plaque-lift technique using Standard Stringent hybridization and wash conditions (Hajela et al. (1990) Moll 368 (reverse) Plant Physiol. 93:1246-1252; Sambrook et al. Supra) 5'-CAY CCN ATH TAY MGN GGN GT-3' (SEQ ID NO: 62) 6xSSPE buffer, 60° C. for hybridization and 0.1XSSPE Moll 378 (forward) buffer and 60° C. for washes). Twelve positively hybridizing 5'-GGN ARN ARC ATN CCY TCN GCC-3' (SEQ ID NO: 63) clones were obtained and the DNA sequences of the cDNA inserts were determined. The results indicated that the clones 0476 (Y: C/T, N: AC/CI/T, H: A/C/T, M: A/C, R: A/G) fell into three classes. One class carried inserts correspond ing to CBF1. The two other classes carried Sequences 0477 Primer Mol 368 is in the AP2 binding domain of corresponding to two different homologs of CBF1, desig CBF1 (amino acid sequence: His-Pro-Ile-Tyr-Arg nated CBF2 and CBF3. The nucleic acid sequences and Gly-Val) while primer Mol 378 is outside the AP2 domain predicted protein coding Sequences for Arabidopsis CBF1, (carboxyl terminal domain) (amino acid sequence: Met CBF2 and CBF3 are listed in the Sequence Listing (SEQ ID Ala-Glu-Gly-Met-Leu-Leu-Pro). NOs:54, 56, 58 and SEQ ID NOS:55, 57, and 59, respec 0478. The genomic DNA isolated from B. napus was tively). The nucleic acid sequences and predicted protein PCR-amplified by using these primers following these con coding Sequence for Brassica napuS CBF ortholog is listed ditions: an initial denaturation step of 2 min at 93 C.; 35 in the Sequence Listing (SEQ ID NOS:60 and 61, respec cycles of 93° C. for 1 min, 55° C. for 1 min, and 72° C. for tively). 1 min; and a final incubation of 7 min at 72 C. at the end 0486 A comparison of the nucleic acid sequences of of cycling. Arabidopsis CBF1, CBF2 and CBF3 indicate that they are 0479. The PCR products were separated by electrophore 83 to 85% identical as shown in Table 8. sis on a 1.2% agarose gel and transferred to nylon membrane and hybridized with the AT CBF1 probe prepared from TABLE 8 Arabidopsis genomic DNA by PCR amplification. The hybridized products were visualized by colorimetric detec Percent identity" tion System (Boehringer Mannheim) and the corresponding DNA Polypeptide bands from a similar agarose gel Were isolated using the cbf1.fcbf2 85 86 Qiagen Extraction Kit (Qiagen). The DNA fragments were cbf1.fcbf3 83 84 ligated into the TA clone vector from TOPOTA Cloning Kit cbf2.fcbf3 84 85 (Invitrogen) and transformed into E. coli strain TOP10 (Invitrogen). Percent identity was determined using the Clustal algorithm from the Megalign program (DNASTAR, Inc.). 0480. Seven colonies were picked and the inserts were Comparisons of the nucleic acid sequences of the open reading frames sequenced on an ABI 377 machine from both strands of are shown. sense and antisense after plasmid DNA isolation. The DNA Sequence was edited by Sequencer and aligned with the 0487. Similarly, the amino acid sequences of the three At CBF1 by GCG software and NCBI blast searching. CBF polypeptides range from 84 to 86% identity. An align ment of the three amino acidic Sequences reveals that most 0481. The nucleic acid sequence and amino acid of the differences in amino acid Sequence occur in the acidic Sequence of one canola ortholog found in this manner C-terminal half of the polypeptide. This region of CBF 1 (bnCBF1; polynucleotide SEQ ID NO:60 and polypeptide Serves as an activation domain in both yeast and Arabidopsis SEQ ID NO:61) identified by this process is shown in the (not shown). Sequence Listing. 0488 Residues 47 to 106 of CBF1 correspond to the AP2 0482. The aligned amino acid sequences show that the domain of the protein, a DNA binding motif that to date, has bnCBF1 gene has 88% identity with the Arabidopsis only been found in plant proteins. A comparison of the AP2 sequence in the AP2 domain region and 85% identity with domains of CBF1, CBF2 and CBF3 indicates that there are the Arabidopsis Sequence outside the AP2 domain when a few differences in amino acid Sequence. These differences aligned for two insertion Sequences that are outside the AP2 in amino acid Sequence might have an effect on DNA domain. binding Specificity. 0483 Similarly, paralogous sequences to Arabidopsis genes, Such as CBF1, may also be identified. Example XVII 0484. Two paralogs of CBF1 from Arabidopsis thaliana: 0489 Transformation of Canola with a Plasmid Contain CBF2 and CBF3. CBF2 and CBF3 have been cloned and ing CBF1, CBF2, or CBF3 sequenced as described below. The sequences of the DNA 0490. After identifying homologous genes to CBF1, SEQ ID NO:54, 56 and 58 and encoded proteins SEQ ID canola was transformed with a plasmid containing the NO:55, 57 and 59 are set forth in the Sequence Listing. Arabidopsis CBF1, CBF2, or CBF3 genes cloned into the 0485) A lambda cDNA library prepared from RNA iso vector pGA643 (An (1987) Methods Enzymol. 253:292). In lated from Arabidopsis thaliana ecotype Columbia (Lin and these constructs the CBF genes were expressed constitu Thomashow (1992) Plant Physiol. 99:519–525) was tively under the CaMV35S promoter. In addition, the CBF1 Screened for recombinant clones that carried inserts related gene was cloned under the control of the Arabidopsis US 2004/0098764 A1 May 20, 2004 47

COR15 promoter in the same vector pGA643. Each con art (including transcript profile analysis with cDNA or struct was transformed into Agrobacterium strain GV3 101. oligonucleotide microarrays, Northern blot analysis, Semi Transformed Agrobacteria were grown for 2 days in minimal quantitative or quantitative RT-PCR). Interesting gene AB medium containing appropriate antibiotics. expression profiles are revealed by determining transcript 0491 Spring canola (B. napus cv. Westar) was trans abundance for a Selected transcription factor gene after formed using the protocol of Moloney et al. (1989) Plant exposure of plants to a range of different experimental Cell Reports 8:238) with some modifications as described. conditions, and in a range of different tissue or organ types, Briefly, seeds were sterilized and plated on half strength MS or developmental Stages. Experimental conditions to which medium, containing 1% Sucrose. Plates were incubated at plants are exposed for this purpose includes cold, heat, 24°C. under 60-80 uE/m’s light using all 6 hour light/8 hour drought, osmotic challenge, and varied hormone concentra dark photoperiod. Cotyledons from 4-5 day old Seedlings tions (ABA, GA, auxin, cytokinin, Salicylic acid, brassinos were collected, the petioles cut and dipped into the Agro teroid). The tissue types and developmental stages include bacterium Solution. The dipped cotyledons were placed on stem, root, flower, rosette leaves, cauline leaves, Siliques, co-cultivation medium at a density of 20 cotyledons/plate germinating Seed, and meristematic tissue. The Set of and incubated as described above for 3 days. Explants were expression levels provides a pattern that is determined by the transferred to the same media, but containing 300 mg/l regulatory elements of the gene promoter. timentin (SmithKline Beecham, Pa.) and thinned to 10 0498 Transcription factor promoters for the genes dis cotyledons/plate. After 7 days explants were transferred to closed herein are obtained by cloning 1.5 kb to 2.0 kb of Selection/Regeneration medium. TransferS were continued genomic Sequence immediately upstream of the translation every 2-3 weeks (2 or 3 times) until shoots had developed. Start codon for the coding Sequence of the encoded tran Shoots were transferred to Shoot-Elongation medium every scription factor protein. This region includes the 5'-UTR of 2-3 weeks. Healthy looking shoots were transferred to the transcription factor gene, which can comprise regulatory rooting medium. Once good roots had developed, the plants elements. The 1.5 kb to 2.0 kb region is cloned through PCR were placed into moist potting Soil. methods, using primers that include one in the 3' direction located at the translation start codon (including appropriate 0492. The transformed plants were then analyzed for the adaptor Sequence), and one in the 5' direction located from presence of the NPTII gene/kanamycin resistance by 1.5 kb to 2.0 kb upstream of the translation start codon ELISA, using the ELISA NPTII kit from 5Prime-3Prime (including appropriate adaptor Sequence). The desired frag Inc. (Boulder, Colo.). Approximately 70% of the screened ments are PCR-amplified from Arabidopsis Col-O genomic plants were NPTII positive. Only those plants were further DNA using high-fidelity Taq DNA polymerase to minimize analyzed. the incorporation of point mutation(s). The cloning primers 0493. From Northern blot analysis of the plants that were incorporate two rare restriction Sites, Such as Not and Sfi1, transformed with the constitutively expressing constructs, found at low frequency throughout the Arabidopsis genome. showed expression of the CBF genes and all CBF genes Additional restriction Sites are used in the instances where a were capable of inducing the Brassica napuS cold-regulated Not or Sfill restriction site is present within the promoter. gene BN115 (homolog of the Arabidopsis COR15 gene). Most of the transgenic plants appear to exhibit a normal 0499. The 1.5-2.0 kb fragment upstream from the trans growth phenotype. AS expected, the transgenic plants are lation start codon, including the 5'-untranslated region of the more freezing tolerant than the wild-type plants. Using the transcription factor, is cloned in a binary transformation electrolyte leakage of leaves test, the control showed a 50% vector immediately upstream of a Suitable reporter gene, or leakage at -2 to -3°C. Spring canola transformed with eieitr a transactivator gene that is capable of programming expres CBFlor CBF2 showed a 50% leakage at-6 to -7°C. Spring Sion of a reporter gene in a Second gene construct. Reporter canola transformed with CBF3 shows a 50% leakage at genes used include green fluorescent protein (and related about -10 to -15° C. Winter canola transformed with CBF3 fluorescent protein color variants), beta-glucuronidase, and luciferase. Suitable transactivator genes include LeXA may show a 50% leakage at about -16 to -20° C. Further GAL4, along with a transactivatable reporter in a Second more, if the Spring or winter canola are cold acclimated the binary plasmid (as disclosed in U.S. patent application Ser. transformed plants may exhibit a further increase in freezing No. 09/958,131, incorporated herein by reference). The tolerance of at least -2 C. binary plasmid(s) is transferred into Agrobacterium and the 0494. To test salinity tolerance of the transformed plants, structure of the plasmid confirmed by PCR. These strains are plants were watered with 150 mM NaCl. Plants overex introduced into Arabidopsis plants as described in other pressing CBF1, CBF2 or CBF3 grew better compared with examples, and gene expression patterns determined accord plants that had not been transformed with CBF1, CBF2 or ing to Standard methods know to one skilled in the art for CBF3. monitoring GFP fluorescence, beta-glucuronidase activity, 0495. These results demonstrate that homologs of Ara or luminescence. bidopsis transcription factors can be identified and shown to 0500 All references, publications, patent documents, confer similar functions in non-Arabidopsis plant species. web pages, and other documents cited or mentioned herein are hereby incorporated by reference in their entirety for all Example XVIII purposes. Although the invention has been described with reference to specific embodiments and examples, it should 0496 Cloning of Transcription Factor Promoters be understood that one of ordinary skill can make various 0497 Promoters are isolated from transcription factor modifications without departing from the Spirit of the inven genes that have gene expression patterns useful for a range tion. The Scope of the invention is not limited to the Specific of applications, as determined by methods well known in the embodiments and examples provided. US 2004/0098764 A1 May 20, 2004 48

SEQUENCE LISTING

<160> NUMBER OF SEQ ID NOS: 64 <210> SEQ ID NO 1 <211& LENGTH: 1281 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G867 Predicted polypeptide sequence is paralogous to G9, G993, G1930 <400 SEQUENCE: 1

CaCaaCaCaa acacatttct gttittctoca ttgtttcaaa ccataaaaaa. aaacacagat 60 taaatggaat cgagtagcgt tatgagagt actacaagta caggttc cat citgtgaalacc 120 cc.ggcgataa citc.cgg.cgaa aaagttcgtcg gtaggtaact tatacaggat gggaag.cgga 18O toaa.gc gttg tgttagattc agagaacggc gtagaagctg aatctaggaa gct tcc.gtog 240 toaaaataca aaggtgtggit gcc acaacca aacggaagat ggggagctca gatttacgag aaacaccago gcqtgtggct c ggga cattc aac gaagaag acgaagcc.gc togtgccitac 360 gacgtc.gcgg ttcacaggitt cogtc.gc.cgt. gacgc.cgtoa caaattitcaa. agacgtgaag 420 atggacgaag acgaggtoga tittcttgaat totcattcga aatctgagat cgttgatatg 480 ttgaggaaac atacittataa cqaagagitta gag cagagta aacgg.cgtog taatggtaac 540 ggaalacatga citaggacgtt gttaacg.tc.g gggttgagta atgatggtot ttctacgacg 600 gggtttagat Cgg.cggaggc actgtttgag aaag.cggtaa cgc.caag.cga Cgttgggaag 660 ctaaaccqtt tggittatacc gaalacatcac gcagagaaac atttitcc.gtt accgtcaagt 720 aacgtttcc.g tgaaaggagt gttgttgaac tittgaggacg ttaacgggaa agtgtggagg titcc gttact cg tattggaa cagtag toag agittatgttt tgactaaagg ttggagcagg 840 titcgittaagg agaagaatct acgtgctggit gacgtggitta gtttcagtag atctaacggit 9 OO caggatcaac agttgtacat tdggtggaag to gagat.ccg ggtoagattit agatgcgggit 96.O cgggttittga gattgttcgg agitta acatt to accggaga gttcaagaaa cgacgtogta 1020 ggaaacaaaa gagtgaacga tactgagatg ttatcgttgg tgtgtag caa gaa.gcaacgc 1080 atctittcacg cctcgtaaca acticttctitc tittittitt titc. titttgttgtt ttaataattit 1140 ttaaaaactic catttitcgitt ttctittattt gcatcggttt cittitctitc.tt gtttaccaaa 1200 ggttcatgag ttgttitttgttgtattgatg aactgtaaat tittatttata ggataaattit 1260 taaaaaaaaa. aaaaaaaaaa, a 1281

<210> SEQ ID NO 2 <211& LENGTH 34 4 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G867 polypeptide Paralogous to G9, G993, G1930 <400 SEQUENCE: 2 Met Glu Ser Ser Ser Val Asp Glu Ser Thr Thr Ser Thr Gly Ser Ile 1 5 10 15 Cys Glu Thr Pro Ala Ile Thr Pro Ala Lys Lys Ser Ser Val Gly Asn 25 30 Leu Tyr Arg Met Gly Ser Gly Ser Ser Val Val Lieu. Asp Ser Glu Asn US 2004/0098764 A1 May 20, 2004 49

-continued

35 40 45 Gly Val Glu Ala Glu Ser Arg Lys Lieu Pro Ser Ser Lys Tyr Lys Gly 50 55 60 Val Val Pro Gln Pro Asn Gly Arg Trp Gly Ala Glin Ile Tyr Glu Lys 65 70 75 8O His Glin Arg Val Trp Lieu Gly Thr Phe Asn. Glu Glu Asp Glu Ala Ala 85 90 95 Arg Ala Tyr Asp Wall Ala Wal His Arg Phe Arg Arg Arg Asp Ala Val 100 105 110 Thr Asn. Phe Lys Asp Wall Lys Met Asp Glu Asp Glu Val Asp Phe Lieu 115 120 125 Asn Ser His Ser Lys Ser Glu Ile Val Asp Met Leu Arg Lys His Thr 130 135 1 4 0 Tyr Asn. Glu Glu Lieu Glu Glin Ser Lys Arg Arg Arg Asn Gly Asn Gly 145 15 O 155 160 Asn Met Thr Arg Thr Lieu Lleu Thir Ser Gly Lieu Ser Asn Asp Gly Val 1.65 170 175 Ser Thr Thr Gly Phe Arg Ser Ala Glu Ala Leu Phe Glu Lys Ala Val 18O 185 190 Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Llys His 195 200 2O5 His Ala Glu Lys His Phe Pro Leu Pro Ser Ser Asn Val Ser Val Lys 210 215 220 Gly Val Lieu Lieu. Asn. Phe Glu Asp Wall Asn Gly Lys Val Trp Arg Phe 225 230 235 240 Arg Tyr Ser Tyr Trp Asn Ser Ser Glin Ser Tyr Val Leu Thr Lys Gly 245 250 255 Trp Ser Arg Phe Wall Lys Glu Lys Asn Lieu Arg Ala Gly Asp Val Val 260 265 27 O Ser Phe Ser Arg Ser Asn Gly Glin Asp Glin Gln Leu Tyr Ile Gly Trp 275 280 285 Lys Ser Arg Ser Gly Ser Asp Lieu. Asp Ala Gly Arg Val Lieu Arg Lieu 29 O 295 3OO Phe Gly Val Asin Ile Ser Pro Glu Ser Ser Arg Asn Asp Val Val Gly 305 310 315 320 Asn Lys Arg Val Asn Asp Thr Glu Met Leu Ser Lieu Val Cys Ser Lys 325 330 335 Lys Glin Arg Ile Phe His Ala Ser 340

<210> SEQ ID NO 3 <211& LENGTH 124 6 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G9 Predicted polypeptide sequence is paralogous to G867, G993, G1930 <400 SEQUENCE: 3 gtgtttctitc tittctgctaa aaggittataa tttttgtttc ttggitttggit gagaatctitc 60 aagaaact ga aacaaagaaa atggattcta gttgcataga cqagataagt to citccactt 120 cagaatctitt citcc.gc.cacc accgc.caaga agctotcitcc toc tocc gog goggc gttac 18O US 2004/0098764 A1 May 20, 2004 50

-continued gcct ct accg gatgggaagc gg.cgggagca gcqtcgtgtt ggatcCC gag alacggcc tag 240 agacggagtc. acgaaagcta ccatcttcaa aatacaaagg togttgttcct cagoctaacg. 3OO gaagatgggg agctcagatc tacgagaagc accaacgagt atggctoggg actittcaacg. 360 agcaagaaga agctgctcgt to citacgaca togcagottg tag attcc.gt gg.ccg.cgacg 420 cc.gtcgtoaa cittcaagaac gttctggaag acggc gattt agcttittctt gaagcto act 480 caaaggcc.ga gatcgtcgac atgttgagaa alacac actta cqc.cgac gag cittgaacaga 540 acaataaacg gcagttgttt citcto cqtcg acgctaacgg aaaacgtaac ggatcgagta 600 citactcaaaa cqacaaagtt ttaaagacgt gtgaagttct titt.cgagaag gotgttacac 660 citagcgacgt toggaagcta aaccqtcticg tdatacctaa acaac acgcc gagaaac act 720 titcc gttacc gtcaccgtca ccggcagtga citaaaggagt tittgatcaac titc galagacg 78O ttaacggtaa agtgtggagg titcc.gttact catactggaa cagtagt caa agttacgtgt 840 tgacca aggg atggagtcga titcgtcaagg agaagaatct tcgagcc.ggit gatgttgtta 9 OO citttcgagag atc gaccgga citagagcggc agittatatat to attggaaa gttcggtotg 96.O gtoc.gagaga aaa.ccc.ggitt caggtggtgg titcggcttitt cqgagttgat atctittaatg 1020 tgaccaccgt gaa.gc.caaac gacgtcgtgg cc.gtttgcgg toggaaagaga totcgagatg 1080 ttgatgatat gtttgc gtta cqgtgttcca agaag caggc gataatcaat gctttgttgac 1140 atatttccitt titcc.gattitt atgctttcgt tttittaattt tttittitttgt caagttgtgt 1200 aggttgttgat toatgctagg ttgtatttag gaaaagagat aag acc 1246

<210> SEQ ID NO 4 &2 11s LENGTH 352 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G9 polypeptide Paralogous to G867, G993, G1930 <400 SEQUENCE: 4 Met Asp Ser Ser Cys Ile Asp Glu Ile Ser Ser Ser Thr Ser Glu Ser 1 5 10 15 Phe Ser Ala Thr Thr Ala Lys Lys Leu Ser Pro Pro Pro Ala Ala Ala 2O 25 30 Leu Arg Lieu. Tyr Arg Met Gly Ser Gly Gly Ser Ser Val Val Lieu. Asp 35 40 45 Pro Glu Asn Gly Lieu Glu Thr Glu Ser Arg Lys Lieu Pro Ser Ser Lys 50 55 60 Tyr Lys Gly Val Val Pro Gln Pro Asn Gly Arg Trp Gly Ala Glin Ile 65 70 75 8O Tyr Glu Lys His Glin Arg Val Trp Leu Gly Thr Phe Asn. Glu Glin Glu 85 90 95 Glu Ala Ala Arg Ser Tyr Asp Ile Ala Ala Cys Arg Phe Arg Gly Arg 100 105 110 Asp Ala Val Val Asn. Phe Lys Asn. Wall Leu Glu Asp Gly Asp Leu Ala 115 120 125 Phe Leu Glu Ala His Ser Lys Ala Glu Ile Val Asp Met Leu Arg Lys 130 135 1 4 0 His Thr Tyr Ala Asp Glu Lieu Glu Glin Asn. Asn Lys Arg Glin Leu Phe 145 15 O 155 160 US 2004/0098764 A1 May 20, 2004 51

-continued Leu Ser Val Asp Ala Asn Gly Lys Arg Asn Gly Ser Ser Thr Thr Glin 1.65 170 175 Asn Asp Llys Val Lieu Lys Thr Cys Glu Val Lieu Phe Glu Lys Ala Val 18O 185 190 Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Lys Glin 195 200 2O5 His Ala Glu Lys His Phe Pro Leu Pro Ser Pro Ser Pro Ala Val Thr 210 215 220 Lys Gly Val Lieu. Ile Asn. Phe Glu Asp Wall Asn Gly Lys Val Trp Arg 225 230 235 240 Phe Arg Tyr Ser Tyr Trp Asn Ser Ser Glin Ser Tyr Val Leu Thr Lys 245 250 255 Gly Trp Ser Arg Phe Wall Lys Glu Lys Asn Lieu Arg Ala Gly Asp Wal 260 265 27 O Val Thr Phe Glu Arg Ser Thr Gly Leu Glu Arg Gln Leu Tyr Ile Asp 275 280 285 Trp Llys Val Arg Ser Gly Pro Arg Glu Asn Pro Val Glin Val Val Val 29 O 295 3OO Arg Leu Phe Gly Val Asp Ile Phe Asin Val Thir Thr Val Lys Pro Asn 305 310 315 320 Asp Val Val Ala Val Cys Gly Gly Lys Arg Ser Arg Asp Wall Asp Asp 325 330 335 Met Phe Ala Lieu Arg Cys Ser Lys Lys Glin Ala Ile Ile Asn Ala Lieu 340 345 350

<210 SEQ ID NO 5 &2 11s LENGTH 1239 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G993 Predicted polypeptide sequence is paralogous to G867, G9, G1930 <400 SEQUENCE: 5 caaatatgga atacagotgt gtagacgaca gtag tacaac gtcagaatct citcto catct 60 citactact.cc aaag.ccgaca acgacgacgg agaagaaact citcttctic.cg cc.ggcgacgt. 120 cgatgcgt.ct citacagaatg ggaag.cgg.cg galagcagogt cqttittggat tdagagaacg. 18O gcqtc.gagac cqagtcacgt aagct tcc tt cqtcgaaata taaaggc gtt gtgccitcago 240 citaacggaag atggggagct cagatttacg agaag catca gcgagtttgg citcgg tactt 3OO tdaacgagga agaagaagct gcgtottctt acgacatcgc cqtgaggaga titcc.gcggcc 360 gcqacgcc.gt cactaactitc aaatctoaag ttgatggaaa cqacgcc gala toggcttittc 420 ttgacgctica ttctaaagct gagat.cgtgg atatgttgag gaalacacact tacgc.cgatg 480 agtttgagca gagtag acgg aagtttgtta acggcgacgg aaaacgctot go.gttggaga 540 cggcgacgta cqgaaacgac gotgttittga gag.cgc.gtga ggttttgttc gagaagacitg 600 ttacgc.cgag cqacgt.cggg aagctgaacc gtttagtgat accgaaacaa cacgcggaga 660 agcatttitcc gttaccgg.cg atgacgacgg cqatggggat gaatc.cgtot cogac gaaag 720 gcqttittgat taacttggaa gatagaacag ggaaagtgtg gcggttcc.gt tacagttact 78O ggaacagoag toaaagttac gtgttgacca agggctd gag ccggttcgtt aaagaga aga 840 atctitcgagc cqgtgatgtg gtttgtttcg agagatcaac cqgaccagac cqgcaattgt 9 OO US 2004/0098764 A1 May 20, 2004 52

-continued atat coactg gaaagttcc.gg totagtc.cgg titcagacitot gottaggcta titcggagtca 96.O acattttcaa totgagtaac gagaalaccala acgacgtc.gc agtag agtgt gttggcaaga 1020 agagat citcg ggaagatgat ttgtttitcgt tagggtottc caagaag cag gogattatca 1080 acatcttgttg acaaattctt tttittittggit ttttittcttcaatttgtttc. tcctttittca 1140 atattttgta ttgaaatgac aagttgtaaa ttagg acaag acaagaaaaa atgacaacta 1200 gacaaaatag tittttgttta aaaaaaaaaa aaaaaaaaa 1239

<210> SEQ ID NO 6 &2 11s LENGTH 361 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G993 polypeptide Paralogous to G867, G9, G1930 <400 SEQUENCE: 6 Met Glu Tyr Ser Cys Val Asp Asp Ser Ser Thr Thr Ser Glu Ser Leu 1 5 10 15 Ser Ile Ser Thr Thr Pro Lys Pro Thr Thr Thr Thr Glu Lys Lys Leu 2O 25 30 Ser Ser Pro Pro Ala Thr Ser Met Arg Leu Tyr Arg Met Gly Ser Gly 35 40 45 Gly Ser Ser Val Val Leu Asp Ser Glu Asn Gly Val Glu Thr Glu Ser 50 55 60 Arg Lys Lieu Pro Ser Ser Lys Tyr Lys Gly Val Val Pro Glin Pro Asn 65 70 75 8O Gly Arg Trp Gly Ala Glin Ile Tyr Glu Lys His Glin Arg Val Trp Lieu 85 90 95 Gly Thr Phe Asn. Glu Glu Glu Glu Ala Ala Ser Ser Tyr Asp Ile Ala 100 105 110 Val Arg Arg Phe Arg Gly Arg Asp Ala Val Thr Asn. Phe Lys Ser Glin 115 120 125 Val Asp Gly Asn Asp Ala Glu Ser Ala Phe Lieu. Asp Ala His Ser Lys 130 135 1 4 0 Ala Glu Ile Val Asp Met Leu Arg Lys His Thr Tyr Ala Asp Glu Phe 145 15 O 155 160 Glu Glin Ser Arg Arg Lys Phe Val Asn Gly Asp Gly Lys Arg Ser Gly 1.65 170 175 Leu Glu Thir Ala Thr Tyr Gly Asn Asp Ala Val Lieu Arg Ala Arg Glu 18O 185 190 Val Leu Phe Glu Lys Thr Val Thr Pro Ser Asp Val Gly Lys Leu Asn 195 200 2O5 Arg Lieu Val Ile Pro Lys Glin His Ala Glu Lys His Phe Pro Leu Pro 210 215 220 Ala Met Thr Thr Ala Met Gly Met Asn Pro Ser Pro Thr Lys Gly Val 225 230 235 240 Lieu. Ile Asn Lieu Glu Asp Arg Thr Gly Lys Val Trp Arg Phe Arg Tyr 245 250 255 Ser Tyr Trp Asin Ser Ser Glin Ser Tyr Val Leu Thir Lys Gly Trp Ser 260 265 27 O Arg Phe Wall Lys Glu Lys Asn Lieu Arg Ala Gly Asp Val Val Cys Phe 275 280 285 US 2004/0098764 A1 May 20, 2004 53

-continued

Glu Arg Ser Thr Gly Pro Asp Arg Glin Leu Tyr Ile His Trp Llys Val 29 O 295 3OO Arg Ser Ser Pro Val Glin Thr Val Val Arg Leu Phe Gly Val Asin Ile 305 310 315 320 Phe Asn. Wal Ser Asn. Glu Lys Pro Asn Asp Val Ala Val Glu Cys Wal 325 330 335 Gly Lys Lys Arg Ser Arg Glu Asp Asp Leu Phe Ser Lieu Gly Cys Ser 340 345 350 Lys Lys Glin Ala Ile Ile Asn. Ile Leu 355 360

<210 SEQ ID NO 7 &2 11s LENGTH 1155 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1930 Predicted polypeptide sequence is paralogous to G867, G9, G993 <400 SEQUENCE: 7 attcacatta citaatctotc aagattitcac aattittcttg tdattittcto tcagtttctt 60 attitcgtttc ataa.catgga tigc catgagt agcgtag acg agagctotac alactacagat 120 to cattcc.gg cq agaaagttc atcgtc.to cq gogagtttac tatatagaat gggaag.cgga 18O acaag.cgtgg tacttgatto agagaacggt gtcgaagttcg aagttcgaagc cqaatcaaga 240 aagctt.ccitt cittcaagatt caaaggtgtt gttcc to aac caaatggaag atggggagct 3OO cagatttacg agaaac atca acgc.gtgtgg cittgg tactt toaac gagga agacgaagca 360 gctcgtgctt acgacgtc.gc ggcto accgt titc.cgtggcc gcgatgcc.gt tactaattitc 420 aaag acacga cqttcgaaga agaggttgag ttcttaaacg cqcattc gala atcagagatc 480 gtagatatgttgagaaaa.ca cacttacaaa gaagagittag accalaaggaa acgta accqt 540 gacggtaacg gaaaagagac gacgg.cgttt gctittggctt cqatggtggit tatgacgggg 600 tittaaaacgg cqgagttact gtttgagaaa acggtaacgc caagtgacgt. c gggaaacta 660 aaccgtttag titataccalaa acaccaag.cg gagaalacatt titcc.gttacc gttaggtaat 720 aataacgt.ct cogittaaagg tatgctgttgaattitcgaag acgittaacgg gaaagtgtgg 78O aggttc.cgtt acticittattg gaatagtagt caaagttato tdttgaccala aggttggagt 840 agatto gtta aagaga agag actttgttgct ggtgatttga toagttittaa aagatccaac 9 OO gatcaagatc aaaaattctt tatcgggtgg aaatcgaaat cogggttgga totagaga.cg 96.O ggtogggitta toagattgtt toggggttgat atttctittaa acgc.cgt.cgt totagtgaag 1020 gaaacaacgg aggtgttaat gtcgtcgitta aggtgtaaga agcaa.cgagt tttgtaataa 1080 caatttaaca acttgggaaa gaaaaaaaag citttittgatt ttaatttcto ttcaacgitta 1140 atcttgct ga gatta 1155

<210 SEQ ID NO 8 &2 11s LENGTH 333 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1930 polypeptide Paralogous to G867, G9, G993 <400 SEQUENCE: 8 US 2004/0098764 A1 May 20, 2004 54

-continued

Met Asp Ala Met Ser Ser Val Asp Glu Ser Ser Thr Thr Thr Asp Ser 1 5 10 15 Ile Pro Ala Arg Lys Ser Ser Ser Pro Ala Ser Lieu Lleu Tyr Arg Met 2O 25 30 Gly Ser Gly Thr Ser Val Val Leu Asp Ser Glu Asn Gly Val Glu Val 35 40 45 Glu Val Glu Ala Glu Ser Arg Lys Lieu Pro Ser Ser Arg Phe Lys Gly 50 55 60 Val Val Pro Gln Pro Asn Gly Arg Trp Gly Ala Glin Ile Tyr Glu Lys 65 70 75 8O His Glin Arg Val Trp Lieu Gly Thr Phe Asn. Glu Glu Asp Glu Ala Ala 85 90 95 Arg Ala Tyr Asp Wall Ala Ala His Arg Phe Arg Gly Arg Asp Ala Val 100 105 110 Thr Asin Phe Lys Asp Thr Thr Phe Glu Glu Glu Val Glu Phe Leu Asn 115 120 125 Ala His Ser Lys Ser Glu Ile Val Asp Met Leu Arg Lys His Thr Tyr 130 135 1 4 0 Lys Glu Glu Lieu. Asp Glin Arg Lys Arg Asn Arg Asp Gly Asn Gly Lys 145 15 O 155 160 Glu Thir Thr Ala Phe Ala Leu Ala Ser Met Val Val Met Thr Gly Phe 1.65 170 175 Lys Thr Ala Glu Leu Leu Phe Glu Lys Thr Val Thr Pro Ser Asp Val 18O 185 190 Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Llys His Glin Ala Glu Lys His 195 200 2O5 Phe Pro Leu Pro Leu Gly Asn. Asn. Asn. Wal Ser Wall Lys Gly Met Lieu 210 215 220 Lieu. Asn. Phe Glu Asp Wall Asn Gly Lys Val Trp Arg Phe Arg Tyr Ser 225 230 235 240 Tyr Trp Asin Ser Ser Glin Ser Tyr Val Leu Thir Lys Gly Trp Ser Arg 245 250 255 Phe Wall Lys Glu Lys Arg Lieu. Cys Ala Gly Asp Lieu. Ile Ser Phe Lys 260 265 27 O Arg Ser Asn Asp Glin Asp Glin Lys Phe Phe Ile Gly Trp Llys Ser Lys 275 280 285 Ser Gly Lieu. Asp Leu Glu Thr Gly Arg Val Met Arg Lieu Phe Gly Val 29 O 295 3OO Asp Ile Ser Leu Asn Ala Val Val Val Val Lys Glu Thir Thr Glu Val 305 310 315 320 Leu Met Ser Ser Leu Arg Cys Lys Lys Glin Arg Val Lieu 325 330

<210 SEQ ID NO 9 &2 11s LENGTH 1194 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G2687

<400 SEQUENCE: 9 citctgtctot cqtatcttitc tactactctg tittcttgaat tctaatgaac aacatcgacg 60 US 2004/0098764 A1 May 20, 2004 55

-continued acgcaaagac ggagacittca gtgtc.ttcag gttcaag.cga citctttcttg cct citcaaga 120 aacgcatgag acttgatgac galaccagaaa acgcc ctagt ggtttcgtot to accaaaga 18O cggttgttggc titctggcaat gtcaagtaca aaggagtcgt to agcaa.cag aacgg to att 240 ggggtgcc.ca gatttacgca gaccacaaaa gqatttggct toggaactittcaaatcc.gctg 3OO atgaagcc.gc. cacggcttac gatagtgcat citatcaaact cogaa.gctitt gacgctaact 360 cgcaccggaa cittcccttgg totacaatca citctdaacga accag actitt caaaattgct 420 acacaacaga gacitgttgttgaac at gatca gagacggttc gtaccalacac aaattcagag 480 attittctoag aatcagatct cagattgttg cqagtaticaa catcggggga ccaaaacaag 540 ccc.gaggaga agtgaatcaa gaatcagaca agtgtttittc ttgcacacag citttittcaga 600 agga attgac accgag cqat gtagggaaac taaataggct totgatacct aaaaagtatg 660 cagtgaagta tatgccttitc ataag.cgctg atcaaag.cga gaaagaagag g g toaaatag 720 taggatctgt ggaagatgttg gaggttgttgt tttacgacag agcaatgaga caatggaagt 78O ttagg tattg titactggaaa agtagccaga gctttgtc.tt caccagagga tiggaatagitt 840 togtgaagga gaagaatcto aag gagaagg atgttattgc cittctacact togc gatgtc.c 9 OO cgaacaatgt galagacatta galaggtoa aa gaaagaactt cittgatgatc gatgttcatt 96.O gcttitt caga caacggttcc gtggtagctg aggaagtaag tatgacggitt catgacagtt 1020 cagtgcaagt aaagaaaa.ca gaaaacttgg ttagcto cat gttagaagat aaagaalacca 1080 aatcagagga gaacaaagga gggitttatgc tigtttggtgt aag gatcgaa totccittagg 1140 gaatttittct ttaaaagttt cittacttcaa citagaacttig titttacttgt acct 1194

<210> SEQ ID NO 10 &2 11s LENGTH 363 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2687 polypeptide <400 SEQUENCE: 10 Met Asn. Asn. Ile Asp Asp Ala Lys Thr Glu Thir Ser Val Ser Ser Gly 1 5 10 15 Ser Ser Asp Ser Phe Leu Pro Leu Lys Lys Arg Met Arg Lieu. Asp Asp 2O 25 30 Glu Pro Glu Asn Ala Leu Val Val Ser Ser Ser Pro Lys Thr Val Val 35 40 45 Ala Ser Gly Asn. Wall Lys Tyr Lys Gly Val Val Glin Glin Glin Asn Gly 50 55 60 His Trp Gly Ala Glin Ile Tyr Ala Asp His Lys Arg Ile Trp Leu Gly 65 70 75 8O Thr Phe Lys Ser Ala Asp Glu Ala Ala Thr Ala Tyr Asp Ser Ala Ser 85 90 95 Ile Lys Lieu Arg Ser Phe Asp Ala Asn. Ser His Arg Asn. Phe Pro Trp 100 105 110 Ser Thr Ile Thr Leu Asn Glu Pro Asp Phe Glin Asn Cys Tyr Thr Thr 115 120 125 Glu Thr Val Leu Asn Met Ile Arg Asp Gly Ser Tyr Gln His Lys Phe 130 135 1 4 0 Arg Asp Phe Leu Arg Ile Arg Ser Glin Ile Val Ala Ser Tle Asn. Ile US 2004/0098764 A1 May 20, 2004 56

-continued

145 15 O 155 160 Gly Gly Pro Lys Glin Ala Arg Gly Glu Val Asn. Glin Glu Ser Asp Lys 1.65 170 175 Cys Phe Ser Cys Thr Glin Leu Phe Gln Lys Glu Leu Thr Pro Ser Asp 18O 185 190 Val Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Llys Lys Tyr Ala Wall Lys 195 200 2O5 Tyr Met Pro Phe Ile Ser Ala Asp Glin Ser Glu Lys Glu Glu Gly Glu 210 215 220 Ile Val Gly Ser Val Glu Asp Val Glu Val Val Phe Tyr Asp Arg Ala 225 230 235 240 Met Arg Gln Trp Llys Phe Arg Tyr Cys Tyr Trp Lys Ser Ser Glin Ser 245 250 255 Phe Val Phe Thr Arg Gly Trp Asin Ser Phe Val Lys Glu Lys Asn Leu 260 265 27 O Lys Glu Lys Asp Val Ile Ala Phe Tyr Thr Cys Asp Val Pro Asn. Asn 275 280 285 Wall Lys Thr Lieu Glu Gly Glin Arg Lys Asn. Phe Leu Met Ile Asp Val 29 O 295 3OO His Cys Phe Ser Asp Asn Gly Ser Val Val Ala Glu Glu Val Ser Met 305 310 315 320 Thr Val His Asp Ser Ser Val Glin Val Lys Lys Thr Glu Asn Leu Val 325 330 335 Ser Ser Met Leu Glu Asp Lys Glu Thir Lys Ser Glu Glu Asn Lys Gly 340 345 350 Gly Phe Met Leu Phe Gly Val Arg Ile Glu Cys 355 360

<210> SEQ ID NO 11 <211& LENGTH 1216 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1957

<400 SEQUENCE: 11 caagaaccat citcgtaaatc aagatttcto caaggaaaat cagataagtic ataatggatc 60 tatc.cctggc ticcgacaa.ca acaacaagtt cog accalaga acaag acaga gaccaagaat 120 taac citccaa catcggagca agcagoagct coggtoccag cqgaaacaac aacaaccittc 18O cgatgatgat gattccacct cog gagaaag alacacatgtt cqacaaagt g g talacaccala 240 gcqacgtogg aaaacticaac agacitcgtga toccitaalaca acacgct gag agg tattitcc 3OO citctagacitc. citcaaacaac caaaacggca cqcttittgaa citt.ccaagac agaaacggca 360 agatgtggag attcc.gttac togtattgga actictagoca gagctacgtt atgaccaaag 420 gatggagc.cg titt.cgtcaaa gagaaaaagc ticgatgcagg agacattgtc. tctitt.ccaac 480 gagg catcgg agatgagtica gaaagatcca aactttacat agattggagg catagaccc.g 540 acatgagcct cqttcaag.ca catcagtttg gtaattittgg tittcaattitc aattitccc.ga 600 ccacttctoa atattocaac agattitcatc cattgccaga atataactoc gtoccgattc 660 accggggctt aaa.catcgga aatcaccaac gttcc tatta talacacccag cqtcaag agt 720 togtagggta togttatggg aatttagotg galaggtgtta citacacggga to accgttgg 78O US 2004/0098764 A1 May 20, 2004 57

-continued atcataggaa cattgttgga toaga.gc.cgt tdgttataga citcagtc.cct gtggttc.ccg. 840 ggagattaac toc ggtgatg ttaccgcc.gc titcct cogcc toctitctacg gcgggaaaga 9 OO gactaaggct citttggggtg aatatggaat gtggcaatga citataatcaa caagaag agt 96.O catggttggit gcc acgtggc gaaattggtg catcttctitc ttcttcttca gct citacgac 1020 taaatttatc gactgatcat gatgatgata atgat gatgg tatgatggc gatgatgatc 1080 aatttgctaa gaaagg gaag tottcactitt citctdaattt caatccatga gaagtttcat 1140 catcttcttg ttittgaatct citctittatat tigttitccatt agtaatttitt actaagg gta 1200 ttagattota gctagt 1216

<210> SEQ ID NO 12 &2 11s LENGTH 358 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1957 polypeptide <400 SEQUENCE: 12 Met Asp Leu Ser Leu Ala Pro Thr Thr Thr Thr Ser Ser Asp Glin Glu 1 5 10 15 Glin Asp Arg Asp Glin Glu Lieu. Thir Ser Asn. Ile Gly Ala Ser Ser Ser 2O 25 30 Ser Gly Pro Ser Gly Asn Asn Asn Asn Leu Pro Met Met Met Ile Pro 35 40 45 Pro Pro Glu Lys Glu His Met Phe Asp Llys Val Val Thr Pro Ser Asp 50 55 60 Val Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Lys Gln His Ala Glu Arg 65 70 75 8O Tyr Phe Pro Leu Asp Ser Ser Asn. Asn Glin Asn Gly Thr Lieu Lieu. Asn 85 90 95 Phe Glin Asp Arg Asn Gly Lys Met Trp Arg Phe Arg Tyr Ser Tyr Trp 100 105 110 Asn Ser Ser Glin Ser Tyr Val Met Thr Lys Gly Trp Ser Arg Phe Val 115 120 125 Lys Glu Lys Lys Lieu. Asp Ala Gly Asp Ile Val Ser Phe Glin Arg Gly 130 135 1 4 0 Ile Gly Asp Glu Ser Glu Arg Ser Lys Lieu. Tyr Ile Asp Trp Arg His 145 15 O 155 160 Arg Pro Asp Met Ser Leu Val Glin Ala His Glin Phe Gly Asin Phe Gly 1.65 170 175 Phe Asin Phe Asin Phe Pro Thir Thr Ser Glin Tyr Ser Asn Arg Phe His 18O 185 190 Pro Leu Pro Glu Tyr Asn Ser Val Pro Ile His Arg Gly Leu Asin Ile 195 200 2O5 Gly Asn His Glin Arg Ser Tyr Tyr Asn Thr Glin Arg Glin Glu Phe Val 210 215 220 Gly Tyr Gly Tyr Gly Asn Lieu Ala Gly Arg Cys Tyr Tyr Thr Gly Ser 225 230 235 240 Pro Leu Asp His Arg Asn. Ile Val Gly Ser Glu Pro Leu Val Ile Asp 245 250 255 Ser Val Pro Val Val Pro Gly Arg Leu Thr Pro Val Met Leu Pro Pro US 2004/0098764 A1 May 20, 2004 58

-continued

260 265 27 O Leu Pro Pro Pro Pro Ser Thr Ala Gly Lys Arg Lieu Arg Lieu Phe Gly 275 280 285 Val Asn Met Glu Cys Gly Asn Asp Tyr Asn. Glin Glin Glu Glu Ser Trp 29 O 295 3OO Leu Val Pro Arg Gly Glu Ile Gly Ala Ser Ser Ser Ser Ser Ser Ala 305 310 315 320 Leu Arg Lieu. Asn Lieu Ser Thr Asp His Asp Asp Asp Asn Asp Asp Gly 325 330 335 Asp Asp Gly Asp Asp Asp Glin Phe Ala Lys Lys Gly Lys Ser Ser Lieu 340 345 350

Ser Lieu. Asn. Phe Asn Pro 355

<210> SEQ ID NO 13 &2 11s LENGTH 1368 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE &223> OTHER INFORMATION: G1010

<400 SEQUENCE: 13 attcttctitc taaaaaatct togacaactitt ttgttitttgt tittctittcto tdaattittitt 60 aaaagagaga gagctatota gctatoaaac agtaagagat atagatatag agagacagag 120 aaagatgatg atcagtgaag titaggctaaa cccactittct atttatgtat aattaggtoa 18O atcacatcac caatctocto citccaattct cotccitctoc titccaaattic tagggittittg 240 cittgitatcto accoccittitc. tcaatticcct agg gaaactg tdaattitcat caaattic cat 3OO tattttittgg to acacccitt aaagagatct gagagttcta aagatgatga cagattitatic 360 totcac gaga gatgaagatg aagaagaagc aaag.cccitta gcagaagaag aaggagc.gc.g 420 tgaagtag ca gacagaga.gc acatgttcga caaagttgtg acticcaagtg atgtcggaaa 480 actaaaccga cittgttgatcc caaagcaa.ca cqcagagaga ttctt.ccctt tag attcatc 540 ttcaaacgag aaaggtttgc titttaaactt cqaagat citc actggcaaat cittggaggitt 600 cc.gttactict tactggaaca gtagt caaag citatgtcato actaaaggitt gag cagatt 660 cgittaaagac aaaaagcttg acgc.cggaga tattgtc.tct titccaaagat gtgtcggaga 720 ttcaggaaga gatago cqtt totttattga ttggaggaga agacctaaag tocct gacca 78O toctoattitc gcc.gc.cggag citatgttccc taggittttac agcttitcctt cqaccaatta 840 cagt ctittat aatcatcago agcaacgtca toatcacagt ggtggtggitt ataattatca 9 OO toaaattic.cg agagaatttg gttatggitta citt.cgittagg toagtggat.c agaggaacaa 96.O to cit gciggct gcggtggctd atcc.gttggit gattgaatct gtgcc.ggtga tigatgcacgg 1020 gaga.gctaat caggaacttg ttggaacggc cqggaagaga citgaggcttt ttggagttga 1080 tatggaatgc ggc gaga.gc.g. gaatgaccala cagtacggag gaggaatcat catctitc.cgg 1140 tggaagtttg ccacgtggag goggtggtgg togctt catct tcc tott tot titcagct gag 1200 acttggaagc agcagtgaag atgat cactt cactaagaaa gaaagttctt cattgtctitt 1260 tgatttggat caataataat gatgatgatgaaattagttg gtattittaag aaaaaaaa.ca 1320 tacatatata attctatata tatgacaaca taatgcattg attitccitt 1368 US 2004/0098764 A1 May 20, 2004 59

-continued

<210> SEQ ID NO 14 &2 11s LENGTH 310 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G1010 polypeptide <400 SEQUENCE: 14 Met Met Thr Asp Leu Ser Lieu. Thir Arg Asp Glu Asp Glu Glu Glu Ala 1 5 10 15 Lys Pro Leu Ala Glu Glu Glu Gly Ala Arg Glu Val Ala Asp Arg Glu 2O 25 30 His Met Phe Asp Lys Val Val Thr Pro Ser Asp Val Gly Lys Leu Asn 35 40 45 Arg Lieu Val Ile Pro Lys Glin His Ala Glu Arg Phe Phe Pro Leu Asp 50 55 60 Ser Ser Ser Asn. Glu Lys Gly Lieu Lleu Lieu. Asn. Phe Glu Asp Lieu. Thr 65 70 75 8O Gly Lys Ser Trp Arg Phe Arg Tyr Ser Tyr Trp Asn Ser Ser Glin Ser 85 90 95 Tyr Val Met Thr Lys Gly Trp Ser Arg Phe Val Lys Asp Llys Lys Lieu 100 105 110 Asp Ala Gly Asp Ile Val Ser Phe Glin Arg Cys Val Gly Asp Ser Gly 115 120 125 Arg Asp Ser Arg Leu Phe Ile Asp Trp Arg Arg Arg Pro Llys Val Pro 130 135 1 4 0 Asp His Pro His Phe Ala Ala Gly Ala Met Phe Pro Arg Phe Tyr Ser 145 15 O 155 160 Phe Pro Ser Thr Asn Tyr Ser Leu Tyr Asn His Glin Gln Glin Arg His 1.65 170 175 His His Ser Gly Gly Gly Tyr Asn Tyr His Glin Ile Pro Arg Glu Phe 18O 185 190 Gly Tyr Gly Tyr Phe Val Arg Ser Val Asp Glin Arg Asn. Asn Pro Ala 195 200 2O5 Ala Ala Val Ala Asp Pro Leu Val Ile Glu Ser Val Pro Val Met Met 210 215 220 His Gly Arg Ala Asn Glin Glu Lieu Val Gly Thr Ala Gly Lys Arg Lieu 225 230 235 240 Arg Leu Phe Gly Val Asp Met Glu Cys Gly Glu Ser Gly Met Thr Asn 245 250 255 Ser Thr Glu Glu Glu Ser Ser Ser Ser Gly Gly Ser Leu Pro Arg Gly 260 265 27 O Gly Gly Gly Gly Ala Ser Ser Ser Ser Phe Phe Glin Lieu Arg Lieu Gly 275 280 285 Ser Ser Ser Glu Asp Asp His Phe Thr Lys Lys Gly Lys Ser Ser Lieu 29 O 295 3OO Ser Phe Asp Lieu. Asp Glin 305 310

<210 SEQ ID NO 15 &2 11s LENGTH 1065 &212> TYPE DNA <213> ORGANISM: Arabidopsis thaliana &220s FEATURE US 2004/0098764 A1 May 20, 2004 60

-continued

&223> OTHER INFORMATION: G2690

<400 SEQUENCE: 15 atggatatgg acgagatgag caatgtagcc aagacaacga cagagacitt.c aggctta act 60 gacitctgtct toga.gc.citcac gaaacgcatgaaaccitact g aggttacgac caccacaaaa 120 cctgccttgt coaacacgac gaaattcaaa gqagttgttc agcaa.ca.gaa cqg to attgg 18O ggtgcticaga tttacgcaga ccatcgaagg atttggcttg gaactittcaa atcc.gct cat 240 gaagcc.gctg. citgcttacga tag.cgcatcg attaa.gctitc gaagctttga tigcta actog 3OO caccggaact tcc cittgg to tdattittacc citc catgaac cqgactittca agagtgctac 360 acgacagaag citgtgttgaa catgatcaga gacggttctt atcaiacacaa gttcagagat 420 tittcto agaa toc ggtotca gattgttgcg aatat caa.ca togtoggat.c aaaacaagtic 480 ttaggaggag gagaaggtgg toaagaatcg aacaagttgtt totcgtgcac goagctttitt 540 cagaaagaac togacaccgag cqatgtaggg aaactgaata ggcttgttgat acctaagaag 600 tatgcagtga agtatatgcc titt cataagc gatgatcaaa go.gagaaaga gacgagtgaa 660 ggagtagaag atgtggaggit totcittttac gacagagcaa toaga caatig gaagtttagg 720 tattgttact g gagaagtag ccagagctitt gttcttcacca gaggatggala togtttcgtg 78O aaggagaaga atctoaagga gaaagatatt attgtc.ttitt acacttgcga tigtc.cccaac 840 aatgttgaaga cattagaagg ccaaagcaag accittcttga tigattgatgttcatcactitt 9 OO toaggcaacg gttitcgtggit toccgaggaa gtaaacaaga cggttcatga gatttctgat 96.O gaagagatga aaacagaaac cct citttacc to galaggtag aagaagaaac caaatcagag 1020 gagaagaaag gagggitttat gctgtttggt gttaggatcc aatag 1065

<210> SEQ ID NO 16 &2 11s LENGTH 354 &212> TYPE PRT <213> ORGANISM: Arabidopsis thaliana &220s FEATURE <223> OTHER INFORMATION: G2690 polypeptide <400 SEQUENCE: 16 Met Asp Met Asp Glu Met Ser Asn Val Ala Lys Thr Thr Thr Glu Thr 1 5 10 15 Ser Gly Lieu. Thir Asp Ser Val Lieu Ser Lieu. Thir Lys Arg Met Lys Pro 2O 25 30 Thr Glu Val Thr Thr Thr Thr Lys Pro Ala Leu Ser Asn Thr Thr Lys 35 40 45 Phe Lys Gly Val Val Glin Glin Glin Asn Gly His Trp Gly Ala Glin Ile 50 55 60 Tyr Ala Asp His Arg Arg Ile Trp Leu Gly Thr Phe Lys Ser Ala His 65 70 75 8O Glu Ala Ala Ala Ala Tyr Asp Ser Ala Ser Ile Lys Lieu Arg Ser Phe 85 90 95 Asp Ala Asn. Ser His Arg Asn. Phe Pro Trp Ser Asp Phe Thr Lieu. His 100 105 110 Glu Pro Asp Phe Glin Glu Cys Tyr Thr Thr Glu Ala Val Leu Asn Met 115 120 125 Ile Arg Asp Gly Ser Tyr Gln His Lys Phe Arg Asp Phe Leu Arg Ile 130 135 1 4 0 US 2004/0098764 A1 May 20, 2004 61

-continued

Arg Ser Glin Ile Val Ala Asn. Ile Asn. Ile Val Gly Ser Lys Glin Val 145 15 O 155 160 Leu Gly Gly Gly Glu Gly Gly Glin Glu Ser Asn Lys Cys Phe Ser Cys 1.65 170 175 Thr Glin Leu Phe Glin Lys Glu Lieu. Thr Pro Ser Asp Val Gly Lys Lieu 18O 185 190 Asn Arg Leu Val Ile Pro Lys Lys Tyr Ala Val Lys Tyr Met Pro Phe 195 200 2O5 Ile Ser Asp Asp Glin Ser Glu Lys Glu Thir Ser Glu Gly Val Glu Asp 210 215 220 Val Glu Val Val Phe Tyr Asp Arg Ala Met Arg Gln Trp Llys Phe Arg 225 230 235 240 Tyr Cys Tyr Trp Arg Ser Ser Glin Ser Phe Val Phe Thr Arg Gly Trp 245 250 255 Asn Gly Phe Wall Lys Glu Lys Asn Lieu Lys Glu Lys Asp Ile Ile Val 260 265 27 O Phe Tyr Thr Cys Asp Val Pro Asn Asn Val Lys Thr Leu Glu Gly Glin 275 280 285 Ser Lys Thr Phe Leu Met Ile Asp Val His His Phe Ser Gly Asin Gly 29 O 295 3OO Phe Val Val Pro Glu Glu Val Asn Lys Thr Val His Glu Ile Ser Asp 305 310 315 320 Glu Glu Met Lys Thr Glu Thir Leu Phe Thr Ser Lys Val Glu Glu Glu 325 330 335 Thr Lys Ser Glu Glu Lys Lys Gly Gly Phe Met Leu Phe Gly Val Arg 340 345 350

Ile Glin

<210 SEQ ID NO 17 &2 11s LENGTH 1253 &212> TYPE DNA <213> ORGANISM: Glycine max &220s FEATURE <223> OTHER INFORMATION: G3451 GLYMA-28NOWO1-CLUSTER19062. 3 Predicted polypeptide sequence is orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 17 citagaatc.cg tacaatctaa toaacataac aaaaatggat gcaattagtt gcatggatga 60 gag caccacc actgagtcac totctataag totttct cog acgtocatcgt c ggagaaag.c 120 gaagccttct tcgatgatta catcgtogga galaggtttct citgtc.ccc.gc cqc.cgtoaaa 18O Cagacitatgc cqtgttggaa gcggcgc.gag cqCagtcgtg gatcCtgat g g cqgcgg cag 240 cggcgctgag gtagagtc.gc ggaaactc.cc citcgtcgaag tacaagggcg tdgtgcc.cca 3OO gcc.caacggc cqctggggtg cqcagattta cq agaag cac cagogcgtgt ggcttggaac 360 gttcaacgag galaga.cgagg C gg.cgcgtgc gtacgaCatC gcc.gc.gcagc ggttcc.gcgg 420 caaggacgcc gtcacgaact tcaag.ccgct c gocggcgcc gacgacgacg acggagaatc 480 ggagtttcto aactc.gcatt coaaacco ga gatcgtogac atgctg.cgaa agcacacgta 540 Caatgacgag Ctggag Caga gcaa.gc.gcag cc.gcggCgtc gtc.cggcggc gaggctcc.gc 600 cgcc.gc.cggc accgcaaact caattitcc.gg cqcgtgctitt actalagg cac gtgag cagot 660 attc gagaag gCtgttacgc cqagcigacgt toggaaattg aaccqtttgg toataccgaa 720 US 2004/0098764 A1 May 20, 2004

-continued gcagdacgcg gagaag cact titc.cgttaca gagctictaac gg.cgittago g c gacgacgat 78O agcggCggtg acggcgacgC C gacggcggC galagg gC gtt ttgttgaact tcgalagacgt 840 tggagggaaa gtgtgg.cggit titcgttactic gtattggaac agtag coaga gttacgt.citt 9 OO alaccalaaggit toggagc.cggit togttaagga gaagaatctgaaagctggtg acacggtttg 96.O ttitt caccgg to cactggac cqgacaag ca gctttacatc gattggaaga cqaggaatgt 1020 tgttaacaac gaggtogcgt tottcggacc ggtogg accq gttgtcgaac cqatccagat 1080 ggttcggctc tittggggitta acattttgaa act accoggit toagatact a ttgttggcaa. 1140 taacaataat gcaagtgggit gctgcaatgg caa.gagaaga gaaatggaac tottcto gtt 1200 agagtgtagc aagaaaccita agattattgg to citttgtaa cqttacgitta ggc 1253

<210> SEQ ID NO 18 <211& LENGTH: 401 &212> TYPE PRT <213> ORGANISM: Glycine max &220s FEATURE <223> OTHER INFORMATION: G3451 polypeptide GLYMA-28NOVO1-CLUSTER 19062 3 Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 18 Met Asp Ala Ile Ser Cys Met Asp Glu Ser Thr Thr Thr Glu Ser Leu 1 5 10 15 Ser Ile Ser Leu Ser Pro Thr Ser Ser Ser Glu Lys Ala Lys Pro Ser 20 25 30 Ser Met Ile Thr Ser Ser Glu Lys Val Ser Leu Ser Pro Pro Pro Ser 35 40 45 Asn Arg Lieu. Cys Arg Val Gly Ser Gly Ala Ser Ala Val Val Asp Pro 50 55 60 Asp Gly Gly Gly Ser Gly Ala Glu Val Glu Ser Arg Lys Lieu Pro Ser 65 70 75 8O Ser Lys Tyr Lys Gly Val Val Pro Glin Pro Asn Gly Arg Trp Gly Ala 85 90 95 Glin Ile Tyr Glu Lys His Glin Arg Val Trp Leu Gly Thr Phe Asin Glu 100 105 110 Glu Asp Glu Ala Ala Arg Ala Tyr Asp Ile Ala Ala Glin Arg Phe Arg 115 120 125 Gly Lys Asp Ala Val Thr Asn. Phe Lys Pro Leu Ala Gly Ala Asp Asp 130 135 1 4 0 Asp Asp Gly Glu Ser Glu Phe Lieu. Asn. Ser His Ser Lys Pro Glu Ile 145 15 O 155 160 Val Asp Met Leu Arg Lys His Thr Tyr Asn Asp Glu Lieu Glu Glin Ser 1.65 170 175 Lys Arg Ser Arg Gly Val Val Arg Arg Arg Gly Ser Ala Ala Ala Gly 18O 185 190 Thr Ala Asn. Ser Ile Ser Gly Ala Cys Phe Thr Lys Ala Arg Glu Glin 195 200 2O5 Leu Phe Glu Lys Ala Val Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg 210 215 220 Leu Val Ile Pro Lys Gln His Ala Glu Lys His Phe Pro Leu Glin Ser 225 230 235 240 Ser Asn Gly Val Ser Ala Thr Thr Ile Ala Ala Val Thr Ala Thr Pro

US 2004/0098764 A1 May 20, 2004 64

-continued cggc gaccac cacctg.cgcc gg.cgacgaga ggccalaccac aaccaca 1067

<210> SEQ ID NO 20 &2 11s LENGTH 315 &212> TYPE PRT <213> ORGANISM: Oryza sativa &220s FEATURE <223> OTHER INFORMATION: ORYSA-22 JANO2-CLUSTER46.187 2 polypeptide Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 20 Met Gly Val Val Ser Phe Ser Ser Thr Ser Ser Gly Ala Ser Thr Ala 1 5 10 15 Thir Thr Glu Ser Gly Gly Ala Val Arg Met Ser Pro Glu Pro Val Val 2O 25 30 Ala Val Ala Ala Ala Ala Glin Gln Leu Pro Val Val Lys Gly Val Asp 35 40 45 Ser Ala Asp Glu Val Val Thir Ser Lys Pro Ala Ala Ala Ala Val Ala 50 55 60 Gln Glin Ser Ser Arg Tyr Lys Gly Val Val Pro Gln Pro Asn Gly Arg 65 70 75 8O Trp Gly Ala Glin Ile Tyr Glu Arg His Ala Arg Val Trp Leu Gly Thr 85 90 95 Phe Pro Asp Glu Glu Ala Ala Ala Arg Ala Tyr Asp Wall Ala Ala Lieu 100 105 110 Arg Tyr Arg Gly Arg Asp Ala Ala Thr Asn Phe Pro Gly Ala Ala Ala 115 120 125 Ser Ala Ala Glu Lieu Ala Phe Leu Ala Ala His Ser Lys Ala Glu Ile 130 135 1 4 0 Val Asp Met Leu Arg Lys His Thr Tyr Ala Asp Glu Lieu Arg Glin Gly 145 15 O 155 160 Leu Arg Arg Gly Arg Gly Met Gly Ala Arg Ala Glin Pro Thr Pro Ser 1.65 170 175 Trp Ala Arg Glu Pro Leu Phe Glu Lys Ala Val Thr Pro Ser Asp Val 18O 185 190 Gly Lys Lieu. Asn Arg Lieu Val Val Pro Lys Glin His Ala Glu Lys His 195 200 2O5 Phe Pro Leu Arg Arg Ala Ala Ser Ser Asp Ser Ala Ser Ala Ala Ala 210 215 220 Thr Gly Lys Gly Val Lieu Lleu. Asn. Phe Glu Asp Gly Glu Gly Lys Wal 225 230 235 240 Trp Arg Phe Arg Tyr Ser Tyr Trp Asin Ser Ser Glin Ser Tyr Val Leu 245 250 255 Thr Lys Gly Trp Ser Arg Phe Val Arg Glu Lys Gly Lieu Arg Ala Gly 260 265 27 O Asp Thr Ile Val Phe Ser Pro Leu Gly Val Arg Pro Arg Glin Ala Ala 275 280 285 Lieu. His Arg Lieu Glin Glu Glu Glin Arg Gly Gly Gly Gly Asp His His 29 O 295 3OO Leu Arg Arg Arg Arg Glu Ala Asn His Asn His 305 310 315

<210> SEQ ID NO 21 &2 11s LENGTH 1081

US 2004/0098764 A1 May 20, 2004 66

-continued

100 105 110 Ser Asp Asp Ala Glu Ser Glu Phe Lieu. Asn. Ser His Ser Lys Phe Glu 115 120 125 Ile Val Asp Met Leu Arg Lys His Thr Tyr Asp Asp Glu Lieu Glin Glin 130 135 1 4 0 Ser Thr Arg Gly Gly Arg Arg Arg Lieu. Asp Ala Asp Thr Ala Ser Ser 145 15 O 155 160 Gly Val Phe Asp Ala Lys Ala Arg Glu Gln Leu Phe Glu Lys Thr Val 1.65 170 175 Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg Lieu Val Ile Pro Lys Glin 18O 185 190 His Ala Glu Lys His Phe Pro Leu Ser Gly Ser Gly Asp Glu Ser Ser 195 200 2O5 Pro Cys Val Ala Gly Ala Ser Ala Ala Lys Gly Met Lieu Lieu. Asn. Phe 210 215 220 Glu Asp Val Gly Gly Lys Val Trp Arg Phe Arg Tyr Ser Tyr Trp Asn 225 230 235 240 Ser Ser Glin Ser Tyr Val Leu Thr Lys Gly Trp Ser Arg Phe Val Lys 245 250 255 Glu Lys Asn Lieu Arg Ala Gly Asp Ala Val Glin Phe Phe Lys Ser Thr 260 265 27 O Gly Pro Asp Arg Glin Leu Tyr Ile Asp Cys Lys Ala Arg Ser Gly Glu 275 280 285 Val Asn Asn Asn Ala Gly Gly Leu Phe Val Pro Ile Gly Pro Val Val 29 O 295 3OO Glu Pro Val Glin Met Val Arg Lieu Phe Gly Val Asn Lieu Lleu Lys Lieu 305 310 315 320 Pro Val Pro Gly Ser Asp Gly Val Gly Lys Arg Lys Glu Met Glu Lieu 325 330 335 Phe Ala Phe Glu Cys Cys Lys Lys Lieu Lys Val Ile Gly Ala Lieu 340 345 350

<210> SEQ ID NO 23 &2 11s LENGTH 1089 &212> TYPE DNA <213> ORGANISM: Glycine max &220s FEATURE <223> OTHER INFORMATION: G3453 Predicted polypeptide sequence is orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 23 atggatggag gCagtgtcac agacgaalacc accacalacca gcaactcitct titcggttcc.g 60 gcqaatctat citcc.gc.cgcc totcagoctt gacggaag.cg gcgcaa.ccgc cqtcgtotac 120 ccc.gacggitt gttgcgtc.tc. c gg.cgaagcc gaatc.ccgga aac toccgtc. citcgaaatac 18O aaaggcgtgg togcc.gcaa.cc gaacggtogt togggagcto agatttacga galagcaccag 240 cgc.gtgtggc ticggcaccitt caacgaggaa gacgaagcc.g. tcagagccta cqacatcgtc 3OO gcqcatc.gct tcc.gcggcc.g. c gacgc.cgito actaact tca agcct citc.gc cqgcgc.cgac 360 gacgc.cgaag cc.gagttcct cagdacgcat to caagtcc.g. agatcgt.cga catgctocq c 420 agg cacacct acgacaacga gct coagcag agcaccc.gcg gcggcaggcg cc.gc.cgggac 480 gcc.gaalaccg C gtcgagcgg C gC gttcgac gcgaaggcgc gtgag cagot ggtc.gagaaa 540 US 2004/0098764 A1 May 20, 2004

-continued accgttacgc cqagcgacgt. c gggaagctgaac cq attag togataccalaa goagcacgc.g 600 gagaag cact titc.cgittaag cqgatcc.ggc gg.cggagcct togcc.gtgcat gg.cgg.cggct 660 gcggggg.cga aaggaatgtt gctgaactitt gaggacgttg gagggaaagt gtggcggttc 720 cgttacitcgt attggaacag tagccagagc tacgtgctta ccaaaggatg gag coggttc 78O gttaaggaga agaatctt.cg agctggtgac goggttcagt tottcaagtic gaccggactg 840 gaccggcaac tatatataga citgcaaggcg aggagtggta aggttaacaa taatgct gcc 9 OO ggitttgttta titc.cggttgg accggttgtt gagcc.ggttc agatggtacg gctttitcggg 96.O gtogac ctitt togaaactacc cqtaccc.ggit toggatggta ttggggttgg citgttgacggg 1020 aagagaaaag agatggagct gtttgcattt gaatgtagca agaagttaaa agtaattgga 1080 gctttgtaa 1089

<210> SEQ ID NO 24 &2 11s LENGTH 362 &212> TYPE PRT <213> ORGANISM: Glycine max &220s FEATURE <223> OTHER INFORMATION: G3453 polypeptide Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 24 Met Asp Gly Gly Ser Val Thr Asp Glu Thir Thr Thr Thr Ser Asn Ser 1 5 10 15 Leu Ser Val Pro Ala Asn Leu Ser Pro Pro Pro Leu Ser Leu Asp Gly 2O 25 30 Ser Gly Ala Thr Ala Val Val Tyr Pro Asp Gly Cys Cys Val Ser Gly 35 40 45 Glu Ala Glu Ser Arg Lys Lieu Pro Ser Ser Lys Tyr Lys Gly Val Val 50 55 60 Pro Glin Pro Asn Gly Arg Trp Gly Ala Glin Ile Tyr Glu Lys His Glin 65 70 75 8O Arg Val Trp Leu Gly. Thir Phe Asn. Glu Glu Asp Glu Ala Wall Arg Ala 85 90 95 Tyr Asp Ile Val Ala His Arg Phe Arg Gly Arg Asp Ala Val Thr Asn 100 105 110 Phe Lys Pro Leu Ala Gly Ala Asp Asp Ala Glu Ala Glu Phe Leu Ser 115 120 125 Thr His Ser Lys Ser Glu Ile Val Asp Met Leu Arg Arg His Thr Tyr 130 135 1 4 0 Asp Asn. Glu Lieu Glin Glin Ser Thr Arg Gly Gly Arg Arg Arg Arg Asp 145 15 O 155 160 Ala Glu Thir Ala Ser Ser Gly Ala Phe Asp Ala Lys Ala Arg Glu Glin 1.65 170 175 Leu Val Glu Lys Thr Val Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg 18O 185 190 Leu Val Ile Pro Lys Gln His Ala Glu Lys His Phe Pro Leu Ser Gly 195 200 2O5 Ser Gly Gly Gly Ala Leu Pro Cys Met Ala Ala Ala Ala Gly Ala Lys 210 215 220 Gly Met Lieu Lieu. Asn. Phe Glu Asp Val Gly Gly Lys Val Trp Arg Phe 225 230 235 240

US 2004/0098764 A1 May 20, 2004 69

-continued <210> SEQ ID NO 26 &2 11s LENGTH 395 &212> TYPE PRT <213> ORGANISM: Glycine max &220s FEATURE <223> OTHER INFORMATION: G3454 polypeptide Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 26 Met Asp Ala Ile Ser Cys Met Asp Glu Ser Thr Thr Thr Glu Ser Leu 1 5 10 15 Ser Ile Ser Glin Ala Lys Pro Ser Ser Thr Ile Met Ser Ser Glu Lys 2O 25 30 Ala Ser Pro Ser Pro Pro Pro Pro Asn Arg Leu Cys Arg Val Gly Ser 35 40 45 Gly Ala Ser Ala Val Val Asp Ser Asp Gly Gly Gly Gly Gly Gly Ser 50 55 60 Thr Glu Val Glu Ser Arg Lys Lieu Pro Ser Ser Lys Tyr Lys Gly Val 65 70 75 8O Val Pro Gln Pro Asn Gly Arg Trp Gly Ser Glin Ile Tyr Glu Lys His 85 90 95 Glin Arg Val Trp Leu Gly. Thir Phe Asn. Glu Glu Asp Glu Ala Ala Arg 100 105 110 Ala Tyr Asp Val Ala Val Glin Arg Phe Arg Gly Lys Asp Ser Val Thr 115 120 125 ASn Phe Lys Pro Leu Ala Gly Ala Asp Asp Asp Asp Gly Glu Ser Glu 130 135 1 4 0 Phe Lieu. Asn. Ser His Ser Lys Pro Glu Ile Val Asp Met Leu Arg Lys 145 15 O 155 160 His Thr Tyr Asn Asp Glu Lieu Glu His Ser Lys Arg Asn Arg Gly Val 1.65 170 175 Val Arg Arg Arg Gly Ser Ala Ala Val Gly Thr Ala Asp Ser Ile Ser 18O 185 190 Gly Ala Cys Phe Thr Asn Ala Arg Glu Gln Leu Phe Glu Lys Ala Wal 195 200 2O5 Thr Pro Ser Asp Val Trp Lys Lieu. Asn Arg Lieu Val Ile Pro Lys Glin 210 215 220 His Ala Glu Lys His Phe Pro Leu Glin Ser Ser Asn Gly Val Ser Ala 225 230 235 240 Thir Thr Ile Ala Ala Val Thr Ala Thr Pro Thr Ala Ala Lys Gly Val 245 250 255 Leu Lieu. Asn. Phe Glu Asp Val Gly Gly Lys Val Trp Arg Phe Arg Tyr 260 265 27 O Ser Tyr Trp Asin Ser Ser Glin Ser Tyr Val Leu Thir Lys Gly Trp Ser 275 280 285 Arg Phe Wall Lys Glu Lys Asn Lieu Lys Ala Gly Asp Thr Val Cys Phe 29 O 295 3OO His Arg Ser Thr Gly Pro Asp Lys Glin Leu Tyr Ile Asp Trp Llys Thr 305 310 315 320 Arg Asn Val Val Asn Asn Glu Val Ala Leu Phe Gly Pro Val Gly Pro 325 330 335 Val Val Glu Pro Ile Gln Met Val Arg Leu Phe Gly Val Asin Ile Leu 340 345 350

US 2004/0098764 A1 May 20, 2004

-continued aag cactitcc cqctcc.gc.cg cqc gg.cgagc ticcgacitcc.g. cct cogcc.gc cqccaccggc 78O aagggcgtgc ticcitcaactt cqaggacggc gaggggaagg totgg.cgatt cogg tactcg 840 tactggaaca gcago cagag citacgtgctg accaaggggt ggagc.cgatt cqtgagggag 9 OO aagggcctico go.gc.cggcga caccatagtc ttctoccgct c gg.cgtacgg ccc.cgacaag 96.O citgctott.ca togactgcaa gaagaacaac goggcgg.cgg cqaccaccac citgc.gc.cggc 1020 gacgagaggc caaccacaag cqgcgc.cgaa ccacgc.gtog to aggct citt cqgcgtogac 1080 atcgc.cgg.cg gcgattgc.cg gaag.cgggag aggg.cggtgg agatgggg.ca agaggtott C 1140 citactgaaga ggcaatgcgt ggttcatcag cqtactcctd coctaggtgc cct gctgtta 1200 tag catcaaa toaaattcat atatagatca aatcaaatct tcttctottc catcttttitt 1260 gttgttcatc gtctgttgtt to atctitcga g 1291

<210 SEQ ID NO 30 &2 11s LENGTH 365 &212> TYPE PRT <213> ORGANISM: Oryza sativa &220s FEATURE <223> OTHER INFORMATION: G3388 polypeptide OSC2 1673. C1. p5. fg GI: 12328560 Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 30 Met Gly Val Val Ser Phe Ser Ser Thr Ser Ser Gly Ala Ser Thr Ala 1 5 10 15 Thir Thr Glu Ser Gly Gly Ala Val Arg Met Ser Pro Glu Pro Val Val 2O 25 30 Ala Val Ala Ala Ala Ala Glin Gln Leu Pro Val Val Lys Gly Val Asp 35 40 45 Ser Ala Asp Glu Val Val Thir Ser Arg Pro Ala Ala Ala Ala Ala Glin 50 55 60 Gln Ser Ser Arg Tyr Lys Gly Val Val Pro Gln Pro Asn Gly Arg Trp 65 70 75 8O Gly Ala Glin Ile Tyr Glu Arg His Ala Arg Val Trp Leu Gly Thr Phe 85 90 95

Pro Asp Glu Glu Ala Ala Ala Arg Ala Tyr Asp Wall Ala A a. Leu Arg 100 105 1 O Tyr Arg Gly Arg Asp Ala Ala Thr Asn. Phe Pro Gly Ala Ala Ala Ser 115 120 125 Ala Ala Glu Lieu Ala Phe Leu Ala Ala His Ser Lys Ala Glu Ile Val 130 135 1 4 0 Asp Met Leu Arg Lys His Thr Tyr Ala Asp Glu Lieu Arg Glin Gly Lieu 145 15 O 155 160 Arg Arg Gly Arg Gly Met Gly Ala Arg Ala Glin Pro Thr Pro Ser Trp 1.65 170 175 Ala Arg Glu Pro Leu Phe Glu Lys Ala Val Thr Pro Ser Asp Val Gly 18O 185 190 Lys Lieu. Asn Arg Lieu Val Val Pro Lys Gln His Ala Glu Lys His Phe 195 200 2O5 Pro Leu Arg Arg Ala Ala Ser Ser Asp Ser Ala Ser Ala Ala Ala Thr 210 215 220 Gly Lys Gly Val Lieu Lieu. Asn. Phe Glu Asp Gly Glu Gly Lys Val Trp 225 230 235 240

US 2004/0098764 A1 May 20, 2004 74

-continued <213> ORGANISM: Oryza sativa &220s FEATURE <223> OTHER INFORMATION: G3389 polypeptide BAB21211.1 OSC21674. C1. p12. fg Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 32 Met Glu Gln Glu Ala Ala Met Val Val Phe Ser Cys Asn Ser Gly Ser 1 5 10 15 Gly Gly Ser Ser Ser Thr Thr Asp Ser Lys Glin Glu Glu Glu Glu Glu 2O 25 30 Glu Glu Lieu Ala Ala Met Glu Glu Asp Glu Lieu. Ile His Val Val Glin 35 40 45 Ala Ala Glu Leu Arg Leu Pro Ser Ser Thr Thr Ala Thr Arg Pro Ser 50 55 60 Ser Arg Tyr Lys Gly Val Val Pro Glin Pro Asn Gly Arg Trp Gly Ala 65 70 75 8O Glin Ile Tyr Glu Arg His Ala Arg Val Trp Leu Gly Thr Phe Pro Asp 85 90 95 Glu Glu Ala Ala Ala Arg Ala Tyr Asp Wall Ala Ala Lieu Arg Phe Arg 100 105 110 Gly Arg Asp Ala Val Thr Asn Arg Ala Pro Ala Ala Glu Gly Ala Ser 115 120 125 Ala Gly Glu Lieu Ala Phe Leu Ala Ala His Ser Lys Ala Glu Val Val 130 135 1 4 0 Asp Met Leu Arg Lys His Thr Tyr Asp Asp Glu Leu Gln Gln Gly Leu 145 15 O 155 160 Arg Arg Gly Ser Arg Ala Glin Pro Thr Pro Arg Trp Ala Arg Glu Pro 1.65 170 175 Leu Phe Glu Lys Ala Val Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg 18O 185 190 Leu Val Val Pro Lys Glin Glin Ala Glu Arg His Phe Pro Phe Pro Leu 195 200 2O5 Arg Arg His Ser Ser Asp Ala Ala Gly Lys Gly Val Lieu Lieu. Asn. Phe 210 215 220 Glu Asp Gly Asp Gly Lys Val Trp Arg Phe Arg Tyr Ser Tyr Trp Asn 225 230 235 240 Ser Ser Glin Ser Tyr Val Leu Thr Lys Gly Trp Ser Arg Phe Val Arg 245 250 255 Glu Lys Gly Lieu Arg Pro Gly Asp Thr Val Ala Phe Ser Arg Ser Ala 260 265 27 O Ala Ala Trp Gly Thr Glu Lys His Lieu Lieu. Ile Asp Cys Lys Lys Met 275 280 285 Glu Arg Asn. Asn Lieu Ala Thr Val Asp Asp Asp Ala Arg Val Val Val 29 O 295 3OO Lys Lieu Phe Gly Val Asp Ile Ala Gly Asp Llys Thr Arg 305 310 315

<210 SEQ ID NO 33 &2 11s LENGTH 1348 &212> TYPE DNA <213> ORGANISM: Oryza sativa &220s FEATURE <223> OTHER INFORMATION: G3390 AC130725 Predicted polypeptide sequence is orthologous to G867, G9, G993, G1930

US 2004/0098764 A1 May 20, 2004 76

-continued

Thr Gly Glu Ala Glu Ala Ala Arg Ala Tyr Asp Wall Ala Ala Glin Arg 100 105 110 Phe Arg Gly Arg Asp Ala Val Thr Asn. Phe Arg Pro Leu Ala Glu Ser 115 120 125 Asp Pro Glu Ala Ala Val Glu Lieu Arg Phe Leu Ala Ser Arg Ser Lys 130 135 1 4 0 Ala Glu Val Val Asp Met Leu Arg Lys His Thr Tyr Lieu Glu Glu Lieu 145 15 O 155 160 Thr Glin Asn Lys Arg Ala Phe Ala Ala Ile Ser Pro Pro Pro Pro Lys 1.65 170 175 His Pro Ala Ser Ser Pro Thr Ser Ser Ser Ala Ala Arg Glu His Leu 18O 185 190 Phe Asp Llys Thr Val Thr Pro Ser Asp Val Gly Lys Lieu. Asn Arg Lieu 195 200 2O5 Val Ile Pro Lys Gln His Ala Glu Lys His Phe Pro Leu Gln Leu Pro 210 215 220 Pro Pro Thr Thr Thr Ser Ser Val Ala Ala Ala Ala Asp Ala Ala Ala 225 230 235 240 Gly Gly Gly Asp Cys Lys Gly Val Lieu Lieu. Asn. Phe Glu Asp Ala Ala 245 250 255 Gly Lys Val Trp Llys Phe Arg Tyr Ser Tyr Trp Asn Ser Ser Glin Ser 260 265 27 O Tyr Val Leu Thr Lys Gly Trp Ser Arg Phe Val Lys Glu Lys Gly Leu 275 280 285 His Ala Gly Asp Ala Val Gly Phe Tyr Arg Ala Ala Gly Lys Asn Ala 29 O 295 3OO Glin Leu Phe Ile Asp Cys Lys Val Arg Ala Lys Pro Thir Thr Ala Ala 305 310 315 320

Ala Ala Ala Ala Phe Leu Ser Ala Wall Ala Ala Ala Ala Ala Pro Pro 325 330 335 Pro Ala Wall Lys Ala Ile Arg Lieu Phe Gly Val Asp Lieu Lieu. Thir Ala 340 345 350 Ala Ala Pro Glu Lieu Glin Asp Ala Gly Gly Ala Ala Met Thr Lys Ser 355 360 365 Lys Arg Ala Met Asp Ala Met Ala Glu Ser Glin Ala His Val Val Phe 370 375 38O Lys Lys Glin Cys Ile Glu Lieu Ala Lieu. Thr 385 390

<210 SEQ ID NO 35 &2 11s LENGTH 1338 &212> TYPE DNA <213> ORGANISM: Oryza sativa &220s FEATURE &223> OTHER INFORMATION: G3391 APO 03450 OSC 26104. C1. p13. fg Predicted polypeptide sequence is orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 35 ggagagtagg agtgtgctag totgtgaggit citactgaaat ggacagotcc agctgcc togg 60 tggatgatac caa.ca.gcggc ggctogtoca cqgacaagct gaggg.cgttg gcc.gc.cgcgg 120 Cggcggagac gg.cgc.cgctg gag.cgcatgg ggagcggggc gag.cgcggtg gtggacgcgg 18O CCgagcctgg CGC ggaggcg gactCcgggt CC9ggggacg tdtgtgcggc gg.cgg.cggCg 240

US 2004/0098764 A1 May 20, 2004 79

-continued gccaacaa.cc cittctitccita cqcgtc.gctd. tcc.ccc.gcga cc.gc.gacggc cqc.cgc.gc.gg 660 gag caccitct tcgacaagac gg to accocc agcigacgtgg gcaa.gctgaa cc.ggctggtg 720 atcc.cgaagc agcacgc.cga gaag cacttic cc.gctgcago toccatcc.gc cqgcggc gag 78O agcaaggg.cg togctccitcaa cct ggaggac gocgcgggca aggtotggcg gttcc.gctac 840 togtactgga acagoagcca gagctacgtg citcaccalagg gctggagcc.g. citt.cgtoaag 9 OO gaga agggcc tocaa.gc.cgg cqacgtogto ggcttctacc gct cogctgc cqgcgc.cgac 96.O accalag citct tcatcg acto caagctg.cgg cccaa.cagog togtogtogc citcgacggca 1020 ggcc.cgtogc citc.cgg.cgcc ggtgg.cgaag gCC gtgcgto tctitcgg.cgt c gacctgctg 1080 acgg Caccgg ccaccgc.cgc gg.cgc.cgg.cg gaggcc.gtgg CC9ggtgcaa gaga.gcCagg 1140 gacittgggitt cqc coccgca gg.cgg.cgttcaagaa.gcago togtoggagct ggcactagtg 1200 tagattaatg citacggag cq atcgatctitt coctdgctag citagtcttitt tttitttittgc 1260 to gatcgcto aactcagatg gtag catcat 1290

<210 SEQ ID NO 38 &2 11s LENGTH 389 &212> TYPE PRT <213> ORGANISM: Zea mays &220s FEATURE <223> OTHER INFORMATION: G3432 polypeptide Orthologous to G867, G9, G993, G1930 <400s. SEQUENCE: 38 Met Asp Ser Ala Ser Ser Lieu Val Asp Asp Thr Ser Ser Gly Gly Gly 1 5 10 15 Gly Gly Ala Ser Thr Asp Lys Lieu Arg Ala Lieu Ala Val Phe Ala Ala 2O 25 30 Ala Ser Gly. Thr Pro Leu Glu Arg Met Gly Ser Gly Ala Ser Ala Val 35 40 45 Val Asp Ala Ala Glu Pro Gly Ala Glu Ala Asp Ser Gly Ser Gly Ala 50 55 60 Ala Ala Wal Ser Val Gly Gly Lys Lieu Pro Ser Ser Arg Tyr Lys Gly 65 70 75 8O Val Val Pro Gln Pro Asn Gly Arg Trp Gly Ala Glin Ile Tyr Glu Arg 85 90 95 His Glin Arg Val Trp Lieu Gly Thr Phe Ala Gly Glu Ala Asp Ala Ala 100 105 110 Arg Ala Tyr Asp Wall Ala Ala Glin Arg Phe Arg Gly Arg Asp Ala Val 115 120 125 Thr Asn. Phe Arg Pro Leu Ala Asp Ala Asp Pro Asp Ala Ala Ala Glu 130 135 1 4 0 Leu Arg Phe Leu Ala Ser Arg Ser Lys Ala Glu Val Val Asp Met Lieu 145 15 O 155 160 Arg Lys His Thr Tyr Phe Asp Glu Lieu Ala Glin Asn Lys Arg Ala Phe 1.65 170 175

Ala Ala Ala Ser Ala Ser Ala Ala Thr Ala Ser Ser Leu Ala Asn. Asn 18O 185 190 Pro Ser Ser Tyr Ala Ser Leu Ser Pro Ala Thr Ala Thr Ala Ala Ala 195 200 2O5 Arg Glu His Leu Phe Asp Lys Thr Val Thr Pro Ser Asp Wall Gly Lys

US 2004/0098764 A1 May 20, 2004 81

-continued tactitcatcg agtaccgc.ca citgccago go cqgcgcc.gcg acgtcgatat cagottcggc 96.O gacgct gcca cc.gtgc.cggc gtggcc.gagg cc.gatagitta toggalacc.gc ggc catgaat 1020 aatgggggtg caacggtggc gtc.cgccacc atc.gc.cggcc atgacatcga ggtgg cagtg 1080 gcaccotcgg g g g cqaggag cittcaggcto titcgg gttca atgttgagtg cagoggcgac 1140 gatgcaccgg caccgg cacc toc toccgcc gaagtggagt atgtcgacgg cqacaccitag 1200

<210> SEQ ID NO 40 &2 11s LENGTH 399 &212> TYPE PRT <213> ORGANISM: Zea mays &220s FEATURE <223> OTHER INFORMATION: G3433 polypeptide Orthologous to G867, G9, G993, G1930 <400 SEQUENCE: 40 Met Asp Ser Ala Ser Ser Lieu Val Asp Asp Thr Ser Gly Ser Gly Gly 1 5 10 15 Gly Ala Cys Thr Asp Lys Lieu Arg Ala Lieu Ala Ala Ala Ala Ala Ser 2O 25 30 Ala Ser Gly Pro Pro Pro Glu Arg Met Gly Ser Gly Ala Ser Ala Val 35 40 45 Val Asp Ala Ala Glu Pro Gly Ala Glu Ala Asp Ser Gly Ser Ala Pro 50 55 60 Ala Ser Val Ala Ala Wall Ala Ala Gly Val Gly Gly Lys Lieu Pro Ser 65 70 75 8O Ser Arg Tyr Lys Gly Val Val Pro Glin Pro Asn Gly Arg Trp Gly Ala 85 90 95 Glin Ile Tyr Glu Arg His Leu Arg Val Trp Leu Gly Thr Phe Thr Gly 100 105 110 Glu Ala Glu Ala Ala Arg Ala Tyr Asp Wall Ala Ala Glin Arg Phe Arg 115 120 125 Gly Arg Asp Ala Val Thr Asn. Phe Arg Pro Leu Ala Glu Ser Asp Lieu 130 135 1 4 0 Asp Pro Asp Ala Ala Ala Glu Lieu Arg Phe Leu Ala Ser Arg Ser Lys 145 15 O 155 160 Ala Glu Val Val Asp Met Leu Arg Lys His Thr Tyr Gly Glu Glu Lieu 1.65 170 175 Ala Glin Asn Arg Arg Ala Phe Ala Ala Ala Ala Ala Ser Lieu Ala Ser 18O 185 190 Pro Gln Leu Pro Pro Ala Lys Asn. Thir Ser Pro Ala Ala Ala Arg Glu 195 200 2O5 His Met Phe Asp Llys Val Lieu. Thr Pro Ser Asp Val Gly Lys Lieu. Asn 210 215 220 Arg Leu Val Val Pro Lys Gln His Ala Glu Arg Phe Phe Pro Ala Ala 225 230 235 240 Gly Ala Gly Ser Thr Glin Lieu. Cys Phe Glin Asp Arg Gly Gly Ala Lieu 245 250 255 Trp Glin Phe Arg Tyr Ser Tyr Trp Gly Ser Ser Glin Ser Tyr Val Met 260 265 27 O Thr Lys Gly Trp Ser Arg Phe Val Arg Ala Ala Arg Lieu Ala Ala Gly 275 280 285 Asp Thr Val Thr Phe Ser Arg Ser Gly Gly Gly Arg Tyr Phe Ile Glu US 2004/0098764 A1 May 20, 2004 82

-contin ued

29 O 295

Tyr Arg His Glin Arg Arg Arg Asp Ile Ser Phe Gly 305 310 315 320

Asp Ala Ala Thr Wall Pro Ala Trp Pro Arg Pro Ile Wall Ile Gly Thr 325 330 335

Ala Ala Met Asn Asn Gly Gly Ala Thr Wall Ala Ser Ala Thr Ile Ala 340 345 350

Gly His Asp Ile Glu Wall Ala Wall Ala Pro Ser Gly Ala Arg Ser Phe 355 360 365

Arg Lieu Phe Gly Phe Asn. Wall Glu Cys Ser Gly Asp Asp Ala Pro Ala 370 375 38O

Pro Ala Pro Ala Pro Ala Glu Wall Glu Tyr Val Asp Gly Asp Thr 385 390 395

<210> SEQ ID NO 41 &2 11s LENGTH 969 &212> TYPE DNA <213> ORGANISM: Brassica oleraceae &220s FEATURE &223> OTHER INFORMATION BZ. 458719 Predicted sequence is orthologous to G8 67, G993, G1930 <400 SEQUENCE: 41 caa.ccc.ggcc cqtatcctgt tocaa.cccag cctttgtact to cacacaat atacaactgt 60 tgatcc tac cgittagatct tittaaaactg atcaaatcac Cgg Cacagag totctitctict 120 ttaacgaacc tgctocaacc tittagtcaac acgtagctitt gacitactgtt ccaatac gag 18O taacggalacc tocacactitt cocgittaacg tottcgaaat tdaacagogt cccitctoacg 240 gaga.cgtc.gc cggittaacgg taacggaaaa tgtttctocq cittggtgttt cggitatcact aaacggttta gct tcc.cgiac gtcactcggc gttaccgttt totcaaacag acactcc.gc.c 360 gttittaaacc cc.gtolacaac cqtaacgtta gcaaacgc.cg totctitcc.gt gtotcc.gitta 420 ccaccqttac gtttg.cgttt cotctgctct aacticittctt tgtaagtgttg titt.ccitcaiac 480 atat caacga totcatattt cqaatgttgcg tittaagaact ttactitcgtc tocgtcacct 540 to accgittac ggaacgtogt gtctggtttg aaattagtga Cgg.cgtcaga gcc.gcggaaa 600

Cggtgagcgg cgacgtcgta gacacgc.gc.c gcttcttctt citt.cgttgaa tgtc.ccgagc 660 caaacgc.gct tgttgcttcto gtatatotga gctcc.ccatc titcc.gtttgg Ctgagggacg 720 acgc.ctittga attittgacga cqggagctitt cittgattctg citt.cgacgc.c gttctgggaa to gagaacca cgcttgaacc gctitcccatt citgtataaac togc.cggaga tgacg acttic 840 gcc.gtott.cg gcq goggtgt atcittagcc.g gagtgtggat ggaaact gta cittgttgagc 9 OO totic catcac actacticaca gctitt catat tagagaaatc acaagaaagt tgttgaaattit 96.O gagaatgaa 969

<210> SEQ ID NO 42 &2 11s LENGTH 288 &212> TYPE PRT <213> ORGANISM: Brassica oleraceae &220s FEATURE <221 NAME/KEY: misc feature <222> LOCAT ION: (28 8) ... (288) <223> OTHER INFORMATION: Xaa can be any naturally occurring amino acid &220s FEATURE <223> OTHER INFORMATION: BZ458719 polypeptide Orthologous to G867, G9,