trends in plant science Reviews The WRKY superfamily of plant transcription factors Thomas Eulgem, Paul J. Rushton, Silke Robatzek and Imre E. Somssich

The WRKY proteins are a superfamily of transcription factors with up to 100 representatives in Arabidopsis. Family members appear to be involved in the regulation of various physio- logical programs that are unique to plants, including pathogen defense, senescence and trichome development. In spite of the strong conservation of their DNA-binding domain, the overall structures of WRKY proteins are highly divergent and can be categorized into distinct groups, which might reflect their different functions.

ne of the apparent fundamental principles of biological The name of the WRKY family is derived from the most promi- evolution is that the progression from ancient to advanced nent feature of these proteins, the WRKY domain, a 60 Olife forms is inseparably connected to an increase in regu- region that is highly conserved amongst family members. The latory capacity. Genome-sequencing efforts have provided evi- emerging picture is that these proteins are regulatory transcription dence for a positive correlation between the proportion of genes factors with a binding preference for the W box, but with the involved in information processing and the complexity of organ- potential to differentially regulate the expression of a variety of isms. More than 20% of the genes within the sequence available target genes. Consistent with a role as transcription factors, for the Arabidopsis thaliana genome appear to encode proteins PcWRKY1 and WIZZ (from tobacco) have been shown to be tar- that play a role in signal transduction or transcription1, whereas geted to the nucleus11,12. only 12% of the genome of the single-celled yeast Saccharomyces cerevisiae contains genes of this type2. The WRKY domain and the W box This increase in biological complexity coincides with the The WRKY domain is defined by the conserved amino acid appearance or expansion of specific groups of regulator genes. sequence WRKYGQK at its N-terminal end, together with a novel One example is the nuclear-receptor-gene family, which is com- zinc-finger-like motif 8 (Fig. 1). Because of the clear binding pref- pletely absent in yeast but highly represented in metazoan organ- erence of all characterized WRKY proteins for the same DNA isms3. The evolution of nuclear receptors is believed to be a key motif, it has been assumed that the WRKY domain, as their only event in the development of intercellular communication, a pre- conserved structural feature, constitutes a DNA-binding domain. requisite for the multicellularity of metazoans4. Similarly, the Indeed, it has recently been shown that an isolated WRKY domain establishment of a complex animal body plan was driven by the has sequence-specific DNA-binding activity12. The divalent metal amplification and divergence of ancestral genes, chelators 1,10-o-phenanthroline and EDTA abolish in vitro DNA thereby generating a sophisticated regulatory system of function- binding, which is taken as strong support for a zinc-finger struc- ally interconnected transcriptional regulators5. ture within the WRKY domain8,10,11. However, it has not yet been To meet their disparate biological requirements, plants and ani- proven that zinc is actually complexed in the WRKY domain. In mals have evolved unique regulatory mechanisms. This was partly addition, nothing is known about the function of the WRKYGQK achieved by combining functional domains from pre-existing fac- heptapeptide stretch, the hallmark of this superfamily. tors to build new regulators, as exemplified by the MADS-box fac- All known WRKY proteins contain either one or two WRKY tors, which play a central role in determining floral and organ domains. They can be classified on the basis of both the number of identity in plants6. In addition, completely new factors have arisen WRKY domains and the features of their zinc-finger-like motif. and we focus here on the potential biological roles of WRKY (pro- WRKY proteins with two WRKY domains belong to group I, nounced ‘worky’) proteins, a large family of transcriptional regu- whereas most proteins with one WRKY domain belong to group II lators that has to date only been found in plants. The abundance of (Fig. 2). Generally, the WRKY domains of group I and group II information provided by the Arabidopsis sequencing projects is an members have the same type of finger motif, whose pattern of

ideal basis for comparative analysis of this superfamily within one potential zinc ligands (C–X4–5–C–X22–23–H–X1–H; Fig. 1) is unique plant species. Although their precise regulatory functions are among all described zinc-finger-like motifs13. The single finger largely unknown, the fact that these factors appear to be specific to motif of a small subset of WRKY proteins is distinct from that of

plants, with probably up to 100 members in Arabidopsis, suggests group I and II members. Instead of a C2–H2 pattern, their WRKY that they play an important role during plant evolution. domains contain a C2–HC motif (C–X7–C–X23–H–X1–C; Fig. 1). Owing to this distinction, they were recently assigned to the newly Biochemical properties of WRKY proteins defined group III. Nevertheless, experimental evidence has shown The first WRKY cDNAs were cloned from sweet potato (Ipomoea that members of all three groups bind sequence specifically to vari- batatas; SPF1), wild oat (Avena fatua; ABF1,2), parsley (Petro- ous W box elements (R.S. Cormack et al., unpublished). selinum crispum; PcWRKY1,2,3) and Arabidopsis (ZAP1), based The two WRKY domains of group I members appear to be on the ability of the encoded proteins to bind specifically to the functionally distinct. As has been shown for SPF1, ZAP1 and DNA sequence motif (T)(T)TGAC(C/T), which is known as the PcWRKY1, sequence-specific binding to their target DNA W box7–10. It has been suggested that the cognate binding site for sequences is mediated mainly by the C-terminal WRKY SPF1 is different from other WRKY proteins. However, the domain7,10,12. The function of the N-terminal WRKY domain oligonucleotide used to isolate SPF1 does have a W box in the remains unclear. Because protein regions outside of the C-termi- flanking sequence7. nal WRKY domain contribute to the overall strength of DNA

1360 - 1385/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1360-1385(00)01600-9 May 2000, Vol. 5, No. 5 199 trends in plant science Reviews

Group I WRKY1 TLFDIVNDGYRWRKYGQKSVKGSPYPRSYYRCSSPG...CPVKKHVERSSHDTKLLITTYEGKHDHDMP WRKY2 SDVDILDDGYRWRKYGQKVVKGNPNPRSYYKCTAPG...CTVRKHVERASHDLKSVITTYEGKHNHDVP WRKY3 SEVDLLDDGYRWRKYGQKVVKGNPYPRSYYKCTTPD...CGVRKHVERAATDPKAVVTTYEGKHNHDVP WRKY4 SEVDLLDDGYRWRKYGQKVVKGNPYPRSYYKCTTPG...CGVRKHVERAATDPKAVVTTYEGKHNHDLP WRKY20 SEVDILDDGYRWRKYGQKVVRGNPNPRSYYKCTAHG...CPVRKHVERASHDPKAVITTYEGKHDHDVP WRKY25 SDIDVLIDGFRWRKYGQKVVKGNTNPRSYYKCTFQG...CGVKKQVERSAADERAVLTTYEGRHNHDIP WRKY26 SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTFTG...CFVRKHVERAFQDPKSVITTYEGKHKHQIP WRKY32 GDVGICGDGYRWRKYGQKMVKGNPHPRNYYRCTSAG...CPVRKHIETAVENTKAVIITYKGVHNHDMP WRKY33 SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTTIG...CPVRKHVERASHDMRAVITTYEGKHNHDVP WRKY34 SDIDILDDGYRWRKYGQKVVKGNPNPRSYYKCTANG...CTVTKHVERASDDFKSVLTTYIGKHTHVVP WRKY44 VESDSLEDGFRWRKYGQKVVGGNAYPRSYYRCTSAN...CRARKHVERASDDPRAFITTYEGKHNHHLL WRKY45 SQVDILDDGYRWRKYGQKAVKNNPFPRSYYKCTEEG...CRVKKQVQRQWGDEGVVVTTYQGVHTHAVD WRKY58 SEVDLLDDGYRWRKYGQKVVKGNPHPRSYYKCTTPN...CTVRKHVERASTDAKAVITTYEGKHNHDVP WRKY10 SDEDNPNDGYRWRKYGQKVVKGNPNPRSYFKCTNIE...CRVKKHVERGADNIKLVVTTYDGIHNHPSP Group II (a) WRKY18 DTSLTVKDGFQWRKYGQKVTRDNPSPRAYFRCSFAPS..CPVKKKVQRSAEDPSLLVATYEGTHNHLGP WRKY40 KDGYQWRKYGQKVTRDNPSPRAYFKCACAPS..CSVKKKVQRSVEDQSVLVATYEGEHNHPMP WRKY60 VSSLTVKDGYQWRKYGQKITRDNPSPRAYFRCSFSPS..CLVKKKVQRSAEDPSFLVATYEGTHNHTGP

(b) WRKY6 SEAPMISDGCQWRKYGQKMAKGNPCPRAYYRCTMATG..CPVRKQVQRCAEDRSILITTYEGNHNHPLP WRKY9 CETATMNDGCQWRKYGQKTAKGNPCPRAYYRCTVAPG..CPVRKQVQRCLEDMSILITTYEGTHNHPLP WRKY31 SEAAMISDGCQWRKYGQKMAKGNPCPRAYYRCTMAGG..CPVRKQVQRCAEDRSILITTYEGNHNHPLP WRKY36 CEDPSINDGCQWRKYGQKTAKTNPLPRAYYRCSMSSN..CPVRKQVQRCGEETSAFMTTYEGNHDHPLP WRKY42 SEAPMLSDGCQWRKYGQKMAKGNPCPRAYYRCTMAVG..CPVRKQVQRCAEDRTILITTYEGNHNHPLP WRKY47 HKQHEVNDGCQWRKYGQKMAKGNPCPRAYYRCTMAVG..CPVRKQVQRCAEDTTILTTTYEGNHNHPLP WRKY61 NDGCQWRKYGQKIAKGNPCPRAYYRCTIAAS..CPVRKQVQRCSEDMSILISTYEGTHNHPLP (c) WRKY8 TEVDHLEDGYRWRKYGQKAVKNSPYPRSYYRCTTQK...CNVKKRVERSYQDPTVVITTYESQHNHPIP WRKY12 SDVDVLDDGYKWRKYGQKVVKNSLHPRSYYRCTHNN...CRVKKRVERLSEDCRMVITTYEGRHNHIPS WRKY13 SEVDVLDDGYRWRKYGXKVVKNTQHPRSYYRCTQDK...CRVKKRVERLADDPRMVITTYEGRHLHSPS WRKY23 SEVDHLEDGYRWRKYGQKAVKNSPFPRSYYRCTTAS...CNVKKRVERSFRDPSTVVTTYEGQHTHISP WRKY24 SDDDVLDDGYRWRKYGQKSVKHNAHPRSYYRCTYHT...CNVKKQVQRLAKDPNVVVTTYEGVHNHPCE WRKY28 SEVDHLEDGYRWRKYGQKAVKNSPYPRSYYRCTTQK...CNVKKRVERSFQDPTVVITTYEGQHNHPIP WRKY43 SDADILDDGYRWRKYGQKSVKNSLYPRSYYRCTQHM...CNVKKQVQRLSKETSIVETTYEGIHNHPCE WRKY48 KSIDNLDDGYRWRKYGQKAVKNSPYPRSYYRCTTVG...CGVKKRVERSSDDPSIVMTTYEGQHTHPFP WRKY49 NSNGMCDDGYKWRKYGQKSIKNSPNPRSYYKCTNPI...CNAKKQVERSIDESNTYIITYEGFHFHYTY WRKY50 SEVEVLDDGFKWRKYGKKMVKNSPHPRNYYKCSVDG...CPVKKRVERDRDDPSFVITTYEGSHNHSSM WRKY51 DVMDDGFKWRKYGKKSVKNNINKRNYYKCSSEG...CSVKKRVERDGDDAAYVITTYEGVHNHESL WRKY56 SDDDVLDDGYRWRKYGQKSVKNNAHPRSYYRCTYHT...CNVKKQVQRLAKDPNVVVTTYEGVHNHPCE WRKY57 SDVDNLEDGYRWRKYGQKAVKNSPFPRSYYRCTNSR...CTVKKRVERSSDDPSIVITTYEGQHCHQTI WRKY59 DEKVALDDGYKWRKYGKKPITGSPFPRHYHKCSSPD...CNVKKKIERDTNNPDYILTTYEGRHNHPSP

(d) WRKY7 KMADIPSDEFSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERALDDAMMLIVTYEGDHNHALV WRKY11 KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSTFRG..CPARKHVERALDDPAMLIVTYEGEHRHNQS WRKY15 KMSDVPPDDYSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERAADDSSMLIVTYEGDHNHSLS WRKY17 KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSTFRG..CPARKHVERALDDSTMLIVTYEGEHRHHQS WRKY21 KVADIPPDDYSWRKYGQKPIKGSPYPRGYYKCSSMRG..CPARKHVERCLEDPAMLIVTYEAEHNHPKL WRKY39 KIADIPPDEYSWRKYGQKPIKGSPHPRGYYKCSSVRG..CPARKHVERCIDETSMLIVTYEGEHNHSRI (e) WRKY14 SGEVVPSDLWAWRKYGQKPIKGSPFPRGYYRCSSSKG..CSARKQVERSRTDPNMLVITYTSEHNHPWP WRKY16 DRGSRSSDLWVWRKYGQKPIKSSPYPRSYYRCASSKG..CFARKQVERSRTDPNVSVITYISEHNHPFP WRKY22 AAEALNSDVWAWRKYGQKPIKGSPYPRGYYRCSTSKG..CLARKQVERNRSDPKMFIVTYTAEHNHPAP WRKY27 TQENLSSDLWAWRKYGQKPIKGSPYPRNYYRCSSSKG..CLARKQVERSNLDPNIFIVTYTGEHTHPRP WRKY29 KEENLLSDAWAWRKYGQKPIKGSPYPRSYYRCSSSKG..CLARKQVERNPQNPEKFTITYTNEHNHELP WRKY35 SGEVVPSDLWAWRKYGQKPIKGSPYPRGYYRCSSSKG..CSARKQVERSRTDPNMLVITYTSEHNHPWP Group III WRKY30 GVDRTLDDGFSWRKYGQKDILGAKFPRGYYRCTYRKSQGCEATKQVQRSDENQMLLEISYRGIHSCSQA WRKY41 GLEGPHDDIFSWRKYGQKDILGAKFPRSYYRCTFRNTQYCWATKQVQRSDGDPTIFEVTYRGTHTCSQG WRKY46 QENGSIDDGHCWRKYGQKEIHGSKNPRAYYRCTHRFTQDCLAVKQVQKSDTDPSLFEVKYLGNHTCNNI WRKY53 GLEGPQDDVFSWRKYGQKDILGAKFPRSYYRCTHRSTQNCWATKQVQRSDGDATVFEVTYRGTHTCSQA WRKY54 VEAKSSEDRYAWRKYGQKEILNTTFPRSYFRCTHKPTQGCKATKQVQKQDQDSEMFQITYIGYHTCTAN WRKY55 NTDLPPDDNHTWRKYGQKEILGSRFPRAYYRCTHQKLYNCPAKKQVQRLNDDPFTFRVTYRGSHTCYNS WRKY38 SPDPIYYDGYLWRKYGQKSIKKSNHQRSYYRCSYNKDHNCEARKHEQKIKDNPPVYRTTYFGHHTCKTE WRKY52 IPAIDEGDLWTWRKYGQKDILGSRFPRGYYRCAYKFTHGCKATKQVQRSETDSNMLAITYLSEHNHPRP

Trends in Plant Science

200 May 2000, Vol. 5, No. 5 trends in plant science Reviews

Fig. 1. Left Comparison of WRKY Group I domain sequences from AtWRKY pro- WRKY WRKY teins. Sequences encoding the peptide Pc WRKY1 stretch WRKYGQK were found by the BLAST programs tblastn and blastp pro- WRKY WRKY grams37 in genomic and EST databases. Ib SPF1 Gaps (dots) have been inserted for opti-

mal alignment. Residues that are highly WRKY WRKY At ZAP1 conserved within each of the major groups are in red and potential zinc li- WRKY WRKY Nt WRKY1 gands are highlighted in black boxes. For each (sub)group, the position of a

conserved intron is indicated by an WRKY WRKY Nt WRKY2 arrowhead.

WRKY WRKY Cs SE71 binding, the N-terminal domain might par- Group II ticipate in the binding process, increasing the affinity or specificity of these proteins WRKY Pc WRKY3 for their target sites. Alternatively, it might provide an interface for protein–protein WRKY Af ABF2 interactions, a known function of some zinc-finger-like domains14; this could allow more efficient DNA binding through inter- WRKY Pc WRKY4 actions with other DNA-associated pro- WRKY teins. Not unexpectedly, the single WRKY Nt WIZZ domains of group II and III family mem- bers are more similar in sequence to the C- Group III terminal than to the N-terminal WRKY domain of group I proteins, suggesting that WRKY Pc WRKY5 the C-terminal and single WRKY domains are functionally equivalent and constitute WRKY Nt WRKY4 the major DNA-binding domain. The conservation of the WRKY domain is mirrored by a remarkable conservation of WRKY Nt WRKY5 the cognate cis-acting W box elements. Trends in Plant Science These (T)(T)TGAC(C/T) sequence el- ements contain the invariant TGAC core, Fig. 2. Schematic representation of published full length WRKY proteins from parsley (Pc), which is essential for function and WRKY sweet potato (Ib), Arabidopsis (At), tobacco (Nt), cucumber (Cs) and wild oat (Af ). They are divided into three groups based on the number and type of the WRKY domains they contain. binding. They mediate transcriptional 9,15 WRKY domains are black, putative basic nuclear localization signals are blue and leucine responses to pathogen-derived elicitors zippers are pink. Serine–threonine-rich regions are yellow, -rich regions are and are present in the promoters of many purple, proline-rich regions are green and acidic regions are red. plant genes that are associated with defense16. Functional W boxes frequently cluster within short promoter stretches15–17 and can act together synergistically12. WRKY–W box interactions dimensional shapes of such complexes. Indeed, WRKY proteins have been demonstrated by numerous binding experiments, both might be part of multimeric protein–DNA complexes. Both in vitro and in vivo8–10,12,18,19 (R.S. Cormack et al., unpublished), WRKY protein-containing nuclear extracts and purified recombi- and random binding-site selection assays have shown that the opti- nant WRKYs from tobacco lose their DNA-binding activity when mal binding site for ZAP1 contains the W box motif10. Interactions treated with the protein-dissociating agent deoxycholate18. of WRKY proteins with W boxes can be regulated post-trans- Furthermore, some WRKY proteins contain potential leucine zip- lationally, because binding of WRKY-like DNA-binding activities pers (LZs), structures known to allow protein dimerization. They to W boxes in tobacco is abolished by treatment with alkaline appear to be functional in PcWRKY4 and 5, because their de- phosphatase18 and the protein-kinase inhibitor staurosporin20. letion greatly reduces reporter-gene expression mediated by these In spite of the stereotypic binding preferences of WRKY pro- proteins in yeast (R.S. Cormack et al., unpublished). teins for W boxes, their affinities for certain types or arrangements of this element can vary (R.S. Cormack et al., unpublished). Transcriptional regulation Sequences flanking the invariant W box TGAC core might be ZAP1 and PcWRKY1, 4 and 5 can activate transcription in partly responsible for the observed specificity. In addition, the yeast10,12 (R.S. Cormack et al., unpublished), a feature that has been cooperative assembly of discrete higher-order WRKY–DNA confirmed for ZAP1 and PcWRKY1 in plant cells10,12. Although it complexes at defined W box arrangements might also account for has yet to be studied in detail, the primary structures of WRKY pro- specific promoter recognition12. Owing to the high variability in teins have an abundance of potential transcriptional activation or overall protein structure, access to certain promoters would be repression domains (Fig. 2). A common feature of many domains restricted to distinct family members that fit into the three- affecting transcription is the predominance of certain amino acids,

May 2000, Vol. 5, No. 5 201 trends in plant science Reviews

Table 1. Identified members of the Arabidopsis WRKY superfamily of transcription factors

AtWRKY Group Chr. ESTa Genea BAC.ORF AtWRKY Group Chr. ESTa Genea BAC.ORF

1 I 2X92976/ZAP1 AC007211F1013.1 19 4 AL049638 F16J13.90 AI995838 AC006955 F2818 AL091613 T8E18-end 2 I 5 T44598 AB026656 MXK23 20 I 4 AL078465 T15N24.90 AA395490 21 II 2 AA586133 U93215 T06B20.6 N37131 T20410 AQ010529 F24C8-end 3 I 2 T45479 AC006284 T4M8.23 AI992739 AI099874 B77849 T29F22-end 22 II 4 T04811 AF007269 IG002N01.6 AI993164 AL080571 F1G15-end 23 II 2 F14417 AC002337 T08I13.10 AL096246 T16K23-end F14438 4 I 1 T22085 AC007576 F7A19.5 24 I 5 AB005233 MBK23 W43265 25 I 2 T42934 AC002338 T9D9.6 AA585810 AC004165 T27E13.1 5 5 H77044 AB011485 MXH1.P3 AL093076 T10P21-end H77050 26 I 5 AA585811 B09174 T30A11-end H77051 T22092 AB010697 M0J9.24 AI995170 AI995443 AA605512 27 5 AB009055 MXC20.3 6 II 1 U75592 AC000375 F19K23.22 28 II 4 AL021713 T9A21.10 AA650675 29 II 4 AL035394 F9D19.20 H77127 30 III 5 AB010696 MLE8.2 AA394951 31 II 4 AL022140 F1N20.170 AA650826 32 I 4 AL022198 F6I18.160 AI992388 33 I 2 AC004683 T19C21.4 7 II 4 N37775 AC005861 F23B24 2 AC005499 T6A23.33 T20578 AL078637 T22A6.70 34 I 4 AL022223 M3E9.130 R30038 35 II 2 AC004238 F19I3.6 AI992658 36 II 1 AC010675 T17F3.16 8 II 5 AB010698 MPL12.9 AC010852 T12P18 9 AI998645 AQ011596 F24A12TRC-end 37 B23309 F28C5-end B98122 F24A12TRB-end 38 ? 5 AB012244 MQJ16.9 10 II 1 AC002328 F20N2 39 II 3 AC011437 F7O18.30 11 II 4 R64846 AL080283 F3L17.120 40 II 1 AC011713 F23A5 T88086 41 III 4 AF080120 F2P3.16 R30283 AL049876 T22B4.50 AI998936 42 II 4 AF076243 T26N6.6 T22071 43 2 AC005397 T3F17.22 T42669 44 I 2 AC005896 F3G5.5 Z29806 45 1/3? ATU63815 AT.I.24-4 Z29805 AC010797 IGF-F28J7 12 2 AC003672 F16B22 AC011664 F5A18 13 F14100 AC011624 T18B3 II 4 AL078620 F23K16.40 46 III 2 AC006526 F11C10.9 14 1 D88748 AC007060 T5I8.10 47 II 4 AF104919 T15B16.12 T20672 48 5 AB023033 K6M13 15 II 2 Z25667 AC002391 T20D16.5 49 5 AB017070 MNL12 T04430 50 5 AC005965 T19G15 T43675 51 5 AB019236 MXK3 T21472 52 III? 5 AB020744 K9E15 H36048 53 III 4 AL078468 T32A16 AI993841 AL035394 F9D16.280 16 II 5 AA042185 AB010693 K21C13.P1 54 III 2 AC007660 T7D17.7 AA395309 B27842 T19B7-end 55 III 2 AC007660 T7D17.8 17 2 AA712348 AC006954 F25P17.13 56 1 AC007764 F22C12 R90490 57 AL080748 F1L14-end AA067545 58 I 3 AC008261 T4P13 AI100579 59 II 2 AC007019 F7D8.22 18 II 4 U74179 AL031004 F28M20.10 60 II 2 AC006585 F27C12.8 AL049607 F11C18 61 II 1 AC011809 F6A14

aGenBank Accession no. Abbreviations: Chr., chromosome; ORF, open reading frame; question marks denote either inconclusive group assignment or inconclusive chromosomal position.

202 May 2000, Vol. 5, No. 5 trends in plant science Reviews

including , glutamine, proline, serine, threonine and Biological roles of WRKY factors charged amino acids21,22. At least two of the seven potential ‘trans- One of the most challenging questions concerns the regulatory regulatory’ domains in PcWRKY1 activate transcription in processes governed by WRKY proteins. Clues might come partly yeast12. The possibility that WRKY proteins possess both activa- from gene expression studies. Because many WRKY genes are tor and repressor functions, as shown for the maize VP1 (Ref. 23), themselves transcriptionally regulated, their distinct expression remains to be tested. patterns might yield hints as to the regulatory functions of the encoded factors under particular biological conditions. In addi- Complexity of the WRKY family in Arabidopsis tion, a full understanding of the biological roles of these factors The large amount of genomic and cDNA sequences available will require the identification of the target genes whose expression from Arabidopsis yields insights into the complexity of the they affect. WRKY family in a single plant species. In total, 61 distinct ORFs potentially encoding WRKY proteins can be found in the data- Expression behavior of WRKY genes bases to date (Table 1). With the exception of AtWRKY1, which Current data point to many WRKY proteins having a regulatory is identical to ZAP1 (Ref. 10), and AtWRKY44, which is defined function in the response to pathogen infection and other stresses. by the ttg2 mutant (C.S. Johnson and D.R. Smyth, pers. com- Effective plant defense against pathogenic microorganisms is mun.), none of these proteins has been described before. We associated with the concerted activation of a large variety of encourage the use of the designations used in Table 1 in future genes, occurring in several temporally distinct waves26. Increased studies to avoid the confusion often caused when multiple names levels of WRKY mRNA, protein and DNA-binding activity have are assigned to a given gene member within large families. been reported to be induced by infection with viruses19, bacteria The AtWRKY genes are randomly distributed over the five (A. Dellagi and P. Birch, pers. commun.) or oomycetes12, by fun- chromosomes and preliminary analyses suggest that they might gal elicitors9,20 (R.S. Cormack et al., unpublished), and by signal- all be present as single copies (I. Somssich and S. Robatzek, ing substances such as salicylic acid18. In addition, WRKY gene unpublished). Many of these putative WRKY proteins are repre- expression has been shown to be upregulated in response to sented by ESTs showing that the corresponding genes are wounding11 (S. Robatzek and I.E. Somssich, unpublished) and expressed. By the number and sequence of their WRKY domains, upon local mechanical stimulation of plant protoplasts27. Induced these proteins can be assigned to the three major groups. Given WRKY mRNA accumulation is often extremely rapid and tran- that about two-thirds of the Arabidopsis genome has been sient, and seems not to require de novo synthesis of regulatory fac- sequenced to date, the total number of WRKY genes in this species tors9,11 (R.S. Cormack et al., unpublished). This immediate–early might be as high as 100. expression behaviour indicates a role for the WRKY proteins in A phylogenetic tree of the AtWRKY proteins based on their regulating subsequently activated secondary-response genes, WRKY domains (Fig. 3) clearly indicates that group II splits up whose products carry out the protective and defensive reactions. into five distinct subgroups (IIa–e). The resulting refined classifi- Comparative expression studies with several AtWRKY genes cation is further substantiated by the presence of ten additional also suggest that certain family members have a role in the regu- structural motifs that are conserved among subsets of AtWRKY lation of senescence (S. Robatzek and I.E. Somssich, unpub- family members. Each of these motifs occurs only in certain sub- lished). Transcript levels of AtWRKY4, 6, 7 and 11 are enhanced groups and each subgroup seems to be best defined by combi- in senescent leaves. In transgenic Arabidopsis plants, an nations of such motifs. In some cases, the sequences of these motifs AtWRKY6 promoter–GUS reporter gene is strongly activated in can reveal clues about their potential functions. In addition to pep- senescent leaves as well as in response to infection by pathogenic tide sequences that might serve as nuclear localization signals24, a bacteria. As several genes are known to be highly expressed heptad repeat of bulky hydrophobic residues characteristic for LZs during both leaf senescence and defense, we might expect the (Ref. 25) is present in some of the proteins. The heptad repeat existence of common regulatory mechanisms between these two occurs exclusively in members of subgroups IIa and IIb. Recent physiological processes28. experiments have shown that the LZ region of AtWRKY6 mediates Inspection of plant databases has revealed the existence of dimerization (S. Robatzek and I.E. Somssich, unpublished). more than 500 WRKY ESTs identified from various tissue An additional common feature that is found in the WRKY genes sources, including roots, leaves, inflorescences, abscission zones, is the existence of an intron within the region encoding the C- seeds and vascular tissue, as well as from drought- or salt-stressed, terminal WRKY domain of group I members or the single WRKY or pathogen-infected tissue. Thus, WRKY genes appear to be domain of group II and III members. This intron position is highly expressed in numerous cell types and under different physio- conserved, being localized after the codon encoding that logical conditions and could therefore participate in the control of is N terminal to the zinc-finger-like motif (Fig. 1). Strikingly, in a wide variety of biological processes. all the genes encoding subgroup IIa and IIb members, the position of this intron is exactly 16 codons further towards the C terminus. Targets of WRKY regulation In spite of the phylogenetic distance of their WRKY domains, As suggested by the general binding preference of WRKY pro- members of all three groups have been shown to recognize W box teins for W boxes, genes containing these promoter elements are elements, indicating that this is a general feature of the entire likely targets of WRKY factors, and these include the WRKY superfamily. genes themselves12 as well as a large variety of defense-related A few AtWRKY proteins do not fit neatly into any one genes of the PR type16,18. Additionally, gibberellic acid-induced (sub)group. For example, AtWRKY10, which carries only one expression of the wild-oat ␣-Amy2/54 gene8 and activation of WRKY domain, appears to be more related to group I (Fig. 3). the barley HvLox1 gene in response to the defense and wound This might be explained by the secondary loss of the N-terminal signaling molecule jasmonic acid29 also appear to involve WRKY domain. Furthermore, based on the pattern of and WRKY–W box interactions. Furthermore, a role has been sug- residues within their WRKY domains (Fig. 1), gested for SPF1 in the sucrose- or polygalacturonic-acid-induced AtWRKY38 and AtWRKY52 could either belong to group III or expression of genes coding for sporamin and ␤-amylase in sweet represent members of a novel group (Fig. 3). potato7. However, as mentioned earlier, uncertainties about the

May 2000, Vol. 5, No. 5 203 trends in plant science Reviews

WRKY38* C WRKY WRKY52* 89 WRKY30* WRKY54 WRKY41 WRKY55 WRKY 55 WRKY46 C WRKY WRKY53 III 69 68 WRKY14* 51 WRKY35*

WRKY29* WRKY16 77 E C 1 WRKY B WRKY27*

83 WRKY22* IIe WRKY7 WRKY21* 61 93 WRKY11* WRKY39 WRKY15 C HARF 1 WRKY WRKY17 IId

WRKY8 WRKY49 WRKY12 WRKY50 77 WRKY13 WRKY51 71 WRKY23 WRKY56 WRKY24 WRKY57 3 A WRKY WRKY28* WRKY59 WRKY43* 100 WRKY48 IIc

WRKY1* WRKY32* WRKY33 WRKY 2 A WRKY WRKY2* * WRKY34 WRKY3 WRKY44* D WRKY 2 A WRKY WRKY4* WRKY45 WRKY10* WRKY58 WRKY10 WRKY20 WRKY25* D 2 WRKY WRKY26 I WRKY6* WRKY47 100 WRKY9 WRKY61 WRKY31* 2 LZ A WRKY WRKY36 WRKY42 IIb WRKY18* WRKY40 LZ WRKY WRKY60* IIa

A [K/R]EPRVAV[Q/K]T[K/V]SEVD[I/V]L E EGDLxAVVG 1 KKRKx[K/R]xK[R/K]TV[R/I][V/K]PA 2 B NALAGSTR LZ LREELxRVNxENKKLKEMLx2Vx6L EEPExKRRKxE C VSSFK[K/R]VISLL HARF RTGHARFRR[A/G]P 3 KAKKxxQK

D LSPSNLLESPxL Trends in Plant Science

Fig. 3. Phylogenetic analysis of 58 members of the AtWRKY family. Amino acid sequences from the single WRKY domain of group II and III members or the C-terminal WRKY domain of group I members were aligned using PileUp (Wisconsin Package Version 10.0, Genetics Computer Group, Madison, WI, USA). The diagram shows the most parsimonious tree constructed using PAUP 3.1.1 (Smithsonian Institution, Washington, DC, USA) to perform a heuristic search with a pre-aligned reduced data set including only representatives of each AtWRKY(sub)group (indicated by asterisks). Based on the results of additional PAUP 3.1.1 runs with extended data sets, further members of each (sub)group were added to the figure. The tree shown is unrooted and has a consistency index of 0.808. The numbers above the branches are bootstrap values from 100 repli- cates. AtWRKYs that consistently clustered together are grouped in blue boxes. Members of subgroups IIa and IIe, are not lined up at the branches of separate subtrees but nevertheless share significant similarities in their WRKY domains. Higher-order branches on the right-hand side of the tree representing relationships within each (sub)group were not highly reproducible and were therefore eliminated. White extensions of branches within the blue boxes indicate that the branch leads only to one distinct AtWRKY. Family members that cannot be unequivocally assigned to a defined (sub)group are highlighted by gray boxes. Conserved primary structural features of the AtWRKY family outside the WRKY domains were identified using MEME (Ref. 38; http://www.sdsc.edu/MEME) and are shown below the tree. Schematic representations of typical mem- bers of each (sub)group are shown on the right: WRKY domains are indicated by black boxes; motifs 1, 2 and 3 are basic stretches that might be nuclear localization sequences; additional basic motifs not detected by MEME are shown by blue boxes without numbers; LZ indicates potential leucine zipper structures that were also predicted by the COILSCAN and COIL (Wisconsin Package Version 10.0)39 programs.

204 May 2000, Vol. 5, No. 5 trends in plant science Reviews

exact binding site of SPF1 means that more work is required to WRKY proteins. Comparative studies in lower plants (e.g. ferns, establish its role in vivo. To date, W boxes have been described as mosses and algae) can give clues to whether WRKY gene diversi- positive cis-acting elements upregulating transcription. However, fication correlates with increasing developmental and metabolic in the case of the Arabidopsis PR1 gene, the basal and salicylic pathway complexity. Furthermore, generating Arabidopsis acid-induced expression levels might be negatively regulated by knock-out lines that affect several members of individual sub- W boxes17. SNI1, a negative regulator of PR gene expression, was groups might help to ‘wrky’ matters out. recently identified in a genetic screening for second-site suppres- sors of the Arabidopsis mutation npr1 (Refs 30,31). Interestingly, Acknowledgements SNI1, which is nuclear localized, contains no obvious DNA-bind- We thank Hiroshi Sano (NAIST, Japan); David R. Smyth ing domain. One possible mode of SNI1 action would involve (Monash University, Australia); Zhixiang Chen (University of interaction with WRKY factors bound to the W box31. Idaho, USA); Jeff Dangl (University of North Carolina, USA); The involvement of WRKY factors in regulating part of the Robert Dietrich (Novartis, Research Triangle, USA); Alia Dellagi defense program is further substantiated by a large-scale expres- and Paul Birch (Scottish Crop Research Institute, UK), for pro- sion profiling study (J. Dangl and R.A. Dietrich, pers. commun.). viding preprints of unpublished data; and Klaus Hahlbrock for Using a DNA microarray with 10 000 Arabidopsis ESTs, a group critical reading of the manuscript and continuous support. of 25 genes, including PR1, was identified whose expression responded coordinately to various pathogens as well as to other References defense-inducing conditions. Within the first kilobase of their pro- 1 Bevan, M. et al. (1998) Analysis of 1.9 Mb of contiguous sequence from moters, these genes shared only the W box motifs (TTGAC), with chromosome 4 of Arabidopsis thaliana. Nature 391, 485–488 on average four copies, which were often clustered. By contrast, 2 Mewes, H.W. et al. (1997) Overview of the yeast genome. Nature 387, 7–8 the promoters of a control set of genes not coordinately regulated 3 Clarke, N.D. and Berg, J.M. (1998) Zinc fingers in Caenorhabditis elegans: with PR1 contained, on average, less than two W boxes. finding families and probing pathways. Science 282, 2018–2022 The only WRKY mutant so far described is transparent testa 4 Laudet, V. et al. (1992) Evolution of the gene superfamily. glabra 2 (ttg2), which is based on a transposon insertion within EMBO J. 11, 1003–1013 AtWRKY44/TTG2 (C.S. Johnson and D.R. Smyth, pers. com- 5 Gellon, G. and McGinnis, W. (1998) Shaping animal body plans in mun.). In ttg2, the number of trichomes and their branching is developmental and evolution by modulation of Hox expression patterns. reduced, as is anthocyanin pigmentation of the seed coat, together BioEssays 20, 116–125 with a loss of mucilage. This pleiotropic phenotype resembles that 6 Riechmann, J.L. and Meyerowitz, E.M. (1997) MADS domain proteins in of ttg1, which is defective for a regulatory protein of the WD40- plant development. Biol. Chem. 378, 1079–1101 repeat type32. AtWRKY44/TTG2 and TTG1 might therefore act 7 Ishiguro, S. and Nakamura, K. (1994) Characterization of a cDNA encoding a in the same regulatory cascade, controlling a common set of novel DNA-binding protein, SPF1, that recognizes SP8 sequences in the 5Ј genes. The extensive use of reverse genetics to obtain additional upstream regions of genes coding for sporamin and ␤-amylase from sweet tagged WRKY mutants, as well as the generation of WRKY pro- potato. Mol. Gen. Genet. 244, 563–571 moter–reporter gene and WRKY overexpressor lines, will allow 8 Rushton, P.J. et al. (1995) Members of a new family of DNA-binding proteins us to gain a more comprehensive understanding of the various bind to a conserved cis-element in the promoters of ␣-Amy2 genes. Plant Mol. biological roles of WRKY proteins. Furthermore, inducible Biol. 29, 691–702 expression systems33 could be used for controlled temporal over- 9 Rushton, P.J. et al. (1996) Interaction of elicitor-induced DNA binding expression of WRKY transgenes in their own loss-of-function proteins with elicitor response elements in the promoters of parsley PR1 background; combined with methods of large-scale gene expres- genes. EMBO J. 15, 5690–5700 sion profiling (e.g. differential display, DNA chips), this should 10 de Pater, S. et al. (1996) Characterization of a zinc-dependent transcriptional facilitate the identification of defined WRKY target genes. In a activator from Arabidopsis. Nucleic Acids Res. 24, 4624–4631 similar way, the Arabidopsis NAP gene was identified as a target 11 Hara, K. et al. (2000) Rapid systemic accumulation of transcripts encoding a of the APETALA3–PISTILLATA transcription factor dimer34. tobacco WRKY transcription factor upon wounding. Mol. Gen. Genet. 263, 30–37 Conclusions 12 Eulgem, T. et al. (1999) Early nuclear events in plant defense: rapid gene WRKY proteins have only recently been identified as a new fam- activation by WRKY transcription factors. EMBO J. 18, 4689–4699 ily of transcription factors. In Arabidopsis, this family appears to 13 Berg, J.M. and Shi, Y. (1996) The galvanization of biology: a growing be nearly as complex as the well-known MYB family35, but it is appreciation for the roles of zinc. Science 271, 1081–1085 restricted to the plant kingdom. This suggests that WRKY genes 14 Mackay, J.P. and Crossley, M. (1998) Zinc fingers are sticking together. originated concurrently with the major plant phyla. Current infor- Trends Biochem. Sci. 23, 1–4 mation suggests that WRKY factors play a key role in regulating 15 Fukuda, Y. and Shinshi, H. (1994) Characterization of a novel cis-acting the pathogen-induced defense program. The exposure of plants to element that is responsive to a fungal elicitor in the promoter of a tobacco a wide variety of biotic or abiotic stresses connected with their class I chitinase gene. Plant Mol. Biol. 24, 485–493 sessile, autotrophic lifestyle could be one major factor in the enor- 16 Rushton, P.J. and Somssich, I.E. (1998) Transcriptional control of plant genes mous expansion of the WRKY family during evolution. In addi- responsive to pathogens. Curr. Opin. Plant Biol. 1, 311–315 tion, the extensive metabolic changes associated with the 17 Lebel, E. et al. (1998) Functional analysis of the regulatory sequences establishment of defense responses26 or senescence36 might controlling PR-1 gene expression in Arabidopsis. Plant J. require an elaborate regulatory system. 16, 223–233 WRKY proteins also seem to be involved in other plant-spe- 18 Yang, P. et al. (1999) A pathogen- and salicylic acid-induced WRKY cific processes, such as trichome development and the biosynthe- DNA-binding activity recognizes the elicitor response element of tobacco sis of secondary metabolites. Thus, they appear to participate in class I chitinase gene promoter. Plant J. 18, 141–149 controlling the expression of a plethora of genes. As with other 19 Wang, Z. et al. (1998) An oligo selection procedure for identification of large gene families, the problem of functional redundancy will sequence-specific DNA-binding activities associated with plant defense. complicate genetic attempts to determine the role of individual Plant J. 16, 515–522

May 2000, Vol. 5, No. 5 205 trends in plant science Reviews

20 Fukuda, Y. (1997) Interaction of tobacco nuclear proteins with an elicitor- 32 Walker, A.R. et al. (1999) The TRANSPARENT TESTA GLABRA1 locus, responsive element in the promoter of a basic class I chitinase gene. Plant which regulates trichome differentiation and anthocyanin biosynthesis in Mol. Biol. 34, 81–87 Arabidopsis, encodes a WD40 repeat protein. Plant Cell 11, 1337–1349 21 Triezenberg, S.J. (1995) Structure and function of transcriptional activation 33 Gatz, C. and Lenk, I. (1998) Promoters that respond to chemical inducers. domains. Curr. Opin. Genet. Dev. 5, 190–196 Trends Plant Sci. 3, 352–358 22 Hanna-Rose, W. and Hansen, U. (1996) Active repression mechanisms of 34 Sablowski, R.W.M. and Meyerowitz, E.M. (1998) A homolog of NO APICAL eukaryotic transcription repressors. Trends Genet. 12, 229–234 MERISTEM is an immediate target of the floral homeotic genes 23 Hoecker, U. et al. (1995) Integrated control of seed maturation and APETALA3/PISTILLATA. Cell 92, 93–103 germination programs by activator and repressor functions of Viviparous-1 of 35 Martin, C. and Paz-Ares, J. (1997) MYB transcription factors in plants. Trends maize. Genes Dev. 9, 2459–2469 Genet. 13, 67–73 24 Garcia-Bustos, J. et al. (1991) Nuclear protein localization. Biochim. Biophys. 36 Gan, S. and Amasino, R.M. (1997) Making sense of senescence. Plant Acta 1071, 83–101 Physiol. 113, 313–319 25 Landschulz, W.H. et al. (1988) The leucine zipper: a hypothetical structure 37 Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation common to a new class of DNA-binding proteins. Science 240, 1759–1764 of protein database search programs. Nucleic Acids Res. 25, 3389–3402 26 Somssich, I.E. and Hahlbrock, K. (1998) Pathogen defense in plants – a 38 Bailey, T.L. and Elkan, C. (1994) Fitting a mixture model by expectation paradigm of biological complexity. Trends Plant Sci. 3, 86–90 maximization to discover motifs in biopolymers. In Proceedings of the Second 27 Gus-Mayer, S. et al. (1998) Local mechanical stimulation induces components International Conference on Intelligent Systems for Molecular Biology of the pathogen defense response in parsley. Proc. Natl. Acad. Sci. U. S. A. 95, (Altmann, R., ed.), pp. 28–36, AAAI Press 8398–8403 39 Lupas, A. (1996) Coiled coils: new structures and new functions. Trends 28 Quirino, B.F. et al. (1999) Diverse range of gene activity during Arabidopsis Biochem. Sci. 21, 375–382 thaliana leaf senescence includes pathogen-independent induction of defense- related genes. Plant Mol. Biol. 40, 267–278 29 Rouster, J. et al. (1997) Identification of a methyl-jasmonate-responsive Thomas Eulgem, Paul Rushton, Silke Robatzek and Imre region in the promoter of a lipoxygenase-1 gene expressed in barley grain. Somssich* are at the Max-Planck-Institut für Züchtungsforschung, Plant J. 11, 513–523 Abteilung Biochemie, Carl-von-Linné-Weg 10, D-50829 Köln, 30 Cao, H. et al. (1997) The Arabidopsis NPR1 gene that controls systemic Germany; Thomas Eulgem is currently at the Dept of Biology, acquired resistance encodes a novel protein containing ankyrin repeats. Cell 108 Coker Hall CB#3280, University of North Carolina, Chapel Hill, 88, 57–63 NC 27599-3280, USA. 31 Li, X. et al. (1999) Identification and cloning of a negative regulator of *Author for correspondence (tel ϩ49 221 5062310; systemic acquired resistance, SNI1, through a screen for suppressors fax ϩ49 221 5062313; e-mail [email protected]). of npr1-1. Cell 98, 329–339 Plant one-carbon metabolism and its engineering Andrew D. Hanson, Douglas A. Gage and Yair Shachar-Hill

The metabolism of one-carbon (C1) units is vital to plants. It involves unique enzymes and takes place in four subcellular compartments. Plant C1 biochemistry has remained relatively unexplored, partly because of the low abundance or the lability of many of its enzymes and intermediates. Fortunately, DNA sequence databases now make it easier to characterize

known C1 enzymes and to discover new ones, to identify pathways that might carry high C1 fluxes, and to use engineering to redirect C1 fluxes and to understand their control better.

ne-carbon (C1) metabolism is essential to all organisms. In several isoforms, mutants are lacking, and its key intermediates – C1 plants, it supplies the C1 units needed to synthesize proteins, substituted folates – are labile and hard to quantify. Fortunately, nucleic acids, pantothenate and many methylated mol- classical approaches to C1 metabolism can now be complemented O1 ecules . Fluxes through C1 pathways are particularly high in plants by genomics-driven approaches that exploit the fast-growing DNA that are rich in methylated compounds such as lignin, alkaloids and sequence databases. Accordingly, this review has three aims: betaines because methyl moieties make up several percent of their • To illustrate how genomics-based approaches are advancing 2 dry weight . Transfers of C1 units are also central to the massive our knowledge of plant C1 biochemistry. 3 photorespiratory fluxes that occur in all C3 plants . In spite of the • To bring together biochemical and genomics-derived data to fundamental significance of these roles, and the interest in the meta- show which C1 pathways might operate in plants, and where bolic engineering of lignin2, betaines4 and photorespiration3, there is they operate in the cell. much that is not understood about the enzymes, pathways and regu- • To examine progress towards engineering C1 metabolism. latory mechanisms of plant C1 metabolism. In part this is because of Nucleotide sequence information – from genomes, cDNAs and the obstacles that C1 metabolism presents for classical biochemistry ESTs – can be used to complement biochemical approaches in sev- and genetics: its enzymes can be of low abundance and/or exist as eral ways. Because most enzymes of C1 metabolism are highly

206 May 2000, Vol. 5, No. 5 1360 - 1385/00/$ – see front matter © 2000 Elsevier Science Ltd. All rights reserved. PII: S1360-1385(00)01599-5