Two Distinct Classes of Arabidopsis Leucine Zipper Proteins Compete for the G-Box-Like Element TGACGTGG
Total Page:16
File Type:pdf, Size:1020Kb
The Plant Cell, Vol. 4, 1309-1319, October 1992 O 1992 American Society of Plant Physiologists TGAl and G-Box Binding Factors: Two Distinct Classes of Arabidopsis Leucine Zipper Proteins Compete for the G-Box-Like Element TGACGTGG Ulrike Schindler, Holger Beckmann, and Anthony R. Cashmore’ Department of Biology, Plant Science Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104-6018 Regulatory elements containing the sequence ACGT are found in several plant promoters and are recognized by various basiclleucine zipper (bZIP) proteins. The Arabidopsis G-box binding factor 1 (GBFl), initially identified by its ability to bind to the palindromic G-box (CCACGTGG), also interacts with the TGACGT motif if this hexamer sequence is followed by either the dinucleotide GG-as found in the Hex motif of the wheat histone 3 promoter-or GT. Here we describe the isolation of an Arabidopsis bZlP protein, denoted TGAl, that also recognizes ACGT-containingsequences. However, TGAl differs from members of the GBF family in the spectrum of base pair permutations flanking the ACGT sequence that are required for DNA binding. TGAl primarily requires a TGACG motif and preferentially binds to those pentamers that are followed by a T residue. We show that although both TGAl and GBFl bind to the Hex motif (TGACGTGG), this binding can be distinguished on the basis of their specific DNA-protein contacts. Furthermore, TGAl also differs from members of the GBF family in that it apparently does not form heterodimers with any member of this family. INTRODUCTION Transcriptional regulation of gene expression is mediated by Interestingly, these two G-box-like elements (TGACGTGG the concerted action of sequence-specific transcription fac- and TGACGTGT) encompass the sequence TGACGTfound in tors that interact with regulatory elements residing in the the cauliflower mosaic virus (CaMV) 35s promoter (the as-1 promoter regions of the corresponding gene. One group of tran- element), the enhancers of the nopaline and octopine synthase scription factors is defined by a basidleucine zipper (bZIP) motif genes (the nos and ocs elements, respectively), and the wheat (Landschulz et al., 1988; McKnight, 1991). This bipartite DNA histone 3 promoter (the hexamer or Hex element) (Bouchez binding structure consists of a region enriched in basic amino et al., 1989; Katagiri et al., 1989; Tabata et al., 1989,1991; Singh acids (basic region) adjacent to a leucine zipper that is charac- et al., 1990). Only the TGACGT motif found in the wheat his- terized by several leucine residues regularly spaced at seven- fone 3 promoter fulfills the binding site requirements for GBFI, amino acid intervals (Vinson et al., 1989). Whereas the basic because in this specific context the TGACGT motif is followed region directly contacts the DNA, the leucine zipper mediates by two G residues (positions +3 and +4) (Schindler et al., homodimerizationand heterodimerizationof protein monomers 1992b). through a parallel interaction of the hydrophobic dimerization Severa1 plant bZlP proteins have been shown to interact with interfaces of two a-helices, resulting in a coiled-coilstructure TGACG-related motifs and/or the G-box (Katagiri et al., 1989; (OShea et al., 1989, 1991; Hu et al., 1990; Rasmussen et al., Tabata et al., 1989, 1991; Guiltinan et al., 1990; Singh et al., 1991). 1990; Oedaet al., 1991; Weisshaar et al., 1991; Schmidt et al., The Arabidopsis bZlP family of G-box binding factors (GBF1, 1992; Ueda et al., 1992). Because all these bZlP proteins bind GBF2, and GBF3) interact with the palindromic G-box motif to DNA motifs carrying the ACGT core sequence, they consti- (CCACGTGG) found in several plant promoters (Schindler et tute a broad class of ACGT binding proteins (Tabata et al., 1991; al., 1992a). For ease of reference, we have numbered the in- Weisshaar et al., 1991; Armstrong et al., 1992). However, pro- dividual base pairs encompassing the G-box from -4 to +4 teins belonging to this group that bind to the as-l site have (numbering from 5’to 3’). We have demonstrated that the DNA DNA binding site requirements distinct from those proteins in- binding specificity of GBF1 is strongly influenced by the na- teracting with the G-box (Tabata et al., 1991). As well as defining ture of the nucleotides (positions -4, -3, +3, and +4) flanking individual classes of bZlP proteins according to their DNA bind- the ACGT core sequence. For example, sequences that carry ing specificity, such proteins may also be classified according a TG at positions -4 and -3 and a T or G residue at position to their heterodimerization characteristics. These criteria have +4 are as efficiently bound by GBFl as the palindromic G-box. been used by Cao et al. (1991) in describing the properties of the mammalian ClEBP family of bZlP proteins. To further explore the question of whether Arabidopsis en- To whom correspondence should be addressed. codes distinct classes of bZlP proteins with overlapping binding 1310 The Plant Cell specificities, we isolated a cDNA encoding a member Of the 1 CCTTCCRAATCGCACTATTGGGGAARTGGCTTCTTCTCTTTTTTCCGGTGTAGGTTTAAGG 59 TTTTAGATTTGAAGGCGGATGGTGAGGAGTTTGTGTTGATGRAATCGTGGGTGATTTGRAGTT bZlP class of proteins that interactedwith the as-1 and the Hex 119 AAATACCTTTAAGTTCTTTTAGCCTTAAGTTCTTTTAGCCTGTCATTACAATATATGTTT 179 motif (as found in the wheat histone 3 promoter), but not with 23 9 299 TTTGGCG4AGARACATCRGTTCACTATTTGCTTTACGTGTTATGCTTGTTTGACTTT the G-box. Both the deduced amino acid sequence and the 359 GTGTGATCTAACTTTTGGTTGAAAACTAATGCTRATGCTCATCTTTTGTTCTTTGGAATGTCT~GA expression pattern are very similar to the tobacco protein 419 TCTGCTTGATTTTGAATGCTTTGTCACATTGTTT~ACAATTTGTTCTTTGTTTTCAGTT 419 GAGGAARRCATGAATTCGACATCGACACATTTTGTGCCACCGAGAAGAGTTGGTATATAC TGAla; consequently, we have designated this Arabidopsis MNSTSTHFVPPRRVGIY 17 protein TGAl. Although TGA1, like GBF1, binds to the Hex se- 539 GAACCTGTCCATCAATTCGGTATGTGGGGGGAGAGTTTCRGCAATATTAGCAATGGG quence, the protein-DNA contacts for thess proteins are quite EPVHQFGMWGESFKSNISNG 37 distinct. Using the random binding site selection assay, we 599 ACTATGMCACACCAAACCACATRATRATACCGAATAATCAG~CTAGACAACAACGTG further demonstrated that, in contrast to GBF1, TGAl did not TMNTPNHIIIPNNQKLDNNV 57 require a specific base pair combination following the TGACG 659 TCAGAGGATACTTCCCATGGRRCAGCAGGAACTCCTCACATGTTCGATCAAGAAGCTTCA SEUTSHGTAGTPHMFDQEAS 77 sequence. Furthermore, TGAl did not productively het- 719 ACGTCTAGACATCCCGATRRGATACARAGACGGCTTGCTCRRARCCGCGAGGCTGCTAGG erodimerize with members of the GBF family. The results T S RHP D K IQIRRLAQNREAAR 97 presented here clearly establish criteria (DNA contacts and 779 AAAAGTCGCTTGCGCAAGAAGGCTTATGTTCAGCAACTGGAAACRAGCAGGTTGAAGCTA dimerization properties) that help discriminate between pro- IK S R L R K K]A Y V O Q@E T S R L KO117 teins belonging to the class of TGACG binding proteins and 839 ATTCAATTAGAGCAAGAACTCGATCGTGCTAGACAACAGGGATTCTATGTAGGAAACGGA those belonging to the GBF family. IQLEQEODRARQQOFYVGNG137 899 ATAGATACTAATTCTCTCGGTTTTTCGGRRACCATGAATCCAGGGATTGCTGCATTTGAA IUTNSLGFSETMNPGIAAFE157 959 ATGG~~ATATGGACATTGGGTTGAAGAACAGRRCAGACAGATATGTGAACTMGRRCAGTT RESULTS MEYGHWVEEQNRQICELRTV177 1019 TTACACGGACACATTAACGATATCGAGCTTCGTTCGCTAGTCGRRARCGCCATGAAACAT LHGHINDIELRSLVENAMKH197 lsolation of an Arabidopsis cDNA Encoding TGAl 1073 TACTTTGAGCTTTTCCGGATGRAATCGTCGTCTGCTGCC~GCCGATGTCTTCTTCGTCATG YFELFRMKSSAAKADVFFVM217 To isolate Arabidopsis cDNAs encoding proteins that interacted 1139 TCAGGGATGTGGAGAACTTCAGCAGAACGATTCTTCTTATGGATTGGCGG~TTTCGACCC SGMWRTSAERFFLWIGGFRP231 with TGACG-related motifs, we screened an Arabidopsis cDNA 1199 TCCGATCTTCTCRAGGTTCTTTTGCCACATTTTGATGTCTTGACGGATCMCRACTTCTA library under low-stringency hybridization conditions using a SDLLKVLLPHFDVLTDQQLL257 DNA fragment derived from the tobacco TGAla sequence 1159 GATGTATGCRATCTAARACAATCGTG~CAGCAGRAGGGTATG (Katagiri et al., 1989). Six positively hybridizing clones were DVCNLKQSCQQAEDALTQGM277 isolated. All of these cDNAs were shown to be derived from 1219 GAGRAGCTGCRACACACCTTGCGGACCGTTGCAGCGGGACAACTCGGTGMGGAAGTTAC the same mRNA species. The DNA and the deduced amino EKLQHTLRTVAAGQLGEGSY2 97 acid sequences of the longest of the six cDNAs are shown 1379 ATTCCTCAGGTGRATTCTGCTATGGATAGATTAGRAGCTTTGGTCAGTTTCGTAAATCAG IPQVNSAMDRLEALVSFVNQ317 in Figure 1. The protein encoded by the Arabidopsis cDNA 1439 GCTGATCACTTGAGACATGAARCATTGCAACAAA?G?A?CGGATATTGACAACGCGACAA is 63% identical to the tobacco protein TGAla, 60% identical ADHLRHETLQQMYRILTTRQ337 to the tobacco protein PG13 (Fromm et al., 1991), and 37% 1499 GCGGCTCGAGGATTATTAGCTCTTGGTGAGTATTTTCAACGGCTTAGAGCCTTGAGCTCA identical to the wheat protein HBP-lb (Tabata et ai., 1991); the AARGLLALGEYFQRLRALSS357 protein differs by three of 30 amino acids from the partia1 se- i559 AGTTGGGCRRCTCGACATCGTGMCCRACGTAGGTTTGAGTTATTTTGTMCAACCAAAT quence recently reported (Kawata et al., 1992) for a bZlP protein SWATRHREPT' 367 from Arabidopsis (Landsberg). 1619 GAAGAAAATGGAARGACCTCAAAAATGAAGAATGAGTGCATCTGRRARCAGAGGACTACT RNA gel blot analysis demonstrated that the mRNA corre- sponding to the Arabidopsis cDNA is elevated in roots and 1799 TTTGTACTTTTTAGCTTTTGAAAGAGGCAAGTTT dark-grown leaf tissue (data not shown). Similar results have Figure 1. DNA Sequence of the cDNA Encoding TGAl and Predicted been observed for the tobacco TGAla mRNA (Katagiri et al., Amino Acid Sequence. 1989). Because of the similarity in the expression patterns and Amino acids representing the basic region are boxed; the regularly the strong