Journal of Experimental Botany, Vol. 66, No. 13 pp. 3775–3789, 2015 doi:10.1093/jxb/erv173 Advance Access publication 23 April 2015 This paper is available online free of all access charges (see http://jxb.oxfordjournals.org/open_access.html for further details)

RESEARCH PAPER Characterization of the cis elements in the proximal promoter regions of the anthocyanin pathway reveals a common regulatory logic that governs pathway regulation

Zhixin Zhu1,2, Hailong Wang 1,2, Yiting Wang1,2, Shan Guan1, Fang Wang1,2, Jingyu Tang1,2, Ruijuan Zhang1,2, Lulu Xie1,2 and Yingqing Lu1,2,* 1 State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, 20 Nan Xin Cun, Beijing 100093, China 2 University of Chinese Academy of Sciences, Beijing 100049, China

* To whom correspondence should be addressed. E-mail: [email protected]

Received 19 January 2015; Revised 7 March 2015; Accepted 16 March 2015

Abstract Cellular activities such as compound synthesis often require the transcriptional activation of an entire pathway; how- ever, the molecular mechanisms underlying pathway activation have rarely been explained. Here, the cis regulatory architecture of the anthocyanin pathway genes targeted by the transcription factor (TF) complex including MYB, bHLH, and WDR was systematically analysed in one species and the findings extended to others. InIpomoea purpu- rea, the IpMYB1-IpbHLH2-IpWDR1 (IpMBW) complex was found to be orthologous to the PAP1-GL3-TTG1 (AtPGT) complex of Arabidopsis thaliana, and interacted with a 7-bp MYB-recognizing element (MRE) and a 6-bp bHLH- recognizing element (BRE) at the proximal promoter region of the pathway genes. There was little transcription of the in the absence of the MRE or BRE. The cis elements identified experimentally converged on two syntaxes, ANCNNCC for MREs and CACN(A/C/T)(G/T) for BREs, and our bioinformatic analysis showed that these were present within anthocyanin gene promoters in at least 35 species, including both gymnosperms and angiosperms. For the anthocyanin pathway, IpMBW and AtPGT recognized the interspecific promoters of both early and later genes. In A. thaliana, the seed-specific TF complex (TT2, TT8, and TTG1) may regulate all the anthocyanin pathway genes, in addition to the proanthocyanidin-specific BAN. When multiple TF complexes in the anthocyanin pathway were com- pared, the cis architecture played a role larger than the TF complex in determining the variation in promoter activity. Collectively, a cis logic common to the pathway gene promoters was found, and this logic is essential for the trans factors to regulate the pathway.

Key words: Anthocyanin pathway, Arabidopsis, bHLH, cis element, Ipomoea, MBW complex, MYB, promoter activity, promoter architecture, regulatory module, trans factor, transcription initiation, WDR.

Introduction Anthocyanins are widely synthesized in seed plants to provide enzymes (Winkel-Shirley, 2001). These enzymes also comprise colouration, protection under various circumstances, and the backbone of the flavonoid synthesis network, supporting components for cellular activities. The synthesis of antho- the network to metabolize arrays of secondary compounds in cyanins is the end product of genetically well characterized different species (Vogt, 2010). Although the regulation of the

© The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 3776 | Zhu et al. anthocyanin pathway genes has been known to be under the down to ANCNACC by site mutagenesis tests with antho- control of a transcription factor (TF) complex that consists cyanin MYB1s in coloured Ipomoea petals and magnolia of MYB, bHLH, and WD-repeat (WDR) (Broun, tepels (Wang et al., 2013). For the bHLH part, after reported 2004; Koes et al., 2005; Ramsay and Glover, 2005; Xu et al., binding of CG-1 (Staiger et al., 1991) and human 2015), little consensus has been reached about the cis ele- c-MYC (Blackwell et al., 1990) to CACGTG, the G-box was ments recognized by the complex at the pathway level. The recently shown to bind to maize R (Kong et al., 2012), petu- MYB-bHLH-WDR (MBW) complex represents a mode of nia AN1, and Ipomoea bHLH2 (Wang et al., 2013). EMSA gene regulation not only for the anthocyanin pathway but tests on anthocyanin gene promoter fragments involving also proanthocyanidin (PA) synthesis and epidermal cell Ipomoea CHS-D, Zea 3GT, and Gerbera DFR led to a tenta- differentiation (reviewed by Feller et al., 2011). Systematic tive consensus of bHLH-recognized elements in the form of examination of the cis structures involved in the regulation CACNN(G/T) (Wang et al., 2013). All these analyses agreed of pathway genes is important for understanding these bio- in that the binding of MYB and bHLH occurred at locations logical processes. within 200 bp upstream of the translation start site. This fea- The involvement of MYB and bHLH in flavonoid gene ture of cis locations was also reported in bean (Loake et al., regulation was initially found in Zea mays, with mutants 1992), grape (Gollop et al., 2002), African daisy (Elomaa of c1 (Paz-Ares et al., 1987) and Lc (Ludwig et al., 1989), et al., 2003), and apple (Espley et al., 2009), indicating that respectively; the necessity for both MYB and bHLH in the the short functional promoters are common to the anthocya- activation of the anthocyanin genes was soon established in nin genes. The focus of these studies was usually on one or a this species (Goff et al., 1990; Bodeau and Walbot, 1992). few promoters. Analysis of TF–promoter interactions at the Still in maize, details followed that the N-terminus of B (a level of pathway has been limited to several Arabidopsis pro- bHLH) directly interacted with the R2R3 domain MYB moters activated by maize TFs (Hartmann et al., 2005). How C1 (Goff et al., 1992), and that the C-terminus of C1 was conspecific pathway genes interact in the ′5 -noncoding region responsible for transcriptional activation (Sainz et al., 1997). remains to be explored. Obviously, conspecific TF interac- Interestingly, C1 and Lc were both required for restoring the tions are most relevant to understanding pathway regulation. phenotype of an11 mutant in Petunia (Quattrocchio et al., In documented MBW complexes, specific residues were 1993), and AN11 was later found to be a WDR regulating identified on maize C1 for its interaction with R (Grotewold the anthocyanin pathway (deVetten et al., 1997). The partici- et al., 2000), while protein interaction was also observed pation of WDR in the pathway was further validated with between Arabidopsis GL3 and TTG1 (Payne et al., 2000); Arabidopsis TTG1 (Walker et al., 1999) and Zea PAC1 (Carey however, little interaction was observed between MYB and et al., 2004). WDR (Zhang et al., 2003). Meanwhile, transcription of In Arabidopsis, characterization of MBWs led to the iden- Arabidopsis TT8 and petunia AN1 require the presence of tification of PAP1-GL3/EGL3-TTG1 transiently expressed both MYB and WDR (Baudry et al., 2006; Albert et al., in the seedlings for anthocyanin synthesis (Zhang et al., 2014). For transcription of MYB and WDR, however, the 2003; Gonzalez et al., 2008) and TT2-TT8-TTG1 in develop- necessity for the presence of MBW remains unclear, except in ing siliques for PA synthesis (Nesi et al., 2001; Baudry et al., one case of a MYB mutant (Espley et al., 2009). With all that 2004; Lepiniec et al., 2006). At the same time, anthocyanin is known about MBW regulation, however, little can be said regulatory genes MYB1, bHLH2, and WDR1 were reported about the conspecific interactions between the trans and cis from Ipomoea mutants (Chang et al., 2005; Morita et al., components. Although anthocyanin TFs have been found to 2006; Park et al., 2007), complementing previously well char- be conserved in both monocots and dicots (e.g. Quattrocchio acterized AN2-AN1-AN11 in the Petunia hybrid Vilmorin et al., 1993; Hartmann et al., 2005), the molecular mechanism (Beld et al., 1989; Quattrocchio et al., 1993; deVetten et al., for observed trans-specific regulation is unclear. While the 1997). Nonetheless, the collaborative action of the Ipomoea lack of identifiedcis components and the range of a gene’s TFs requires more evidence. Reported components of the promoter previously prevented systematic and quantitative anthocyanin MBW complexes appear to form their own analysis of the regulatory mechanism, accumulating data clades, with MYBs from the subgroup 6 (Dubos et al., 2010), indicate that transcription in metazoa typically occurs within bHLHs (subfamily IIIf) (Pires and Dolan, 2010), and recently the proximal promoter region (Lenhard et al., 2012), which WDRs (subgroup 19) (Li et al., 2014). appears true also for the anthocyanin genes. With a recently Binding of MYB and bHLH to promoters of anthocya- reported method (Wang et al., 2013), the cis element(s) may nin structural genes was first reported with maize C1 and B be pinpointed to an anthocyanin promoter through both bio- (Roth et al., 1991). Details showed that from –123 to –88 bp informatic and experimental approaches, and quantitative of the A1 promoter was critical for C1/B activation (Tuerck evaluation of contributions of TFs and cis elements to gene and Fromm, 1994); a region within –224 bp of the Bz2 pro- regulation is now achievable. moter was adequately regulated by R (a B homologue) and In order to elucidate the molecular mechanism(s) of antho- C1 (Bodeau and Walbot, 1996). For the MYB part, C1 cyanin pathway regulation in the proximal regions, we ana- could bind to variable sites in the maize a1 gene promoter lysed interactions of TFs and cis motifs of conspecific genes (Sainz et al., 1997), and a 16 bp-long consensus motif was at the pathway level in I. purpurea Roth (common morning identified froma2 and other C1-binding genes (Lesnick glory) and made parallel examinations of the homologous and Chandler, 1998); recently, the consensus was narrowed genes in Arabidopsis thaliana (L.) Heynh. as a reference. Regulatory logic of the anthocyanin pathway genes | 3777

The I. purpurea genes included those encoding chalcone 1992) in frame with appropriate restriction sites (Supplementary synthase D (CHS-D), chalcone isomerase (CHI), flavanone Table S1) to replace its reporter gene, resulting in effector vectors. 3-hydroxylase (F3H), flavonoid ′3 -hydroxylase (F3′H), dihy- The reporter vectors were adapted from pJIT163-eGFP with the CaMV 35S promoters replaced by the proximal promoters and droflavonol reductase B (DFR-B), anthocyanidin synthase GFP replaced by the firefly luciferase (LUC) gene. The reference (ANS), UDP-glucose:flavonoid 3-O-glucosyltransferase vector was also based on pJIT163-eGFP, with its GFP replaced by (3GT), UDP-glucose:anthocyanidin 3-O-glucoside-2′-O- the renilla luciferase (RUC) gene (Wang et al., 2013). The vectors glucosyltransferase (3GGT), and the trans factors MYB1, were introduced into leaf cells of the Ipomoea nil wdr1 mutant bHLH2, and WDR1. Transcriptional activities of MBW (seeds available on request) by particle bombardment (BioRad PDS-1000/He equipment). A construct mixture (0.5 μg effector, complexes were subsequently compared between I. purpurea 0.5 μg reporter, and 0.1 μg reference for each shot) was prepared and Arabidopsis on interspecific and conspecific promoters. for the plasmid cocktail, which was coated with 50 mg ml–1 micro- A bioinformatic analysis of homologues in multiple species particles (Bio-Rad) at 2.0 μl per 1.0 μg plasmid DNA. The com- was used to explore the generality of cis patterns (initially plex was then mixed with 2.5 M CaCl2 and 0.1 M spermidine in a identified inI. purpurea). These extensive analyses revealed ratio of 1:5:2. Other details followed Wang et al. (2013). the features of the MYB- and bHLH- recognizing elements (MREs and BREs, respectively), patterns of MREs and Yeast two-hybrid assays BREs on the proximal promoters, and one cis logic for path- The coding sequences of IpMYB1, IpbHLH2, and IpWDR1 were way regulation. integrated in frame into vectors pAD-GAL4 2.1 and pBD-GAL4 Cam of HybriZAP-2.1 (Stratagene), respectively, according to the manufacturer’s instructions. Constructs were introduced in pairs into the YRG-2 yeast strain (Stratagene). The transformants were Materials and methods plated onto SD media minus histidine, leucine, and tryptophan for identifying positive interactions. The strengths of TF interactions Plant materials were then measured by β-galactosidase activities according to the Seeds of I. purpurea taken from a nationwide collection (Lu et al., protocol of the Matchmaker system (Clontech). 2009) were grown in the botanical garden of IB-CAS, Beijing. These plants provided petal tissue for anthocyanin gene cloning and quan- titative analysis of gene expression, and leaf tissue for promoter Electrophoretic mobility shift assays amplification and genotyping. Seeds of Arabidopsis (tt3 mutant and A complete coding sequence of MYB was integrated in frame into the Columbia accession) were obtained from the Arabidopsis infor- pMAL-c2G (New England Biolabs) to express a tagged protein mation resource (TAIR) and those of Petunia hybrida (R27) were a consisting of maltose-binding protein (MBP) and MYB. A partial kind gift from Dr Quattrocchio (Free University, The Netherlands). coding region of bHLH was similarly expressed. The vector-trans- All seeds were germinated in the growth chamber. formed Escherichia coli strain BL21 (DE3) was incubated at 37°C in LB solution and treated as described (Wang et al., 2013). Protein quantification followed the Bradford method using the bovine qRT-PCRs gamma globulin (BIO-RAD) as the standard. Probes were quan- RNA samples were extracted in TRIzol (Life Technologies) from tified by the Picogreen quantification module of Rotor Gene 3000 petals of I. purpurea (III6-4 and III6-9). Each plant was sampled (Corbett Research) with λDNA (Life technologies) as the reference. at 2-h intervals starting 60 h before flower opening. RNAs were Details for binding reactions and gel documentation followed Wang reverse transcribed and quantifications of transcript copies fol- et al. (2013). lowed the protocol reported by Lu et al. (2012). About 1–10 ng cDNA was used in each qPCR reaction, and the running conditions were optimized for efficiency with each gene taken (Supplementary Transgenic experiments Table S1). Homozygous Arabidopsis tt3 mutants were grown in the growth chamber for transgenic tests of the IpDFR-B promoter series. The coding regions of IpDFR-B were first subcloned into the SN1301 Gene cloning and sequencing binary vector for the construction of SN1301-35S::DFR-Bcoding. Standard PCRs were applied to leaf genomic DNAs to amplify Varying promoter regions (IpDFRB-1559, IpDFRB-191, and five gene promoters IpCHS-D( , IpF3′H, IpDFR-B, IpbHLH2, and IpDFRB-166) independently replaced the 35S promoter to gener- IpWDR1) based on primers (Supplementary Table S1) whose designs ate series of SN130-proDFR::DFR-Bcoding vectors. These vectors followed the available sequences of conspecific or closely related tar- were singly introduced into Agrobacterium strain GV3101 via the gets (Supplementary Table S2). The promoter sequence of IpF3H standard freeze-thaw protocol. Transformation of Arabidopsis was obtained by inverse PCR (Ochman et al., 1988). Promoter followed a modified flower dip protocol (Logemann et al., 2006), sequences of IpMYB1, IpCHI, IpANS, Ip3GT, and Ip3GGT were and the selection of transformants was on MS medium contain- isolated by tail-PCRs (Liu and Whittier, 1995). Homologue promot- ing 50 mg l–1 hygromycin and 50 mg l–1 carbenicillin. DNAs were ers from Arabidopsis were PCR-amplified via primers targeting the subsequently extracted from these plants and scored for the pres- Columbia accession (Supplementary Table S1). Ipomoea TFs were ence of the transgene by PCRs and DNA sequencing. reverse transcribed from RNAs extracted from fresh petals of I. pur- Phenotypic complementation was observed in the T2 generation. purea. Arabidopsis TFs were similarly amplified from seedlings of the Seeds were sterilized and cultivated on 1/2MS medium containing Columbia accession with the appropriate primers (Supplementary 5% sucrose, kept at 4°C in darkness for 2 days, and transferred Table S1). to constant light conditions (100 μmol m–2 s–1, 24°C) for observa- tions. For 4–5-week old seedlings, treatment by dehydration (up to a week) and a low temperature (10–16°C) also induced antho- Dual luciferase assays cyanin accumulation. Anthocyanins extracted from representative The promoters were ligated into vectors to prepare for dual lucif- seedlings were further examined by high performance liquid chro- erase assays as described by Wang et al. (2013). The complete cod- matography using cyanidin-3-O-glucoside (Extrasynthese) as the ing regions of TFs were integrated into pJIT163 (Guerineau et al., standard. 3778 | Zhu et al.

Bioinformatic analysis subcellular sites of their homologues (Ness et al., 1989; Matus Multiple blastp searches in the NCBI database (ftp://ftp.ncbi.nih. et al., 2010). The other experiment described protein interac- gov/genbank/, up to 10 October 2012) identified protein homo- tions via yeast two-hybrids. Quantifications of β-galactosidase logues for each of the anthocyanin genes. The results were trimmed activities suggested significant interactions between IpMYB1 for redundancy following the protocol reported by Wang et al. (2013). For each of the promoters, a sliding window (6-bp width) and IpbHLH2 as well as between IpbHLH2 and IpWDR1 was initially adapted to search for four or more bases matching (Supplementary Figure S1B), as expected. CACNNG; if this condition was met, the second sliding window The TF validations paved the way for identification of the of 7 bp was taken to examine whether or not there were four or conspecific structural genes of the pathway. ForI. purpurea , more bases matching ANCNNCC in the next 100 bases. When both some pathway genes (IpCHS-D, IpF3′H, and IpDFR-B) were searches returned hits, the non-coding sequences were recorded and the candidate cis elements were uppercased by WxW_Align.gene, characterized through mutants or mutant complementation and the frequency of each case was counted by WxW_Align.PosD (Habu et al., 1998; Hoshino et al., 2003; Zufall and Rausher, (Supplementary File 1). We ended up with 159 sequences satisfying 2004), while others, including IpCHI, IpF3H, IpANS, and the search criteria, which provided the basis for mapping the natural Ip3GT, were largely identified via gene expression and distribution of known cis elements on anthocyanin gene promoters. to genes identified in other model species The sequences included in the peak of the distribution were further characterized by two motifs identified via MEME software follow- (Tiffin et al., 1998; Durbin et al., 2000; Clegg and Durbin, ing the parameters of Wang et al. (2013). 2003; Lu et al., 2009). Assuming that they were the bona fide genes of the anthocyanin pathway, we collected their 5′-non- coding sequences upstream of the translation start sites Statistical analysis (Supplementary Table S2); similar information was also gath- Standard t-tests were performed on comparisons between averages ered for IpMYB1, IpbHLH2, and IpWDR1 to evaluate the of experiments. Two-way ANOVAs were carried out after log trans- formation of the promoter activity. Pearson correlation coefficients TF regulatory capacity on the expression of each. By single were calculated with the significance levels corrected for multiple and combinatory tests of the TFs as effectors in dual lucif- comparisons by the Dunn-Šidák method (taking the experiment- erase assays (Fig. 1A), we detected how IpMYB1, IpbHLH2, wise error rate α = 0.05). All tests were computed using SAS (ver. and IpWDR1 (IpMBW) regulated the reporter gene via the 9.0). 5′-noncoding regions of the pathway genes. We found that seven (IpCHS-D, IpCHI, IpF3H, IpF3′H, IpDFR-B, IpANS, and IpbHLH2) of the ten promoter constructs could be acti- Results vated strongly by IpMBW; however, the constructs hosting IpMYB1, IpbHLH2, and IpWDR1 collaboratively the IpWDR1 5′-noncoding regions responded only at low regulate the anthocyanin pathway genes levels to IpMBW, and little promoter activity was found on those hosting various lengths of Ip3GT and IpMYB1 pro- Although the regulatory genes IpMYB1, IpbHLH2, and moter sequences (Supplementary Figure S1C). The tests IpWDR1 are expressed in the corolla of I. purpurea (Chang also showed that IpMYB1 could, alone, initiate a low level et al., 2005; Morita et al., 2006; Park et al., 2007; Guan and of transcription on the seven promoters (particularly F3H). Lu, 2013), whether or not they regulate the anthocyanin By contrast, IpbHLH2 or IpWDR1 alone failed to activate pathway as a complex remains to be confirmed. If they do, observable promoter activities in the same setting. Only the expression of the anthocyanin pathway genes is expected collective presence of all three TF effectors generated consid- to be correlated. We quantified the pair-wise Pearson cor- erable promoter activities (Fig. 1B), suggesting that the TFs relation coefficients among transcript levels of ten antho- acted as an IpMBW complex in transcriptional activation of cyanin genes via reversely transcribed quantitative PCRs the main anthocyanin pathway genes in I. purpurea. The acti- (RT-qPCRs) in developing petals of two wild-type I. purpurea vation of IpMBW on the promoter of IpbHLH2 resembles individuals (Supplementary Table S3). The transcript levels that of TT8 in Arabidopsis (Baudry et al., 2006). of the enzyme-coding (structural) genes (IpCHS-D, IpCHI, IpF3H, IpDFR-B, and IpANS) were significantly correlated Two syntaxes emerge from identification of cis with those of regulatory genes (IpMYB1, IpbHLH2, and elements in pathways IpWDR1) and among themselves (except between IpDFR-B and Ip3GT). The expression of IpF3′H was highly correlated The central aim of our study was to understand how the TFs with that of IpbHLH2 and IpWDR1, but not with that of interacted with the cis elements at the target gene promoters. IpMYB1. The quantitative data largely agreed with the per- Our identification ofcis elements was extended from IpCHS- ception that IpMYB1, IpbHLH2, and IpWDR1 regulated Dpro (Wang et al., 2013) to all other gene promoters includ- pathway transcription collaboratively. The features of the ing IpCHIpro, IpF3Hpro, IpF3′Hpro, IpDFR-Bpro, IpANSpro, TFs were examined in two further experiments. One showed Ip3GGTpro, IpWDR1pro, and IpbHLH2pro. Both sequential the subcellular localization of the three TFs. The vectors with deletion tests (Supplementary Figure S1C) on the cloned fluorescent-tagged TFs were introduced into the onion epi- 5′-noncoding regions of the anthocyanin genes and site-by- dermis (Supplementary Figure S1A), and the in vivo images site tests on the predicted motifs (concentrated on the proxi- indicated that IpbHLH2 and IpWDR1 were localized in both mal promoter regions) were adapted. In the initial case of the nucleus and the cytoplasm, whereas IpMYB1 was mostly IpDFR-B, three experimental approaches were taken for site- in the nucleus. This pattern agrees with the documented specific analysis, including dual luciferase assays performed Regulatory logic of the anthocyanin pathway genes | 3779

Fig. 1. Collaborative regulation of IpMYB1, IpbHLH2, and IpWDR1 on the conspecific genes of the anthocyanin pathway ofI. purpurea . (A) Construct details for dual luciferase assays. Effectors and reporters were based on the same plasmid pJIT163 (black line). The promoter region of each construct is shown by an arrowed rectangle, while the coding region is shown by the plain rectangle. The firefly luciferase (LUC) was taken as the reporter gene while renilla luciferase (RUC) was the reference for internal control. (B) Results of transient expression assays expressed as the ratios of LUC and RUC for trials on different combinations of TFs. The 5′-noncoding regions are shown using bars for the ten genes tested. The negative controls were the reporters only. Effectors were added as shown on the x-axis for each reporter, with the error bars based on four replicates. This figure is available in colour at JXB online. on leaves of the wdr1 mutant of I. nil, gene transformations was further supported by the EMSA assay with the probe of the tt3 mutant of Arabidopsis, and electrophoretic mobil- (DFRBpro) hosting a suspected motif (GGGGGTT or ity shift assays (EMSAs) with IpMYB1 and IpbHLH2. GGGGGGT, a reverse complement of MRE) as mutations Starting by mutating predicted motifs (Fig. 2A), we exam- at the site (DFRB-M4) demolished the binding capacity of ined the effects of the mutations in both dual luciferase assays IpMYB1 (Fig. 3A). In the transgenic results, promoter activ- (Fig. 2B) and transgenic plants (Fig. 2C). The presence of ity as little as 2% could still result in anthocyanin accumu- MRE3 multiple MREs was evident for IpDFR-B-191pro as mutations lation in the seedlings of Arabidopsis (e.g. pDFR-B-166 ; at one of the MREs did not much affect the transcriptional Fig. 2C); the promoter construct IpDFRB-1559pro::IpDFRB- intensity, and only mutations at multiple motifs led to a sig- coding caused a higher anthocyanin accumulation than that nificant reduction in promoter activity (Fig. 2B). The initially of 35S::DFRB-coding in the tt3 background (Fig. 2C), sug- suspected effect of the ‘GC interval’ overlapped with that of gesting that the 35S promoter did not lead to excessive gene pDFR-BMRE2, and the two were better explained by the modi- expression in vivo. This was relevant to the dual luciferase fied pDFR-BMRE2* (Fig. 2D). The function of pDFR-BMRE2* assay where the effectors were driven by 35S promoters. 3780 | Zhu et al.

Fig. 2. Analysis of cis elements on IpDFR-Bpro.(A) Sequence features of pDFR-B-191, pDFR-B-166, and pDFR-B-136. Predicted cis motifs on pDFR- B-191are in bold and underlined, and the mutated sites shown in lower case. The nucleotides that are shaded and in bold mark the beginnings of pDFR-B-191, pDFR-B-166, and pDFR-B-136. (B) Results of dual luciferase assays. Effectors were IpMYB1, IpbHLH2, and IpWDR1 (0.5 µg each). The black bar for pDFRB-191 was set as 100%. Mutation types (BRE in grey) are checked boxes. The error bars indicate the standard errors based on four independent trials. The tests highlighted by triangles were further examined by the following transgenic trials. (C) Phenotypes of wild type, dfr mutant (tt3), and transgenic seedlings of Arabidopsis. The upper panel shows 2–3 day old seedlings. Some mature leaves were induced by 10–18°C and dehydration. The seeds of each line are shown in the lower panel. The schematic boxes follow the same notation as in (B). (D) A summary of cis motifs on

IpDFR-B-191pro. The cis elements BRE and MRE3 were confirmed on IpDFR-B-166pro in the mutation tests of transient expression shown in (B) and the additional transgenic in vivo test of BRE shown in (C). MRE1 and MRE3 are identical in sequence and weak in effect as seen in (B). Effects of the previous MRE2 and GC interval in (B) were similar whether alone or in combination, and are thus best explained by a unified MRE2*, which was also supported in the subsequent EMSA test with the IpDFRB-m4 probe. The underlined motifs are possibe additional targets of the TFs in vivo. This figure is available in colour at JXB online.

Though effective in validating cis elements, the time com- of EMSAs and transient expressions. Locating and testing cis mitment for the transgenic tests and common occurrence of elements on those genes were guided partially by the previous multiple cis elements prevented the transgenic approach being results on MREs and BREs (Wang et al., 2013) and partially used on a large scale. We identifiedcis elements for other by our ongoing accumulating data. The EMSA results were anthocyanin genes mainly through mutation tests by means presented in both DNA- and protein-staining to reduce false Regulatory logic of the anthocyanin pathway genes | 3781

Fig. 3. Results of EMSAs with MYBs and bHLHs. (A) I. purpurea probes based on the 5′-noncoding regions of the anthocyanin genes and their binding results with IpMYB1 in EMSAs. The probes harboured candidate cis motifs (underlined) and their mutations (in bold). The gels are shown in two parts, with the upper panel for the DNA binding and the lower panel for protein binding of the same gel photographed under a different light filter (Wang et al., 2013). The lane numbers correspond to those of probes used in the binding reactions; the probe lane contains probe 1 (20 pmol) only; the protein lane has MBP-IpMYB1 (2 µg) only. (B) Results of EMSAs with probes based on the 5′-noncoding region of IpWDR1. Three likely cis motifs (underlined) were tested with IpMYB1; only the second probe showed positive binding as indicated by the arrow. The arrangement shown follows that described in (A). (C)

Arabidopsis probes based on AtCHIpro and their binding results with AtPAP1. The probes and EMSA results are shown as in (A). (D) EMSAs of bHLHs of I. purpurea (IpbHLH2) and Petunia (PhAN1) with probes based on the 5′-noncoding regions of Arabidopsis DFR and CHI genes. The experimental conditions followed (A), and the mutations to the G-box are indicated in bold. 3782 | Zhu et al. positives. Effective cis-containing probes that showed positive Generality and layout pattern of cis elements on the binding to MBP-tagged IpMYB1 were identified for all tested anthocyanin gene promoters anthocyanin genes including IpCHS-D, IpCHI, IpF3H, IpF3′H, IpANS, Ip3GT, IpMYB1, IpbHLH2, and IpWDR1 We searched for the syntaxes in multiple species to test for (Fig. 3A, B). To know whether or not TF orthologues could their generality. Searches using the blastp command led to the recognize similar cis elements, we expressed MBP-tagged identification of available coding sequences of anthocyanin PAP1 of Arabidopsis to detect its binding capacity to the pre- genes in the NCBI database after taking those of Ipomoea genes as templates (Fig. 5A). A collection of 299 anthocya- dicted cis elements of AtCHIpro. The results suggested that the MYB orthologues recognized the cis elements of simi- nin gene promoter sequences was retrieved. A fixedcis layout lar structure (Fig. 3C). Likewise, IpbHLH2 shared the same pattern (CACNNG (N1–100)ANCNNCC) was taken as the recognition pattern with PhAN1 for the G-box present at search criterion for these sequences, as the pathway-based the promoters of AtDFR and AtCHI (Fig. 3D), as with the identification ofcis motifs suggested a common layout of known binding capacity of AtGL3 to the G-box (Shangguan BRE and MRE (5′→3′) on the anthocyanin genes (Fig. 4). et al., 2008). Further EMSA tests with IpbHLH2 indicated From the collected promoter sequences, 159 promoters of 35 that its BREs could change from canonical CACGTG to species (53%) were found exhibiting the pair of cis elements CACGTT (Supplementary Figure S2). within 100 bp of each other. The species included Ginkgo Complementing the EMSAs but in a cellular environment, biloba, monocots and dicots (Supplementary Table S4). dual luciferase assays showed that mutations correctly tar- Interestingly, the distance between the cis elements showed geting MREs and BREs could reduce promoter activities a peak at 6-bp apart (Fig. 5B). The sequences giving rise to the peak showed two significant motifs agreeing with the (Supplementary Figure S3). Except in one case (IpbHLH2pro), where accidental mutations to pbHLHMRE2 caused an exceed- syntaxes identified in Ipomoea and Arabidopsis (Fig. 5C). As ingly high promoter activity, all other introduced mutations IpCHSDpro hosts the BRE and MRE at this preferred dis- were mostly on targets, judging from their reduced promoter tance of 6-bp apart (Fig. 4A), the biological significance of activities. To make sure that the TFs were not overexpressed the distance can be evaluated in transient expression assays. to bias the detection of effective motifs, we compared 10-fold By artificially increasing the between-cis distance, we found levels of particle bombardment in the assays for two genes that the distance affected the promoter activity negatively. Lengthening the distance (up to 60 bp) significantly weak- including IpCHSDpro and IpF3′Hpro (Supplementary Figure S4A, B). Although lowered TF dosages led to reduced pro- ened the promoter strength; nevertheless, about 12% of the moter activities (Supplementary Figure S4C), the patterns of promoter activity still remained even after the distance was the relative effects of mutated motifs were largely unchanged increased further to 80-bp apart (Fig. 5D). (Supplementary Figure S4D). To further make sure that the results of mutation tests were not biased by using leaves of The regulatory specificities of MBW complexes I. purpurea, we compared the same promoters and effectors in the leaves of cultivated rice seedlings in transient assays. Since both trans and cis components are vital for the regulatory The results showed essentially the same patterns for the muta- module formed at the 5′-noncoding region, we evaluated the tion effects (Supplementary Figure S4E, F) despite change in specificities of thetrans factors in governing the transcription promoter strength. These experiments effectively validated of flavonoid genes. This analysis included proximal promoters applications of transient assays to assessing mutation effect of ten flavonoid genes (CHS, CHI, F3H, F3′H, DFR, ANS, of cis motifs. 3GT, AtBAN, Ip3GGT, and IpbHLH2) and four sets of MBW Collectively, transient and EMSA results led to identifi- complex from two species (Arabidopsis and I. purpurea). Three cation of the major cis elements over the anthocyanin path- TF complexes [AtTT2-AtTT8-AtTTG1 (AtTTT), AtPAP1- way (Fig. 4), including combined identification of MRE AtGL3-AtTTG1 (AtPGT), and AtPAP1-AtEGL3-AtTTG1 and BRE on the Ip3GGT promoter. These valid motifs indi- (AtPET)] were from Arabidopsis, and one (IpMBW) from cated that the orientation of the cis elements could change, I. purpurea. We found that AtPGT, AtPET, and IpMBW could as could the numbers of MREs and BREs. The presence positively regulate all tested genes except AtF3′H, AtBAN and of both MRE and BRE is essential for the promoters to Ip3GT. In comparison, AtTTT also activated the same set of respond to IpMBW. In addition, variants of some of the genes (except IpbHLH2), in addition to AtBAN (Fig. 6), con- cis elements identified were also functional, as tested on gruent with its role in PA synthesis; the activation intensity was

IpCHSD-337pro (Supplementary Figure S5). Putting all lower than AtPGT and AtPET for the same constructs. While the information on cis motifs together, we found they con- all TF complexes were capable of regulating CHS, CHI, and verged on ANCNNCC for MREs and CACN(A/T/C)(G/T) F3H, none could activate AtF3′H or Ip3GT. A close search of for BREs (Table 1). The two syntaxes representing nine the 876-bp long 5′-noncoding region of AtF3′H indicated that anthocyanin pathway genes indicate a cis logic shared by MREs and BREs were both absent within the region about the pathway. As the two syntaxes all fall in the proximal 500-bp upstream of the translation start site (Supplementary promoter regions typically within 350 bp from the transla- Figure S6), despite typical BRE and MRE sequences being tion start site (Supplementary Figure S1C), the pathway located at –659 and –584 from the translation start site, respec- genes appear to have a tight motif organization in the prox- tively. Both Ip3GT and IpMYB1 were found lacking BREs in imal region of a promoter in common. their proximal promoters (Supplementary Figure S6), which Regulatory logic of the anthocyanin pathway genes | 3783

Fig. 4. Proximal promoter architectures of the I. purpurea anthocyanin pathway genes. (A–I) Test results of the 5′-noncoding regions of the anthocyanin genes from IpCHS-D to IpWDR1. Each gene is displayed by its dual luciferase results on the left and features of the proximal promoter sequence on the right (see Fig. 3 and Supplementary Figure S3 for mutation details). The proximal regions are marked at the sites of experimentally confirmed BREs (in blue and bold) and MREs (in red and bold, along with the expected TATA box (in italic and bold)). The mutated MREs and BREs shown on the x-axis on the left correspond to those confirmedcis elements on the right. The orientations of the cis elements are cartooned by the insert figure with arrows representing the cis motifs for each gene. The arrows are coloured in accordance with those of MREs and BREs; and those above the line and pointing to the translation start site are oriented following the syntax and the ones below the line are in the complementary layout. The pink arrows are for the weak MREs. The angled arrow shows the start site for the transcription of the gene. The box indicates the coding region. The numbers above the sequences show the relative positions from the translation start site, and those after the gene names the lengths of the regions. 3784 | Zhu et al.

Table 1. Experimentally confirmed cis elements on anthocyanin gene promoters

Method Species Gene BRE BRE location MRE MRE location

Luciferase assay and EMSA I. purpurea CHS-D CACGTG 205–200 AGCTACC 193–187 ACCCACC 321–315 CHI CACGAG 143–148 ACCTGCC 167–161 AACAGCC 153–159 F3H CACGTT 135–130 AACTACC 160–166 ACCAAAC (weak) 97–91 ACCAACC 87–81 F3′H CACCCG* 106–101 ACCCACC 86–80 ACCTACC 70–64 DFR-B CACGTG 152–147 ACCAAAC (weak) 187–181 ACCCCCC, ATCAACC 170–180 ACCAAAC (weak) 133–127 ANS CACGTG 155–150 AACCACC 265–271 ATCCGCC 117–123 3GGT CACTTG 275–280, 177–182 ACCAACC 212–218 bHLH2 CACGTG 321–316 ACCAACC 226–220 WDR1 ACCATCC 257–263 EMSA I. purpurea 3GT ACCTAAC 105–99 Z. mays Bz (3GT) ACCTGCC Wang et al., 2013 A1 (DFR) ACCTACC Malus domestica MYB10 AGCTACC Gerbera hybrida DFR2 AACTTAC

Luciferase assay I. purpurea CHSD-337pro cacgtt agcaacc mutation cacgcg agcgacc mutation Consensus CACN(A/C/T)(G/T) ANCNNCC (ANCNNAC weak)

The numbers for location are relative to the translation start site of the gene. Sequences underlined are ones in the reverse or/and complementary form on the promoters. Those in lower case were tested on mutated IpCHSD-337pro. *, Little binding to IpbHLH2 in EMSA. explained their low promoter activities in the presence of the orientation of the BRE and MRE within the proximal IpMBW (Fig. 1B). A common pattern in the TF complexes promoter and the semi-conservative composition of cis ele- such as IpMBW, AtPGT, and AtPET was that they all dem- ments for MYB and bHLH. The syntaxes for both BRE and onstrated strong power of activation towards the later enzyme MRE contain highly variable nucleotide sites, which may lead genes of the anthocyanin pathway (DFR, ANS, and At3GT). to segmented summaries for the binding sites of the complex. In contrast, AtTTT could activate these genes at a relatively For instance, for 85 R2R3 MYBs of the Arabidopsis genome, low level (Fig. 6). three categories of MYB binding site (MBS) were suggested After excluding AtF3′H and the more-or-less redundant (Romero et al., 1998). MBSI (CNGTTR, or T/CAACNG) AtPET (relative to AtPGT), we analysed the transient expres- partially overlaps with MBSII [GTTWGTTR, or T/ sion data in two-way ANOVAs. The analysis suggests that CAAC(T/A)A(C/A)C], while MBSIIG [GKTWGGTR, or C/ promoter identity is the leading factor for proximal promoter TACC(T/A)A(C/A)C] differs from MBSII by one nucleotide. activity on the anthocyanin pathway, and the interactions The latter two also agree largely with the AC-rich element [A/ between the TF complex and promoter (TF × promoter) CCC(T/A)A(C/A)C/G] proposed by Hartmann et al. (2005). plays a role as significant as the TF complex itself (Table 2). Our summary of MREs for MYB1- like homologues in the When the analysis was performed again with the transient syntax of ANCNN(C/A)C appears to unify these catego- expression data of only Arabidopsis or Ipomoea, the pattern ries in both length and content, providing a basis for future remained the same. Proximal promoters are hence considered expanded inquiries on R2R3 MYBs in the plant genome. In the major force governing gene transcription in the flavonoid this study, however, we mainly focused on the anthocyanin network, irrespective of species. pathway, showing that a systematic approach could enable a deeper understanding of the regulatory mechanism in the context of the related flavonoid branches. Discussion

Our analysis has provided a bilateral view of pathway reg- MBW regulates the pathway genes as one unit ulation, with the molecular mechanism discussed in both promoter architecture and TF–cis interactions featuring the Identification of cis elements in I. purpurea pathways led to MBW complex. The promoter architecture was found follow- a mechanistic understanding of how TFs regulate groups of ing a common logic for the anthocyanin pathway, including genes. In both Arabidopsis and Ipomoea, transcription of the Regulatory logic of the anthocyanin pathway genes | 3785

Fig. 5. A common layout for the BRE and MRE on the anthocyanin gene promoters across species. (A) Results of a bioinformatic survey. The column for putative promoters lists the numbers of 5′-noncoding sequences with their coding regions sharing >40% amino acid identity with those of I. purpurea anthocyanin genes. Of these sequences, the number of promoter sequences having the specific arrangement of BREs and MREs [from 5′ to 3′ as

CACNNG(N1–100)ANCNNCC] is listed as hits for each gene. (B) Frequency distribution of the distance (bp) between BRE and MRE candidates detected on the 159 promoter sequences. The cis candidates were within 100 bp of each other (Supplementary Table S4). The sequences are 200–2000 bp long, covering the 5′-noncoding region upstream of the translation start site. (C) The cis motifs reported by MEME for the peak in (B). The peak consists of 34 sequences. (D) Effect of between-cis distance on promoter activity. The distance alterations were carried out on IpCHSD-337pro. All five constructs had the same cis elements (in bold, MRE to the right and BRE to the left) but different spacer lengths to separate the distances between the cis elements. The effect of changing the distance was analysed in the transient expression system driven by the IpMBW complex. The activity of the native promoter estimated in LUC/RUC was set as 100%. The error bars are from three replicates. This figure is available in colour at JXB online. enzyme-coding genes of the anthocyanin pathway were acti- pathway, as shown by the wide-ranging regulation capacity vated as one unit, just like the pathway regulation found in of AtTTT on all the anthocyanin pathway genes (with the maize (Carey et al., 2004). Since AtPGT and AtPET activate exception of AtF3′H but including At3GT) and AtBAN. The the same set of genes as IpMBW does, their orthology can be result agrees with reported regulation of AtTTT on AtCHS established in functioning as bona fide regulators of the antho- (Appelhagen et al., 2011), AtDFR, AtANS, AtBAN (Baudry cyanin pathway. Interestingly, the PA pathway in Arabidopsis et al., 2004; Xu et al., 2014a), and additionally AtCHI and appears to be regulated together with the anthocyanin AtF3H (Xu et al., 2014b). Against this background, all tested 3786 | Zhu et al.

Fig. 6. Comparisons of the flavonoid network regulation between Arabidopsis and Ipomoea. (A) The anthocyanin pathway backbone and the major branches. The arrows and associated TFs indicate the known regulated segments of the network after this work and that of Stracke et al. (2007). Pathway regulation by four MBW complexes on the flavonoid genes of A. thaliana. (B) and I. purpurea (C) were measured in three replicates in dual luciferase assays. The 5′-noncoding regions of the genes are labelled along the x-axis. The normalized promoter activities (in LUC/RUC) are indicated by the y-axis.

Table 2. Analysis by two-way ANOVA on pathway promoter required for the MBW complex to activate the promoter of a activities of flavonoid genes target gene after the local chromatin structure is cleared. Most pathway genes identified so far do contain the cis regulatory region with the necessary footings for the MBW complex. Source df Mean square F-value Pr > F Power

Promoter 7 1.58 101.19 < 0.0001 0.999 Parallel redundancy of cis elements and multiple TFs TF complex 2 0.43 27.20 < 0.0001 0.999 Promoter × TF complex 14 0.46 29.47 < 0.0001 0.999 feature combinatory regulation The pathway genes show multiple copies of each type of cis Eight proximal promoters included AtBAN, At3GT, AtANS, AtDFR, AtF3H, AtCHI, AtCHS, and IpCHS-D. Three TF complexes were element (particularly those of MREs). At least three func- from Arabidopsis (TT2-TT8-TTG1, PAP1-GL3-TTG1) and I. purpurea tional MREs were experimentally confirmed on each of (IpMYB1-bHLH2-WDR1). Each promoter activity (log transformed) IpCHI-226 , IpF3H-211 , IpF3′H-171 , and IpDFRB- was measured by three replicates of its transient expression pro pro pro (R2 = 0.96, n = 72). The F-value was from F-test on the effect of each 191pro (Fig. 3; Fig. 4; Supplementary Figure S3). For instance, variable (source). if one of the MREs was mutated on IpDFRB-191pro, no phenotypic change could be found on the seedlings trans- MBWs failed to activate AtF3′H and Ip3GT in the transient formed with the mutated construct. Although refractory in expressions, consistent with their lack of the appropriate cis experimental validations of the cis elements involved, the elements in the proximal promoters. These genes are either redundancy in binding sites may provide dynamics as well as regulated independently of MBW or need extra regulatory intricacy to the gene transcription. A full occupancy of the components beyond the proximal regions. For instance, the cis sites in a promoter region, for instance, may result in a grape UFGT was considered to be regulated independently higher level of gene transcription than is the case for partial of the other genes (Boss et al., 1996), and ZmA1 required occupancy of the cis sites. the presence of R-interacting factor 1 to be activated by its The multiple cis elements also mirror functional redun- conspecific MBW (Kong et al., 2012). Regardless, the cases of dancy of TFs. Noticeably, AtTTT activates the same set of AtF3′H and Ip3GT suggest that both an MRE and a BRE are anthocyanin pathway genes as AtPGT and AtPET do, but Regulatory logic of the anthocyanin pathway genes | 3787 possibly at a lower capacity (Fig. 6). This may explain why et al., 1992) but not the E-box (CANNTG). Certain cases sat- the transgenic Arabidopsis plants showed accumulation of isfying the syntax such as CACTTG were proven invalid for anthocyanins but no obvious PA accumulation in the cases IpbHLH2 (Supplementary Figure S2). The mutation tests of of the short promoter sequences (Fig. 2C). The level of PA Bodeau and Walbot (1996) suggested that the R-motif in the synthesis in these cases was probably too low to be visible. form of CACGAC and CACGAG was effective on the Bz2 Given the restricted yet high expression of AtTT2 in develop- (GST) promoter, suggesting that the BRE syntax holds for ing seeds (Nesi et al., 2001) and a low expression of AtPAP1 both monocots and dicots. in developing siliques (Matsui et al., 2004), the extensive The proximity of BRE and MRE in the 5′-noncoding control from AtCHS to AtBAN by the AtTTT complex on region was revealed by both the promoter survey (Fig. 5) and the flavonoid network allows independent PA synthesis in identification of thecis motifs over the anthocyanin pathway the seed coat. Likewise, flavonol synthesis appears to use the (Fig. 4). The influence of cis spacing on promoter activity same principle, where AtCHS, AtCHI, and AtF3H are regu- was previously indicated in the rbcS promoter (Lopez-Ochoa lated by AtMYB11, AtMYB12, and AtMYB111, which also et al., 2007), and further analysed here to show its relevance target the flavonol synthase 1 gene Stracke( et al., 2007). The to transcriptional intensity. Locations of most MREs on the regulatory redundancy of the same genes may offer insur- anthocyanin genes were found close to the translation start ance to different functional branches of the network. For TF site; BREs, on the other hand, are largely located 5′ to the redundancy, maize B alleles can substitute for R in inducing first MRE (Fig. 4). This order of BRE and MRE from 5′ to the expression of Zm3GT (Goff et al., 1990). Also, the triple 3′ in the noncoding region is not only broadly found on the mutant of AtMYB11, AtMYB12, and AtMYB111 still accu- anthocyanin gene promoters (> 53%) but also present in the mulates anthocyanin (Stracke et al., 2007), but AtPAP1has promoter of Rd22, an abscisic acid-mediated dehydration- little to do with PA accumulation (Gonzalez et al., 2008). The responsive gene (Abe et al., 2003). The occurrence of the cis redundancy in both cis and trans factors in pathways enables orientation in different molecular systems driven by MYB a cell-specific regulation of the same pathway by different and bHLH implies an antiquity of the cis arrangement for arrays of regulatory proteins. genes regulated by bHLH and MYB. We have shown that promoter architecture varied greatly Promoter architecture is the most important factor for among genes, facilitating variations of gene expression across pathway regulation pathways that is regulated by the same set of TFs. The regula- tory mechanism of the anthocyanin pathway is mostly about Promoter architecture is believed to take a central position the cooperation of the cis-regulatory elements and the cogni- in transcription (Spitz and Furlong, 2012), but detailed stud- tive TF complex, taking the same logic of TF–DNA inter- ies have been scarce. A promoter’s architecture is character- actions across the pathway. The findings have led to greater ized here by the distribution of all cis elements in the region, understanding of the regulatory mechanism of the antho- including the binding sites of TFs and the basal transcription cyanin pathway and the opportunity for more quantitative apparatus (such as the TATA box), numbers of cis elements, analysis of TF–DNA interactions. It will be interesting to see spatial distance between cis elements, and the relative orienta- how other TFs affect pathway gene regulation. tions of cis elements. These factors may significantly affect a promoter’s activity. For the anthocyanin pathway genes, the cis regulatory sites summarized by the syntax identified for MREs (ANCNNCC) overlap with several previously postu- Supplementary material lated motifs. For example, cctacc(n)7ct, described as an H-box Supplementary data can be found at JXB online. (Loake et al., 1992), shows an overlap of NCNNCC, and the Supplementary Table S1. List of primers used in the exper- consensus (AC)ACC(TA)A(AC)C described by Sablowski iments in this study. et al. (1994) is also mostly congruent with the MRE syntax Supplementary Table S2. Accessions of genes and pro- here; in the case of TTGACTGGnGGnTGCG for C1 bind- moter sequences. ing (Lesnick and Chandler, 1998), GGnGGnT is clearly the Supplementary Table S3. Correlation coefficients of tran- reverse complement of ANCNNCC. The syntax also includes script levels by RT-qPCRs. ANCNACC, shown in a recent consensus summary (Wang Supplementary Table S4. List of 159 sequences of the et al., 2013). We have shown here that both orientations of anthocyanin genes at the 5′-noncoding region. the syntaxes are valid on the anthocyanin genes (Table 1). The Supplementary Figure S1. Characterization of Ipomoea intermittently conserved sites within the MRE syntax have TFs and promoter activities of the target genes. greatly increased the difficulty of cis element predictions. Supplementary Figure S2. Binding capacity of IpbHLH2 Even when all the existing data support the syntax for MREs, in EMSAs. for instance, it provides still more of guidance than a total Supplementary Figure S3. Analysis of cis elements on the replacement of experimental validations since a fraction of anthocyanin proximal promoters of I. purpurea. cases that obey the structure of the syntax have been proven Supplementary Figure S4. Comparisons of TF dosages in nonfunctional, at least to IpMBW (Supplementary Figure dual luciferase assays.

S5). The same is also true for the syntax of BREs. It matches Supplementary Figure S5. Mutations on CHSD-337pro well with the previous G-box (Staiger et al., 1991; Loake showing DNA binding spectrums of IpMYB1 and IpbHLH2. 3788 | Zhu et al.

Supplementary Figure S6. 5′-noncoding sequences of fla- Clegg MT, Durbin ML. 2003. Tracing floral adaptations from ecology to vonoid genes of two species (A. thaliana and I. purpurea). molecules. Nature Reviews Genetics 4, 206–215. deVetten N, Quattrocchio F, Mol J, Koes R. 1997. The an11 locus Supplementary File S1. Bioinformatic survey with Perl controlling flower pigmentation in petunia encodes a novel WD-repeat scripts and instructions. protein conserved in yeast, plants, and animals. Genes and Development 11, 1422–1434. Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L. 2010. MYB transcription factors in Arabidopsis. Trends in Plant Science Funding 15, 573–581. The study was supported by National Science Foundation Durbin ML, McCaig B, Clegg MT. 2000. Molecular evolution of the chalcone synthase multigene family in the morning glory genome. Plant of China (91331116, 31070263) and Chinese Academy of Molecular Biology 42, 79–92. Sciences (KSCX2-YW-N-043). Elomaa P, Uimari A, Mehto M, Albert VA, Laitinen RAE, Teeri TH. 2003. Activation of anthocyanin biosynthesis in Gerbera hybrida (Asteraceae) suggests conserved protein-protein and protein-promoter interactions between the anciently diverged monocots and eudicots. Plant Acknowledgements Physiology 133, 1831–1842. We thank anonymous reviewers for constructive comments, Dr Yan Wang Espley RV, Brendolise C, Chagne D, et al. 2009. Multiple repeats of a for technical assistance during the early phase of the study, the group of Dr promoter segment causes transcription factor autoregulation in red apples. Kang Chong for access to equipment, and members of the ecological and The Plant Cell 21, 168–183. evolutionary research group for discussions. Feller A, Machemer K, Braun EL, Grotewold E. 2011. Evolutionary and comparative analysis of MYB and bHLH plant transcription factors. The Plant Journal 66, 94–116. References Goff SA, Cone KC, Chandler VL. 1992. Functional-analysis of the transcriptional activator encoded by the maize B gene - evidence for a Abe H, Urao T, Ito T, Seki M, Shinozaki K, Yamaguchi-Shinozaki direct functional interaction between two classes of regulatory proteins. K. 2003. Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as Genes and Development 6, 864–875. transcriptional activators in abscisic acid signaling. The Plant Cell 15, Goff SA, Klein TM, Roth BA, Fromm ME, Cone KC, Radicella JP, 63–78. Chandler VL. 1990. Transactivation of anthocyanin biosynthetic genes Albert NW, Davies KM, Lewis DH, Zhang HB, Montefiori M, following transfer of B regulatory genes into maize tissues. The EMBO Brendolise C, Boase MR, Ngo H, Jameson PE, Schwinn KE. 2014. Journal 9, 2517–2522. A conserved network of transcriptional activators and repressors regulates Gollop R, Even S, Colova-Tsolova V, Perl A. 2002. Expression of the anthocyanin pigmentation in eudicots. The Plant Cell 26, 962–980. grape dihydroflavonol reductase gene and analysis of its promoter region. Appelhagen I, Lu GH, Huep G, Schmelzer E, Weisshaar B, Sagasser Journal of Experimental Botany 53, 1397–1409. M. 2011. TRANSPARENT TESTA1 interacts with R2R3-MYB factors and Gonzalez A, Zhao M, Leavitt JM, Lloyd AM. 2008. Regulation of the affects early and late steps of flavonoid biosynthesis in the endothelium of anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional Arabidopsis thaliana seeds. The Plant Journal 67, 406–419. complex in Arabidopsis seedlings. The Plant Journal 53, 814–827. Baudry A, Caboche M, Lepiniec L. 2006. TT8 controls its own Grotewold E, Sainz MB, Tagliani L, Hernandez JM, Bowen B, expression in a feedback regulation involving TTG1 and homologous MYB Chandler VL. 2000. Identification of the residues in the Myb domain and bHLH factors, allowing a strong and cell-specific accumulation of of maize C1 that specify the interaction with the bHLH cofactor flavonoids in Arabidopsis thaliana. The Plant Journal 46, 768–779. R. Proceedings of the National Academy of Sciences, USA 97, Baudry A, Heim MA, Dubreucq B, Caboche M, Weisshaar B, 13579–13584. Lepiniec L. 2004. TT2, TT8, and TTG1 synergistically specify the Guan S, Lu YQ. 2013. Dissecting organ-specific transcriptomes through expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis RNA-sequencing. Plant Methods 9, 42. thaliana. The Plant Journal 39, 366–380. Guerineau F, Lucy A, Mullineaux P. 1992. Effect of two consensus Beld M, Martin C, Huits H, Stuitje AR, Gerats AGM. 1989. Flavonoid sequences preceding the translation initiator codon on gene-expression in synthesis in Petunia hybrida - partial characterization of dihydroflavonol-4- plant-protoplasts. Plant Molecular Biology 18, 815–818. reductase genes. Plant Molecular Biology 13, 491–502. Habu Y, Hisatomi Y, Lida S. 1998. Molecular characterization of the Blackwell TK, Kretzner L, Blackwood EM, Eisenman RN, Weintraub mutable flaked allele for flower variegation in the common morning glory. H. 1990. Sequence-specific DNA-binding by the c-myc protein. Science The Plant Journal 16, 371–376. 250, 1149–1151. Hartmann U, Sagasser M, Mehrtens F, Stracke R, Weisshaar Bodeau JP, Walbot V. 1992. Regulated transcription of the maize Bronze B. 2005. Differential combinatorial interactions of cis-acting elements 2 promoter in electroporated protoplasts requires the C1 and R gene- recognized by R2R3-MYB, BZIP, and BHLH factors control light- products. Molecular and General Genetics 233, 379–387. responsive and tissue-specific activation of phenylpropanoid biosynthesis Bodeau JP, Walbot V. 1996. Structure and regulation of the maize genes. Plant Molecular Biology 57, 155–171. Bronze2 promoter. Plant Molecular Biology 32, 599–609. Hoshino A, Morita Y, Choi JD, Saito N, Toki K, Tanaka Y, Iida S. Boss PK, Davies C, Robinson SP. 1996. Analysis of the expression of 2003. Spontaneous mutations of the flavonoid 3’-hydroxylase gene anthocyanin pathway genes in developing Vitis vinifera L cv Shiraz grape conferring reddish flowers in the three morning glory species. Plant and berries and the implications for pathway regulation. Plant Physiology 111, Cell Physiology 44, 990–1001. 1059–1066. Koes R, Verweij W, Quattrocchio F. 2005. Flavonoids: a colorful model Broun P. 2004. Transcription factors as tools for metabolic engineering in for the regulation and evolution of biochemical pathways. Trends in Plant plants. Current Opinion in Plant Biology 7, 202–209. Science 10, 236–242. Carey CC, Strahle JT, Selinger DA, Chandler VL. 2004. Mutations Kong Q, Pattanaik S, Feller A, Werkman JR, Chai CL, Wang YQ, in the pale aleurone color1 regulatory gene of the Zea mays anthocyanin Grotewold E, Yuan L. 2012. Regulatory switch enforced by basic pathway have distinct phenotypes relative to the functionally similar helix-loop-helix and ACT-domain mediated dimerizations of the maize TRANSPARENT TESTA GLABRA1 gene in Arabidopsis thaliana. The Plant transcription factor R. Proceedings of the National Academy of Sciences, Cell 16, 450–464. USA 109, E2091–E2097. Chang SM, Lu YQ, Rausher MD. 2005. Neutral evolution of the Lenhard B, Sandelin A, Carninci P. 2012. REGULATORY ELEMENTS nonbinding region of the anthocyanin regulatory gene Ipmyb1 in Ipomoea. Metazoan promoters: emerging characteristics and insights into Genetics 170, 1967–1978. transcriptional regulation. Nature Reviews Genetics 13, 233–245. Regulatory logic of the anthocyanin pathway genes | 3789

Lepiniec L, Debeaujon I, Routaboul JM, Baudry A, Pourcel L, Nesi Quattrocchio F, Wing JF, Leppen HTC, Mol JNM, Koes RE. 1993. N, Caboche M. 2006. Genetics and biochemistry of seed flavonoids. Regulatory genes-controlling anthocyanin pigmentation are functionally Annual Review of Plant Biology 57, 405–430. conserved among plant-species and have distinct sets of target genes. Lesnick ML, Chandler VL. 1998. Activation of the maize anthocyanin The Plant Cell 5, 1497–1512. gene a2 is mediated by an element conserved in many anthocyanin Ramsay NA, Glover BJ. 2005. MYB-bHLH-WD40 protein complex and promoters. Plant Physiology 117, 437–445. the evolution of cellular diversity. Trends in Plant Science 10, 63–70. Li Q, Zhao PP, Li J, Zhang CJ, Wang LN, Ren ZH. 2014. Genome-wide Romero I, Fuertes A, Benito MJ, Malpica JM, Leyva A, Paz-Ares analysis of the WD-repeat protein family in cucumber and Arabidopsis. J. 1998. More than 80R2R3-MYB regulatory genes in the genome of Molecular Genetics and Genomics 289, 103–124. Arabidopsis thaliana. The Plant Journal 14, 273–284. Liu YG, Whittier RF. 1995. Thermal asymmetric interlaced PCR - Roth BA, Goff SA, Klein TM, Fromm ME. 1991. C1 dependent and R automatable amplification and sequencing of insert end fragments from P1 dependent expression of the maize Bz1 gene requires sequences with and YAC clones for walking. Genomics 25, 674–681. homology to mammalian myb and myc binding-sites. The Plant Cell 3, Loake GJ, Faktor O, Lamb CJ, Dixon RA. 1992. Combination of 317–325. H-box [CCTACC(N)7CT] and G-box (CACGTG) cis elements is necessary Sablowski RWM, Moyano E, Culianezmacia FA, Schuch W, Martin for feedforward stimulation of a chalcone synthase promoter by the C, Bevan M. 1994. A flower-specific myb protein activates transcription of phenylpropanoid-pathway intermediate p-coumaric acid. Proceedings of phenylpropanoid biosynthetic genes. The EMBO Journal 13, 128–137. the National Academy of Sciences, USA 89, 9230–9234. Sainz MB, Grotewold E, Chandler VL. 1997. Evidence for direct Logemann E, Birkenbihl RP, Ulker B, Somssich IE. 2006. An activation of an anthocyanin promoter by the maize C1 protein and improved method for preparing Agrobacterium cells that simplifies the comparison of DNA binding by related Myb domain proteins. The Plant Arabidopsis transformation protocol. Plant Methods 2, 16. Cell 9, 611–625. Lopez-Ochoa L, Acevedo-Hernandez G, Martinez-Hernandez A, Shangguan XX, Xu B, Yu ZX, Wang LJ, Chen XY. 2008. Promoter of a Arguello-Astorga G, Herrera-Estrella L. 2007. Structural relationships cotton fibre MYB gene functional in trichomes of Arabidopsis and glandular between diverse cis-acting elements are critical for the functional trichomes of tobacco. Journal of Experimental Botany 59, 3533–3542. properties of a rbcS minimal light regulatory unit. Journal of Experimental Spitz F, Furlong EEM. 2012. Transcription factors: from enhancer binding Botany 58, 4397–4406. to developmental control. Nature Reviews Genetics 13, 613–626. Lu YQ, Du J, Tang JY, Wang F, Zhang J, Huang JX, Liang WF, Wang Staiger D, Becker F, Schell J, Koncz C, Palme K. 1991. Purification LS. 2009. Environmental regulation of floral anthocyanin synthesis in of tobacco nuclear proteins binding to a CACGTG motif of the chalcone Ipomoea purpurea. Molecular Ecology 18, 3857–3871. synthase promoter by DNA affinity-chromatography. European Journal of Lu YQ, Xie LL, Chen JN. 2012. A novel procedure for absolute real-time Biochemistry 199, 519–527. quantification of gene expression patterns. Plant Methods 8, 9. Stracke R, Ishihara H, Barsch GHA, Mehrtens F, Niehaus K, Ludwig SR, Habera LF, Dellaporta SL, Wessler SR. 1989. Lc, a Weisshaar B. 2007. Differential regulation of closely related R2R3-MYB member of the maize R gene family responsible for tissue-specific transcription factors controls flavonol accumulation in different parts of the anthocyanin production, encodes a protein similar to transcriptional Arabidopsis thaliana seedling. The Plant Journal 50, 660–677. activators and contains the myc-homology region. Proceedings of the Tiffin P, Miller RE, Rausher MD. 1998. Control of expression patterns National Academy of Sciences, USA 86, 7092–7096. of anthocyanin structural genes by two loci in the common morning glory. Matsui K, Tanaka H, Ohme-Takagi M. 2004. Suppression of the Genes and Genetic Systems 73, 105–110. biosynthesis of proanthocyanidin in Arabidopsis by a chimeric PAP1 Tuerck JA, Fromm ME. 1994. Elements of the maize a1 promoter repressor. Plant Biotechnology Journal 2, 487–493. required for transactivation by the anthocyanin b/c1 or phlobaphene p Matus JT, Poupin MJ, Canon P, Bordeu E, Alcalde JA, Arce- regulatory genes. The Plant Cell 6, 1655–1663. Johnson P. 2010. Isolation of WDR and bHLH genes related to flavonoid Vogt T. 2010. Phenylpropanoid biosynthesis. Molecular Plant 3, 2–20. synthesis in grapevine (Vitis vinifera L.). Plant Molecular Biology 72, 607–620. Walker AR, Davison PA, Bolognesi-Winfield AC, James CM, Srinivasan N, Blundell TL, Esch JJ, Marks MD, Gray JC. 1999. Morita Y, Saitoh M, Hoshino A, Nitasaka E, Iida S. 2006. Isolation The TRANSPARENT TESTA GLABRA1 locus, which regulates trichome of cDNAs for R2R3-MYB, bHLH and WDR transcriptional regulators differentiation and anthocyanin biosynthesis in Arabidopsis, encodes a and identification ofc and ca mutations conferring white flowers in the WD40 repeat protein. The Plant Cell 11, 1337–1349. Japanese morning glory. Plant and Cell Physiology 47, 457–470. Nesi N, Jond C, Debeaujon I, Caboche M, Lepiniec L. 2001. The Wang HL, Guan S, Zhu ZX, Wang Y, Lu YQ. 2013. A valid strategy for Arabidopsis TT2 gene encodes an R2R3 MYB domain protein that acts as precise identifications of transcription factor binding sites in combinatorial a key determinant for proanthocyanidin accumulation in developing seed. regulation using bioinformatic and experimental approaches. Plant The Plant Cell 13, 2099–2114. Methods 9, 34. Ness SA, Marknell A, Graf T. 1989. The v-myb oncogene product Winkel-Shirley B. 2001. It takes a garden. How work on diverse plant binds to and activates the promyelocyte-specific mim-1 gene. Cell 59, species has contributed to an understanding of flavonoid metabolism. 1115–1125. Plant Physiology 127, 1399–1404. Ochman H, Gerber AS, Hartl DL. 1988. Genetic applications of an Xu WJ, Grain D, Bobet S, Le Gourrierec J, Thevenin J, Kelemen Z, inverse polymerase chain-reaction. Genetics 120, 621–623. Lepiniec L, Dubos C. 2014a. Complexity and robustness of the flavonoid transcriptional regulatory network revealed by comprehensive analyses of Park KI, Ishikawa N, Morita Y, Choi JD, Hoshino A, Iida S. 2007. A MYB-bHLH-WDR complexes and their targets in Arabidopsis seed. New bHLH regulatory gene in the common morning glory, Ipomoea purpurea, Phytologist 202, 132–144. controls anthocyanin biosynthesis in flowers, proanthocyanidin and phytomelanin pigmentation in seeds, and seed trichome formation. The Xu WJ, Lepiniec L, Dubos C. 2014b. New insights toward the Plant Journal 49, 641–654. transcriptional engineering of proanthocyanidin biosynthesis. Plant Signaling and behavior 9, e28736. Payne CT, Zhang F, Lloyd AM. 2000. GL3 encodes a bHLH protein that regulates trichome development in Arabidopsis through interaction with Xu WJ, Lepiniec L, Dubos C. 2015. Transcriptional control of flavonoid GL1 and TTG1. Genetics 156, 1349–1362. biosynthesis by MYB–bHLH–WDR complexes. Trends in Plant Science 20, 176–185. Pazares J, Ghosal D, Wienand U, Peterson PA, Saedler H. 1987. The regulatory c1 locus of Zea mays encodes a protein with homology to myb Zhang F, Gonzalez A, Zhao MZ, Payne CT, Lloyd A. 2003. A network protooncogene products and with structural similarities to transcriptional of redundant bHLH proteins functions in all TTG1-dependent pathways of activators. The EMBO Journal 6, 3553–3558. Arabidopsis. Development 130, 4859–4869. Pires N, Dolan L. 2010. Origin and diversification of basic-Helix-Loop- Zufall RA, Rausher MD. 2004. Genetic changes associated with floral Helix proteins in plants. Molecular Biology and Evolution 27, 862–874. adaptation restrict future evolutionary potential. Nature 428, 847–850.