Cis-Regulatory Code of Stress-Responsive Transcription in Arabidopsis Thaliana
Total Page:16
File Type:pdf, Size:1020Kb
Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana Cheng Zoua, Kelian Suna, Joshua D. Mackalusoa,b,c, Alexander E. Seddona, Rong Jinb, Michael F. Thomashowd,e, and Shin-Han Shiua,1 Departments of aPlant Biology, bComputer Science and Engineering, cBiochemistry and Molecular Biology, and eCrop and Soil Sciences, and dDepartment of Energy Plant Research Laboratory, Michigan State University, East Lansing, MI 48824 Edited by Philip Benfey, Duke University, Durham, NC, and approved July 19, 2011 (received for review March 3, 2011) Environmental stress leads to dramatic transcriptional reprogram- essential for stress-responsive transcription (e.g., refs. 3 and 4). ming, which is central to plant survival. Although substantial The importance of a few CRE combinations has also been dem- knowledge has accumulated on how a few plant cis-regulatory onstrated (17–19), indicating that stress-responsive genes are elements (CREs) function in stress regulation, many more CREs re- regulated by multiple transcription factors. In addition, the roles of main to be discovered. In addition, the plant stress cis-regulatory plant CRE copy number and location in transcriptional regulation code, i.e., how CREs work independently and/or in concert to spec- of plant stress have also been studied for a few CREs (5, 20–22). ify stress-responsive transcription, is mostly unknown. On the basis Although these pioneering studies have clearly demonstrated the of gene expression patterns under multiple stresses, we identified existence of stress cis-regulatory codes in plants, there are few a large number of putative CREs (pCREs) in Arabidopsis thaliana examples and a global description of CRE-based stress regulatory with characteristics of authentic cis-elements. Surprisingly, biotic rules is not available. cis and abiotic responses are mostly mediated by two distinct pCRE To globally decipher -regulatory code, detailed knowledge of cis CREs is required. However, many experimentally verified (re- superfamilies. In addition, we uncovered -regulatory codes spec- “ ” ifying how pCRE presence and absence, combinatorial relationships, ferred to as known ) plant CREs in public depositories, such as the Arabidopsis Gene Regulatory Information Server (23) and location, and copy number can be used to predict stress-responsive Cis expression. Expression prediction models based on pCRE combina- Plant -Acting Regulatory DNA Elements database (24), are derived from multiple plant species and some are highly similar or tions perform significantly better than those based on simply pCRE identical. Furthermore, known plant CREs are mostly available in presence and absence, location, and copy number. Furthermore, the form of consensus sequences with little information on binding instead of a few master combinatorial rules for each stress condi- site degeneracy. Thus, to complement our current knowledge of tion, many rules were discovered, and each appears to control only plant CREs, we first identified putative CREs (pCREs) involved a small subset of stress-responsive genes. Given there are very few in stress response through analysis of A. thaliana stress expression documented interactions between plant CREs, the combinatorial fi data. We then demonstrated that these pCREs exhibit charac- rules we have uncovered signi cantly contribute to a better under- teristics of authentic cis-elements. Finally, the presence, combi- cis standing of the -regulatory logic underlying plant stress response nation, copy number, and location of these pCREs were used to and provide prioritized targets for experimentation. establish the cis-regulatory code of stress-responsive gene ex- pression in A. thaliana. machine learning | motif discovery | transcription factor binding site Results and Discussion nvironmental stress, both abiotic and biotic, is the key con- Multiple pCREs Implicated in Regulating Abiotic and Biotic Stress- Estraint to plant productivity (1). Under stressful environments, Responsive Transcription Belong to Two Motif Superfamilies. The plants undergo significant physiological and/or morphological assumption that genes with similar expression patterns are likely alterations (2). Such plastic responses are particularly relevant for coregulated and have the same CREs has been applied to plants, which need to respond to ambient conditions due to their identify plant CREs (25, 26). Therefore, we set out to identify sessile nature (2). At the molecular level, one of the most imme- pCREs involved in regulating stress-responsive transcription on diate responses to stress is the extensive reprogramming of tem- the basis of coexpression patterns of A. thaliana genes under 16 poral and spatial transcription. In the past two decades, substantial abiotic/biotic stress conditions (SI Methods and Table S1). Each progress has been made in understanding how this reprogramming A. thaliana gene was categorized as up-regulated, down-regu- occurs via the interaction of transcription factors with a handful of lated, or not changed under each condition, and a motif dis- cis-regulatory elements (CREs). Examples of these well-charac- covery pipeline was used to identify 1,215 pCREs from putative terized CREs include abscisic acid-responsive element (ABRE) promoter regions (Fig. 1A, SI Methods, and Dataset S1). The (3, 4), dehydration-responsive element (DRE) (5), C-repeat (6), sites where these pCREs were mapped are preferentially found and W-box (7). In Arabidopsis thaliana, there are an estimated in the promoters of stress-responsive genes (up- and/or down- 1,346–2,290 putative transcription factor genes (8, 9), and many regulated) compared with promoters of genes without significant are likely involved in regulating stress-responsive transcription. changes under stress (Fig. 1B and Dataset S2 A and B). Among However, the corresponding CREs of most of these transcription these pCREs, 346 are highly similar [Pearson’s correlation co- factors are not known. In addition, it is not clear to what degree the efficient (PCC) ≥ 0.9] (SI Methods) to 52 known CREs (Dataset known CREs can explain expression changes in response to stress. On the basis of knowledge of transcription factors and CREs, transcriptional regulatory models in yeast and mammals have been Author contributions: C.Z., R.J., M.F.T., and S.-H.S. designed research; C.Z., K.S., J.D.M., established that can explain the transcription profiles of different A.E.S., and S.-H.S. performed research; R.J. contributed new reagents/analytic tools; C.Z., developmental stages and environmental conditions (10–12). One J.D.M., and A.E.S. analyzed data; and C.Z., M.F.T., and S.-H.S. wrote the paper. approach to building transcriptional regulatory models is to con- The authors declare no conflict of interest. sider the presence or absence of CREs and their combinatorial This article is a PNAS Direct Submission. relationships, copy number, and/or location relative to their reg- Freely available online through the PNAS open access option. ulatory targets. Such models, or cis-regulatory codes, have been 1To whom correspondence should be addressed. E-mail: [email protected]. applied to explain gene expression in yeast (13, 14), humans (15), This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. and fruit flies (16). In plants, a number of CREs are known to be 1073/pnas.1103202108/-/DCSupplemental. 14992–14997 | PNAS | September 6, 2011 | vol. 108 | no. 36 www.pnas.org/cgi/doi/10.1073/pnas.1103202108 Downloaded by guest on October 2, 2021 Abiotic Biotic pCREs A B C PCC AtGenExpress 1 dataset: 15 stress conditions Cold OsmoticSalt WoundingGenotoxicDroughtHeat UV-B DC3000Flg22GST-NPPHrcC-HrpZP.infestansPsphavrRpm α 0.8 Up only Fuzzy k-means Cluster size Dn only 0.6 clustering < 60 genes Up & Dn Up + Dn 0.4 Co-expression clusters: β 0.2 2,693 0 De novo motif Discovery Length 6-15 bp Motif set I 111,254 PWMs pCRE PCC Salt3 genes with D & without pCRE 2.0 Enrichment p-value < 5% of test random PWMs 0.0 0.97 Motif set II 0.97 14,078 pCREs IC 0.90 Motif clustering PCC ≤ 0.9 0.86 & merging 0.85 Putative CREs 1,215 E 0.92 Motif UPGMA Branch length 0.82 clustering ≤ 0.24 0.80 pCRE families 250 0.80 0.79 Fig. 1. Putative CRE (pCRE) identification pipeline and pCREs relevant to differential gene expression under various stress conditions. (A) pCRE identification pipeline. P value, Fisher’s exact test; PCC, Pearson’s correlation coefficient; PWM, position weight matrix. (B) pCREs significantly enriched in the promoters of differentially regulated genes for each condition and treatment duration (latter not labeled). The pCREs were ordered according to results of complete linkage clustering of the enrichment patterns across conditions. Up only, pCREs enriched among up-regulated genes; down (Dn) only, pCREs enriched in down-regulated genes; Up & Dn, pCRE enriched in both; Up + Dn, pCRE enrichment only when up- and down-regulated genes are jointly considered. Dotted rectangles: two major pCRE clusters α and β.(C) PCC matrix indicating the degrees of similarity between pCRE PWMs. The ordering of pCREs is the same as in B. The gray areas indicate the correspondence of the α- and β-clusters. (D) Sequence logos of example pCREs and their similarities (PCC) to ABRE (Left and Middle). IC: information content. (Right) Presence of example pCREs in the promoters of salt-3h up-regulated genes. Each column represents one gene and whether its promoter contains the pCRE in question (yellow) or not (blue). Only genes containing one or more example pCREs are shown. (E) Sequence logos and similarities of example pCREs to W-box (Left and Middle). (Right) Presence of example pCRE sites in the promoters of flagellin-1h up-regulated genes. S2C). To obtain a lower-bound estimate of how many distinct α or the β superfamily are likely involved in regulating distinct but CREs may be represented by these 1,215 motifs, we collapsed overlapping sets of target genes and may be bound preferentially the pCREs into 250 motif families (Dataset S2D), using a strin- by distinct transcription regulators.