Neuron, Vol. 7, 177-181, August, 1991, Copyright 0 1991 by Cell Press

Mechanisms for Diversity Review in Gene Expression Patterns

Kevin Struhl tional machinery is qualitatively different from pro- Department of Biological Chemistry karyotic RNA polymerase holoenzymes that are suffi- and Molecular Pharmacology cient for efficient transcriptional initiation. Harvard Medical School In eukaryotic organisms, gene expression requires Boston, Massachusetts 02115 activator proteins that bind to specific promoter sequences and stimulate the basic transcriptional The human central nervous system contains about machinery (Mitchell and Tjian, 1989; Ptashne, 1988; 1012 cells whose actions define the world as we know Struhl, 1989). Thus, a first-order description of a partic- it. Although the number of classically defined cell ulartranscriptional regulatory pattern is simply a mat- types is rather small, the regulatory complexity dis- ter of which specific activator proteins can interact played by individual genes indicates that many, and at the promoter. In this view, a set of genes can be perhaps nearly all, of the cells in the central nervous coordinately regulated if their promoters contain re- system are distinct with respect to which genes are lated DNA sequence elements that can interact with expressed. In addition to this cellular specificity, gene a common activator protein. In general, the related regulatory patterns are constantly changing through- promoter elements are not identical, but strongly re- out development and in response to extracellular sig- semble a consensus sequence, which is often func- nals. As a result, transcriptional regulatory patterns in tionallyoptimal. This abilityto interact efficientlywith the central nervous system are extraordinarily com- a range of related sequences allows for regulatory plex. As a rough estimate, probably between IO4 and and evolutionary flexibility. However, the number of IO5 genes are expressed, many of them in unique and distinct DNA-binding specificities is far too limited to unexpectedly complicated cellular patterns (Bier et account for the diversity of transcriptional regulatory al., 1989; McKay and Hockfield, 1982; Sutcliffe, 1988). patterns. More importantly, generating complex ex- This enormous diversity and flexibility in gene ex- pression patterns would be impossible if a single pression patterns is accomplished with a relatively activator protein were sufficient to enhance transcrip- small number of factors. There are un- tion, because all genes containing a common pro- doubtedly hundreds of transcription factors and pos- moter element would be coordinately expressed in a sibly as many as a few thousand, but more than this given cell type. seems very unlikely. It is evident, therefore, that a The fundamental aspect of the RNA polymerase II “one regulatory protein per gene” model, such as fre- machinery that addresses the diversity problem is that quently applies to prokaryotic organisms, is grossly efficient transcription requires the combinatorial ac- inadequate. Instead, combinatorial action of tran- tion of activator proteins. A single activator protein scriptional regulatory proteins is necessary for multi- bound at one site in the promoter typically confers a cellular organisms to generate the requisite diversity very low level of gene expression. In contrast, tran- in gene expression patterns. This review will discuss scription is stimulated much more efficiently (factors the molecular mechanisms involved in generating di- of 5-1000) by the combination of multiple activator versity. proteins bound at distinct promoter sites. Most im- portantly, such transcriptional synergy is frequently Combinatorial Activation of Transcription observed even when the multiple binding sites are recognized by distinct, and even evolutionarily dis- The most important mechanism for achieving diver- tant, proteins. As an example of such promiscuity, the sity, combinatorial activation, relieson the basic prop- combination of the mammalian glucocorticoid recep- erties of theeukaryotic transcriptional machinery. For tor and the GAL4 protein is much more effective protein-coding genes, this machinery consists of RNA than either protein alone. Although the mechanism(s) polymerase II and several auxiliary factors including of synergistic and promiscuous activation remains to TFIID, which binds to the conserved TATA element be elucidated, the requirement for multiple activator found in most eukaryotic promoters (Sawadogo and proteins at a promoter permits a very large number Sentenac, 1990). By itself, RNA polymerase II is tran- of possible combinations, each of which might be scriptionally inactive on normal DNA templates. How- biologically distinct. ever, after binding of TFIID to the TATA element and The regulatory flexibility due to transcriptional syn- subsequent assembly of the other factors into an ac- ergy is greatly enhanced by the ability of activator tive transcription complex, RNA polymerase I I can ini- proteins to function bidirectionally at long and vari- tiate synthesis at a site 25-30 bp downstream of the able distances either upstream or downstream from TATA element. However, this “basic transcriptional the mRNA initiation site. Such action at a distance machinery” is not sufficient to promote transcription is believed to reflect interactions between distantly in vivo because promoters containing only the TATA bound proteins that are brought into close proximity element and initiation region are essentially inactive. by looping out of the intervening DNA. In general, an Thus, the eukaryotic RNA polymerase II transcrip- activator protein becomes less efficient when bound Figure 1. Generation of a Complex Expres- sion Pattern for a Hypothetical Gene Con- taining Seven Enhancer Elements Up- stream of the TATA Element and mRNA initiation Site Each of the eight cells (A-H) contains a par- ticular array of activator proteins (I-7) bound directly to the promoter region (closed boxes, with distinctive shadings in- dicating individual members of a multipro- tein family as in 2 and 7) or indirectly via a protein-protein interaction (between X and 7 in cell G); protein 6 is cell type spe- cific, whereas protein 5 is nearly ubiqui- tous. As a simple arbitrary rule, any three activator proteins can stimulate the basic transcriptional machinery unless a repres- sor protein is also bound to the promoter (at site 7 in cell F). Four activator proteins permit higher expression levels (cell E), and two activator proteins are insufficient (ceil C) unless there is a synergistic protein-pro- tein interaction (cell H).

at increasing distances from the initiation site, but overall pattern. Finally, the principle of combinatorial individual proteinsdisplayconsiderablevariability(in activation results in regulatory networks in which sets this regard, the common distinction between “pro- of genes are coordinately controlled by specific envi- moter” and “enhancer” binding proteins is artificial). ronmental or developmental signals, yet the individ- Whatever the precise molecular mechanisms in- ual genes can be members of many different sets. volved, the important principle is that a promoter can be subject to the action of numerous proteins whose Families of Transcription Factors target sequences can be spread out over a large chro- mosomal region. Indeed, there are already examples Although the combinatorial activation process clearly in flies and mammals in which sequences 30-50 kb generates an impressive amount of diversity, it is lim- from the initiation site play an important regulatory ited bythe number of distinct DNA-binding specificit- role (Grosveld et al., 1987; Karch et al., 1990). Protein- ies of the activator proteins. The number of possible binding sites are often tightly clustered into en- recognition sequences is limited bythe small number hancers that can be moved as a functionally autono- of base pairs (typically 6-8) that are involved in high mous unit, but such a genomic organization is not affinity protein-DNA interactions. Moreover, it is very essential. likelythat the inherent chemistries of nucleotides and Given these properties, an enormous diversity of amino acids severely restrict which DNA sequences transcriptional regulatory patterns can be generated can serve as protein-binding sites. This restriction is (see Figure 1). In simple cases, dedicated promot- compounded by structural and evolutionary con- ers responding to a single activator protein can be straints on the number of DNA-binding motifs (e.g., arranged by having multiple copies of a common helix-turn-helix, zinc finger, bZIP, and helix-loop- binding site. More typically, genes whose promoters helix). contain multiple distinct sites could be efficiently ex- Multiprotein families of transcription factors that pressed only when certain developmental or environ- recognize related DNA sequences constitute an im- mental conditions are met simultaneously. Redun- portant diversity mechanism for overcoming some of dant promoters that contain more elements than the above constraints. Examples of such families are necessary can permit expression under several differ- homeodomain proteins that control key develop- ent, but specific, circumstances. For the most com- mental decisions (Levine and Hoey, 1988); steroid hor- plex expression patterns, such as observed for genes mone receptors (Evans and Hollenberg, 1988); the that determine cell fate or that are responsible for the AP-1 and ATF/CREB proteins, which utilize the bZlP synthesis of neurotransmitters, numerous protein- structural motif (Curran and Franza, 1988; Hai et al., binding sites are scattered over large regions of DNA 1989); and the helix-loop-helix proteins such as myoD, such that a wide variety of different protein combina- E121E47, and achaete-scute (Murre et al., 1989). Such tions can activate transcription. Genetic experiments protein families are likely to regulate a core group of in Drosophila often reveal that individual elements genes in a variety of cell types and developmental are required for strikingly discrete portions of the stages or in response to extracellular signals. How- Review: Diversity in Gene Expression 179

ever, the individual proteins in each of these families, 1981), and as is likely for the initial response to the though structurally and functionally related, do not bicoid gradient morphogen in early Drosophila em- necessarily have identical DNA-binding specificities. bryos (Driever et al., 1989; Struhl et al., 1989), coopera- Thus, the precise DNA sequence of a promoter ele- tive DNA binding provides a means bywhich the level ment can determine which particular members of a of gene expression is extremely sensitive to small multiprotein familywill regulate the expression of the changes in protein concentration. Third, interactions gene. Moreover, the spectraof genes affected by indi- between two DNA-binding transcription factors can vidual family members could differ dramatically, es- either augment or inhibit gene expression, as ob- pecially in the common situation in vivo in which the served for the steroid hormone receptors and the AP-I relatively large number of potential binding sites are protein family (Diamond et al., 1990; Schtile et al., competing for the limited amounts of protein. 1990; Yang-Yen et al., 1990) as well as for yeast MCMI Heterodimer formation between individual mem- and the cell type regulators al and a2 (Bender and bers of a protein family can provide an additional di- Sprague, 1987; Keleher et al., 1988). Fourth, hetero- versity mechanism that increases the number of merit protein complexes such as CTFI (Chodosh et DNA-binding transcription factors. Such dimerization al., 1988) and HAP2/3/4 (Olesen and Guarente, 1990) interactions can be mediated by the leucine zipper can be necessary for a single DNA-binding event, (Landschulz et al., 1988) or helix-loop-helix (Murre et whereas complexes such as alla2 can have DNA se- al., 1989) motifs, and the resulting heterodimers can quence specificities that differ from either of the in- be functionally distinct from the parental homo- dividual components (Goutte and Johnson, 1988). dimers with respect to their DNA-binding or their Whatever the particular molecular mechanism, the transcriptional activation properties (i.e., inherent regulatory combinations mediated by protein-pro- strength or regulated activity). In a given cell type or tein interactions add a new level of diversity beyond under particular environmental circumstances, the combinatorial activation and multiprotein families. constellation of transcription factors of a particular family will depend on the amounts of the individual Modification of Protein Activity proteins present and on the relative strengths of the dimerization interactions. Although much of the diversity in multicellularorgan- isms depends simply upon which transcription fac- Protein-Protein Interactions tors are present in the various cell types, variations in the activities of the proteins also make a major contri- The diversity mechanisms described above make the bution. Differences in protein activity can occur at the simplifying assumption that a particular pattern of level of DNA binding, inherent transcriptional activa- gene expression reflects synergistic activation that oc- tion potential, or protein-protein interactions; hence curs in the absence of direct interactions between the they amplify all the diversity mechanisms described specific transcription factors bound at the promoter. above. One standard means by which protein activity However, such protein-protein interactions clearly can be altered is by phosphorylation or by other cova- occur, and they can result in dramatic transcriptional lent modifications. In the case of phosphorylation, the effects. Although relatively few such protein-protein major protein kinases are activated by second mes- interactions have been characterized in detail at the sengers (CAMP, inositol phosphates, diacylglycerol, present time, it is very likely that they serve as an and calcium) that are generated by signal transduction important source of regulatory diversity. pathways; however, other protein kinases are almost Protein-protein interactions between transcription certainly involved as well. Another classic way to af- factors can influence gene regulation by a variety of fect protein activity is by allosteric interaction with distinct molecular mechanisms. First, a DNA-binding small molecules (e.g., hormones, amino acids, and protein with low transcriptional activity can be con- CAMP). Both of these mechanisms for altering the ac- verted to a potent activator by interacting with a sepa- tivity of specific transcription factors are utilized ex- rate non-DNA-binding protein that contains a strong tensively in prokaryotic organisms, and they provide acidic activation region. For example, the acidic activa- the major basis for modulating gene expression in tion domain of herpesvirus VP16 interacts with the response to extracellular signals. homeodomain of Ott-1 (Stern et al., 1989). Second, Eukaryotic cells have a novel way to modify protein interactions between two proteins can result in the activity effectively, namely, regulation of nuclear lo- cooperative binding of both proteins to target DNA calization. In the case of NF-KB, the protein is trans- sequences in the promoter under conditions in which located to the nucleus only under particular condi- neither protein can bind alone. Cooperative DNA tions that inactivate a specific inhibitor protein (IKB) binding can involve two molecules of the same pro- which otherwise sequesters NF-KB in the cytoplasm tein, as is the case for steroid hormone receptors (Baeuerle and Baltimore, 1988). Other members of the (Schmid et al., 1989; Tsai et al., 1989), or two distinct NF-KB family, the rel oncoprotein and the dorsal mor- protein species, as in the MCMl/a2 interaction (Kel- phogen of Drosophila, presumably function in a simi- eher et al., 1988). As initially described for develop- lar manner (Gilmore, 1990). A different mechanism for mental decisions of h (Johnson et al., regulating nuclear localization is exemplified by the Neuron 180

glucocorticoid receptor, which in the absence of hor- Curran, T., and Franza, B. R., Jr. (1988). Fos and Jun: the AP-1 mone is excluded from the nucleus by virtue of an connection. Cell 55, 395-397. interaction with a heat shock protein (Picard et al., Diamond, M., Miller, J. N., Yoshinaga, S. K., and Yamamoto, K. R. (1990). c-jon and c-fos levels specify positive or negative 1990). glucocorticoid regulation from a composite CRE. Science 249, 1266-1272. Negative Regulation Driever, W., Thoma, G., and Nusslein-Volhard, C. (1989). Deter- mination of spatial domains of zygotic gene expression in the By counterbalancing the actions of activator proteins, Drosophila embryo by the affinity of binding sites for the bicoid transcriptional repressors provide another funda- morphogen. Nature 340, 363-367. Evans, R. M., and Hollenberg, S. M. (1988). Zinc fingers: gilt by mental mechanism for achieving diversity. Repressors association. Cell 52, 1-3. inhibit gene expression by a variety of molecular Cilmore, T. D. (1990). NF-KB, KBFI, dorsal, and related matters. mechanisms, including competitive DNA binding to Cell 62, 841-843. coincident or overlapping promoter elements, inacti- Coutte, C., and Johnson, A. D. (1988). al protein alters the DNA vation of a bound activator protein, or direct repres- binding specificity of a2 repressor. Cell 52, 875-882. sion (silencing) of the basic transcriptional machinery Grosveld, F., van Assendelft, G. B., Creaves, D. R., and Kollias, (Levine and Manley, 1989). Regardless of the particular B. (1987). Position-independent, high-level expression of the hu- molecular mechanism, repressors contribute to diver- man Bglobin gene in transgenic mice. Cell 57, 975-985. sity by using the basic principles of combinatorial ac- Hai, T., Liu, F., Coukos, W. J., and Green, M. R. (1989). Transcrip tion factor ATF cDNA clones: an extensive family of leucine zip- tion, multiprotein families, heterodimerization, pro- per proteins able to selectively form DNA-binding heterodimers. tein-protein interactions, and modification of protein Genes Dev. 3, 2083-2090. activity. Moreover, multiprotein families often in- Johnson, A. D., Poteete, A. R., Lauer, G., Sauer, R. T., Ackers, clude both activators and repressors, and protein- C. R., and Ptashne, M. (1981). 1 repressor and cro-components protein interactions can have synergistic or antagonis- of an efficient molecular switch. Nature 294, 217-223. tic consequences for gene expression. Karch, F., Bender, W., and Weiffenbach, B. (1990). abdA expres- sion in Drosophila embryos. Genes Dev. 4, 1573-1587. Summary Keleher, C. A., Goutte, C., and Johnson, A. D. (1988). The yeast cell-type-specific repressor a2 acts cooperatively with a non-cell- type-specific protein. Cell 53, 927-936. Despite the relatively low number of transcriptional Landschulz, W. H., Johnson, P. F., and McKnight, S. L. (1988).The regulatory proteins, the number of possible combina- leucine zipper: a hypothetical structure common to a new class tions that act in particular cell types at specific times of DNA binding proteins. Science 240, 1759-1764. and in response to appropriate extracellular stimuli is Levine, M., and Hoey, T. (1988). Homeobox proteins as se- enormous. In considering the regulatory patterns of quence-specific transcription factors. Cell 55, 537-540. a particular gene, the critical determinants of diversity Levine, M., and Manley, J. L. (1989). Transcriptional repression of eukaryotic promoters. Cell 59, 405-408. are the specific promoter sequences that govern the McKay, R. D. C., and Hockfield, S. 1. (1982). Monoclonal antibod- potential DNA-binding proteins which function either ies distinguish antigenicallydiscrete neuronal types in the verte- directly or indirectly in association with other pro- bratecentral nervous system. Proc. Natl.Acad. Sci. USA 79,6747- teins; constellations of proteins in the nucleus and 6751. their transcriptional activities; and synergistic or antag- Mitchell, P., and Tjian, R. (1989). Transcriptional regulation in onistic protein-protein interactions. Although some mammalian ceils by sequence-specific DNA binding proteins. Science 245, 371-378. of these regulatory principles operate in , the combinatorial nature of the transcriptional activa- Murre, C., McCaw, P. S., Vaessin, H., Caudy, M., Jan, L. Y., Jan, Y. N., Cabrera, C. V., Buskin, j. N., Hauschka, S. D., Lassar, A. B., tion process, the existence of multiprotein families, Weintraub, H., and Baltimore, D. (1989). Interactions between and the prevalance of heteromeric protein complexes heterologous helix-loop-helix proteins generate complexes that are characteristic of eukaryotic cells and are essential bind specifically to a common DNA sequence. Ceil 58,537-.5&t. for the extraordinary complexity of gene expression Olesen, J. T., and Guarente, L. (1990). The HAP2 subunit of yeast patterns in multicellular organisms. CCAAT transcriptional activator contains adjacent domains for subunit association and DNA recognition: model for the HAP2I 3/4 complex. Genes Dev. 4, 1714-1729. References Picard, D., Khursheed, B., Garabedian, M. J., Fortin, M. G., Lind- Baeuerle, P. A., and Baltimore, D. (1988). IkB: a specific inhibitor quist, S., and Yamamoto, K. R. (1990). Reduced levels of hsp90 of the NF-KB transcription factor. Science 242, 540-546. compromise steroid receptor action in viva. Nature348,166-168, Bender, A., and Sprague, G. F., Jr. (1987). MATal protein, a yeast Ptashne, M. (1988). How eukaryotic transcriptional activators transcription activator, binds synergistically with a second pro- work. Nature 335, 683-689. tein to a set of cell-type-specific genes. Cell 50, 681-691. Sawadogo, M., and Sentenac, A. (1990). RNA polymerase B (Ii) Bier, E., Vaessin, H., Shepherd, S., Lee, K., McCall, K., Barbel, S., and general transcription factors. Annu. Rev. Biochem. 59,71?- Ackerman, L., Carretto, R., Uemura,T., Crell, E.,Jan, L. Y. and Jan, 754. Y. N. (1989). Searching for pattern and mutation in theDrosophila Schmid, W., Strahle, U., Schutz,G., Schmitt,J., and Stunnenberg, genome with a P-/acZ vector. Genes Dev. 3, 1273-1287. H. (1989). Glucocorticoid receptor binds cooperatively to adja- Chodosh, L. A., Baldwin, A. S., Carthew, R. W., and Sharp, P. A. cent recognition sites. EMBO J. 8, 2257-2263. (1988). Human CCAAT-binding proteins have heterologous sub- Schiile, R., Rangarajan, P., Kliewer, S., Ransone, L. J., Bolado, units. Cell 53, 11-24. I., Yang, N., Verma, I. M., and Evans, R. M. (1990). Functional Review: Diversity in Gene Expression 181

antagonism between oncoprotein c-Jun and the glucocorticoid receptor. Cell 62, 1217-1226. Stern, S., Tanaka, M., and Herr, W. (1989). The Ott-1 homeodo- main directs formation of a multiprotein-DNA complex with the HSV transactivator VP16. Nature 347, 624-630. Struhl, G., Struhl, K., and Macdonald, P. M. (1989). The gradient morphogen bicoid is a concentration-dependent transcriptional activator. Cell 57, 1259-1273. Struhl, K. (1989). Molecular mechanisms of transcriptional regu- lation in yeast. Annu. Rev. Biochem. 58, 1051-1077. Sutcliffe, 1. C. (1988). Messenger RNA in the mammalian central nervous system. Annu. Rev. Neurosci. 77, 157-198. Tsai, S. Y., Tsai, M.-J., and O’Malley, B. W. (1989). Cooperative binding of steroid hormone receptors contributes to transcrip- tional synergism at target enhancer elements. Cell 57,443~448. Yang-Yen, H.-F., Chambard, J.-C., Sun, Y.-L., Smeal, T., Schmidt, T. J., Drouin, J., and Karin, M. (1990). Transcriptional interference between c-Jun and the glucocorticoid receptor: mutual inhibi- tion of DNA binding due to direct protein-protein interaction. Cell 62, 1205-1215.