12 A Primer on Gene Regulation

To understand the principles Genes are the units of nucleotide sequence in DNA that specify a protein or Goal of gene regulation. non-coding RNA. The full complement of genes in the genome ofEscherichia coli is about 5,000, and that in the human genome is about 20,000. These genes Objectives are expressed via their transcription into RNA and subsequent (in the case After this chapter, you should be able to of protein-coding genes) translation into protein. Importantly, not all genes are expressed at the same time. In bacteria, some genes are expressed at a • distinguish between negative and constant rate, but others are turned ON (transcribed) or OFF in response to positive control. cues from the environment. In multicellular organisms, such as the embryo • calculate K for repressor binding to eq of an animal, genes are turned ON in a cell- or tissue-specific manner at the DNA. right time and in the right place in response to developmental cues. Gene • explain the lac AND gate. regulation is a vastly complicated and fascinating subject encompassing an extraordinary range of molecular mechanisms. This chapter is intended as a primer for introducing the classic example of the operon in bacteria and the concepts of negative and positive control.

Genes involved in lactose metabolism are grouped in a single transcription unit, the lactose operon The subject of gene regulation derives from the seminal discoveries on the lactose (lac) operon of E. coli made by François Jacob and Jacque Monod while they were working at the Institut Pasteur in Paris and for which they shared the Nobel Prize in Physiology and Medicine in 1965. An operon is two or more genes that are co-transcribed from a common promoter as part of a single transcription unit. Thus, an operon is transcribed as a single transcript that contains the coding sequences for two or more proteins. Chapter 12 A Primer on Gene Regulation 2

OH O HO + 6 O 6 CO + 6 H O + Energy HO OH 2 2 2 OH

Glucose metabolism Glucose

cellular enzymes

OH OH β-linkage OH OH OH OH O O β-galactosidase O O O OH + HO HO HO OH HO HO OH OH OH OH OH H2O

Lactose metabolism Lactose Glucose

Figure 1 β-galactosidase cleaves lactose, producing glucose that can fuel cellular metabolism

(The grouping of genes into is common in bacteria but rare in eukaryotes.) The lactose orlac operon contains three genes, lacZ, lacY, and lacA, but we will only be concerned with the most promoter-proximal gene, lacZ, which encodes β-galactosidase. β-galactosidase is an enzyme that enables E. coli to metabolize the sugar lactose. The preferred carbon and energy source for E. coli is glucose, but E. coli will instead metabolize lactose if no glucose is present in the growth medium. Lactose is a composed of the sugars galactose and glucose. β-galactosidase cleaves the glycosidic bond (a β-glycosidic bond that links the 1 position of galactose to the 4 position of glucose) that connects galactose and glucose, thereby releasing free glucose and free galactose, which another cellular enzyme converts into glucose (Figure 1). If E. coli is growing on its preferred carbon source, glucose, then it would be wasteful to produce β-galactosidase. On the other hand, if the growth medium contains lactose and not glucose, then production of β-galactosidase is essential for growth and viability. How does E. coli cope with these conflicting requirements? The answer is that transcription of the operon is subject to a regulatory mechanism that turns ON the operon when lactose is present. (Shortly, we will come to the interesting circumstance when glucose and lactose are present simultaneously.)

The negatively regulates thelac operon How does lactose turn ON transcription of the ? Transcription is controlled by a regulatory protein known as the lactose operon repressor or LacI. The gene for LacI is located just upstream of thelac operon and is transcribed from its own separate promoter. The repressor is a tetramer of four LacI subunits (i.e., it has quaternary structure). The LacI tetramer binds to a nucleotide sequence known as the operator, which overlaps with Chapter 12 A Primer on Gene Regulation 3

Figure 2 Expression of the lac repressor (LacI) operon is negatively regulated by LacI

operator

promoter lacZ lacY lacA

transcription genes encoded by lac operon +1

upstream downstream

the promoter for the operon; by binding, LacI blocks RNA polymerase from accessing the promoter and hence blocks transcription (Figure 2). LacI is therefore a paradigmatic example of negative regulation in which the binding of a regulatory protein to DNA represses transcription. (We will come to positive regulation presently.) How does the lac operon escape repression to turn on the synthesis of β-galactosidase when lactose is present in the growth medium instead of glucose? The answer is that lactose acts as an inducer that binds to LacI, preventing the repressor from binding to the operator (Figure 3). Because LacI forms a tetramer, the inducer has four binding sites on the repressor. The inducer turns ON (derepresses) the operon by preventing the binding of the repressor to the operator and allowing RNA polymerase to bind. (Actually, the inducer is not lactose per se but rather a slightly modified form of lactose called allolactose. When lactose enters the cell, some of it is converted to allolactose by β-galactosidase. The two differ only in that the 1 position of galactose is linked to the 4 position of glucose in lactose and to the 6 position of glucose in allolactose. That the inducer is allolactose and not lactose is an oddity of nature that need not concern us further in what follows.)

repressor in high- a nity conformation repressor bound to inducer inducer operator DNA + operator DNA

transcription transcription repressed not repressed Figure 3 Inducer triggers the dissociation of the repressor from the operator Notice that the repressor exists in two conformations, as indicated in the cartoon by circular and rectangular shapes. Chapter 12 A Primer on Gene Regulation 4

inducer repressor bound Keq < 1 to inducer

repressor repressor high-a nity low-a nity conformation conformation Figure 4 Inducer shifts the equilibrium between thelac repressor’s high-affinity and low-affinity DNA binding conformations towards the low-affinity conformation

How exactly does the inducer remove the repressor from the operator? The inducer’s effect is another example of Le Châtelier’s principle (Figure 4). The repressor exists in two conformations: a conformation in which it has high affinity for DNA and a conformation in which it has low affinity for DNA. The two conformations are in equilibrium, with the high-affinity conformation being strongly favored. The inducer, however, only binds to the low-affinity conformation. Therefore, when lactose is present, the inducer binds to the low-affinity conformation and removes it from the high-affinity/low- affinity equilibrium. In order for the ratio of low-affinity to high-affinity repressor to remain equal to the equilibrium constant, there must be a net conversion of high-affinity repressor to the low-affinity conformation. This depletes the amount of repressor in the high-affinity conformation. Taken as a whole, the presence of inducer perturbs the equilibrium between low- affinity and high-affinity conformations, decreasing the amount of high- affinity repressor and ultimately decreasing the amount of repressor bound to DNA. Let’s look more closely at how the repressor prevents RNA polymerase from binding to the promoter. When RNA polymerase binds to the promoter, it physically contacts a stretch of DNA that extends upstream to roughly position −40 relative to the start site of transcription (recall that the sigma factor contacts the −35 and −10 sequences) and downstream to roughly position +20. Meanwhile, the stretch of DNA contacted by the repressor, the operator, overlaps with the downstream region of the promoter, covering the transcription start site and extending past the end of the promoter (Figure 5). Thus, when the repressor binds to the operator, it physically occludes RNA polymerase.

DNA covered by RNA polymerase transcription

5’ AATGTGAGTTAGCTCACTCATTAGGCACCCCAGGCTTTACATTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCACAC 3’ 3’ TTACACTCAATCGAGTGAGTAATCCGTGGGGTCCGAAATGTAAATACGAAGGCCGAGCATACAACACACCTTAACACTCGCCTATTGTTAAAGTGTG 5’ −35 sequence −10 sequence +1 CAP site DNA covered by repressor

Figure 5 Binding of the repressor to the operator occludes RNA polymerase Shown are the DNA binding sites for RNA polymerase, the repressor, and CAP, which is introduced below. Chapter 12 A Primer on Gene Regulation 5 Figure 6 The operator is composed of two palindromic “half-sites” Shown is a surface representation of the repressor. The repressor exists in the cell as a tetramer composed of four polypeptide chains; however, only the two polypeptide repressor chains that contact the operator are shown (cyan and green). The lac operator DNA is also shown. On the bottom is a diagram of the two palindromic “half-sites” of the operator. Dashes indicate bases that are not identical between the half-sites. lac operator DNA

5’GGAATTGTGAGCGGATAACAATTTC 3’ 3’CCTTAACACTCGCCTATTGTTAAAG 5’ 5’ AATTGT-A-C 3’ 3’C-A-TGTTAA 5’

“half-site” “half-site”

lac operator The sequence of bases that makes up the operator is present in two copies in an inverted repeat (or head-to-head) orientation, meaning that the operator is a palindrome (Figure 6). Because of its symmetry, the operator can be divided into two “half-sites.” Two of the four polypeptide subunits of the tetrameric repressor contact the operator, with one subunit contacting one half-site and the other subunit contacting the other half-site. Proteins that exhibit quaternary structure and bind to repeated sequences in DNA is a common theme among DNA-binding proteins both in bacteria and eukaryotes.

The equilibrium binding constant describes the affinity of the repressor for the operator How do we measure the affinity of the repressor for the operator, that is, its equilibrium binding constant? Like a typical equilibrium constant, the equilibrium binding constant can be expressed as the quotient of the product concentrations divided by the reactant concentrations, as shown in Figure 7. The value of the binding constant is proportional to the repressor’s affinity for DNA. In other words, the conformation of thelac repressor that has a high affinity for the operator would have a higher binding constant than the conformation that has a low affinity for the operator.

[R-O] Figure 7 The equilibrium binding K = eq [R] [O] constant (Keq) for the formation of the repressor-operator complex Repressor + Operator Repressor-Operator Complex “R” “O” “R-O” Chapter 12 A Primer on Gene Regulation 6

increasing repressor concentration*

[R]: 5x10-15 1x10-14 5x10-14 1x10-13 5x10-13 1x10-12 5x10-12

-

operator bound by repressor migration

direction of DNA free operator DNA +

*[O] is the same in each lane. repressor concentration at which 50% of DNA is bound

Figure 8 Electrophoretic mobility shift assays are used to experimentally determine the equilibrium binding constant

A simple experimental procedure for measuring the binding constant is the electrophoretic mobility shift assay (Figure 8). A segment of DNA containing an operator site is subjected to electrophoresis through a gel in the presence of an electric field in which the negative pole is at one end (top end in the figure) of the gel and the positive pole at the other. The DNA is applied to the end with the negative pole. The DNA molecules, being negatively charged, migrate through the gel, away from the negative pole and towards the positive pole at the bottom of the gel. This procedure is carried out in the presence of increasing concentrations of repressor. Free DNA molecules migrate with the highest mobility, whereas DNA molecules that are bound by repressor migrate more slowly through the gel owing to their cargo of repressor protein. This experiment results in a gel that shows the proportion of DNA that is bound to the repressor at each concentration of repressor. A plot of the data yields a simple saturation curve in which DNA binding increases with increasing repressor concentration, as shown in Figure 9. A control experiment (not shown in the figures) employs a DNA segment lacking an operator to which the repressor has low affinity.

The binding constant (Keq) can be calculated from the saturation curve by determining the repressor concentration at which half of the DNA is bound to repressor. At this concentration, the concentration of repressor bound to DNA [R-O] is equal to the concentration of unbound operator [O]. Thus, in

the equation for the binding constant, [R-O] and [O] cancel out, leaving Keq being equal to the inverse of the repressor concentration (1/[R]). This yields an equilibrium binding constant for repressor binding of 1013 M−1. (Strictly speaking, this is an approximation, because the concentration of free repressor molecules when 50% of the operator DNA is bound is slightly less than the original repressor concentration used to set up the electrophoretic Chapter 12 A Primer on Gene Regulation 7

100

80

60

[R-O] 1 1 13 Keq = = = = 1x10 40 [R] [O] [R] 1x10-13

[R-O] = [O] when 50% of the This is the speci c value operator DNA is bound by the of [R] at which 50% of the % DNA bound to repressor % DNA bound to 20 repressor. Since these values operator DNA is bound are equal, they cancel out in by the repressor. the equation.

0 10-14 1x10-13 3x10-13 5x10-13 7x10-13 9x10-13 Repressor concentration (M)

repressor concentration when 50% of DNA is bound by repressor.

Figure 9 The equilibrium binding constant is the inverse of the [R] at which 50% of the operator DNA is bound by repressor The percentage of operator DNA that is bound by repressor is plotted versus the repressor concentration, [R]. These data represent the data shown in Figure 8. The value of [R] at which 50% of the operator is bound by repressor, 1 x 10−13 M, is indicated.

mobility shift experiment. This is because some of the repressor molecules are bound to the operator. Since the concentration of repressor in these experiments is much larger than the concentration of operator DNA, the discrepancy is negligible.) The equilibrium binding constant for the binding of a repressor to DNA is sometimes referred to as an association constant or, alternatively, as a dissociation constant, which is simply the inverse of the association constant or, in this case, 10−13 M.

Thelac operon is subject to both positive and negative control Although the lac operon was and is a paradigmatic example of negative control, it later emerged that it is also a classic example of positive regulation. Many genes, indeed most, are subject to positive control. That is, their expression depends on an activator, which is a DNA-binding protein that turns ON transcription by binding to DNA (as opposed to blocking transcription as in negative regulation). In addition to being subject to negative control by repressor binding to the operator at the downstream end of the promoter, the lac operon is subject to positive control by an activator called CAP. CAP binds to a site just upstream of the promoter such that both CAP and RNA polymerase can sit side-by-side on the DNA. This is in contrast to the repressor, whose binding site overlaps with the binding site for RNA polymerase. Chapter 12 A Primer on Gene Regulation 8 Why does RNA polymerase require the assistance of CAP to bind to the -35 -10 promoter in the presence of inducer? If inducer is present, then, as we have seen, the LacI repressor is not bound to the operator and hence RNA consensus TTGACA TATAAT polymerase should be able to bind to the promoter and initiate transcription. The answer is that the lac promoter is a poor match to the −35 and −10 lac promoter TTTACA TATGTT consensus sequences. As you will recall, the ideal −35 and −10 sequences are 5’-TTGACA-3’ and 5’-TATAAT-3’, respectively. The promoter for the Thelac operon requires Figure 10 lac operon differs from these ideal sequences at three positions, as shown positive regulation because its in Figure 10. Hence, the lac promoter is an intrinsically weak promoter to promoter sequence deviates from which RNA polymerase only weakly binds. This is the basis for positive the consensus control; an activator compensates for the promoter’s poor match to the consensus sequence by helping to facilitate the binding of RNA polymerase. How does CAP facilitate the binding of RNA polymerase? It does so by directly contacting the RNA polymerase, and the favorable free energy from this protein-protein interaction helps to stabilize the binding of RNA polymerase to the otherwise weak promoter (Box 1). Situations such as these in which an activator stabilizes the binding of RNA polymerase to DNA are often referred to asrecruiting RNA polymerase.

Box 1 CAP recruits RNA polymerase

What is the nature of the contact site between CAP and RNA polymerase? The cartoon of Figure 11 shows that RNA polymerase is a heteromeric complex consisting of subunits known as α, β, and β’ in addition to the sigma (σ) subunit, which contacts the −35 and −10 sequences. The α subunit has two domains, an N-terminal domain (NTD) and a C-terminal domain (CTD). CAP, which binds to DNA as a dimer, makes contact with the RNA polymerase in the C-terminal domain of the α subunit, which protrudes from the back side of the RNA polymerase. CAP cAMP RNA polymerase (α CTD) α NTD α CTD CAP β β’ σ

CAP site -35 -10 transcribed into mRNA

Figure 11 CAP enhances RNA polymerase’s ability to bind to the promoter RNA polymerase is shown in shades of purple; CAP is shown in green. Figure 12 A dimer of CAP∙cAMP bound to DNA contacts the C-terminal domain of the α subunit of RNA polymerase Shown is a CAP∙cAMP dimer (green and cyan) bound to its contact site on RNA polymerase (red) and to the CAP binding site on DNA. Chapter 12 A Primer on Gene Regulation 9 Binding of CAP to DNA depends on a cyclic nucleotide

NH2 Just as the affinity of the LacI repressor for DNA is governed by a small molecule, the inducer allolactose, the ability of CAP to adhere to its binding N N site is strongly influenced by a small molecule,3’,5’-cyclic adenosine O N N monophosphate (cAMP) (Figure 13). Whereas the lactose inducer lowers O the affinity of the LacI repressor for its operator, cAMP stimulates the binding of CAP to its binding site in DNA. O P O OH What is the meaning of subjecting the lac operon to positive control by a O complex of CAP and cAMP? The answer is that the concentration of cAMP Figure 13 3’,5’-cyclic adenosine in the cell is not constant. Rather, it varies in a manner that is influenced monophosphate (cAMP) by the carbon source. If the cells are growing on glucose, then the levels of cAMP in the cell are low. But if the cell is growing on a carbon source other than glucose (e.g., lactose), then the levels of the cyclic nucleotide are high. Thus, subjecting thelac operon to positive control by CAP∙cAMP ties expression of the lac operon to whether or not the cells are growing on glucose. If cells are growing on glucose, the preferred carbon source for E. coli, then cAMP levels will be low, and the lac operon will be OFF whether or not lactose is present. If, on the other hand, the only carbon source is lactose, then cAMP levels will be high, enabling CAP to bind to its binding site and allowing the lac operon to be ON.

Thelac operon is subject to an AND gate As we have seen, the lac operon is subject to both positive and negative control. When lactose is present, the LacI repressor dissociates from the operator. But the presence of lactose is not the only condition that must be met in order for the lac operon to be expressed. If, and only if, two conditions are met—glucose is absent and lactose is present—is the operon ON (Figure 14). Thus, ifE. coli is growing on glucose alone or on a mixture of glucose and lactose, then the operon is OFF. Thus, thelac operon is said to be subject to the logic of an AND gate, borrowing the term from computer science. From the cell’s perspective, the AND gate is exquisitely sensible. E. coli does not wastefully express the lac operon when its favored food source glucose is available, nor it does express the operon when both glucose and lactose are absent and the cells are growing on some other carbon source (e.g., maltose).

Summary Thelac operon in E. coli is a three-gene transcription unit that includes the gene for β-galactosidase, an enzyme that converts the disaccharide lactose into galactose and glucose. The lac operon is subject to negative control by the LacI repressor. The repressor binds to an operator site that overlaps with the promoter for the operon, thereby occluding RNA polymerase and blocking transcription. Repression is relieved by the presence of lactose, from which the inducer is derived. The repressor, a tetramer, exists in an equilibrium between a conformation with a high affinity for the operator and a conformation with a low affinity. When lactose is present, the inducer binds to the low-affinity conformation Chapter 12 A Primer on Gene Regulation 10

repressor Figure 14 The lac operon is regu- glucose lactose lated by an AND gate operator Yes No Transcription CAP site -35 -10 OFF

Yes Yes Transcription CAP site -35 -10 OFF

CAP

No No Transcription CAP site -35 -10 OFF

RNA polymerase

No Yes Transcription CAP site -35 -10 ON

of the repressor, draining it from the equilibrium and decreasing the amount of repressor that is bound to DNA. A simple technique for measuring the binding of repressor to operator is the electrophoretic mobility shift assay, which takes advantage of the impaired mobility in an electric field of DNA molecules to which repressor is bound. The equilibrium binding constant of the repressor for its operator can be derived from the experimentally determined concentration of repressor

at which half of the operator DNA is bound by using the equation Keq = [R‑O]/([R][O]). The lac operon is also subject to positive regulation by CAP and cAMP, which compensates for the poor match to consensus of the −35 and −10 sequences of the operon’s promoter. A dimer of CAP∙cAMP binds to a site just upstream of the promoter, contacting RNA polymerase and stabilizing its binding to the promoter (that is, recruiting the RNA polymerase). Because cAMP levels are depressed under conditions of growth on glucose, the operon is OFF when cells are grown on their preferred carbon source whether or not lactose is present. Thus, thelac operon is subject to an AND gate in which two conditions must be met in order for the operon to be expressed: the absence of glucose and the presence of lactose.