st8703.qxd 07/03/2000 09:38 Page 739

Research Article 739

Combining structure-based design with phage display to create new Cys2His2 dimers Scot A Wolfe, Elizabeth I Ramm and Carl O Pabo*

Background: Several strategies have been reported for the design and Address: Howard Hughes Medical Institute and selection of novel DNA-binding proteins. Most of these studies have used Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. Cys2His2 zinc finger proteins as a framework, and have focused on constructs that bind DNA in a manner similar to Zif268, with neighboring fingers *Corresponding author. connected by a canonical (Krüppel-type) linker. This linker does not seem ideal E-mail: [email protected] for larger constructs because only modest improvements in affinity are Key words: dimerization, DNA recognition, leucine observed when more than three fingers are connected in this manner. Two zippers, phage display, structure-based design, strategies have been described that allow the productive assembly of more zinc fingers than three canonically linked fingers on a DNA site: connecting sets of fingers using linkers (covalent), or assembling sets of fingers using dimerization Received: 13 March 2000 28 April 2000 domains (non-covalent). Accepted: Published: 21 June 2000 Results: Using a combination of structure-based design and phage display, we Structure 2000, 8:739–750 have developed a new dimerization system for Cys2His2 zinc fingers that allows the assembly of more than three fingers on a desired target site. Zinc finger 0969-2126/00/$ – see front matter constructs employing this new dimerization system have high affinity and good © 2000 Elsevier Science Ltd. All rights reserved. specificity for their target sites both in vitro and in vivo. Constructs that recognize an asymmetric binding site as heterodimers can be obtained through substitutions in the zinc finger and dimerization regions.

Conclusions: Our modular zinc finger dimerization system allows more than

three Cys2His2 zinc fingers to be productively assembled on a DNA-binding site. Dimerization may offer certain advantages over covalent linkage for the CORE recognition of large DNA sequences. Our results also illustrate the power of Metadata, citation and similar papers at core.ac.uk

Provided by Elsevier - Publishercombining Connector structure-based design with phage display in a strategy that assimilates the best features of each method.

Introduction created both by design [5–8] and by selection via phage

Cys2His2 zinc fingers function as recognition domains that display [9–12]. Recurring patterns observed in the DNA facilitate protein–DNA, protein–RNA, and protein–protein contacts by zinc fingers have led to proposals for a zinc interactions [1]. Each zinc finger is a minidomain of finger ‘recognition code’, but recognition is sufficiently approximately 30 amino acids that folds into a ββα motif in complicated that selection via phage display remains the the presence of zinc. Tandem repeats of fingers are usually most reliable method for the generation of fingers with employed for DNA or RNA recognition, and DNA recog- novel specificities [1,13]. The assembly of designed or nition is mediated by the α helix of each finger. The selected fingers into functional proteins also is compli- α helix inserts into the major groove such that the i, i + 3 cated by the partial overlap in the recognition sites of ridge of the helix is aligned with neighboring bases along neighboring fingers. Methods that help overcome this one strand of the DNA [2]. Fingers that bind in a canonical problem with multifinger proteins have recently been manner (i.e., like Zif268) recognize overlapping four-base- described [14–16]. pair subsites using amino acids at positions –1, 2, 3 and 6 of the α helix (Figure 1) [1]. Binding sites for individual Many linkers found between DNA-binding zinc fingers fingers overlap because the amino acid at position 2 in the conform to the consensus sequence TGEKP (in single- recognition helix from one finger often contacts the same letter amino acid code; Figure 1b). These canonical base pair as the amino acid at position 6 in the recognition linkers are well ordered in the structures of zinc helix from the neighboring N-terminal finger [3]. finger–DNA complexes [3,4,17,18], and mutagenesis studies indicate that they play an important role in DNA

Cys2His2 zinc finger proteins have proven to be a versa- recognition [19,20]. Because Zif268 (and to a lesser extent tile motif for the recognition of a variety of different SP1) served as the prototype for the design and selection DNA sequences. Using the structure of Zif268 [4] as a of fingers with novel specificities, TGEKP linkers have framework, zinc fingers with novel specificities have been been used almost exclusively in the development of st8703.qxd 07/03/2000 09:38 Page 740

740 Structure 2000, Vol 8 No 7

Figure 1 sequence larger than ten base pairs (i.e., proteins with more than three fingers) may be required to obtain effi- (a) 3′ 5′ cacy as a gene therapy construct. Two approaches have C G R 6 been used to make proteins that specifically recognize 2 E 3 G C3 D 2 longer sites. These involve either insertion of a flexible R -1 Finger 3 linker between two sets of canonically linked fingers [25], C G4 T 6 or attachment of a dimerization domain that allows coop- A T5 C G H 3 erative assembly of the zinc finger proteins on the DNA 6 D 2 C G R -1 [26]. Femtomolar dissociation constants have been 7 Finger 2 observed for a six-finger construct (268//NRE) that has C G8 R 6 G C E 3 two three-finger proteins connected by a flexible linker 9 D 2 [25]. Improvements in affinity and specificity observed C G R Ð1 10 Finger 1 with this six-finger construct are encouraging, but they A T11 5′ 3′ are still far below the values that would be (b) anticipated — based on the chelate effect — for a con- struct containing two linked three-finger proteins. Ð1123456 789 Finger 1 PYACPVESCDRRFSRSDELTRHIRIHTGQK Finger 2 PFQCRI--CMRNFSRSDHLTTHIRTHTGEK Many naturally occurring transcription factors use dimer- Finger 3 PFACDI--CGRKFARSDERKRHTKIHLRQK ization to enhance their affinity and specificity, and het- =DNA recognition = Zinc coordination = Canonical linker erodimerization is sometimes used in natural regulatory Structure systems to provide an additional level of transcriptional control. In principle, dimerization of zinc fingers on DNA Wild-type Zif268 recognizes its binding site using three Cys2His2 zinc could offer a number of advantages for the assembly of fingers. (a) Overall arrangement seen in the 1.6 Å structure of Zif268 bound to DNA [3]. Each finger inserts its α helix into the major groove, four or more fingers [27,28]. Cooperative recognition of a and DNA recognition is mediated by amino acids at positions –1, 2, 3 binding site provides a sharper transition between the and 6 from this helix, with the majority of contacts made to one strand bound and unbound states as a function of protein concen- of the DNA (primary strand). Interactions that are mediated by tration, thus providing tighter control of the ‘on’ and ‘off’ hydrogen bonds between the fingers and DNA are indicated by arrows regulatory states. In the presence of a large number of in the panel on the right; hydrophobic interactions are indicated by non-cognate binding sites, cooperative binding to the open circles. (b) Amino acid sequence of the three Cys2His2 zinc fingers from Zif268 [43]. Adjacent fingers in this structure are DNA may also provide faster equilibration rates than pos- connected by ‘canonical’ linkers (TGQKP and TGEKP, respectively; sible with a covalent fusion of the same number of fingers. indicated in avocado) [1]. The residues from each finger involved in defining DNA specificity are indicated in blue; residues involved in zinc coordination are indicated in purple. The secondary structure elements Our group recently described a prototype dimerization are indicated above the sequence with arrows indicating β strands and system for zinc fingers in which the coiled-coil dimeriza- the cylinder denoting α helices. tion domain of Gal4 is fused via a long linker to the C ter- minus of two fingers from Zif268 [26]. This fusion protein (ZFGD1) bound its DNA sequence with a reasonable –18 2 multifinger proteins with novel specificities. Several affinity (Kd =10 M ) and can accommodate zinc fingers groups have noted that as the number of TGEKP-linked with different sequence specificities, thus allowing recog- fingers increases from one to two to three, there is an nition of asymmetric binding sites. Building on our experi- accompanying increase in DNA-binding affinity. ence with ZFGD1, we developed an improved (Roughly speaking, the affinity increases from millimolar dimerization system for zinc fingers. Computer modeling to micromolar to nanomolar.) However, the attachment of (structure-based design) suggested that it should be possi- additional fingers (four or more in total) using the ble to create a pseudo-continuous α helix joining a leucine TGEKP linker provides only modest additional improve- zipper dimerization domain directly to the DNA recogni- ment in affinity [21–23]. The accompanying structural tion helix of a zinc finger. This fusion should give a much and energetic problems arising from the presence of four more rigid dimer interface than with Gal4 and, conse- or more fingers are not entirely clear, but these difficul- quently, help to define a preferred half-site spacing for the ties may originate from the distortion of the DNA that complex. This new design also juxtaposes the binding zinc fingers elicit upon binding in the major groove [24]. sites for each monomer, allowing recognition of a contigu- (In the resulting complex, DNA assumes a variant B-form ous DNA sequence. (The ZFGD1 construct, like Gal4 conformation with about 11 base pairs per turn and an itself, contained an unrecognized 13-base-pair region at enlarged major groove.) the center of its site.)

Given the complexity of the human genome (~3 × 109 In this paper, we describe the creation of a dramatically base pairs), zinc finger proteins with a recognition improved dimerization system for zinc fingers that is fully st8703.qxd 07/03/2000 09:38 Page 741

Research Article Design and selection of zinc finger dimers Wolfe, Ramm and Pabo 741

modular: the region of this fusion can be between the histidines was chosen to encourage a confor- changed to alter its dimerization properties and the zinc mation similar to an α helix in this region (as observed in

fingers can be substituted to modulate its DNA speci- the Tramtrack structure [32]). Note that a 310 helix is typi- ficity. These dimeric zinc finger proteins recognize DNA cally observed in fingers (like those in Zif268) that have cooperatively and have good specificity and affinity for the only three intervening residues between the histidines desired target sites both in vitro and in vivo. Given the [1]. We expected that this four-residue spacing would be potential advantages of dimerization for the recognition of more conducive to the generation of a fully continuous large DNA target sites, dimeric zinc finger proteins α helix at this junction in Zif23–GCN4. provide a promising alternative platform for the creation of constructs that may be used for gene therapy. Zif23–GCN4 was expressed and purified, and dissociation constants were measured for sites with several different Results half-site spacings (Figure 3). In these oligonucleotides, Design and DNA-binding properties of the Zif23–GCN4 the six-base-pair binding sites for each Zif23–GCN4 chimera monomer are arranged in an inverted (i.e., symmetric) ori- Structure-based design [29] was used to create an initial entation and are spaced so that they overlap by zero, one model for the fusion between two zinc fingers from Zif268 or two base pairs. (These are hereafter referred to as the 0, and a leucine zipper dimerization domain. Our first designs used the GCN4 leucine zipper which can form Figure 2 homodimers and which shares a relevant base-specific ↔ interaction (arginine guanine) with that of finger 3 from (a) Zif268. Initial juxtaposition of the structures of the GCN4 [30,31] and Zif268 [3,4] protein–DNA complexes pro- GCN4 ceeded by overlaying the basic recognition helix of GCN4 = leucine zipper and finger 3 of Zif268 using the arginine ↔ guanine inter- action to establish the registers of the recognition helices. Zif268 This model led us to believe that a direct fusion between = fingers the leucine zipper of GCN4 and the recognition helix 2 and 3 from finger 3 of Zif268 could generate a continuous helix without altering the orientation of the zinc fingers in the major groove.

A number of models of a dimeric Zif268–GCN4 complex were then constructed, using the structure-based design approach to predict the DNA half-site spacing that this (b) chimera would prefer. The best alignment of the recogni- Zif268 Ð1123456 789 finger 3 ~PFACDICGRKFARSDERKRHTKIHLRQK~ tion helices occurred with GCN4 bound to the pseudo- GCN4 ~SDPAALKRARNTEAARRSRARKLQRMKQLEDKVEEL~ dyad site (TGACTCA) in an arrangement that created a one-base-pair overlap between the half-sites recognized (c)

by each monomer (Figure 2a). It is noteworthy that a cor- Zif23ÐGCN4 ~PFACDICGRKFARSDERKRHRKLQHMKQLEDKVEEL~ responding one-base-pair overlap is observed in the wild- Zif268Linker GCN4

type GCN4 structure on the pseudo-symmetric site. = DNA recognition = Coiled-coil interaction = Zinc coordination Structure A fusion protein containing fingers 2 and 3 of Zif268 and the leucine zipper of GCN4 was created based on this Structure-based design of the Zif23–GCN4 chimera. (a) Model of the model. The final, shared arginine ↔ guanine interaction Zif23–GCN4 chimera bound to a DNA site with a one base pair overlap between the recognition sites of neighboring monomers. This provides the fundamental ‘crossover point’ for the fusion model is a fusion of the DNA-bound structures of Zif268 [3] and of the two sequences (Figure 2b). Thus our fusion protein GCN4 [30]. (The base pair recognized by both of the neighboring (Zif23–GCN4) has the Zif268 sequence preceding this monomers is indicated in orange.) (b) Alignment of the amino acid shared arginine (boxed in Figure 2b) and generally has sequences of finger 3 of Zif268 and the DNA-binding domain of GCN4 based on the best superposition of these structures. This GCN4 sequences following this arginine (Figure 2c). It structural superposition aligns the final arginine in each structure that was necessary to include two histidine residues in the makes a base-specific contact (boxed). The secondary structure subsequent region to complete the zinc coordination site elements are indicated next to each sequence: arrows indicate β for finger 3. We placed one histidine immediately after strands and cylinders denote α helices. Tildes at the ends indicate that the common arginine (thus matching its position in each displayed sequence is only a segment of the entire gene. (c) Sequence of the region that fuses fingers 2 and 3 from Zif268 to Zif268) and placed the other histidine five residues the leucine zipper from GCN4 (Zif23–GCN4). further along in the sequence. A four-residue spacing st8703.qxd 07/03/2000 09:38 Page 742

742 Structure 2000, Vol 8 No 7

Figure 3

Dissociation constants for the binding of (a) Increasing [Zif23ÐGCN4] Zif23–GCN4 to oligonucleotides containing ProteinÐ various half-site spacings. (a) Binding of DNA Complex Zif23–GCN4 to the –2 site as a function of protein concentration. Each lane represents a 1.5-fold change in protein concentration. The mobility of the Zif23–GCN4 shift relative to Free Zif268, which contains three zinc fingers is DNA consistent with formation of a dimeric complex (data not shown). (b) Binding constants and representative Scatchard analysis of (b) Zif23–GCN4 for various binding sites. The Dissociation electrophoretic mobility shift assay data

Ð2 Binding site constant (M2) Relative affinity indicate that the binding of Zif23–GCN4 to M 1.5 the various sites is cooperative. The Ð2 Ð19 ± Ð18 3.50 x 10 1.10 1 sequences of the binding sites are given at 1.0 Ð17 x 10 Ð1 1.26 x 10 ± 1.08 36 the bottom of the panel, and the half-sites 2

θ recognized by the zinc fingers in each Zif23 [P] 0 Ð18 ± ′ ′ 0.5 9.18 x 10 8.81 26 monomer (5 -GCGTGG-3 ) are underlined. Note that the –1 site only has pseudo-dyad Half Ð16 1.63 x 10 ± 1.54 466 symmetry given that the second half-site has 0.2 0.4 0.6 0.8 ′ ′ θ an imperfect repeat (5 -CCGTGG-3 ). -CCCACGCGTGGG- -CCCACGCGCGTGGG- Ð2 site -GGGTGCGCACCC- 0 site -GGGTGCGCGCACCC- -CCCACGGCGTGGG- -TTATAGCGTGGG- Ð1 site -GGGTGCCGCACCC- Half site -AATATCGCACCC- Structure

–1 and –2 sites respectively.) As a control, we used an Optimization of Zif23–GCN4 by phage display oligonucleotide containing only one half-site. Scatchard Phage display was used to select Zif23–GCN4 variants, analysis of the data reveals a second order dependence on with changes in the region where the two proteins were protein concentration for complex formation, indicating fused, that have higher affinity and improved specificity. that Zif23–GCN4 binds cooperatively as a dimer For these selection experiments, Zif23–GCN4 was fused (Figure 3b). Zif23–GCN4 displayed the highest affinity to the gene III protein of M13 filamentous phage [33–35]. for the –2 site, and preferred this site by more than an Several libraries were constructed. Each randomized the order of magnitude over the 0 and –1 sites. This observed four positions between the two histidines, and these preference may reflect the fact that two complete half- libraries contained either two, three or four randomized sites exist within the –2 sequence, whereas one half-site is positions between the final histidine and the first leucine incomplete in the –1 sequence. The leucine zipper dimer- involved in the GCN4 coiled-coil interaction (Figure 4). A ization motif must also exert some control over the pre- reduced codon set was used to generate the libraries such ferred spacing because the 0 site (which also contains two that only 24 codons were needed to encode 19 possible complete half-sites) is not bound as tightly as the –2 site. amino acids (cysteine was excluded). This reduced codon set allowed us to search these libraries more efficiently by Preferred recognition of the –2 site is intriguing because decreasing disparities in representation of certain amino this creates a situation in which the recognition helices acids. Each of our three libraries contained approximately overlap near the center of the site. This would allow 2 × 109 members. This allowed us to thoroughly search recognition of both strands of the DNA over the central the two smaller libraries, although we recognize that only dinucleotide step. Modeling of Zif23–GCN4 bound to a fraction of the possible sequence space encoded by the DNA with a two-base-pair overlap suggests that the major largest library was examined. groove can accommodate both of the C-terminal fingers but they are in close proximity (data not shown). In the Phage from these three libraries were pooled, and selec- absence of the leucine zipper, the Zif23 fingers bind anti- tions were performed separately on the –2, –1, or the 0 cooperatively to the –2 site. The affinity of the second set sites. Sheared calf-thymus DNA (1 mg/ml) was included of fingers is reduced about 40-fold (data not shown) indi- in the binding reactions to act as a non-specific competi- cating that there is a modest penalty (~2.2 kcal/mol) to tor. To encourage the selection of clones with improved binding of the second monomer. specificity for the –1 and 0 sites, we added a tenfold st8703.qxd 07/03/2000 09:38 Page 743

Research Article Design and selection of zinc finger dimers Wolfe, Ramm and Pabo 743

Figure 4 Figure 5

Clone # Unique sequences from 0 site selection Zif23ÐGCN4~PFACDICGRKFARSDERKRHRKLQHMKQLEDKVEELL~ 1 H HDKVH ÐVKRL Zif268Linker GCN4 2 H HDRVH ÐLKKL Theoretical Constructed 4 H HDKVH ÐLRKL library size library size 5 H HDRVH ÐIKRL 7 9 Library (4+2) ¥ ¥ H X X X X H X X L ¥ ¥ 1.9 x 10 ~2 x 10 6 H KRKLH YVA FL 9 9 Library (4+3) ¥ ¥ H X X X X H X X X L ¥ ¥ 4.5 x 10 ~2 x 10 7 H HDKVH ÐLKRL 11 9 Library (4+4) ¥ ¥ H X X X X H X X X X L ¥ ¥ 1.1 x 10 ~2 x 10 11 H HDKVH ÐIRRL X = All amino acids except cysteine 16 H KRTMH YVQYL = DNA recognition = Coiled-coil interaction = Zinc coordination 17 H RRK LH YVRFL Structure 18 H HDKVH ÐLRRL 19 H HDRVH ÐIRKL Library design for the selection by phage display of Zif23–GCN4 fusion 20 H TRRVH ÐV I RL proteins with improved affinity and specificity. Three different libraries Set A H HD+VH Ð Φ ++L were constructed that contained four randomized codons (X) between Set B H + R K L,M H YV xF,YL the two histidines involved in zinc coordination (purple) and then either two, three or four randomized codons between the final histidine and Clone # Unique sequences from Ð1 site selection the first leucine in the coiled-coil heptad repeat (red amino acids). The 1 H RFNVH ÐIKKL randomized sections encode all amino acids except cysteine. 2 H RFNVH ÐFKKL 3 H RDNVH ÐFKRL 10 H RFNVH ÐVRGL 12 H EGNVH ÐRKKL excess of the unbiotinylated –2 site as a competitor. The 15 H RYNVH ÐFKML progress of the selections was monitored after each round 17 H RFNVH ÐQKQL by measuring the retention efficiency of the phage pool. 18 H RYNVH ÐQKFL 19 H RFNVH ÐVKKL At several stages, the remaining diversity in each phage 20 H RFNVH ÐFRKL pool was determined by sequencing a set of clones, and Consensus H RFNVH Ð xKKL the binding specificity of each pool was tested with several different sites. At various stages in this process, the Clone # Unique sequences from Ð2 site selection 2 H RD I QH ILPIL stringency of the selection was increased by: increasing 4 H RD I QH MLPI L the NaCl concentration in the binding reactions and 6 H RD I QH LIPRL washes; increasing the temperature at which the selections 8 H RD I QH LIPTL were performed from room temperature to 37°C; decreas- 9 H RD I QH ILPLL ing the concentration of biotinylated binding site present; 17 H RD I QH IIPYL A H RD I QH YIPHL changing the phage/DNA ratios to force the phage to B H KN I QH DLPAL compete for limiting DNA [36]. By adjusting these condi- Ð2 Consensus H RD I QH I,L I,L P x L tions, we were able to obtain a consensus for the selections Ð1 Consensus H RFNVH Ð xKKL on the –1 and 0 sites. At this point the phage selected 0 Set A H HD+VH Ð Φ ++L under similar conditions on the –2 site only had a weak 0 Set B H + R K L,M H YV xF,YL GCN4 ARKLQRÐM KQL consensus. Consequently, four additional rounds of off- Zif268 H TK I ÐH LRQKD rate selection were performed to obtain the most stable Structure protein–DNA complexes. This was sufficient to yield a Sequences of the clones selected by phage display on the 0, –1 and consensus for the phage selected against the –2 site. –2 sites. This figure shows all unique sequences recovered from the selection against the 0 site (top), –1 site (middle), and –2 site Sequences of clones from the final phage pools and their (bottom). The invariant amino acids are colored according to their corresponding consensus sequences are shown in structural role (purple, zinc coordination; red, coiled-coil interaction). The qualitative consensus sequence for each selection is shown at the Figure 5. It is interesting that all of the selected proteins bottom of each panel. + indicates that lysine and arginine are about contain either three or four residues between the invari- equally preferred; Φ indicates that isoleucine, leucine and valine are ant histidine and leucine. Clones containing two residues about equally preferred; x indicates that no clear preference was in this region were observed at intermediate stages of the observed at that position. Two different sets of sequences were selections, but apparently these complexes were unable recovered in the selections against the 0 site: set A (black letters) with three amino acids between the final histidine and the first leucine, and to tolerate the increased stringency of selection condi- set B (green letters) with four amino acids. The consensus sequences tions during later rounds. Another common feature in from all of the selections, as well as the sequences of GCN4 and these clones is the presence of a hydrophobic amino acid Zif268 in the region of the overlap based on the original design, are three residues prior to the invariant leucine. In the GCN4 listed together in the bottom panel. The consensus sequences display a number of similarities to each other and to the sequences of the leucine zipper, this position represents the first amino parent proteins (see text). The clone(s) chosen from each pool for the acid involved in coiled-coil packing (position ‘a’ in the assessment of their affinity and specificity are boxed. heptad repeat). The hydrophobic residue obtained at this st8703.qxd 07/03/2000 09:38 Page 744

744 Structure 2000, Vol 8 No 7

Figure 6

Affinity and specificity of individual clones (a) Dissociation selected on the –1 or –2 sites. Dissociation 2 8 Binding site constant (M ) Relative affinity constants and representative Scatchard Ð2 Ð18 ± analysis for the binding of (a) Zif23–GCN4(–1) M Ð2 4.04 x 10 1.09 18 6 clone number 20 or (b) Zif23–GCN4(–2) Ð18 Ð1 2.27 x 10Ð19 ± 2.28 1 clone number 2 to various DNA sequences.

x 10 4 Binding reactions for both proteins show 2 Ð18 θ 0 ± 23 second order dependence on protein [P] 5.17 x 10 1.02 2 concentration. The reported dissociation half 1.97 x 10Ð14 ± 0.55 8.3 x 104 constants (and standard deviations) represent 0.10.2 0.3 0.4 0.5 0.6 0.7 the average of three or more experiments. θ Relative affinity represents the dissociation constant of the protein for that site divided by Dissociation (b) the dissociation constant of the protein for the Binding site constant (M2) Relative affinity site it was selected to recognize. The G•C Ð2 1.41 x 10Ð21 ± 0.56 1 swap and double G•C swap sites contain the 2.5 –2 site sequence with a single or double Ð2 Ð1 2.18 x 10Ð19 ± 0.28 154 M 2.0 mutation that inverts the central G•C base

Ð18 pairs. (c) Comparison of the specific and non- 0 Ð18 ± 1184 1.5 1.67 x 10 0.69 specific dissociation constants for x 10 2 1.0 half Ð14 ± 0.41 7 Zif23–GCN4(–2), Zif23–GCN4 and Zif268. θ 1.82 x 10 1.3 x 10 [P] The specificity of these proteins can be 0.5 G¥C swap1.81 x 10Ð19 ± 0.23 128 compared by inspecting the ratio of these two NS S constants (Kd /Kd ). The specificity ratio Double 0.2 0.3 0.4 0.5 0.6 0.7 0.8 5.60 x 10Ð18 ± 1.73 3971 determined for Zif268 is similar to the ratio that θ G¥C swap was previously reported [25]. (c) Specificity Specificity ratio comparison to NS S NS S Protein Kd Non-specific (Kd ) Kd specific (Kd ) Kd / Kd Zif23ÐGCN4(Ð2)

Zif23ÐGCN4(Ð2) 7.00 x 10 Ð15 ± 4.2 M2 1.41 x 10 Ð21 M2 5.0 x 106 1

Zif23ÐGCN4 4.64 x 10 Ð14 ± 1.6 M2 3.50 x 10 Ð19 M2 1.3 x 105 0.026

Zif268 1.56 x 10 Ð5 ± 0.9 M 1.05 x 10 Ð10 ± 0.39 M 1.5 x 105 0.030 Structure

position presumably preserves this interaction. In general, shown), and both were only marginally specific for the 0 we also find that the clones from the 0 and –1 site selec- site, so they were not studied further. tions are positively charged in the randomized region (net charge of +2 or +3), whereas sequences obtained in this Binding studies on Zif23–GCN4(–1) indicate that this region from the –2 selections are net neutral. It also is protein is specific for the –1 site with a dissociation con- interesting that there are some similarities between the stant of 2.27 × 10–19 M2 (Figure 6a). This protein displays consensus sequences of this region and the parent GCN4 a 20-fold preference for the –1 site over the 0 or –2 sites, sequence. This is especially striking for the –2 sequence, and also shows a 105-fold preference for the –1 site over which displays similarities to the GCN4 sequence at four DNA containing only one half-site. This degree of speci- of seven positions (either an identical amino acid or a con- ficity is encouraging, especially when we consider that a servative substitution). symmetric protein is recognizing a pseudo-symmetric site.

In vitro and in vivo studies of selected clones Binding studies using Zif23–GCN4(–2) were complicated Representative proteins were expressed and purified, and by the extreme stability of the corresponding protein–DNA DNA-binding studies were performed to determine their complex. Under our binding conditions, the half-life of the specificity, affinity and the degree of cooperativity in their complex between Zif23–GCN4(–2) and the –2 site is about binding. We used one clone from the –2 pool, one clone 400 hours. This rendered equilibrium measurements of from the –1 pool, and one clone from each consensus this binding constant difficult. Consequently the binding sequence obtained for the 0 pool (Figure 5). The constant for the –2 site was determined both by using a gel- Zif23–GCN4(–2) and Zif23–GCN4(–1) clones bind coop- shift assay and by measuring the on- and off-rates for eratively to their respective target sites (Figure 6). The this interaction. Under our conditions, binding of clones obtained from the selection on the 0 site formed Zif23–GCN4(–2) to the –2 site appeared to plateau after higher order aggregates upon binding DNA (data not 16 hours, and this time appeared fully sufficient to reach st8703.qxd 07/03/2000 09:38 Page 745

Research Article Design and selection of zinc finger dimers Wolfe, Ramm and Pabo 745

Figure 7

Zif23–GCN4(–2) fused to the VP16 activation domain activates a promoter containing the –2 (a)30 (b) site in vivo. The normalized luciferase activity Activation of various Activation by various reporters by constructs of a for each combination of expression and Zif23–GCN4(–2) reporter containing reporter vectors is indicated as a vertical bar. 25 the –2 site Each value represents the average of three to five experiments with the standard error of the mean indicated as a vertical line near the top of each bar. (a) Normalized luciferase activity 20 observed from various reporters when transfected into 293 cells with the Zif23–GCN4(–2)–VP16 expression vector. 15 The identity of the binding sites in each reporter is indicated below each column. The pGL2 control refers to the parent reporter 10 vector without any binding sites inserted.

(b) Normalized luciferase activity observed Normalized luciferase activity from the –2 site reporter when transfected into 293 cells with the various expression vectors. 5 The identity of each expression vector is indicated below each column. The Rc–CMV2 control refers to the parent expression vector 0 without a gene inserted downstream of the –2 half ZFHD1–2 TATA pGL2 Zif23– ZFHD1 Zif268 Rc–CMV2 CMV promoter. site site site site control GCN4(–2) control Structure

equilibrium at the other sites (data not shown). Using a 16 determined using competition experiments that measure hour incubation, a dissociation constant of 1.4 × 10–21 M2 the effect of excess nonspecific DNA (sheared calf was measured for the –2 site (Figure 6b). A somewhat thymus DNA) on formation of the specific complex smaller value (2.4 × 10–24 M2) was calculated for the dissoci- [14,25,37]. The ratio of the nonspecific and specific dis- ation constant after measuring the on- and off-rates for this sociation constants provides one measure of the speci- interaction. For subsequent comparison with the dissocia- ficity of each protein (Figure 6c), and this may be tion constants of other complexes (see below) we will use relevant in evaluating their potential utility for gene the more conservative of the two values. therapy. We find that the Zif23–GCN4(–2) dimer is sig- nificantly more specific than the original Zif23–GCN4 Zif23–GCN4(–2) shows a strong preference for the –2 site construct (38-fold) and wild-type Zif268 (33-fold). over the other DNA sequences. Thus the dissociation constant for binding this site is 154-fold smaller than for To determine whether the Zif23–GCN4(–2) construct binding the –1 site, 1200-fold smaller than for the 0 site, could function effectively in vivo, we added a VP16 activa- and 107-fold smaller than for an oligonucleotide containing tion domain and tested this new construct in transient trans- only one half-site. The Zif23–GCN4(–2) dimer also has fection assays. These studies were performed in 293 cells impressive specificity for the base pairs in the region with various binding sites cloned into the promoter region where the fingers overlap at the center of the site. Inver- of a luciferase reporter. The Zif23–GCN4(–2)–VP16 con- sion of one of the central G•C base pairs (G•C swap) struct provided a 20-fold increase in luciferase activity when results in a 128-fold increase in the dissociation constant, the reporter contained three repeats of the –2 site; no acti- and inversion of both G•C base pairs (double G•C swap) vation was observed for reporters containing other binding results in a 4000-fold increase. These binding constants sites (Figure 7a). Furthermore, we found that activation of for the optimized protein represent a significant improve- the –2 reporter was specific to the Zif23–GCN4(–2)–VP16 ment in affinity over the prototype (Zif23–GCN4) and a construct, as this reporter was not activated by expression marked increase in specificity, especially against DNA vectors containing other DNA-binding domains fused to a containing only one half-site. VP16 activation domain (Figure 7b).

Nonspecific dissociation constants were determined for Modularity of the Zif23–GCN4(–2) system the original Zif23–GCN4 construct, for the optimized In an attempt to test and extend the versatility of this zinc Zif23–GCN4(–2) protein, and for wild-type Zif268 so finger/leucine zipper system, we substituted the zinc that we could directly compare the specificities of these fingers and the leucine zippers to create variants that three proteins. Nonspecific dissociation constants were would recognize asymmetric sites. Our two new constructs st8703.qxd 07/03/2000 09:38 Page 746

746 Structure 2000, Vol 8 No 7

Figure 8 (a) Finger 2 Finger 1 Zif23ÐcJun: 65321Ð1 65321Ð1 RKEDSR TTHDSR The modularity of the Zif23–GCN4(–2) dimerization system is demonstrated by substituting domains to create a functional heterodimer. (a) Predicted protein–DNA interactions for the 5′ TTATAGCGTGGG3′ TATA23–cFos and Zif23–cJun constructs when binding the –2 3′ AATATCGCACCC5′ Zif–TATA site as a dimer. Fingers 1 and 2 of Zif23–cJun correspond to fingers 2 and 3 of Zif268; fingers 1 and 2 of TATA23–cFos

QQASNA TLHTTR correspond to fingers 2 and 3 of the TATA zinc finger protein Ð112356 Ð112356 :TATA23ÐcFos previously selected by phage display [14]. Predicted interactions Finger 1 Finger 2 between the fingers and DNA that would be mediated by hydrogen Increasing [protein] bonds are indicated by arrows; hydrophobic interactions are indicated TATA23ÐcFos- by filled circles [13]. (b–d) Comparison of the affinity of the (b) Zif23ÐcJun protein–DNA complexes of the TATA23–cFos and Zif23–cJun DNA complex proteins with the –2 Zif–TATA site. These binding reactions contain: (b) both TATA23–cFos and Zif23–cJun; (c) only Zif23–cJun; or (d) Free only TATA23–cFos. Aligned wells from the three different gels have DNA identical concentrations of zinc finger-leucine zipper proteins, and each Zif23ÐcJun well differs in protein concentration by a twofold dilution. (If only one DNA protein is present in the reaction its concentration is equivalent to two (c) Complex times the concentration of either of the proteins in the heterodimer shifts.) (e) Luciferase activity measured from reporters containing the –2 TATA site, the –2 Zif–TATA site, or the –2 Zif site when co- Free DNA transfected with both the TATA23–cFos–VP16 and Zif23–cJun–VP16 expression vectors. Vertical bars indicate the average luciferase activity from three to five experiments with the standard error of the mean (d) indicated as a vertical line above each bar.

Free DNA asymmetric binding site (–2 Zif–TATA; Figure 8a). By (e) itself, TATA23–cFos did not bind to this site, and 120 Zif23–cJun bound weakly, but a mixture of the two pro- teins formed a stable complex with an apparent dissocia- 100 tion constant of 4.3 nM (Figure 8b). Binding of these 80 proteins to the DNA was cooperative, and the mobility of the shifted complex was consistent with the formation of a 60 dimer. This complex is not as stable as Zif23–GCN4(–2) at its optimal site, and this may reflect the fact that the 40 TATA fingers do not bind as tightly as the Zif268 fingers 20 [14]. The ability of these proteins to form dimers in vitro

Normalized luciferase activity Normalized luciferase reflected the known properties of the leucine zippers used 0 Ð2 TATA Ð2 Zif/TATA Ð2 Zif in these constructs. Thus we found that Zif23–cJun could site site site form homodimers on the –2 Zif site with an apparent dis-

TATA23ÐcFos + Zif23ÐcJun Structure sociation constant of 0.5 nM. However, cFos (unlike cJun) does not homodimerize efficiently, and we found that comprised a Zif268–cJun fusion and TATA–cFos fusion TATA23–cFos did not form complexes on a dyad repeat that were designed to form heterodimers. The cFos and of its finger recognition sequences that contained a two- cJun leucine zippers were chosen because they have a base-pair overlap (–2 TATA site). strong (100-fold) preference for the formation of hetero- dimers over homodimers [38,39]. The Zif23–cJun fusion To determine whether TATA23–cFos and Zif23–cJun contained fingers 2 and 3 of Zif268, a junction sequence could function in vivo, we constructed fusions of these from the Zif23–GCN4(–2) selection, and the cJun leucine proteins to the VP16 activation domain and tested these zipper. The TATA23–cFos fusion was analogous but con- constructs in transient transfection assays (Figure 8c). tained the leucine zipper of cFos and fingers 2 and 3 of a Three different luciferase reporters were used in these zinc finger protein that had been selected [14] to recog- experiments with three repeats of either the –2 Zif site, nize a TATA binding sequence. These proteins recognize the –2 TATA site, or the –2 Zif–TATA site. Co-transfect- compatible sequences (5′-GC-3′) within the two-base-pair ing the TATA23–cFos–VP16 and Zif23–cJun–VP16 con- overlap at the center of the site (Figure 8a). structs activated a luciferase reporter containing the –2 Zif–TATA site. Heterodimer formation seems to be pre- The DNA affinity and specificity of these constructs were ferred: these constructs failed to activate reporters con- tested for an oligonucleotide containing the appropriate taining either the –2 Zif or the –2 TATA sites. However, st8703.qxd 07/03/2000 09:38 Page 747

Research Article Design and selection of zinc finger dimers Wolfe, Ramm and Pabo 747

cJun can facilitate homodimerization in the absence of the that the recognition helices overlap near the center of the cFos: transfection of the Zif23–cJun–VP16 without binding site, it is possible that phage display can be used TATA23–cFos–VP16 results in activation only of the to select variants with even higher levels of discrimination reporter containing the –2 Zif sites (data not shown). (perhaps up to 103- or 104-fold if we can obtain good speci- ficity on each strand). This might allow discrimination Discussion between two alleles that differ by only a single base pair When used together, structure-based design and phage mutation or single nucleotide polymorphism [41,42] and display provide an extremely powerful strategy for gener- proteins with this level of specificity could provide a ating novel proteins. As we have shown, computer-aided mechanism (through transcriptional repression) for specifi- design can readily provide an overall framework for the cally disabling dominant negative alleles. creation of new proteins, and phage display can then be used to optimize critical residues in the region where the Biological implications

proteins are joined (or at a new active site). This strategy Cys2His2 zinc finger proteins have the potential to func- takes advantage of the respective strengths of both design tion as the DNA-binding domains of novel transcription and selection, and we have tested this hybrid approach by factors designed for the treatment of human disease via using it to create a versatile and powerful new dimeriza- gene therapy. Using a combination of structure-based tion system for zinc fingers. design and phage display, we have created a dimerization system that allows the efficient assembly of more than The specificity and affinity of our new zinc three fingers simultaneously on the DNA. The majority finger/leucine zipper hybrids suggests that they may of work in the development of novel DNA-binding provide a useful alternative to covalently linked fingers domains based on zinc fingers has focused on fingers that as such proteins are tested for potential applications in are covalently linked by a Krüppel-type TGEKP linker. gene therapy. Binding parameters with our dimeric con- Although these linkers are ideal for the connection of up structs (which contain four zinc fingers) are comparable to three fingers, attaching additional fingers using a to those obtained with covalently linked constructs con- TGEKP linker provides relatively little improvement in taining six fingers. Zif23–GCN4(–2) forms protein–DNA affinity. This limit on the use of canonical linkers restricts complexes that have similar stability (off-rate) to two sets the effectiveness of zinc fingers in their recognition of of three-finger proteins connected by a flexible linker larger binding sites (> 10 base pairs). When considered in (268//NRE) [25] while employing two fewer fingers. Fur- the context of specifying a single site in the human thermore, Zif23–GCN4(–2) is 33-fold more specific than genome, where each 10-base-pair site may occur thou- Zif268 when employing one additional finger, and sands of times, this level of specificity is probably subopti- Zif23–GCN4(–2) has a specificity ratio similar to mal. Our modular zinc finger dimerization system allows 268//NRE. (268//NRE is 73-fold more specific than the assembly of two sets of fingers on a desired target site. Zif268 but requires three additional fingers.) Moreover, These dimeric zinc fingers bind with high specificity and given the other potential advantages of dimeric DNA affinity, and can function in vivo. A dimeric construct recognition over covalent linkage (sharper transition that contains just four zinc fingers forms a complex that

between on and off states, faster rate of equilibration is as stable (based on a comparison of koff) and almost as [27], and the ability to straddle both DNA strands) our specific as the best covalently linked six-finger-protein modular dimerization system compares very favorably reported to date [25]. Given the other potential advan- with strategies using covalent linkage of more than three tages of dimeric DNA recognition when compared with a zinc fingers. covalently linked system of a similar size, our modular zinc finger dimerization system provides a promising The overlap of the two recognition helices near the center method for the recognition of larger binding sites (> 10 of the –2 site also raises interesting questions about the base pairs). limits of specificity that can be achieved with zinc finger proteins. Modeling of this complex suggests that one of Materials and methods the helices will be near each of the DNA strands (i.e., each Computer modeling zinc finger will track along its primary strand), and it is Computer modeling was carried out using the Insight II package (v 2.3.5 possible that phage display will allow selection of dimers Biosym Technologies) and the Protein Data Bank (PDB) files for with exquisite specificity in this region. Excellent speci- Zif268–DNA complex (1AAY) [3] and GCN4–DNA complexes (1YSA–AP1 site [30]; 1DGC–ATF/CREB site [31]). Alignments ficity is already observed with our Zif23–GCN4(–2) con- between the complexes were obtained by minimizing the root mean struct: inverting a G•C base pair near the center of the site squared difference between the base pairs and phosphates when the causes a 128-fold reduction in affinity. This already com- structures were overlaid in various registers. Complexes were then visu- pares favorably with the 118-fold change in affinity that is ally inspected to qualitatively assess the connection between the GCN4 observed when a corresponding change is made in the and Zif268 structures. This strategy was employed for both DNA-bound structures of GCN4 (i.e. for GCN4 complexes involving the pseudo- binding site for the wild-type Zif268 protein [40]. Given st8703.qxd 07/03/2000 09:38 Page 748

748 Structure 2000, Vol 8 No 7

dyad [30] and perfect dyad sites [31]). These alignments were used to wash buffer (1 × PSB without Ac-BSA), and then eluted for 1 h in a

design the initial Zif23–GCN4 prototype. buffer containing 15 mM Hepes, pH 7.8, 4 M NaCl, 5 mM MgCl2, 10 mM ZnCl2, 25 mM EDTA, 5% glycerol (v/v), 0.1 mg/ml Ac-BSA. Gene construction, expression and purification of Zif23–GCN4 Eluted phage were used to infect XL1-blue cells for library amplifica- Based on computer modeling of the Zif23–GCN4 complex, we fused tion. The stringency of the selection in subsequent rounds of the selec- fingers 2 and 3 of Zif268 (residues 364–412 [43]) to the leucine tion was increased as described in the text. The diversity remaining in zipper of GCN4 (residues 250–281 [44,45]). The junction between the phage pools was ascertained by sequencing individual clones [13]. the final Zif268 residue and the first amino acid involved in GCN4 A total of ten rounds of selection were performed against the 0 site and coiled-coil formation was bridged by a five-residue segment (RKLQH), against the –1 site to obtain a consensus. Four additional rounds of off- four of these residues from GCN4, followed by a histidine, which is rate selection were required to achieve a consensus within the phage required for the final coordination site within the finger (Figure 2). This pools selected on the –2 site. These off-rate selections were per- sequence was cloned in frame into pGEX–2T (Pharmacia) between formed by equilibrating the binding reaction for four hours at 37°C, the Bam HI and Eco RI sites, and the resulting vector was transformed then adding 2 µM –2 site without biotin and equilibrating for another into BL21DE3 pLysS cells (Novagen). GST-fusion protein was 10 h before mixing with Dynabeads. The subsequent washes and expressed, purified and GST was removed by digestion as previously elution were carried out as described for the previous rounds. described [13]. The digested protein was further purified by C18 Sep- pack (Waters) and reverse-phase high performance liquid chromatog- Expression and purification of selected Zif23–GCN4 proteins raphy (HPLC) as described below. Representative fusion protein sequences (indicated by the boxes in Figure 5) were cloned into pET–21D (Novagen), and these plasmids Selection of improved Zif23–GCN4 dimers via phage display were transformed into BL21DE3 pLysS cells (Novagen). Upon induction The construct for phage display was arranged so that Zif23–GCN4 these proteins were expressed in inclusion bodies, and this greatly sim- was attached as an N-terminal fusion to gene III of M13 filamentous plified their purification. Cells were lysed using the freeze-thaw method, phage. This construct also contained an N-terminal pelB export centrifuged, and the cellular debris were resuspended and heated to sequence and an amber codon between Zif23–GCN4 and gene III 75°C for 30 min in reduction buffer (6.2 M guanidine hydrochloride, [14,35]. Bbs I cloning sites were introduced into the vector so that the 20 mM Tris pH 8.0, 150 mM DTT). After centrifugation to remove any junction between the final zinc finger and the leucine zipper could insoluble debris, trifluoroacetic acid (TFA) was added (0.1% v/v), and easily be replaced with a synthetic oligonucleotide containing random- the sample was applied to a pre-equilibrated C18 Sep-pack cartridge. ized codons. Three different libraries were constructed. These con- The column was washed three times with water containing 0.22% TFA, tained four randomized codons between the two histidines and also and then the protein was eluted with an acetonitrile step gradient con- contained two, three or four randomized codons between the final histi- taining 0.1% TFA. The fractions containing protein (as determined by dine and the first leucine involved in coiled-coil interactions (Figure 4). SDS–PAGE) were lyophilized. These fractions were then further purified Randomized codons were synthesized using a two column mixing by HPLC reverse-phase chromatography using a C4 reverse-phase scheme based on a split and pool randomization approach [46]. Out of column (Vydac) on a Hewlett Packard 1100. Purified protein was the possible 20 amino acids, 19 were encoded within each random- lyophilized, resuspended in water, aliquoted, and stored at –80°C. ized codon (cysteine was excluded). On one column a subset of codons was obtained using A, C, G and T at position one, A, C and T Construction, expression and purification of the TATA23–cFos at position two, and T at position three. A second column used the and Zif23–cJun proteins coding scheme A, C, G and T; A, G and T; G for each codon. After a The TATA23–cFos and Zif23–cJun proteins were constructed as randomized codon was completed, DNA synthesis was paused, the described in the text. Both constructs used the linker sequence from resin from the two columns was mixed and redistributed between two clone number 2 of the Zif23–GCN4(–2) selections (RDIQHILPI) to join columns, and synthesis of the next randomized codon was resumed. the zinc fingers to the leucine zipper. The TATA23–cFos construct con- Complementary oligonucleotides were annealed to the constant ends tained residues 165–199 of cFos [47]. The Zif23–cJun construct con- of these randomized oligonucleotides, the duplexes were ligated into tained residues 288–323 of cJun [48]. These proteins were expressed the parent plasmid, and then transformed by electroporation into XL1- and purified as described for the proteins selected by phage display. blue cells (Stratagene). We also prepared a second set of three libraries which were fully analogous but that contained a truncation of Dissociation constant and on/off rate determination the final six amino acids of the GCN4 leucine zipper to reduce its Dissociation constants were determined as previously reported [14] dimerization constant. The first set of libraries was used for selections with the following exceptions: first, mobility shift assays were per- against the 0, –1 and –2 sites, whereas the second set of libraries formed in 0.5 × TBE; second, dissociation constants were calculated (studied in parallel) was used only for selections on the –2 site. Phage using the equation: display was carried out as reported previously [13,14,35], with some θ θ exceptions: selections were preformed outside of the anaerobic = 1 − 2 (1) chamber and without any special precautions to maintain anaerobic [P] Kd Kd conditions [13]. The double-stranded biotinylated binding sites used for the selections were of the following sequence (B = Biotin, Glen Research): 0 site 5′-CTCATACCCACGCGCGTGGGAAATCCATCB- in which θ equals the fraction of DNA bound and P equals the free 3′; –1 site 5′-CTC-ATACCCACGCCGTGGGAATTCCATCB-3′; –2 protein concentration; third, the Zif23–GCN4 dissociation constant site 5′-CTCATACCCACGCGTGGGAATTCCATCB-3′. During the first binding reactions contained 2 µg/ml Ac-BSA; fourth, 32P-labeled round, phage from each library were selected independently against oligonucleotides had the following sequences: site 0 — 5′-TTTTCAT- the desired binding site before combining them for subsequent rounds ACGCCCACGCGCGTGGGCGATACAAAA-3′; site –1 — 5′-TTTTC- of selection. These binding reactions were carried out with 3.3 × 1012 ATACGCCCACGGCGTGGGCGATACAAAA-3′; site –2 — 5′- TTTT- phage from each library and 50 nM of the appropriate binding site at CATACGCCCACGCGTGGGCGATACAAAA-3′; G•C swap — 5′-TTT- room temperature for 75 min (total reaction volume = 1 ml) in 1 × PSB TCATACGCCCACCCGTGGGCGATACAAAA-3′; and double G•C buffer (25 mM potassium phosphate, pH 7.8, 60 mM potassium gluta- swap — 5′-TTTTCATACGCCCACCGGTGGGCGATACAAAA-3′; half- µ mate, 60 mM potassium acetate, 2 mM MgCl2, 20 M ZnSO4, site 5′-TTTTGGTTTTATAGCGTGGGCGTACGAAAA-3′. and were 100 µg/ml Ac-BSA, 5% (v/v) glycerol, 0.5% (v/v) Triton X-100, labeled using Klenow exo to fill-in the overhangs (New England 1 mg/ml sheared calf thymus DNA). Samples were then applied to Biolabs). On and off-rates were calculated as previously described streptavidin coated Dynabeads (M–280, Dynal) for 30 min. Super- [25] except that the on-rate constant was calculated as a function of natant was removed, the beads were washed four times with 750 µl the protein concentration squared. st8703.qxd 07/03/2000 09:38 Page 749

Research Article Design and selection of zinc finger dimers Wolfe, Ramm and Pabo 749

Non-specific dissociation constants were determined as previously sequence discrimination by zinc-finger DNA-binding domains. Nature described [14], except that the reactions were allowed to equilibrate 349, 175-178. for 36 h, and data for calculation of the non-specific dissociation con- 6. Desjarlais, J.R. & Berg, J.M. (1992). Toward rules relating zinc finger stants for Zif23–GCN4 and Zif23–GCN4(–2) was fit to the equation: protein sequences and DNA binding site preferences. Proc. Natl Acad. Sci. USA 89, 7345-7349. 7. Desjarlais, J.R. & Berg, J.M. (1992). Redesigning the DNA-binding specificity of a zinc finger protein: a data base-guided approach.  θ   θ  0.5  θ  0.5 S 0.5   = NS  0  − NS   Proteins 12, 101-104. 2Nt *(Kd ) *   Kd −θ Kd   (2)  1−θ   1 0   1− θ  8. Desjarlais, J.R. & Berg, J.M. (1994). Length-encoded multiplex binding site determination: application to zinc finger proteins. Proc. Natl Acad. Sci. USA 91, 11099-11103. θ 9. Rebar, E.J. & Pabo, C.O. (1994). Zinc finger phage: affinity selection In this equation 0 is the fraction of DNA bound in the absence of non- of fingers with new DNA-binding specificities. Science 263, 671-673. specific competitor (sheared calf thymus DNA, Gibco-BRL), Nt is the concentration of non-specific competitor, and K S and K NS are the 10. Choo, Y. & Klug, A. (1994). Toward a code for the interactions of zinc d d fingers with DNA: selection of randomized fingers displayed on phage. specific and non-specific dissociation constants, respectively. Proc. Natl Acad. Sci. USA 91, 11163-11167. 11. Jamieson, A.C., Kim, S.H. & Wells, J.A. (1994). In vitro selection of In vivo experiments zinc fingers with altered DNA-binding specificity. Biochemistry In vivo experiments were performed on 293 cells by transient transfec- 33, 5689-5695. tion using the Transfast (Promega) cationic lipid as a carrier. Transfec- 12. Wu, H., Yang, W.P. & Barbas, C.F. III. (1995). Building zinc fingers by tions were performed following the suggested protocol in the Transfast selection: toward a therapeutic application. Proc. Natl Acad. Sci. USA manual. Briefly, transfections were done in six well plates (Falcon), with 92, 344-348. each well receiving 0.5 µg of expression vector (pRc/CMV2, Invitro- 13. Wolfe, S.A., Greisman, H.A., Ramm, E.I. & Pabo, C.O. (1999). Analysis of zinc fingers optimized via phage display: evaluating the gen), 0.5 µg of reporter vector (pGL2, Promega), and 12 ng of pCMVβ utility of a recognition code. J. Mol. Biol. 285, 1917-1934. (Clonetech). Expression constructs encoded the hemagglutinin epitope 14. Greisman, H.A. & Pabo, C.O. (1997). Sequential optimization strategy tag MYPYDVPDYA; the nuclear localization signal from SV40 large yields high-affinity zinc finger proteins for diverse DNA target sites. T-antigen PPPKKKRKV [49]; the DNA-binding domain and dimerization Science 275, 657-661. domains of interest; and the activation domain corresponding to 15. Isalan, M., Klug, A. & Choo, Y. (1998). Comprehensive DNA residues 399–479 of VP16 [50]. As controls, corresponding con- recognition through concerted interactions from adjacent zinc fingers. structs were prepared containing the three fingers of Zif268 (residues Biochemistry 37, 12026-12033. 333–420 [43]) or ZFHD1, the zinc finger/homeodomain fusion protein 16. Segal, D.J., Dreier, B., Beerli, R.R. & Barbas, C.F., III (1999). Toward [29]. Reporters used to assay for activation contained three copies of controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5′-GNN-3′ DNA target sequences. Kpn Xho the corresponding binding sites inserted between the I and I Proc. Natl Acad. Sci. USA 96, 2758-2763. restriction sites. Sequences of these sites were as follows: –2 Zif site 17. Foster, M.P., Wuttke, D.S., Radhakrishnan, I., Case, D.A., Gottesfeld, 5′-CCCACGCGTGGGTAACGCCCACGCGTGGGCGATTCCCAC- J.M. & Wright, P.E. (1997). Domain packing and dynamics in the DNA GCGTGGG-3′; half site (also referred to as the –2 Zif/TATA site) 5′- complex of the N-terminal zinc fingers of TFIIIA. Nat. Struct. Biol. TTATAGCGTGGGTAACGTTATAGCGTGGGCGATTTTATAGCGTG 4, 605-608. GG-3′; ZFHD1 site 5′-CGCCCATAATTACAGGTCCGCCCATAAT- 18. Wuttke, D.S., Foster, M.P., Case, D.A., Gottesfeld, J.M. & Wright, P.E. TAGAGTCCGCCCATAATTA-3′; –2 TATA site 5′-TTATAGCTAT- (1997). Solution structure of the first three zinc fingers of TFIIIA bound AATAACGTTATAGCTATAACGATTTTATAGCTATAA-3′. Transfected to the cognate DNA sequence: determinants of affinity and sequence 273 cells were allowed to grow for two days before assaying for β-galactosi- specificity. J. Mol. Biol. , 183-206. 19. Choo, Y. & Klug, A. (1993). A role in DNA binding for the linker dase and luciferase activity using the Dual-Light system (Tropix). Assays sequences of the first three zinc fingers of TFIIIA. Nucleic Acids Res. were measured on a Optocomp I luminometer (MGM Instruments, Inc.). 21, 3341-3346. To normalize for well-to-well differences in transfection efficiency, the 20. Cook, W.J., et al., & Denis, C.L. (1994). Mutations in the zinc-finger observed luciferase activity was divided by the observed β-galactosi- region of the yeast regulatory protein ADR1 affect both DNA binding dase activity. The reported values for the Zif23–GCN4(–2) homodimer and transcriptional activation. J. Biol. Chem. 269, 9374-9379. represent the average of five experiments done in duplicate; the 21. Shi, Y. (1995). Molecular Mechanisms of Zinc Finger Protein–Nucleic TATA23–cFos and Zif23–cJun results represent the average of three Acid Interactions. PhD thesis, Johns Hopkins University, Baltimore, MD. experiments done in duplicate. Experiments performed on different days 22. Rebar, E.J. (1997). Selection Studies of Zinc Finger–DNA Recognition. PhD thesis, Massachusetts Institute of Technology, were normalized by setting the normalized luciferase activity observed Cambridge, MA. for the Zif23–GCN4(–2) expression vector on the half site to 1. 23. Liu, Q., Segal, D.J., Ghiara, J.B. & Barbas, C.F., III. (1997). Design of polydactyl zinc-finger proteins for unique addressing within complex Acknowledgements genomes. Proc. Natl Acad. Sci. USA 94, 5525-5530. We would like to thank B Wang, JK Joung, J Miller, J Pomerantz, R Scarr, M 24. Nekludova, L. & Pabo, C.O. (1994). Distinctive DNA conformation Dobles, W Karazi, and M Chytil for their helpful advice and discussions. with enlarged major groove is found in Zn-finger–DNA and other SAW. was supported by the Howard Hughes Medical Institute and as a protein–DNA complexes. Proc. Natl Acad. Sci. USA 91, 6948-6952. Special Fellow of the Leukemia and Lymphoma Society; COP and EIR were 25. Kim, J.S. & Pabo, C.O. (1998). Getting a handhold on DNA: design of supported by the Howard Hughes Medical Institute. poly-zinc finger proteins with femtomolar dissociation constants. Proc. Natl. Acad. Sci. USA 95, 2812-2817. References 26. Pomerantz, J.L., Wolfe, S.A. & Pabo, C.O. (1998). Structure-based 1. Wolfe, S.A., Nekludova, L. & Pabo, C.O. (2000). DNA recognition by design of a dimeric zinc finger protein. Biochemistry 37, 965-970. 27. Ptashne, M. (1992). A Genetic Switch. 2nd Edition Blackwell Cys2His2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183-211. Scientific Publications & Cell Press, Cambridge, MA. 2. Pabo, C.O. & Nekludova, L. (2000). Geometric analysis and 28. Klemm, J.D., Schreiber, S.L. & Crabtree, G.R. (1998). Dimerization as comparison of protein–DNA interfaces: why is there no simple code a regulatory mechanism in signal transduction. Annu. Rev. Immunol. for recognition? J. Mol. Biol., in press. 16, 569-592. 3. Elrod-Erickson, M., Rould, M.A., Nekludova, L. & Pabo, C.O. (1996). 29. Pomerantz, J.L., Sharp, P.A. & Pabo, C.O. (1995). Structure-based Zif268 protein–DNA complex refined at 1.6 Å: a model system for design of transcription factors. Science 267, 93-96. understanding zinc finger–DNA interactions. Structure 4, 1171-1180. 30. Ellenberger, T.E., Brandl, C.J., Struhl, K. & Harrison, S.C. (1992). The 4. Pavletich, N.P. & Pabo, C.O. (1991). Zinc finger–DNA recognition: GCN4 basic region leucine zipper binds DNA as a dimer of α crystal structure of a Zif268–DNA complex at 2.1 Å. Science uninterrupted helices: crystal structure of the protein–DNA complex. 252, 809-817. Cell 71, 1223-1237. 5. Nardelli, J., Gibson, T.J., Vesque, C. & Charnay, P. (1991). Base 31. Konig, P. & Richmond, T. (1993). The X-ray structure of the GCN4- st8703.qxd 07/03/2000 09:38 Page 750

750 Structure 2000, Vol 8 No 7

bZip bound to ATF-CREB site DNA shows the complex depends on DNA flexibility. J. Mol. Biol. 233, 139-154. 32. Fairall, L., Schwabe, J.W.R., Chapman, L., Finch, J.T. & Rhodes, D. (1993). The crystal structure of a two zinc-finger peptide reveals an extension to the rules for zinc finger/DNA recognition. Nature 366, 483-487. 33. Smith, G.P. (1985). Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 228, 1315-1317. 34. Bass, S., Greene, R., & Wells, J.A. (1990). Hormone phage: an enrichment method for variant proteins with altered binding properties. Proteins 8, 309-314. 35. Rebar, E.J., Greisman, H.A. & Pabo, C.O. (1996). Phage display methods for selecting zinc finger proteins with novel DNA-binding specificities. Meth. Enzymol. 267, 129-149. 36. Wang, B. & Pabo, C. (1999). Dimerization of zinc fingers mediated by peptides evolved in vitro from random sequences. Proc. Natl Acad. Sci. 96, 9568-9573. 37. Lin, S.Y. & Riggs, A.D. (1972). Lac repressor binding to non-operator DNA: detailed studies and a comparison of equilibrium and rate competition methods. J. Mol. Biol. 72, 671-690. 38. O'Shea, E., Rutkowski, R., Stafford W.F., III. & Kim, P. (1989). Preferential heterodimer formation by isolated leucine zippers from Fos and Jun. Science 245, 646-648. 39. O'Shea, E.K., Rutkowski, R. & Kim, P.S. (1992). Mechanism of specificity in the Fos–Jun oncoprotein heterodimer. Cell 68, 699-708. 40. Elrod-Erickson, M., & Pabo, C.O. (1999). Binding studies with mutants of Zif268. J. Biol. Chem. 274, 19281-19285. 41. Basilion, J.P. et al., & Housman, D.E. (1999). Selective killing of cancer cells based on loss of heterozygosity and normal variation in the human genome: a new paradigm for anticancer drug therapy. Mol. Pharmacol. 56, 359-369. 42. Housman, D. & Ledley, F.D. (1998). Why pharmacogenomics? Why now? Nat. Biotechnol. 16, 492-493. 43. Christy, B.A., Lau, L.F. & Nathans, D. (1988). A gene activated in mouse 3T3 cells by serum growth factors encodes a protein with ‘zinc finger’ sequences. Proc. Natl Acad. Sci. USA 85, 7857-7861. 44. Thireos, G., Penn, M.D. & Greer, H. (1984). 5′ Untranslated sequences are required for the translational control of a yeast regulatory gene. Proc. Natl Acad. Sci. USA 81, 5096-5100. 45. Hinnebusch, A.G. (1984). Evidence for translational regulation of the activator of general amino acid control in yeast. Proc. Natl Acad. Sci. USA 81, 6442-6446. 46. Furka, A., Sebestyen, F., Asgedom, M. & Dibo, G. (1991). General method for rapid synthesis of multicomponent peptide mixtures. Int. J. Pept. Protein. Res. 37, 487-493. 47. van Straaten, F., Muller, R., Curran, T., Van Beveren, C. & Verma, I.M. (1983). Complete nucleotide sequence of a human c-onc gene: deduced amino acid sequence of the human c-Fos protein. Proc. Natl Acad. Sci. USA 80, 3183-3187. 48. Bohmann, D., Bos, T.J., Admon, A., Nishimura, T., Vogt, P.K. & Tjian, R. (1987). Human proto-oncogene c-Jun encodes a DNA-binding protein with structural and functional properties of AP-1. Science 238, 1386-1392. 49. Kalderon, D., Roberts, B.L., Richardson, W.D. & Smith, A.E. (1984). A short amino acid sequence able to specify nuclear location. Cell 39, 499-509. 50. Pellett, P.E., McKnight, J.L., Jenkins, F.J. & Roizman, B. (1985). Nucleotide sequence and predicted amino acid sequence of a protein encoded in a small herpes simplex virus DNA fragment capable of trans-inducing α genes. Proc. Natl Acad. Sci. USA 82, 5870-5874.

Because Structure with Folding & Design operates a ‘Continuous Publication System’ for Research Papers, this paper has been published on the internet before being printed (accessed from http://biomednet.com/cbiology/str). For further information, see the explanation on the contents page.