MOLECULAR AND CELLULAR BIOLOGY, Apr. 1994, p. 2755-2766 Vol. 14, No. 4 0270-7306/94/$04.00+0 Copyright ©) 1994, American Society for Microbiology

The DNA-Binding Specificity of the Hepatocyte Nuclear Factor 3/forkhead Domain Is Influenced by Amino Acid Residues Adjacent to the Recognition Helix

DAVID G. OVERDIER, ANNA PORCELLA, AND ROBERT H. COSTA* Department of Biochemistry, College of Medicine, The University of Illinois at Chicago, Chicago, Illinois 60612-7334 Received 19 November 1993/Returned for modification 4 January 1994/Accepted 19 January 1994

Three distinct hepatocyte nuclear factor 3 (HNF-3) proteins (HNF-3ox, -3Ip, and -3-y) are known to regulate the transcription of liver-specific genes. The HNF-3 proteins bind to DNA as a monomer through a modified helix-turn-helix, known as the winged helix motif, which is also utilized by a number of developmental regulators, including the Drosophila homeotic forkhead (fkh) protein. We have previously described the isolation, from rodent tissue, of an extensive family of tissue-specific HNF-3/fkh homolog (HFH) genes sharing homology in their winged helix motifs. In this report, we have determined the preferred DNA-binding consensus sequence for the HNF-3P protein as well as for two divergent family members, HFH-1 and HFH-2. We show that these HNF-3/fkh proteins bind to distinct DNA sites and that the specificity of protein recognition is dependent on subtle nucleotide alterations in the site. The HNF-3, HFH-1, and HFH-2 consensus binding sequences were also used to search DNA regulatory regions to identify potential target genes. Furthermore, an analysis of the DNA-binding properties of a series of HFH-1/HNF-3p protein chimeras has allowed us to identify a 20-amino-acid region, located adjacent to the DNA recognition helix, which contributes to DNA-binding specificity. These sequences are not involved in base-specific contacts and include residues which diverge within the HNF-3/fkh family. Replacement of this 20-amino-acid region in HNF-3P with corresponding residues from HFH-1 enabled the HNF-3I recognition helix to bind only HFH-1-specific DNA-binding sites. We propose a model in which this 20-amino-acid flanking region influences the DNA-binding properties of the recognition helix.

Deciphering mechanisms which lead to transcriptional reg- 23, 41, 48). Each draws specificity from its ulation of a distinct array of genes in a particular cell type is DNA-binding domain and thus is able to recognize only critical for understanding cellular commitment during mam- promoters containing the cognate DNA-binding sites. The malian embryogenesis. Differential expression of protein-en- isolation of conserved family members by low-stringency hy- coding genes occurs at the point of transcriptional initiation bridization with DNA-binding-domain probes has facilitated and involves the assembly of several well-characterized basal the isolation of related transcription factors which have dem- factors with TATA-binding protein and RNA polymerase II at onstrated varied developmental stage- and tissue-specific ex- the initiation site of the promoter region (17). Promoter and pression patterns (14, 31, 38, 52). The manner in which regions are also composed of multiple DNA sites transcription factor families have evolved to share homology that interact with sequence-specific transcription factors which within their DNA-binding domains is clearly advantageous for are believed to enhance the recruitment of basal factors to the organisms. The generation of sequence divergence within initiation complex. Tissue-restricted gene expression thus re- functional DNA-binding structures confers heterogeneity in lies upon the recognition of multiple cis-acting DNA sequences DNA site recognition, which enables different target promoter nuclear some by cell-specific factors that potentiate or, in regulation without the evolution of an unrelated DNA-binding instances, repress transcriptional initiation (23, 28, 32). Be- motif (10, 38, 52). A general grouping of the DNA-binding cause transcription factors play a central role in regulating motifs in known transcription factor families would include cellular differentiation, the analysis of their molecular struc- those with helix-turn-helix (e.g., homeodomain), ture and expression patterns has been fruitful in elucidating (e.g., steroid and hormone regulatory pathways involved in thyroid receptor superfamily), establishing tissue-specific b-Zip (e.g., C/EBP, c-Jun, and c-Fos), helix-loop-helix (e.g., gene transcription. MyoD and myogenin), and RelA (e.g., NF-KB) (2, 10, 23, 38, The functional analysis of a number of transcription factors 41, 44, 62). has demonstrated that they are modular in structure, consist- ing of independently functioning protein domains (11, 18-20). The hepatocyte nuclear factor 3 (HNF-3)/forkhead (fkh) The transcription factor domains include those involved in family of tissue-specific and developmental gene regulators specific DNA recognition (DNA binding), in formation of uses a 100-amino-acid winged helix motif, which consists of a homodimeric or heterodimeric proteins (dimerization), and in modified helix-turn-helix, for monomeric recognition of spe- cific DNA target sites The first members stimulation of RNA polymerase II initiation (activation) (11, (5). family included three distinct hepatocyte transcription factors, HNF-3o, -3r, -3-y, two of which (HNF-3ox and -3p) are expressed in the lung * Corresponding author. Mailing address: Department of Biochem- and are essential participants in liver- and lung-specific gene istry (M/C 536), The University of Illinois at Chicago, College of transcription (3, 7, 22, 27, 29, 30, 36, 43, 50, 51, 56). Mouse Medicine, 1819 West Polk St., Chicago, IL 60612-7334. Phone: (312) embryonic expression studies with HNF-3cx and -31 clones 996-0474. Fax: (312) 413-0364. suggest that they also play a role in specifying neural axis 2755 2756 OVERDIER ET AL. MOL. CELL. BIOL. formation during gastrulation as well as facilitating endoderm GGC, which was located 34 amino acids N terminal of the differentiation (55). Following the isolation of the HNF-3 motif, and the antisense oligonucleotide 5'-CTCTCTAGA cDNA, conservation within the DNA-binding and activation GCC AGG CTG TAG GCT CCG, which was situated 22 domains of HNF-3 and the Drosophila homeotic fkh protein amino acids C terminal of the motif. The HNF-31-GST fusion was noted (30, 47, 60, 61). Subsequent to this discovery, several proteins contained amino acid sequence 144 to 279 or 95 to 279 other HNF-3/fkh family members which participate in deter- (used for DNA site selection) and were made by PCR ampli- mination events during embryogenesis have been identified. fications from previously described HNF-3r expression con- These homologs include the Drosophila sloppypaired (Slpl and structs (47). The chimeras HFH-1.WRNS and HNF-3.WQNS Slp2) and fkh domain (FD1 to FD5) genes (15, 16); Xenopus were created by introducing a unique EcoRI site via PCR- laevis fkh domain genes, XFKH1, Pintallavis, and cXFD1 to mediated mutagenesis into the central portion of the HNF-3f cXFD3 (8, 26, 53); zebrafish Axial (58), and Caenorhabditis and HFH-1 DNA-binding-domain coding region. The EcoRI elegans lin-31 (40). A number of mammalian HNF-3/fkh family site maintained the integrity of the amino acid sequence and members have also been isolated from a variety of different cell provided a means by which to fuse the two DNA-binding types and include the brain-specific BF-1 and its avian onco- domains. In the case of HFH-1.WRNS, a BamHI-EcoRI PCR gene homolog, qin (35, 59), those derived from hemopoietic product was made from HFH-1 template with a 5' flanking lineage, ILF, HTLF, H3, H8, and 5-3 (21, 33, 34), those derived primer and antisense mutagenesis primer 5'-ACGGAA1TC from mesoderm/mesenchyme origin, MFH-1 and MF-i to CGC CAG CCC GTG TAG CT, and an EcoRI-XbaI PCR MF-3 (42, 55), and mouse fkh-] to fkh-6 genes (24). product was made from HNF-31 template with the sense In a previous report, we described the isolation of nine mutagenesis primer 5'-GCAGAATTC CAT CCG TCA 1TC additional HNF-3/fkh homologs (HFH) from a diverse panel TCT C and a downstream flanking primer. Equimolar amounts of rat tissue cDNAs, using both low-stringency hybridization of the digested PCR products were ligated together, and the and PCR amplification (6). The HFH genes possess diverse ligation product was redigested with BamHI and XbaI, purified adult expression patterns and exhibited heterogeneity in their again, and then ligated into pGEX2T (54). The chimera winged helix DNA-binding domains (47 to 68% identity). In HNF-3.WQNS was constructed with the BamHI-EcoRI PCR this study, we determine the DNA recognition sequence for product made from the HNF-3p template with 5' flanking three divergent HNF-3/fkh family members by using bacterial primer and antisense mutagenesis primer 5'-GTGGAATTC fusion proteins containing the HFH-1, HFH-2, and HNF-3,B TGC CAG CGC TGC TGG T, and the EcoRI-XbaI PCR DNA-binding domains for site selection. We show that the product was made from the HFH-1 template with sense HNF-3/fkh proteins possess distinct DNA-binding properties mutagenesis primer 5'-GCGGAATTC CGT GCG CCA CAA which are distinguished by subtle nucleotide changes in their CCT C and the downstream flanking primer as for HFH- DNA recognition sites. Although the HNF-3/fkh family mem- 1.WRNS. HFH-1.AEIY and HNF-3.SEIN were constructed by bers described herein bind to different DNA sites, the residues using unique restriction sites in the HNF-3 domain (BglII) and involved in base-specific contacts in DNA recognition helix 3 in the HFH-1 domain (Sau3AI) as fusion points. The HFH- are conserved (5, 6). Analysis of a series of chimeric HFH-1/ 1.IN-WR chimera was made by ligating the Sau3AI-EcoRI HNF-31 DNA-binding-domain fusion proteins has enabled us fragment from the HFH-1.WRNS chimera into the BglII- to identify a 20-amino-acid region which is capable of altering EcoRI sites of an EcoRI-containing HNF-3,B DNA-binding- DNA sites recognized by helix 3. Interestingly, these amino domain GST expression vector. The DNA sequence of the acid residues were not involved in base-specific contacts and junctions and identity were verified by dideoxy sequencing included helix 2, the turn, and the N-terminal portion of using the T7 DNA polymerase Sequenase (United States recognition helix 3. We propose a model in which this region Biochemicals). may play a structural role in providing distinct DNA-binding Preparation of bacterial and nuclear extracts from tissues. specificities. GST fusion proteins were isolated and affinity purified essen- tially as described by Smith and Johnson (57). Escherichia coli MATERIALS AND METHODS DHSQ cells containing the GST expression vectors were grown in LB containing 100 p.g of ampicillin per ml and 0.4% glucose Construction of fusion proteins and chimeras. Glutathione to an optical density of 0.6 (A600), then isopropyl-3-D-thioga- S-transferase (GST) fusion proteins were synthesized by using lactopyranoside was added to 1 mM, and the mixture was the pGEX2T expression vector (Pharmacia) that was modified further incubated for 3 to 5 h. Cells were resuspended in TEN by ligating an XbaI linker into the SmaI site (54). The (50 mM Tris [pH 8.0], 1 mM EDTA, 100 mM NaCl) containing DNA-binding domains were generated by PCR amplification 1 mg of lysozyme per ml, 0.5% Nonidet P-40 (NP-40), 1 mM of the appropriate HNF-3/fkh template by using a sense primer dithiothreitol (DTT), and protease inhibitor cocktail (leupep- which introduced a BamHI site in frame with the GST-coding tin [1 ju.g/ml], pepstatin A [1 ,ug/ml], and phenylmethylsulfonyl region and an antisense primer creating an XbaI site at the 3' fluoride [0.5 mM]) then lysed by a 15-min incubation at room end. The PCR products were digested with BamHI and XbaI, temperature, and subjected to three freeze-thaw cycles. Cellu- gel purified, and then ligated into the corresponding sites of lar debris was removed by ultracentrifugation, and the GST plasmid pGEX2T. PCR product amplification from the HFH-1 fusion proteins were affinity purified by incubating the bacte- cDNA template (6) was accomplished with the sense primer rial extract with glutathione-agarose (1 ml of 50% slurry) for 10 5'-CTCGGATCC AGT GGC GGG GCG AGC ACC (43 min and then washing the extract three times with several amino acids N terminal of the DNA-binding domain) and the volumes of TEN plus 0.5% NP-40. The GST fusion proteins antisense primer 5'-CTCTCTAGA GGT CCC CGC AGG were eluted from the resin with 2.5 mg of glutathione per ml in TCC GGG (20 amino acids C terminal of the DNA-binding a buffer consisting of 50 mM Tris (pH 8.0), 20% glycerol, 1 mM domain), using the manufacturer's conditions for Vent DNA DTT, and the protease inhibitors. polymerase (New England BioLabs). The HFH-2 GST fusion Nuclear extracts were prepared from rat liver, kidney, and was made in a similar manner, using the PCR product gener- lung tissue essentially as described previously, using a final ated from the HFH-2 genomic DNA template (6) with the concentration of 0.45 M KCl to extract proteins from isolated sense primer 5'-CTCGGATCC GAC GGT TGC AAG GGC nuclei (7, 29). VOL. 14, 1994 DNA-BINDING SPECIFICITY OF THE HNF-3/fkh DOMAIN 2757

SAAB protocol and gel shift assays. Affinity-purified HFH- entries containing DNA regulatory regions. This new data base 1-, HFH-2-, and HNF-3,B-GST fusion proteins were used to was searched with the HFH-1, HFH-2, and HNF-3 consensus isolate binding sites from a partially degenerate oligonucleo- sequences, using the motif program available on the University tide, using a modification of the sequential selection and of Illinois at Chicago VAX mainframe computer. amplification of binding sites (SAAB) protocol (1). To remove free glutathione from affinity-purified GST fusion proteins, we RESULTS dialyzed them against 1 x phosphate-buffered saline, 10% glycerol, 1 mM DTT, and protease inhibitor cocktail. The The HFH-1 and HFH-2 proteins do not bind to HNF-3 DNA degenerate oligonucleotide used in the site selection was recognition sites. The HNF-3ot, HNF-31, and HNF-3,y pro- 5'-GTGAAGCTTGCACGT(N)12CGAGCGTCTAGACTC teins share over 90% amino acid identity in their DNA-binding (underlined are HindlIl and Xba sites) and the amplification domains and possess similar DNA recognition properties (30, primers 1 (5'-GAGTCTAGACGCTCG) and 2 (5'-GTGAA 46). Because the HFH-1 and HFH-2 DNA-binding domains GCTTGCACGT). The oligonucleotide was made double show considerable sequence divergence from the HNF-3 motif stranded by hybridization with primer 1, which was elongated (50 and 64%, respectively), we wanted to determine whether with Klenow fragment of DNA polymerase I, and then the the differences confer altered DNA-binding specificity (Fig. 1). double-stranded DNA product was purified on a 9% acryl- We constructed GST bacterial fusion proteins with the DNA- amide gel (54). The binding reaction between 0.25 ,ug of the binding domain from either HFH-1, HFH-2, or HNF-3f double-stranded degenerate oligonucleotide and 5 ,ug of dia- protein and purified them to homogeneity via glutathione- lyzed fusion protein was carried out in 100 ,l of 1 x gel shift agarose affinity chromatography. A strong-affinity HNF-3 site buffer (20 mM N-2-hydroxyethylpiperazine-N'-2-ethanesulfo- (HNF-3S) derived from the transthyretin (TTR) promoter (7) nic acid [HEPES; pH 7.9], 40 mM KCl, 2 mM MgCl2, 1 mM and a weak-affinity HNF-3 site (HNF-3W) obtained from the DTT, 5% Ficoll) containing 2 ,ug of poly(dI-dC) * poly(dI-dC) HNF-3P promoter (46) were tested for binding with these and 0.2 mg of bovine serum albumin (BSA) per ml. The fusion proteins by gel shift assays (12, 13). The HNF-31 protein-DNA complexes were bound to glutathione-agarose protein formed a significant protein-DNA complex with the (10-min incubation) that had been equilibrated with low-salt HNF-3S site, which was effectively competed for by itself but buffer (LSB; 50 mM Tris [pH 8.0], 10 mM EDTA, 0.2 mg of competed for to a lesser extent by the HNF-3W oligonucleo- BSA per ml, 0.1% NP-40) and then washed twice with LSB tide (Fig. 2; compare S and W). The HNF-3 protein bound to followed by an addition LSB wash lacking detergent. The the HNF-3W site with reduced affinity, as exhibited by dimin- specific DNA was eluted by four successive rounds of heating ished complex formation in the gel shift assay (Fig. 2). In at 100°C for 2 min in high-salt buffer (1 M NaCl, 50 mM Tris contrast, neither the HFH-1 protein nor the HFH-2 protein [pH 8.0], 10 mM EDTA). After ethanol precipitation, the bound to these sites with high affinity, suggesting that their DNA was resuspended in 30 [lI of Tris-EDTA buffer, of which DNA recognition properties differed from those of HNF-3P one-third was used in the PCR with primers 1 and 2. The PCR (Fig. 2, HFH-1 and HFH-2). was performed as recommended by the manufacturer for Vent Subtle nucleotide changes at the termini of the HFH-1 polymerase (New England BioLabs) except that 100 ,uCi of consensus binding site allow for distinct HFH-1 protein rec- [a*- 2P]dATP was included, and the PCR was performed with ognition. To define the recognition sites of the HFH-1 and 30 cycles of 2 min at 95°C, 3 min at 37°C, and 3 min at 72°C. HFH-2 proteins, we used a modification of the SAAB protocol The PCR product was purified by polyacrylamide gel electro- (1). The affinity-purified GST-HFH fusion proteins were used phoresis (54), and approximately one-third of this was used in to isolate high-affinity binding sites from a pool of partially the subsequent binding reaction. After four cycles of protein degenerate oligonucleotides by a process of repetitive protein selection, amplification, and purification, the DNA pool was selection and PCR amplification (see Materials and Methods). digested with HindlIl and XbaI, gel purified, and then cloned The nucleotides recognized by the HNF-3/fkh family members into the corresponding sites of pGEM-1 (Promega), and the were derived from the DNA sequence of the variable portion DNA sequence was determined from independent clones. of the SAAB sites from independent clones. Comparison of Gel shift assays (12, 13) were performed as described the SAAB sequences allowed us to deduce a DNA-binding previously (7, 46, 47) except that in several instances, the consensus sequence for the HFH-1 protein [A(A/T)TGTT SAAB DNA sites were radioactively labeled as described TA(G/T)(A/T)T], which contained a TGT'l'A core sequence above during PCR using primers specific to pGEM-1 vector, (Fig. 3A). To determine whether the HFH-1 selected sites and the labeled PCR products were purified from low-melting- were recognized by the other two HNF-3/fkh family members, point agarose gels (54). The binding reaction was performed in the HFH-1 recognition sequences were radioactively labeled 20 RI of 1 x gel shift buffer containing 20 ng of affinity-purified during PCR amplification and were used in gel shift assays (see HNF-3/fkh fusion proteins, 1 ng of labeled DNA, and 2 ,ug of Materials and Methods). This analysis revealed three types of poly(dI-dC) * poly(dI-dC). Competitions included a 100- or HFH-1-binding sites which exhibited differential recognition 200-fold molar excess of DNA in the binding reactions as specificity for the HNF-3,B and HFH-2 proteins (Fig. 3A and indicated. Fusion proteins were diluted in 1 x gel shift buffer 4A and data not shown). The top group of HFH-1 SAAB containing 20% glycerol and 100 ,ug of BSA per ml. Gel shift sequences bound all three proteins (Fig. 3A and 4A, HFH-1 assays with liver, lung, and kidney nuclear extracts were #3), the middle group of sequences did not bind the HNF-3 performed essentially as described previously (6, 46, 47). The protein (Fig. 3A and Table 1), while the bottom group of 20-,ul binding reaction was performed in 1 x gel shift buffer HFH-1 sites (Fig. 3A) were preferentially recognized by the containing 1 ng of labeled double-stranded oligonucleotide, 4 HFH-1 protein (Fig. 4A, HFH-1 #25). These results suggest ,ug of denatured sonicated salmon sperm DNA, 4 ,ug of that subtle nucleotide changes at the 3' and 5' ends of the poly(dI-dC) - poly(dI-dC), and approximately 4 ,ug of tissue consensus binding site sequence allow for discrimination be- nuclear extract. tween different HNF-3/fkh family members. Computer searches of GenBank with HNF-3/fkh consensus HFH-2 protein recognizes three consensus sites, a subset of binding sites. Promoter and enhancer sequences were ex- which binds either the HFH-1 or HNF-3 protein. Comparison tracted from GenBank to construct a smaller data base of 602 of the HFH-2 SAAB DNA sites revealed that the HFH-2 2758 OVERDIER ET AL. MOL. CELL. BIOL.

Helix 1 Helix 2 I Helix 3 HNF-3a HNF-3y (95%) HNF-30 (93%) HFH-2 (64%) ;_ IC >-aE E > _ HFH-1 (50%) NGIGR_I FIYI 11J_~~~~~~~~Region I Region 2 Regon 3

HNF-3a (Lu, U) HNF-3y (Li, To) HNF-3p (Lu, U) HFH-2 (Ubiq.) HFH-1 (Lu, Ki) Region 4 FIG. 1. Sequence alignment of the HNF-3, HFH-1, and HFH-2 winged helix DNA-binding domains. The amino acid sequence (one-letter code) of the HNF-3ce DNA-binding domain is compared with the HNF-31, HNF-3y, HFH-1, and HFH-2 sequences, which are presented in decreasing order of homology (6, 30). Hatched bars indicate positions of the divergent regions (1 to 4), the helices (1 to 3), and the wing structures (1 and 2) on the sequence (5). Conserved amino acid residues are highlighted by shaded region (white dots designate identity, and white letters indicate conservative substitutions), and the mRNA tissue expression pattern is indicated (6). Abbreviations: Lu, lung; Li, liver; Te, testis; Ki, kidney; Ubiq., ubiquitous expression. protein recognized several related consensus sequences (Fig. and consisted of the HFH-2 consensus 2 and HFH-2 #5 sites 3B). We obtained several classes of HFH-2 protein recognition (Fig. 3B). The bottom portion of the HFH-2 consensus 2 sites sites that were grouped according to their DNA binding contained central GTTG or AFTA motifs and was selectively properties to the other two family members (Fig. 3B). The recognized by HFH-2 (Fig. 3B and 4A, HFH-2 #5; also data HFH-2 consensus 1 [G(A/T)(A/T)TGTFFA(A/T)(G/T)C] se- not shown). The HFH-2 #12 sequence (GTTA motif) was quences were found to be related to the subset of high-affinity preferentially recognized by HFH-2 protein but also displayed HFH-1 sites containing the TGTTFA motif, the first five of reduced affinity for the HFH-1 protein (Fig. 4A, HFH-2 #12). which were also recognized by the HNF-31 protein (Fig. 3B). The HFH-2 consensus 3 sequence [Fig. 3B, NA(A/T)T(G/ The next class of sequences was specific for the HFH-2 protein A)TTTG(C/T)YT] represents the final class of binding sites which is recognized by both the HNF-3 and the HFH-2 proteins but not by the HFH-1 protein (Fig. 4A, HFH-2 #10). Protein HNF-311 HFH-1 HFH-2 It is interesting to note that the HFH-2 #61 (consensus 2) and HFH-2 #10 (consensus 3) sites differ slightly at their 3' S S s C8ompetitor - - - - nucleotide positions, and this difference is sufficient to allow HNF-3 protein recognition of the latter site (Table 1). The of and HNF-3 Probe Z I IZ I X binding specificities HFH-1, HFH-2, proteins z z z z z z are therefore partially overlapping, and subtle nucleotide I I _ _ _ I~e-__< variation provides discriminating protein recognition. Selective binding of HFH-1 and HFH-2 SAAB sites with nuclear extracts from tissues. We sought to determine _, _ whether the DNA-binding specificity exhibited by HNF-3/fkh bacterial fusion proteins would be comparable to that obtained with tissue nuclear extracts. We used the HNF-3S and HNF-3W sites (Fig. 2) as controls for gel shift assays with either lung, liver, or kidney nuclear protein extracts (Fig. 4B). As previously reported, the HNF-3S and HNF-3W sites pro- duced three closely migrating protein-DNA complexes with liver nuclear extracts (HNF-3ao,1,y), two complexes with lung nuclear extracts (HNF-3ot,,), and no visible protein-DNA complexes with kidney nuclear extracts (7, 29, 46). In a manner consistent with the fusion protein studies, the HFH-1 #25 site FIG. 2. HFH-1 and HFH-2 proteins do not bind to known HNF-3 did not form appreciable HNF-3 protein complexes with liver recognition sites. The strong-affinity HNF-3S site or the weak-affinity or lung nuclear extracts but did form other protein-DNA HNF-3W site was used for protein complex formation with affinity- complexes with lung extracts (Fig. 4B, HFH-1 #25). We also purified GST fusion proteins containing the HNF-3p, HFH-1, or used HNF-3-specific antibodies with gel shift assays to confirm HFH-2 DNA-binding domain, and protein-DNA complexes were sites that none of the HFH-1 #25 DNA-protein complexes in the resolved by gel shift assay (12, 13). The HNF-3S and the HNF-3W lung was due to HNF-3 recognition (data not shown). Further- are derived from the TTR promoter (-111 to - 88) and the HNF-3,B HFH-1 #25 were not promoter (-97 to - 82), respectively, and their sequences are shewn more, these protein-DNA complexes in Table 1 (7, 46). Competition lanes included a 100-fold molar excess competed for by high-affinity HFH-2-specific sites (HFH-2 #7 of either the HNF-3S (S) or the HNF-3W (W) oligonucleotide. and #5; data not shown). We presume that one or several of VOL. 14, 1994 DNA-BINDING SPECIFICITY OF THE HNF-3/fkh DOMAIN 2759

A HFH-1 Binding Sites B HFH-2 Binding Sites C HNF-3 Binding Sites name binding sequence rnumber name binding seqiuence number name binding sequence nninumber name binding sequence number 03: 45: 04: HNF-3S.7 09: (3) 54: 06: HNF-3W..TTR: 11: 58: HNF-3.alL-AT: 12: 41: (2) 28: HNF-3.AIFP-: 16: 69: 47: HNF-3W.H3B: 23: 55: (3) HNF-3.AILB: 68: 29: 01: HNF-3.A POB: 13: 14: (6) HNF-3.TA 79: 22: 39: (2) HNF-3.TA 02: 77: (2) HNF-3.PEEPCK: 24: (2) 11: HNF-3.AIL.DB: 52: HNF-3.C( CONS 1: 18: HNF-3.C( ST:S, 20: 1 2 3 4 5 6 7 8191011 12 80: HNF-3.H# 1I:10. 25:_ 78: HFH-1 #2 M 28-:< 09: 2I:AT. w~~~~~~~~5 (2) HFH-1 #3rs- 16. 12: CONS 3: HFH-1 #1EPCK E 12 CONS-. _ _ , NF: 13 456789101112 HFH-1 #1 (12) 12 3 4 5 6 7 8 9 1011 12 07: (6) 16: HFH-2 #4 5:~~~~~~~~~~5 61: 05: HFH-2 #: 11:~~ ~~~~~~(6 1 2 3 4 5 6 7 8 9 10 11 12 HFH-2 #2 83 HNF-3 #1 CONS 2- ta amms HNF-3 #4 1 2 3 4 s 6 7 8; 9 10 11 12 HNF-3 #2

CONS: 1 2 3 4 5 6 7 8 9 10 11 12 FIG. 3. Alignment of DNA-binding sites recognized by the HFH-1, HFH-2, and HNF-3 proteins. The DNA sequence from the variable region of the SAAB sites derived from independent clones (named by arbitrary numbers) was aligned to develop a consensus sequence (white letters indicate mismatches with consensus). The number of occurrences of a particular sequence is shown in parentheses. Abbreviations for nucleotides: N, any base; W, A or T; K, G or T; Y, C or T; R, G or A; V, A, C, or G. The binding sites are grouped according to binding specificity with the other family members. (A) DNA-binding sites selected by the HFH-1 protein and consensus (CONS) sequence. Top sequences are recognized by HFH-1, HNF-3p, and HFH-2; middle sequences are recognized by HFH-1 and HFH-2; and bottom sequences bind only HFH-1 protein. (B) DNA-binding sites selected by the HFH-2 protein and three consensus sequences. For HFH-2 consensus 1, top sequences are recognized by HFH-1, HFH-2, and HNF-3P proteins; bottom sequences bind only HFH-1 and HFH-2; and the HFH-2 #11 site diverges from consensus 1 and is selective for the HFH-2 protein. The HFH-2 consensus 2 bottom sequences and the HFH-2 #5 site are selective for the HFH-2 protein, while the HFH-2 #9 and HFH-2 #12 sites are also weak-affinity sites for HFH-1. HFH-2 consensus 3 sites are recognized by both HFH-2 and HNF-313 proteins. (C) The HNF-3-binding consensus sequence was derived from endogenous HNF-3-binding sites and SAAB sites selected by HFH-1, HFH-2, and HNF-31 which are recognized by HNF-3. HNF-3 sites are derived from the following promoters and enhancers: TTR, osl-antitrypsin (al-AT), and cx-fetoprotein (AFP) (7), HNF-3,B (H3B) (46), HNF-1 (27), albumin (ALB) (36), apolipoprotein B (APOB) (3), tyrosine aminotransferase (TAT) (43, 51), phosphoenolpyruvate carboxykinase (PEPCK) (22), aldolase B (ALDB) (50), and lung clara cell secretory protein CCIO (56). Note that the HNF-3 #1 (two consecutive G residues) and HNF-3 #2 sites possess weak affinity for the HNF-3 proteins. the HFH-1 #25 complexes may be the result of HFH-1 protein quence. Because the HNF-3,B protein bound to several previ- recognition, but the use of HFH-1 specific antisera will be ously uncharacterized recognition sites, we performed SAAB necessary to confirm this result. selection to develop an HNF-3 consensus sequence. HNF-3 Because the HFH-2 #12 site demonstrated high affinity for #1, #2, and #4 sites were isolated multiple times and were the HFH-2 protein, we examined its binding specificity with recognized only by the HNF-3, and HFH-2 proteins (Fig. 5A). tissue nuclear extracts (Fig. 4C). Consistent with the lack of The SAAB HNF-3 sites resemble a subset of the HFH-2 HNF-3f fusion protein recognition, the HFH-2 #12 oligonu- consensus 3 sequences, which are recognized by both HFH-2 cleotide was not bound by HNF-3 proteins in liver and lung and HNF-3, proteins (Fig. 3B and C). HNF-3-binding sites nuclear extracts but did form an abundant protein-DNA from endogenous promoters and those identified herein were complex with kidney nuclear extracts (Fig. 4C). Interestingly, a compared to determine the HNF-3 consensus sequence (A/C/ similar kidney protein-DNA complex was observed with the G)A(A/T)TRTT(G/T)RYTY (Fig. 3C). Reduced HNF-3- HFH-1 #3 probe, which binds all three HNF-3/fkh family binding affinity was observed for sequences illustrating diver- members and with probes which selectively bound the HFH-2 gence at either end of the consensus (e.g., HNF-3W.TTR and protein (Fig. 4C and data not shown). We suggest that this HNF-3W.H3B [7, 46]). kidney complex may be the result of HFH-2 protein binding The binding affinities of the HNF-3 and HFH-2 proteins for and that the putative HFH-2-binding activity is enriched in the HNF-3 SAAB oligonucleotides were determined by cross- kidney extracts. The use of HFH-2-specific antisera will be competition studies using the gel shift assay (Fig. 5B; 100-fold critical in verifying this prediction. Finally, in support of the molar excess). The HNF-3 #1 and HNF-3 #2 sites displayed fusion protein binding studies, the HFH-1 #3 probe is recog- lower binding affinity for the HNF-3,B protein because they nized by authentic HNF-3ot,, proteins in liver and lung were more effectively competed for by HNF-3 #4 and HNF-3S extracts as verified by HNF-3 antisera (Fig. 4C and data not oligonucleotides (Fig. SB). Reciprocal competition experi- shown). These results support our observation that the binding ments demonstrated that the HNF-3 #1 or HNF-3 #2 site sjecificity exhibited with HNF-3/fkh bacterial fusion proteins competed poorly for HNF-3P protein complex formation with reflects that obtained with tissue nuclear extracts. the HNF-3 #4 site (Fig. 5B, HNF-3 #4, lanes 1 and 2). In The HNF-3" protein recognizes a precise consensus se- contrast, competition with itself or with the HNF-3S oligonu- 2760 OVERDIER ET AL. MOL. CELL. BIOL.

A Probe HFH-1 #25 HFII-l #3 HFH1-2 #12 HFII-2 #5 HFH-2 #10

Competitor + + +. + + .. + + + + + + +

Protein C r N C) IN r N e r N C X LL- tL j L..L I _ i._ 3 zL rZrU.U. zZ It.ri ZU.sirU. U. r U.isZU.

._ * V-q_:4. a,I - S

I am

C Probe Probe HNF-3S HNF-3W3N F-1#51 5 LIFH-2 #12 HF H-1 #3 . Competitor 4 4 +.... Competitor. + + + +-+'.'++ 0) g+$+ 0) Tissue C Tissue a) c a |I)i Extract

Extract n 1 _ -j *._@:X:.I.n_.Wt Wt___X ~L -¼ 1-aJ~ - _ -. -J . ._. :3 , ... I ..n,

e-Uf -twit.I.*.A *. S f 4: --a

FIG. 4. The HFH-1 and HFH-2 SAAB sites display selective binding specificity with HNF-3/fkh fusion proteins and tissue nuclear extracts. (A) HFH-1, HFH-2, and HNF-3f DNA-binding domain GST fusion proteins exhibit differential binding with the SAAB sites. The indicated oligonucleotides or PCR-amplified site (HFH-2 #10) were radioactively labeled and used for gel shift assays with affinity-purified HFH-1, HFH-2, or HNF-3,B fusion protein (20 ng) as described in Materials and Methods. An adjacent homologous competition lane (200-fold molar excess) is included for comparison (lanes +). (B and C) Tissue nuclear extracts display differential binding activity with the HFH-1 and HFH-2 SAAB sites. The indicated SAAB oligonucleotides were end labeled and used for gel shift assays with either liver, lung, or kidney nuclear extracts. The HNF-3W panel was exposed twice as long. Competitions included a 200-fold molar excess of homologous oligonucleotide (+). cleotide abrogated complex formation with the HNF-3P fusion HNF-3 proteins, using equivalent amounts of these tissue- protein equivalently, suggesting that the HNF-3 #4 site pos- derived extracts (Fig. SC). The HNF-3 #4 site also formed sessed strong affinity for the HNF-3P protein (Fig. 5B, HNF-3 putative HFH-2 protein-DNA complex in kidney extracts, #4, lanes 4 and S). The HNF-3 #1 (AAATATTGGITT11 ) and which had been observed with other high-affinity HFH-2 sites HNF-3 #2 (AAACAlTTACTT) sites displayed reduced (compare Fig. 4C and SC). Finally, the HNF-3 #2 sequence HNF-3P protein-binding affinity because they contained nucle- did produce a slowly migrating complex with kidney extracts, otide changes which were not observed in other bona fide while no discernible complexes were observed with the HFH-2 HNF-3-binding sequences (Fig. 3C). Competition studies with #4 and HNF-3 #1 probes (Fig. SC). Therefore, SAAB sites the HFH-2 protein and the HNF-3 SAAB sites also demon- possessing low binding affinity may not bind HNF-3/fkh family strated higher HFH-2 protein affinity for the HNF-3 #4 members in tissue-derived protein extracts. sequence (Fig. SB). As anticipated (Fig. 2), we observed that Sequences adjacent to recognition helix are involved in the HNF-3S site poorly competed for HFH-2 protein recogni- DNA-binding specificity. The HNF-3-y protein-DNA structure tion of the HNF-3 SAAB sites (Fig. 5B). This result suggests has recently been determined by X-ray crystallography (5). that the HFH-2 and HNF-3,B proteins recognize these se- HNF-3 binds to DNA as a monomer through the winged helix quences via different nucleotides. motif, which consists of a helix-turn-helix structure at the N When we tested the HNF-3 SAAB sites with liver and lung terminus and C-terminal DNA-interactive protein loops nuclear extracts, only the high-affinity HNF-3 #4 and HFH-2 termed wings 1 and 2 (Wi and W2). The three N-terminal #4 (similar to HFH-2 #10; Fig. 4A) sites formed abundant helices and two C-terminal wings are not separable domains HNF-3 protein-DNA complexes (Fig. SC). In contrast, the because they associate with each other via hydrophobic inter- weak-affinity HNF-3 #1 and #2 sites were poorly bound by actions and through the ,B sheets comprising Wi (5) (Fig. 1, 6, VOL. 14, 1994 DNA-BINDING SPECIFICITY OF THE HNF-3/fkh DOMAIN 2761

TABLE 1. Summary of binding affinities of HFH-1, HFH-2, and suggests that DNA-binding specificity resides in the N-terminal HNF-3 proteins for SAAB DNA sites half of the DNA-binding domain and that the sequences there are capable of altering binding specificity of the recognition Site' DNA-binding Binding affinity" helix. The reciprocal HNF-3,B/HFH-1 chimera bound to the sequence" HNF-3 HFH-1 HFH-2 HNF-3 site, thereby causing the HFH-1 recognition helix to HNF-3 #4 CAATGTTTGTTT +++ - ++ switch binding specificity (Fig. 6, HNF-3.WQNS). The HNF- HNF-3W.H3B CCCTGTTTGTTT + - - 3.WQNS chimera displayed a small increase in complex for- HNF-3S.TTR GATTATTGACTT +++ - - mation with the HFH-2 #12 site compared with HNF-3,B HFH-1 #3 AATTGTTTATTT +++ +++ ++ protein, which could be due to the contribution from the HFH-1 #25 CGTTGTTTATGT ± +++ + HFH-1 W2 sequence (Fig. 6B). In contrast, chimeras which HFH-1 #2 GAATGTTTAGAT - +++ ++ exchanged the first 29 N-terminal amino acids of the DNA- HFH-2 #11 GTTTGTTTAGGA - - +++ binding domain did not alter the DNA-binding properties of HFH-2 #5 TATTTTATTATT - - +++ the corresponding helix 3, indicating that these N-terminal HFH-2 #12 TATTGTTATTTT - ++ +++ do not influence DNA site HFH-2 #7 GTTTGTTGTTTT - - +++ sequences recognition (Fig. 6, HFH-2 #61 AAATATTATTTC - - +++ HFH-1.AEIY and HNF-3.SEIN). HFH-2 #10 AAATATTAGCTT ++ - ++ The DNA-binding properties of HFH-1/HNF-3r chimeras suggested that a region consisting of 20 amino acid residues 'Identified by the protein used for selection and the corresponding colony or juxtaposed to the recognition helix affected DNA-binding clone number and grouped for comparison. "Important nucleotide changes which affect binding specificity are underlined. specificity (Fig. 6A). To confirm this result, we created a Equivalent amounts of the HNF-3/fkh fusion protein were used to evaluate DNA-binding domain chimera (HFH-1.IN-WR) that substi- the amount of complex formation by gel shift assay along with various competitor tuted this 20-amino-acid region into HNF-3r (residues YO to DNAs. + + +, strong; + +, moderate; +, weak; ±, barely detectable; -, WQ) with corresponding sequences from HFH-1 (residues NE undetectable. to WR). The HFH-1.IN-WR chimera possessed DNA-binding specificity which was identical to that of the HFH-1 protein, although its binding affinity was somewhat reduced compared and 7). Nonspecific DNA backbone contacts are made by the with that of the wild-type HFH-1 protein (Fig. 6; compare helices and the wings on both strands of the DNA. Two HNF-3S and HFH-2 #12 probes). The 20-amino-acid region base-specific contacts in the major groove are made on one from HFH-1 is therefore capable of causing the HNF-31 DNA strand by the recognition helix 3 (residues N and H), and recognition helix residues to bind HFH-1-specific DNA sites. a third base interaction is water mediated (Fig. 6A, residues F The HFH-1 sequences involved in altering binding specificity and N) (5). One additional base-specific minor groove contact included sequences which are not involved in base-specific is made by an arginine residue in W2 (LLRRR) (5). Although contacts on DNA (Fig. 6 and 7) and are the most variable we have shown that HFH-1, HFH-2, and HNF-3 proteins among the family (Fig. 1, regions 2 and 3). Substitutions of the demonstrate distinct binding site recognition, the residues three-amino-acid QRR residues at the HNF-3,B N-terminal involved in base-specific contacts in helix 3 and W2 structures helix 3 to the corresponding YTG residues from HFH-1 were are conserved among family members (Fig. 1 and 6) (5, 6). It not sufficient to alter DNA-binding specificity (data not thus appears that the mechanism of DNA-binding specificity shown). It is therefore likely that the binding specificity resides by the winged helix motif may involve additional residues not in helix 2, the turn, and four N-terminal amino acids of helix 3 involved in base-specific contacts. (Fig. 7, shaded region). It thus appears that this 20-amino-acid To identify those amino acid sequences responsible for region plays a structural role which influences the DNA- distinct DNA site recognition, we constructed a series of binding properties of the recognition helix and that the se- chimeric proteins between the HFH-1 and HNF-3f DNA- quence variation in regions 2 and 3 confers altered DNA binding domains. We have shown that the HFH-1 and HNF-31 recognition properties to each of the HNF-3/fkh family mem- proteins possess distinct DNA-binding properties (Fig. 2 and bers. 4). The HNF-31 protein binds to the HNF-3S site but not to the HFH-2 #12 site, whereas the HFH-1 protein displays DISCUSSION opposite binding properties to these sites. Both proteins recognize the HFH-1 #3 site (Fig. 4 and 6). We made an Subtle nucleotide changes provide distinct HFH-1-, HFH-2-, HFH-1/HNF-3,B protein chimera consisting of the first 49 and HNF-3-binding sites. We have used bacterial GST fusion N-terminal amino acids of the HFH-1 DNA-binding domain proteins containing the DNA-binding domain of the HFH-1, fused to the carboxyl terminus of the HNF-31 motif (Fig. 6, HFH-2, and HNF-3 proteins for DNA site selection from HFH-1.WRNS). The HFH-1.WRNS chimera retained the partially degenerate oligonucleotides, using sequential rounds HNF-3r recognition helix and the W2 amino acid residues of protein binding and PCR amplification. Alignment of the involved in base-specific contacts while replacing the amino degenerate portion of the protein-selected DNA sites allowed terminus with HFH-1 sequences. The reciprocal HNF-3r3/ us to derive a consensus binding sequence for the HFH-1, HFH-1 chimera, replacing the HFH-1 49 N-terminal residues HFH-2, and HNF-3 proteins. We demonstrated that each with the corresponding HNF-3, sequences, was also made protein will recognize distinct DNA sites that contain slight (Fig. 6, HNF-3.WQNS). Two additional HFH-1/HNF-3r chi- differences in the core sequence. Furthermore, subtle nucle- meras that swapped the first 29 N-terminal amino acids of the otide changes at either the 3' or 5' ends of binding sites DNA-binding domains were constructed (Fig. 6, HFH-1.AEIY containing similar core sequences also dictate differential and HNF-3.SEIN). HNF-3/fkh protein recognition (e.g., Table 1, HFH-1 #25 and Each protein chimera was compared with the wild-type HFH-1 #3). The Drosophila homeodomain proteins, for ex- protein for DNA-binding specificity, using the gel shift assay. ample, recognize an obligatory TAAT core motif with nucle- Interestingly, the HFH-1.WRNS chimera displayed HFH-1- otide differences at the 3' side distinguishing binding between binding properties because it bound to the HFH-2 #12 and various family domains (4, 9, 37). The HNF-3/fkh proteins, HFH-1 #3 sites but not to the HNF-3S site (Fig. 6). This result therefore, display a feature in common with other transcription 2762 OVERDIER ET AL. MOL. CELL. BIOL.

A B Probe HNF-3 #1 HNF-3 #2 HNF-3 #4 Pr(otein HNF-3- ,fi-2 Competitor + + + + + + Iroe HNF-3#1 INF-3#2 |INF33#4 I1NF-3 #1 NF-3 #2 ! iINF-3#4: _n_ __.cn C-ompetitor -1 2 4 S -1 2 1 2 4 S 1 2 4 S 2 41S2 4 SJ _ _ _ _A Protein I - - Z U. Z IL IL Z IL IL M z z _C lI.r I N,

I1 l C Probe I INF-3 #4 HFH-2 #4 HNF-3 #1 HNF-3 #2 FIG. 5. Weak-affinity SAAB HNF-3 sites do not form abundant Competitor + + + + + + + + + + + + HNF-3 complexes in liver and lung extracts. (A) HNF-3 SAAB sites bind HNF-3 and HFH-2 protein but not HFH-1 protein. The indicated 'Ij'istle 0) a)Y|)Ya).tm 0) SAAB HNF-3 sites were labeled during PCR amplification and used - -J | . J ~ - for gel shift assays with 20 ng of affinity-purified HFH-1-, HFH-2-, and fusion (lane + includes homologous competi- Eltrc z n n HNF-33-GST protein tor). (B) HNF-3 #4 sequences binds the HFH-2 and HNF-313 proteins ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~. > with higher affinity than the other SAAB sites. The indicated oligonu- . WE cleotide probes were used for gel shifts with HNF-3p- and HFH-2- GST fusion proteins. Competitor lanes include a 100-fold molar excess of HNF-3 #1 (lane 1), HNF-3 #2 (lane 2), HNF-3 #4 (lane 4), and

4' HNF-3S.TTR (lane S). (C) Strong-affinity sites display abundant -W .;. 0 complexes in tissue nuclear extracts. The indicated labeled oligonucle- :1. A& otide probes were used for gel shift with liver, lung, and kidney 9-M.'mr"ww- -. .1lfl extracts. Only the HNF-3 #4 and HFH-2 #4 sites form abundant HNF-3 complexes in liver and lung extracts. The strong-affinity HNF-3 #4 forms putative the HFH-2 complex with kidney nuclear extracts.

factor families in which a subset of their binding sites is sist of HFH-1 high-affinity sites containing the TGYTTTA motif, recognized by more than one family member. a subset of which does not bind the HNF-3 proteins. The Comparing endogenous HNF-3 sites and those derived from HFH-2 consensus 2 sites (TATTAT and TGTTGT motifs) are the SAAB protocol enabled us to define an HNF-3 consensus specific for the HFH-2 protein, whereas sequences containing sequence: VAWTRTTKRYTY (Fig. 3C). HNF-3-binding se- the TTGTTATT core also bind the HFH-1 protein weakly. quences which diverge at 5' and 3' nucleotide positions display The final HFH-2 consensus 3 sites (TRTTTGY motif) bind reduced HNF-3-binding affinity and may alter protein recog- both HFH-2 and HNF-3 proteins, but the HNF-3S oligonucle- nition properties (Table 1, HNF-3 #4 and HNF-3W.H3B). In otide does not effectively compete for HFH-2 binding (Fig. 5). addition, HNF-3-binding sites with nucleotide alterations not This result suggests that binding of the consensus 3 sites by observed in other authentic recognition sites form weak HFH-2 and HNF-3 proteins occurs through the recognition of HNF-3 protein-DNA complexes with lung and liver extracts different nucleotide sequences. In addition to the base-specific (Fig. 5, HNF-3 #1 and #2). We did observe that the HNF- contacts, nucleotides surrounding these residues or those 3S.TTR oligonucleotide used in the HNF-3,y cocrystal struc- involved in DNA backbone interaction may also influence ture (5) differed significantly from the derived HNF-3 consen- DNA recognition properties. The structural analysis of various sus; beginning at position 4, G residues were substituted HNF-3/fkh family members, in the form of protein-DNA (GGTTGACTTAGTC) in the first two positions to stabilize complexes with several different cognate sites, will allow the the duplex (Fig. 3C, HNF-3S.TTR; GATTATTGACTT). We identification of amino acid residues involved in DNA recog- therefore tested binding to this altered HNF-3 site and dem- nition. onstrated that it binds the HNF-3 proteins with reduced affinity The existence of binding sites recognized by multiple family (data not shown). Taken together, these data support the members may allow a promoter region to be acted upon by deduced HNF-3 consensus and demonstrate that nucleotides different tissue-restricted HNF-3/fkh proteins. For example, a at the both ends of the recognition site have a profound effect portion of the HFH-1 sites also bind the HNF-3 and HFH-2 on binding specificity. proteins with high affinity (Fig. 3A). Although many of the The HFH-2 protein illustrated the most flexible DNA- endogenous HNF-3 sites appear to be specific for HNF-3 binding properties, requiring the compilation of three separate recognition, a subset of the HNF-3 sites may bind other family consensus sequences. The HFH-2 consensus 1 sequences con- members (3, 7, 22, 27, 29, 30, 36, 43, 50, 51, 56). The HNF-3 VOL. 14, 1994 DNA-BINDING SPECIFICITY OF THE HNF-3/fkh DOMAIN 2763

A DNA Site Recognition DNA Binding Domain HFH-1#3 HNF-3S HFH-2#12

HFH-1 + 1 14 '* 278

HNF-3j -., ^ + 144I A A 271- ,rg

.. L,,.1,_-1 1- -I I I I I I %,, " % HFH-1 .WRNS , 1 , , 11411 279 WQNS + _ _ 2 ¢ R 2 f' / / / / I HNF-3.WQNS + 144 278 AEIY HFH-1 .AEIY 11 g279 SEIN HNF-3.SEINfl ~~~1440~ ~ + SEIN WRNS 278 HFH-1 .IN-WR

ange Binding Specicty 2 3 T-TIR-_1 7IIMT'AIM'rKT'W'%rT 1%'r'LT~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~__ nlufl"LLnXIjA- lND IJ rrrKtFI-JSUWYT KWmbNVRNQLNXLN 111:11 ::I 111:1 1 11:11:11:: HNF-3f3 LTLSEIYQWIMDLFPFYRQNQQRWQNSIRHSLSFN R P B B PBB <-Helix 2-> <--Helix 3---> Recognition Helix B Probe HFH-1 #3 HNF-3S HFH-2 #12 +,.4.,+, + + + + + + Competitor +, +, +, + +. . + I + . + . + . +,I+ . +

|) B B C L/ a] C') V) a: z z Z >- z 3: Protein

...... 1". .,.

w ow O 4. It.-, # so

FIG. 6. DNA-binding specificity resides in N-terminal sequences adjacent to the recognition helix. (A) Shown schematically are the GST-HFH-1 and GST-HNF-3f DNA-binding-domain fusion proteins, the indicated HFH-1/HNF-3P protein chimeras, and their DNA recognition properties to three different DNA sites as determined by gel shift assay (shown in panel B). Homologous DNA competition lane was included (lanes +); binding activity is indicated by +, whereas low or undetectable activity is designated by ± or -, respectively. The residue numbers (144 to 279) refer to amino acid sequence from HNF-3P (30); HFH-1 residue numbers (114 to 278) are relative to the HNF-31 DNA-binding domain. The first chimera consists of the first 49 N-terminal amino acids of the HFH-1 DNA-binding domain fused to the remaining C-terminal portion of the HNF-3,B motif (recognition helix from HNF-3,B; HFH-1.WRNS) and the reciprocal chimeric DNA-binding-domain construct (HNF-3.WQNS). The next chimera consists of the first 29 N-terminal amino acids of the HFH-1 DNA-binding protein fused to the remaining C-terminus portion of the HNF-3,B motif (HFH-1.AEIY) and the reciprocal chimeric construct (HNF-3.SEIN). The final chimera replaces the 20-amino-acid region in the HNF-31 DNA-binding domain (defined by the first four chimeras) with the corresponding sequences from the HFH-1 DNA-binding domain (HFH-1.IN-WR). The HFH-1 and HNF-31 sequences encompassing helix 2 and helix 3 (recognition helix) are compared (identical and conserved residues indicated by and :, respectively), and indicated on the sequence is the 20-amino-acid region which is responsible for altering DNA-binding specificity. Shown on the sequence is the location of helix 2 with residue involved in ribose (R) contact, divergent regions 2 and 3 (bar), and the recognition helix with residues involved in phosphate (P) and base-specific (B) contacts in the major groove (5). Nucleotide recognition by amino acids F and N are water mediated.

recognition site present in the HNF-1 promoter binds the BF-1 interacts with this promoter region via HNF-3-binding-site protein (27, 59) and resembles the class of HFH-1 sites recognition. The acquisition of particular binding sites in DNA containing the TGTTTlA motif that binds multiple family regulatory regions may allow for recognition of different members (Fig. 3C). The HNF-1 transcription factor is also HNF-3/fkh family members in appropriate tissues. The ability expressed in the kidney, and perhaps the HFH-2 protein of HFH proteins to distinguish between subtle nucleotide 2764 OVERDIER ET AL. MOL. CELL. BIOL.

proto-oncogene, and D2 dopamine receptor, acetylcholinest- erase, myosin light-chain, calmodulin III, and Hox 3 and Hox B2/Hox2.8 genes. Sequences adjacent to the recognition helix 3 influence DNA-binding specificity. The winged helix motif consists of a compact structure of N-terminal helices and C-terminal wing structures which are involved in presentation of the recogni- tion helix 3 to the major groove. Analysis of a series of DNA-binding-domain chimeras between HFH-1 and HNF-3 allowed us to identify a 20-amino-acid region located adjacent to helix 3 which altered DNA-binding specificity. Replacement of this 20-amino-acid region in HNF-3 with corresponding sequences from HFH-1 alters the HNF-3 recognition helix and W2 so that they bind to DNA sites which are specific for HFH-1. We believe that this is a general feature of the winged helix motif because we have observed that similar HFH-2/ HNF-31 chimeras, containing the HNF-31 recognition helix and the adjacent 20 amino acids from HFH-2, exhibited HFH-2 DNA-binding properties (data not shown). This region includes helix 2, the four N-terminal residues of helix 3, and the turn connecting the two helices (Fig. 7) and displays the greatest amino acid sequence variability between the different HNF-3/fkh family members (Fig. 1, regions 2 and 3). Further- more, we have replaced three amino acids at the N terminus of the HNF-3r recognition helix with corresponding sequences from HFH-1, yet these substitutions are not sufficient to alter binding specificity (data not shown). These results suggest that the 20-amino-acid region plays a structural role which influ- FIG. 7. Winged helix motif-DNA complex depicting region in- ences the presentation of the recognition helix in the major volved in DNA-binding specificity. Schematically shown is the struc- ture of the winged helix motif bound to DNA derived from X-ray groove. Our model proposes that the amino acid sequence cocrystal structure reported by Clark et al. (5). The amino (N) and variation in regions 2 and 3 of different winged helix proteins carboxyl (C) termini, ao helices (Hi, H2, H3), turn (T'), L sheets (SI, elicits diverse DNA-binding specificity by altering the base- S2, and S3), and wings 1 and 2 (Wi and W2) are indicated. Base- specific contacts made by the recognition helix. Altering the specific contacts are made by the recognition helix (helix 3) in the presentation of the recognition helix may influence the con- major groove and by W2 in the minor groove (5). The darkest area tacts made by identical residues or may allow alternative depicts the region of the winged helix motif involved in altering binding residues in the recognition helix to participate in binding. This specificity as determined by analysis of DNA-binding-domain chimeras model is consistent with the observation that each of the HFH (Fig. 6). This region encompasses helix 2, four N-terminal residues of proteins binds to distinct DNA target sites and that the helix 3, and the turn connecting the helices and does not include helix 3 residues involved in base-specific contacts (5). The third N-terminal residues in regions 2 and 3, but not the recognition helix, residue (arginine) of helix 3 makes a phosphate group contact on the diverge among the family members (Fig. 3 and Table 1). This DNA backbone (5) (see Fig. 6). constitutes a working model for DNA recognition by the winged helix proteins which is consistent with the chimera binding data presented herein. The HNF-3 structure is similar to the helix-turn-helix pro- changes will therefore enable the DNA target site to dictate tein histone H5, but the winged helix differs in that it contains tissue-specific interactions. an additional DNA-interactive protein loop. Histone H5 pro- Computer-assisted searches of the GenBank data base with tein associates with the linker region spanning nucleosome the HNF-3/fkh consensus binding sequences have provided a cores and does not possess sequence-specific recognition prop- number of putative human and rodent target genes. We found erties (49). In support of its structural resemblance to histone HNF-3 recognition sites in the enhancer and promoter regions H5, the HNF-3 protein is involved in organizing the nucleo- of numerous hepatocyte-enriched genes. These include human some pattern of the albumin enhancer region in hepatocytes transferrin, al-acid glycoprotein, a2-macroglobulin, choles- (39). The homeodomain motif also resembles the winged helix terol 7-ot-hydroxylase, insulin growth factor I, hepatitis B virus, structure but differs in that the N-terminal arm of the home- rat retinol-binding protein, amylase (liver and parotid specific), odomain makes several base-specific contacts in the minor ornithine transcarbamylase, carbamylphosphate synthetase I, groove (25, 45). Analysis of chimeras between these two mouse metallothionein I, and chicken apolipoprotein Al different homeodomain proteins suggests that DNA-binding genes. In addition, HNF-3-binding sites were found in the specificity is dictated by the N-terminal arm of the homeodo- regulatory regions of the androgen, estrogen, and glucocorti- main and the carboxyl portion of the recognition helix 3, coid receptors, interphotoreceptor retinoid-binding protein, including a small C-terminal extension (4, 9). Sequence diver- urokinase plasminogen activator, and cx2 type V and type I gence in the C-terminal extension and the N-terminal arm is collagen. Furthermore, HFH-2 consensus 1 and consensus 2 therefore presumed to provide different recognition properties binding sequences were found in the regulatory regions of a to the homeodomain motif. In contrast, the winged helix motif diverse panel of putative target genes, including the estrogen, utilizes sequences central to the helix-turn-helix structure to progesterone, and retinoic acid (3) receptor, Oct 3 (retinoic provide varied binding properties to a conserved recognition acid-repressible enhancer), lactoferrin, thrombospondin 2, to- helix. Although the homeodomain and winged helix motif poisomerase II, and histone H2A.Z genes, PRADI/cyclin DI possess helix-turn-helix structures, they use different structural VOL. 14, 1994 DNA-BINDING SPECIFICITY OF THE HNF-3/fkh DOMAIN 2765 features to generate diverse DNA recognition properties segmentation that show homology to mammalian transcription within their respective families. The use of the HNF-3 crystal factors. Genes Dev. 6:1030-1051. chimeric winged 16. Hacker, U., U. Grossniklaus, W. J. Gehring, and H. Jackie. 1992. structure to interpret our genetic analysis of Developmentally regulated Drosophila gene family encoding fork helix domains has provided a specialized structural basis for head domain. Proc. Natl. Acad. Sci. USA 89:8754-8758. diverse DNA-binding specificities. Future genetic and struc- 17. Hernandez, N. 1993. TBP, a universal eukaryotic transcription tural studies with HNF-3/fkh proteins will verify our model by factor? Genes Dev. 7:1291-1308. determining mechanisms of varied DNA recognition by this 18. Hollenberg, S. M., and R Evans. 1988. Multiple and cooperative transcription factor family. trans-activation domains of the human glucocorticoid receptor. Cell 55:899-906. ACKNOWLEDGMENTS 19. Hope, A., and K. Struhl. 1986. Functional dissection of a eukary- otic transcriptional activator protein, GCN4 of yeast. Cell 46:885- We thank Robert Storti, Pradip Raychaudhuri, and members of our 894. laboratory Derek Clevidence, Xiaobing Qian, Uzma Samadani, and 20. Hope, I. A., S. Mahadevan, and K. Struhl. 1988. Structural and Richard Peterson for critically reviewing the manuscript and for functional characterization of the short acidic transcriptional helpful discussions. We also thank the members of the Scientific activation region of yeast GCN4 protein. Nature (London) 333: Computer Workstation at the Research Resources Center (UIC) for 635-640. aiding in the GenBank searches of HNF-3/fkh consensus binding sites. 21. Hromas, R, J. Moore, T. Johnston, C. Socha, and M. Klemsz. This work was supported by Public Health Service grant GM43241 1993. Drosophila Forkhead homologues are expressed in a lin- from the National Institute of General Medical Sciences and in part by eage-restricted manner in human hematopoietic cells. Blood 81: grants from the Council for Tobacco Research (2822) and American 2854-2859. Heart Association (93006540). R.H.C. is an Established Investigator of 22. Ip, Y. T., D. Poon, D. Stone, D. K. Granner, and R Chalkley. 1990. the American Heart Association/Bristol-Myers Squibb. Interaction of a liver-specific factor with an enhancer 4.8 kilobases upstream of the phosphoenolpyruvate carboxykinase gene. Mol. REFERENCES Cell. Biol. 10:3770-3781. 1. Blackwell, T. K., and H. Weintraub. 1990. Differences and simi- 23. Johnson, P. F., and S. L. McKnight. 1989. Eukaryotic transcrip- larities in DNA-binding preferences of MyoD and E2A protein tional regulatory proteins. Annu. Rev. Biochem. 58:799-839. complexes revealed by binding site selection. Science 250:1104- 24. Kaestner, K. H., K. Lee, J. Schlondorff, H. Hiemisch, A. P. 1110. Monaghan, and G. Schutz. 1993. Six members of the mouse 2. Blank, V., P. Kourilsky, and A. Israel. 1992. NFKB and related forkhead gene family are developmentally regulated. Proc. Natl. proteins: Rel/dorsal homologies meet ankyrin-like repeats. Trends Acad. Sci. USA 90:7628-7631. Biochem. Sci. 17:135-140. 25. Kissinger, C. R, B. Liu, E. Martin-Blanco, T. B. Kornberg, and 3. Brooks, A. R., B. D. Blackhart, K. Haubold, and B. Levy-Wilson. Carl. 0. Pabo. 1990. Crystal structure of an engrailed homeodo- 1991. Characterization of tissue-specific enhancer elements in the main-DNA complex at 2.8 A resolution: a framework for under- second intron of the human apolipoprotein B gene. J. Biol. Chem. standing homeodomain-DNA interactions. Cell 63:579-590. 266:7848-7859. 26. Knochel, S., J. Lef, J. Clement, B. Klocke, S. Hille, M. Koster, and 4. Catron, K. M., N. Iler, and C. Abate. 1993. Nucleotides flanking a W. Knochel. 1992. Activin A induced expression of a fork head conserved TAAT core dictate the DNA binding specificity of three related gene in posterior chordamesoderm (notochord) of Xeno- murine homeodomain proteins. Mol. Cell. Biol. 13:2354-2365. pus laevis embryos. Mech. Dev. 38:157-165. 5. Clark, K. L., E. D. Halay, E. Lai, and S. K. Burley. 1993. Cocrystal 27. Kuo, C. J., P. B. Conley, L. Chen, F. M. Sladek, J. E. Darnell, Jr., structure of the HNF-3/fork head DNA-recognition motif resem- and G. R Crabtree. 1992. A transcriptional hierarchy involved in bles histone H5. Nature (London) 364:412-420. mammalian cell-type specification. Nature (London) 355:457-461. 6. Clevidence, D. E., D. G. Overdier, W. Tao, X. Qian, L. Pani, E. Lai, 28. Lai, E., and J. E. Darnell, Jr. 1991. Transcriptional control in and R. H. Costa. 1993. Identification of nine tissue-specific hepatocytes: a window on development. Trends Biochem. Sci. transcription factors of the hepatocyte nuclear factor 3/forkhead- 16:427-430. DNA-binding domain family. Proc. Natl. Acad. Sci. USA 90:3948- 29. Lai, E., V. R Prezioso, E. Smith, 0. Litvin, R H. Costa, and J. E. 3952. Darnell, Jr. 1990. HNF-3A, a hepatocyte-enriched transcription 7. Costa, R H., D. R. Grayson, and J. E. Darnell, Jr. 1989. Multiple factor of novel structure is regulated transcriptionally. Genes Dev. hepatocyte-enriched nuclear factors function in the regulation of 4:1427-1436. transthyretin and al-antitrypsin. Mol. Cell. Biol. 9:1415-1425. 30. Lai, E., V. R Prezioso, W. Tao, W. Chen, and J. E. Darnell, Jr. 8. Dirksen, M. L., and M. Jamrich. 1992. A novel, activin-inducible, 1991. Hepatocyte nuclear factor 3A belongs to a gene family in blastopore lip-specific gene ofXenopus laevis contains a fork head mammals that is homologous to the Drosophila homeotic gene DNA-binding domain. Genes Dev. 6:599-608. fork head. Genes Dev. 5:416-427. 9. Ekker, S. C., D. P. von Kessler, and P. A. Beachy. 1992. Differen- 31. Levine, M., and T. Hoey. 1988. Homeobox proteins as sequence- tial DNA sequence recognition is a determinant of specificity in specific transcription factors. Cell 55:537-540. homeotic gene action. EMBO J. 11:4059-4072. 32. Levine, M., and J. L. Manley. 1989. Transcriptional repression of 10. Evans, R 1988. The steroid and thyroid hormone receptor super- eukaryotic promoters. Cell 59:405-408. family. Science 240:889-893. 33. Li, C., C. Lai, D. S. Sigman, and R B. Gaynor. 1991. Cloning of a 11. Frankel, A. D., and P. S. Kim. 1991. Modular structure of cellular factor, interleukin binding factor, that binds to NFAT-like transcription factors: implications for gene regulation. Cell 65: motifs in the human immunodeficiency virus long terminal repeat. 717-719. Proc. Natl. Acad. Sci. USA 88:7739-7743. 12. Fried, M., and D. M. Crothers. 1981. Equilibria and kinetics of lac 34. Li, C., A. J. Lusis, R Sparkes, S. Tran, and R. Gaynor. 1992. repressor operator interactions by polyacrylamide gel electro- Characterization and chromosomal mapping of the gene encoding phoresis. Nucleic Acids Res. 9:6505-6525. the cellular DNA binding protein HTLF. Genomics 13:658-664. 13. Garner, M. M., and A. Revzin. 1981. A gel electrophoresis method 35. Li, J., and P. K. Vogt. 1993. The retroviral oncogene qin belongs to for quantifying the binding of proteins to specific DNA regions: the transcription factor family that includes the homeotic gene application to components of the E. Coli lactose operon regulatory fork head. Proc. Natl. Acad. Sci. USA 90:4490-4494. system. Nucleic Acids Res. 9:3047-3060. 36. Liu, J., C. M. DiPersio, and K. S. Zaret. 1991. Extracellular signals 14. Graham, A., N. Papalopulu, and R. Krumlauf. 1989. The murine that regulate liver transcription factors during hepatic differenti- and Drosophila homeobox gene complexes have common features ation in vitro. Mol. Cell. Biol. 11:773-784. of organization and expression. Cell 57:367-378. 37. Mann, R S., and D. S. Hogness. 1990. Functional dissection of 15. Grossniklaus, U., R. K. Pearson, and W. J. Gehring. 1992. The ultrabithorax proteins in D. Melanogaster. Cell 60:597-610. Drosophila sloppy paired locus encodes two proteins involved in 38. McGinnis, W., and R. Krumlauf. 1992. Homeobox genes and axial 2766 OVERDIER ET AL. MOL. CELL. BIOL.

patterning. Cell 68:283-302. C/EBP, NFY, HNF3, and HNF-1 in the rat aldolase B gene 39. McPherson, C. E., E.-Y. Shim, D. S. Friedman, and K. S. Zaret. promoter. Nucleic Acids Res. 19:6145-6153. 1993. An active tissue-specific enhancer and bound transcription 51. Rigaud, G., J. Roux, R. Pictet, and T. Grange. 1991. In vivo factors existing in a precisely positioned nucleosome array. Cell footprinting of rat TAT gene: dynamic interplay between the 75:387-398. glucocorticoid receptor and a liver-specific factor. Cell 67:977-986. 40. Miller, L. M., M. E. Gallegos, B. A. Morisseau, and S. K. Kim. 52. Rosenfeld, M. G. 1991. POU-domain transcription factors: pou- 1993. Lin-31, a caenorhabditis elegans HNF-3/fork head transcrip- er-ful developmental regulators. Genes Dev. 5:897-907. tion factor homolog specifies three alternative cell fates in vulval 53. Ruiz i Altaba, A., and T. M. Jessell. 1992. Pintallavis, a gene development. Genes Dev. 7:933-947. expressed in the organizer and midline cells of frog embryos: 41. Mitchell, P. J., and R. Tjian. 1989. Transcriptional regulation in involvement in the development of the neural axis. Development mammalian cells by sequence-specific DNA binding proteins. 116:81-93. Science 245:371-378. 54. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular 42. Miura, N., A. Wanaka, M. Tohyama, and K. Tanaka. 1993. cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Labo- MFH-1, a new member of the fork head domain family is ratory Press, Cold Spring Harbor, N.Y. expressed in developing mesenchyme. FEBS Lett. 326:171-176. 55. Sasaki, H., and B. L. M. Hogan. 1993. Differential expression of 43. Nitsch, D., and G. Schutz. 1993. The distal enhancer implicated in multiple fork head related genes during gastrulation and axial the developmental regulation of the tyrosine aminotransferase pattern formation. Development 118:47-59. gene is bound by liver-specific and ubiquitous factors. Mol. Cell. 56. Sawaya, P. L., B. R. Stripp, J. A. Whitsett, and D. S. Luse. 1993. Biol. 13:4494-4504. The lung-specific CC10 gene is regulated by transcription factors 44. Olson, E. N. 1990. MyoD family: a paradigm for development? for the AP-1, octamer, and hepatocyte nuclear factor 3 families. Genes Dev. 4:1454-1461. Mol. Cell. Biol. 13:3860-3871. 45. Otting, G., Y. Q. Qian, M. Billeter, M. Muller, M. Affolter, W. J. 57. Smith, D. B., and K. S. Johnson. 1988. Single-step purification of Gehring, and K. Wuthrich. 1989. Protein-DNA contacts in the polypeptides expressed in E. Coli as fusions with glutathione structure of a homeodomain-DNA complex determined by nu- S-transferase. Gene 67:31-40. clear magnetic resonance spectroscopy in solution. EMBO J. 58. Strahle, U., P. Blader, D. Henrique, and P. W. Ingham. 1993. 9:3085-3092. Axial, a zebrafish gene expressed along the developing body axis, 46. Pani, L., D. Clevidence, X. Qian, and R. H. Costa. 1992. The shows altered expression in cyclops mutant embryos. Genes Dev. restricted promoter activity of the liver transcription factor hepa- 7:1436-1446. tocyte nuclear factor 31 involves a cell-specific factor and positive 59. Tao, W., and E. Lai. 1992. Telencephalon-restricted expression of autoactivation. Mol. Cell. Biol. 12:552-562. BF-1, a new member of the HNF-3/forkhead gene family, in the 47. Pani, L., D. Overdier, A. Porcella, X. Qian, E. Lai, and R. H. developing rat brain. Neuron 8:957-966. Costa. 1992. Hepatocyte nuclear factor 31B contains two transcrip- 60. Weigel, D., and H. Jackie. 1990. Fork head: a new eukaryotic DNA tional activation domains one of which is conserved with the binding motif? Cell 63:455-456. Drosophila fork head protein. Mol. Cell. Biol. 12:3723-3732. 61. Weigel, D., G. Jurgens, F. Kuttner, E. Seifert, and H. Jackle. 1989. 48. Ptashne, M., and A. A. F. Gann. 1990. Activators and targets. The homeotic gene fork head encodes a nuclear protein and is Nature (London) 346:329-331. expressed in the terminal regions of the drosophila embryo. Cell 49. Ramakrishnan, V., J. T. Finch, V. Graziano, P. L. Lee, and R. M. 57:645-658. Sweet. 1993. Crystal structure of globular domain of histone H5 62. Weintraub, H., R. Davis, S. Tapscott, M. Thayer, M. Krause, R. and its implications for nucleosome binding. Nature (London) Benezra, T. K. Blackwell, D. Turner, R. Rupp, S. Hollenberg, Y. 362:219-223. Zhuang, and A. Lassar. 1991. The myoD gene family: nodal point 50. Raymondjean, M., A.-L. Pichard, C. Gregori, F. Ginot, and A. during specification of the muscle cell lineage. Science 251:761- Kahn. 1991. Interplay of an original combination of factors: 766.