Crystal Structure of a Human TATA Box-Binding Protein/TATA Element Complex DIMITAR B
Total Page:16
File Type:pdf, Size:1020Kb
Proc. Natl. Acad. Sci. USA Vol. 93, pp. 4862-4867, May 1996 Cell Biology Crystal structure of a human TATA box-binding protein/TATA element complex DIMITAR B. NIKOLOV*, HUA CHEN*t, ELAINE D. HALAY*t, ALEXANDER HOFFMANNt§, ROBERT G. ROEDERt, AND STEPHEN K. BURLEY*t¶ *Laboratories of Molecular Biophysics, tLaboratory of Biochemistry and Molecular Biology, and tHoward Hughes Medical Institute, The Rockefeller University, 1230 York Avenue, New York, NY 10021 Contributed by Robert G. Roeder, January 5, 1996 ABSTRACT The TATA box-binding protein (TBP) is re- Recently, we reported the x-ray crystal structures of TBP quired by all three eukaryotic RNA polymerases for correct isoform 2 (TBP2) fromArabidopsis thaliana (10, 11), a complex initiation of transcription of ribosomal, messenger, small of TBP2 with the TATA element of the adenovirus major late nuclear, and transfer RNAs. The cocrystal structure of the promoter (AdMLP) (12, 13), and a ternary complex of TBP2 C-terminal/core region of human TBP complexed with the with human TFIIB and the AdMLP TATA box (14). TBP2 TATA element ofthe adenovirus major late promoter has been displays a novel, two-domain DNA-binding fold, and the determined at 1.9 A resolution. Structural and functional protein's approximate twofold intramolecular symmetry gen- analyses of the protein-DNA complex are presented, with a erates a saddle-shaped structure dominated by a curved anti- detailed comparison to our 1.9-A-resolution structure ofAra- parallel (3-sheet, forming a concave DNA binding surface. bidopsis thaliana TBP2 bound to the same TATA box. TATA element recognition proceeds via an induced-fit mech- anism, involving subtle conformational changes at the level of the protein and an unprecedented DNA distortion. The seat of Eukaryotes have three distinct RNA polymerases (forms I, II, the saddle presents a large convex surface to which other and III) that catalyze transcription of nuclear genes (1). proteins bind during transcription initiation. TFIIB recognizes Despite their structural complexity, these multisubunit en- the preformed TBP-DNA complex via protein-protein and zymes require sets of auxiliary proteins known as general protein-DNA interactions. The N-terminal portion of TFIIB transcription initiation factors to initiate transcription from forms the downstream surface of the ternary complex, where corresponding class I, II, and III nuclear gene promoters (2-5). it could fix the transcription start site. The remaining surfaces TATA box-binding protein (TBP), first identified as a com- of TBP and the TFIIB can interact with TBP-associated ponent of the class II initiation factor TFIID, participates in factors, other class II initiation factors, and transcriptional transcription by all three nuclear RNA polymerases (reviewed activators and coactivators. in ref. 6). TBP has a phylogenetically conserved, 180-residue We now report the crystal structure of human C-terminal/ C-terminal portion, containing two direct repeats flanking a core TBP (hTBPc) recognizing the AdMLP TATA element. highly basic segment known as the basic repeat. The C-termi- The 1.9-A-resolution structure permits detailed structural and nal/core region of TBP binds to the TATA consensus se- functional analyses of the protein-DNA interactions and pro- quence (TATAa/tAa/t) with high affinity and slow off rate, vides further insights into how TBPs from different organisms recognizing minor groove determinants and inducing a dra- recognize and direct transcription initiation from TATA- matic DNA deformation. The N-terminal portion of TBP containing promoters. varies in length, shows little or no conservation among differ- ent organisms, and is largely unnecessary for transcription in MATERIALS AND METHODS certain yeast strains. The role of TBP in transcription initiation and its regulation Crystallization. hTBPc residues 155-335 (15) fused with an are best understood for genes transcribed by RNA polymerase additional 20 N-terminal amino acids (MGSSHHHHHHSS- II (reviewed in refs. 3 and 4). In this setting, TBP is tightly GLVPRGSH) was overexpressed in Escherichia coli [BL21- associated with other polypeptides known as TBP-associated (DE3)pLysS] using the T7 RNA polymerase system (16). Cells factors (reviewed in ref. 6). This multiprotein complex grown at 30°C to an absorbance of 0.7 at 595 nm and induced (TFIID) is a general initiation factor (7) that binds to the with 0.5 mM isopropyl P3-D-thiogalactopyranoside for 3 h were TATA element, coordinating accretion of class II initiation harvested by low-speed centrifugation. Lysis was performed by factors (TFIIA, -B, -D, -E, -F, -G, -H, and -I) and RNA three cycles of freeze-thaw and DNA digestion was performed polymerase II (pol II) into a functional preinitiation complex by DNase I. The soluble fraction was applied directly to a Ni2+ (PIC) (reviewed in refs. 3 and 8). Although incapable of ion affinity column and washed with a buffer containing mimicking TFIID in vivo (at least in higher eukaryotes; increasing amounts of imidazole. hTBPc with an estimated reviewed in ref. 6), recombinant TBP alone is competent for homogeneity of 95% was then eluted from the resin with a PIC assembly and basal transcription in the presence of the buffer containing 100 mM EDTA. After overnight dialysis, other general class II factors (9). TBP engages in physical and removal of the histidine-containing N-terminal sequence was functional interactions with the general initiation factors TFIIA and TFIIB, the C terminus of the large subunit of pol Abbreviations: AdMLP, adenovirus major late promoter; PIC, preini- tiation complex; TBP, TATA box-binding protein; hTBPc, human core II, some negative cofactors (NC1, NC2, DR1) that inhibit PIC TATA box-binding protein; TBP2, TATA box-binding protein isoform formation, some transcriptional activators, and an initiator- 2; pol II, RNA polymerase II; rmsd, rms deviation. binding factor (TFII-I) that may be important for transcription Data deposition: The atomic coordinates and structure factor ampli- initiation from TATA-less promoters. tudes have been deposited in the Protein Data Bank, Chemistry Department, Brookhaven National Laboratory, Upton, NY 11973 (CDW).1 The publication costs of this article were defrayed in part by page charge §Present address: Department of Biology, Massachusetts Institute of payment. This article must therefore be hereby marked "advertisement" in Technology, Cambridge, MA 02139. accordance with 18 U.S.C. §1734 solely to indicate this fact. 5To whom reprint requests should be sent at the * address. 4862 Downloaded by guest on September 29, 2021 Cell Biology: Nikolov et al. Proc. Natl. Acad. Sci. USA 93 (1996) 4863 A 155 160 170 180 190 200 210 220 230 240 III rr rrr r w r r r r r rr b b w pi p w p b b p w b bw w SGIVPQLQNIVSTVNLGCKLDLKTIALR-ARNAEYNPKRFAAVIMRIREPRTTALIFSSGKMVCTGAKSEEQSRLAARKYARVVQKLGFPAKF Si HI S2 53S4 S5 H2 LjDFKIQNMVGSCDVKFPIRLEGLVLTHQQFSSYE PELFPGLIYRMIKPRIVLLIFVSGKVVLTGAKVRAEIYEAFENIYPILKGFRKTT b b ib p w b b p w b bw p rr rrr r r r r r r r III rI 28 250 260 270 2 290 300 310 320 330 B 1 3 5 7 5'C T G C T A T AAA A G G C T G G A C G A T A T T T T C C G A C 5r 1 3' 5 7' FIG. 1. (A) Sequence of the 180-amino acid conserved C-terminal region of human TBP with the two direct repeats aligned one above the other. a-Helix and 13-strand regions are shaded and labeled (' denotes second direct repeat), and the two "stirrups" are boxed. Residues interacting with DNA are labeled according to interaction type: b, base; p, phosphate; r, ribose; i, intercalating phenylalanine; w, water mediated. Protein numbering is that of full-length human TBP (15) and corresponds to the A. thaliana numbering plus 136 and Saccharomyces cerevisiae numbering plus 94. A full conversion scheme can be found in ref. 10. (B) Sequence of 16-base-pair oligonucleotide used for cocrystallization, with the 8-base-pair TATA box of the AdMLP shaded. DNA numbering scheme is identical to that used in ref. 13. Position 1 of the TATA box corresponds to base pair -31 in the AdMLP, and ' denotes bottom strand. performed by trypsin cleavage of the peptide bond between R were transiently "cryoprotected" in a 10% glycerol/25% eth- and G in the above sequence to yield hTBPc-(155-335) plus 4 ylene glycol/1 mM MgCl2/5 mM Tris HCl, pH 8.5, mixture N-terminal amino acids-namely, GSHM. A final cation- before freezing on a platform created by a loop of ophthal- exchange chromatography step to remove the protease and any mological suture material (10-0 Ethilon, Ethicon). Oscillation remaining contaminants yielded material of homogeneity es- photographs were integrated, scaled and merged using DENZO timated to be 98% or greater. Mass spectrometry documented and SCALEPACK. Molecular replacement (MR) was carried out that the TBP used for crystallization was neither modified nor with X-PLOR (17) using the structure of the TBP2-DNA further proteolyzed during expression and purification. The complex (12, 13). Patterson correlation refinement of the measured molecular mass was 20,829 ± 5, compared with rotation function search maxima yielded a solution with cor- 20,825 calculated from the predicted amino acid sequence. relation coefficient 0.216. A translation search in X-Y-Z, gave Further control experiments were performed to ensure that a final MR solution 14.2cr above the mean peak level. Rigid the protein used for the crystallographic study was biochem- body refinement of the search model, followed by positional ically active and indistinguishable from recombinant yeast and simulated annealing refinement with X-PLOR, gave Rfactor TBP. Gel-retardation analyses, performed as described (11), = 0.26 at 3.0 A resolution. At this point, (2IFobservedI - confirmed that the purified protein had full TBP DNA-binding IFcalculatedl) and (lFobservedi - lFcalculatedl) difference maps were activity (data not shown).