Structural analysis and dimerization potential of the human TAF5 subunit of TFIID

Suparna Bhattacharya, Shinako Takada, and Raymond H. Jacobson*

Department of Biochemistry and Molecular Biology, University of Texas M. D. Anderson Cancer Center and the Program in and Development at the University of Texas, Graduate School in Biochemical Sciences, 1515 Holcombe Boulevard, Unit 1000, Houston, TX 77030

Communicated by Brian W. Matthews, University of Oregon, Eugene, OR, November 21, 2006 (received for review October 12, 2006) TFIID is an essential factor required for RNA polymerase II tran- analysis have provided visualizations of the TFIID complex scription but remains poorly understood because of its intrinsic (19–21). These studies showed that TFIID forms a three lobed, complexity. Human TAF5, a 100-kDa subunit of general transcrip- asymmetric structure. Further immunomapping studies have tion factor TFIID, is an essential and plays a critical role in begun to reveal the relative positions of different TAFs within assembling the 1.2 MDa TFIID complex. We report here a structural the complexes and point to key structural roles for the TAF5 and analysis of the TAF5 . Our structure at 2.2-Å resolution of TAF1 subunits in forming the characteristic three lobed molec- the TAF5-NTD2 domain reveals an ␣-helical domain with distant ular assembly. structural similarity to RNA polymerase II CTD interacting factors. TAF5, a 100-kDa polypeptide in humans, is a key TAF subunit The TAF5-NTD2 domain contains several conserved clefts likely to that seems to play a major role in forming the scaffold critical for be critical for TFIID complex assembly. Our biochemical analysis of TFIID complex formation. TAF5 from all eukaryotes the human TAF5 protein demonstrates the ability of the N-terminal contain three highly conserved sequence motifs in their N- and half of the TAF5 gene to form a flexible, extended dimer, a key C-terminal regions. The C-terminal portion of TAF5 contains six property required for the assembly of the TFIID complex. WD40 repeats that are likely to form a closed beta propeller structure. Beta propeller domains have more generally been inititation complex ͉ transcription ͉ x-ray crystallography ͉ protein–protein shown to have functions in mediating protein-protein interac- interaction tions among a variety of different proteins and might be impor- tant for TAF–TAF interactions (22, 23). Unlike the WD40 BIOCHEMISTRY he general transcription factor TFIID plays a central role in motifs present in the C terminus of TAF5, the N-terminal region Tthe recognition of core promoter elements and is used for of the TAF5 sequence contains two conserved motifs for which accurate transcription initiation by RNA Pol II for a large class little is known. The EM studies had implicated the N-terminal of genes. Recent studies in yeast indicate that the majority of portion of TAF5 as playing a role in dimerization and forming genes present are TFIID dependent (1). TFIID is a multiprotein a scaffold upon which other TFIID subunits could assemble. To complex composed of the TATA box-binding protein (TBP) and further clarify the structural organization and function of TAF5, 14 other TBP-associated factors (TAFs) which have been highly we have determined the crystal structure to a resolution of 2.2Å conserved during eukaryotic evolution (2). TFIID is the only of the larger of the two N-terminal conserved motifs general transcription factor with specific TATA box binding (hTAFII100) (residues 189–343) and characterized the relation- activity and has been shown to initiate recruitment of the other ship between this domain and the smaller, most N-terminal general transcription factors (TFIIA, TFIIB, TFIIE, TFIIF, conserved region of the TAF5 sequence (TAF5-NTD1/LisH TFIIH) along with RNA Pol II into a functional preinitiation domain residues 90–124). complex (PIC) that forms at the start site of Pol II genes. Our studies show that the N-terminal half of the TAF5 TAF subunits seem to serve multiple functions within TFIID sequence is capable of forming a dimeric assembly in the absence holocomplex. For instance, hTAF6 and hTAF9 have been of other TAFs. Our crystal structure shows that the second reported to interact with the downstream promoter element, conserved TAF5 motif (TAF5-NTD2) adopts a mostly ␣-helical whereas TAF1 and TAF2 have been shown to bind to the domain of novel structure that forms a calcium dependent dimer initiator. The specific contacts with the promoter DNA by both in the crystal lattice and in solution. However, direct TFIID support basal transcription from promoters containing measurement Ca2ϩ binding revealed only weak affinity of this these elements and have revealed the role of several TAFs on domain for Ca2ϩ. Subsequent studies by using fragments en- transcription (3–6). TAF1, the largest subunit of TFIID is known compassing both TAF5-NTD1 and TAF5-NTD2 revealed that to harbor multiple enzymatic activities (7–9) and is also involved the small LisH homology domain at N terminus of TAF5 in chromatin transactions by using its double bromodomains to cooperates with TAF5-NTD2 to induce dimer formation of contact acetylated histone tails (10, 11). Furthermore, many of hTAF5 in a calcium independent manner. Our studies are the TAFs have been shown to be involved in interactions with gene specific activators and other general transcription factors either to stabilize the preinitiation complex (12) or to induce Author contributions: S.B. and R.H.J. designed research; S.B., S.T., and R.H.J. performed structural changes in them (13). TAFs are not only restricted to research; S.B., S.T., and R.H.J. contributed new reagents/analytic tools; R.H.J. analyzed data; the TFIID complex but also can be found in the yeast SAGA and S.B. and R.H.J. wrote the paper. (Spt-Ada-GCN5-acetyltransferase complex), human STAGA The authors declare no conflict of interest. (Spt3-TAF9-Ada-GCN5-acetyltransferase complex), PCAF Abbreviations: TBP, TATA box-binding protein; TAF, TBP-associated factor; TF, transcrip- tion factor; MAD, multiwavelength anomalous dispersion; NCS, noncrystallographic sym- (p300/CBP-associated factor), or TFTC (TBP-free TAFII- metry; CID, RNA polymerase CTD-interacting domain; DLS, dynamic light scattering; SEC, containing complex) (14–16). The TAFs within these multipro- size exclusion chromatography; SLS, static light scattering. tein complexes are involved in extensive protein-protein inter- Data deposition: The atomic coordinates and structure factors have been deposited in the actions and various studies indicate the emerging role of TAFs , www.rcsb.org (PDB ID code 2NXP). as cofactors with important functional properties required for *To whom correspondence should be addressed. E-mail: [email protected]. transcription (17, 18). This article contains supporting information online at www.pnas.org/cgi/content/full/ Low-resolution three-dimensional structures of the yeast and 0610297104/DC1. human TFIID complexes from electron microscopy and digital © 2007 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0610297104 PNAS ͉ January 23, 2007 ͉ vol. 104 ͉ no. 4 ͉ 1189–1194 Downloaded by guest on September 29, 2021 1 91 124 194 340 460 739 800 N NTD2 WD40 C a b

NTD1/ LisH

Fig. 1. Primary sequence organization of hTAF5. Schematic diagram of the domain structure present in human TAF5 where NTD1/LisH corresponds to LIS1 homology domain (29), NTD2 corresponds to the ␣-helical domain reported here, and WD40 repeats are predicted to form a closed ␤-propeller structure.

consistent with previous EM analyses that suggested that the N terminus of TAF5 might play a critical role in organizing the c 205 To α2 three-lobed TFIID structure. We have also identified several 198 1 230 α surface pockets that may serve as binding sites for other TAFs 227 328 268 302 305 or transcriptional regulators. 273 η3 336 α7

Results 299 307 339 323 α4 β1 β2 The primary sequence of human TAF5 is shown schematically in α2 α3 α5 η2 α6 Fig. 1. All TAF5 proteins contain two evolutionary conserved 295 310 342 motifs in the N-terminal half (TAF5-NTD1 and TAF5-NTD2, 311 257 283 290 for the human TAF5 residues 90–124 and 194–340, respectively). 211 η1 From α1 The N-terminal region of TAF5 proteins from Drosophila and 253 288 humans share 31% sequence identity, whereas yeast and human Helical-Bundle Helical-Sheet TAF5 protein share 22% sequence identity. Several subfrag- ments of TAF5 N terminus were generated, purified and sub- Fig. 2. Structure of the hTAF5-NTD2 domain. (a) Diagram of the hTAF5- jected to limited proteolysis. Digestion of a fragment containing NTD2. (b) Top view of the ␣-helical domain of the hTAF5-NTD2 showing the residues 1–500 of the human TAF5 protein produced a proteo- arrangement of the other helices around the central helix (␣3) in the crystal lytically resistant fragment with an observed mass by SDS/PAGE structure. All of the helices and the ␤ strands are labeled. Two views in a and of Ϸ19 kDa. Liquid chromatography (LC)-MS unambiguously b are related by rotation of 90° around a horizontal axis. (c) Topology diagram identified this tryptic fragment as the TAF5-NTD2 region of the secondary structural elements of the hTAF5-NTD2 domain to show the arrangement of the helical bundle (front view) and the helical sheet (back (194–340). view) of the crystal structure. A construct corresponding to TAF5-NTD2 (residues 189–343) was expressed, purified, and crystallized as described in Materials and Methods. A multiwavelength anomalous dispersion (MAD) A key feature of the TAF5-NTD2 motif is the presence of a experiment was performed by using SeMet-labeled crystals dif- long central helix (␣3) that forms the core of the domain. This fracting to 2.2-Å resolution (see Materials and Methods). Noncrys- helix is bent by Ϸ30° halfway through its length. The C-terminal tallographic symmetry (NCS) averaging more than eight crystal- end of the core helix is mostly buried by surrounding bundle lographically independent copies coupled with phase extension helices, whereas the N-terminal end is partially exposed to the resulted in a high quality experimental electron density map. The solvent forming part of a cleft that is a putative binding site for final 2.2-Å resolution model consists of residues 195–343 of the other TAF subunits. human TAF5 sequence and includes a total of 150 of 156 expected Similarity searches by using SSM and DALI were carried out residues. Missing residues are localized to the N termini of all four for the TAF5-NTD2 structure (24, 25). Neither SSM or DALI noncrystallographically related dimers. The resulting model has identified any proteins with overall similarity to the TAF5-NTD2 been refined to 2.2 Å with Rwork ϭ 21.2% and Rfree ϭ 26.2% [see topology. The closest matches obtained identified structures that supporting information (SI) Table 1]. All of the molecules in the matched only the TAF5-NTD2 4-helix bundle and that con- asymmetric unit are essentially identical, superimposing with rms tained right-handed superhelices. The similarity searches suggest deviations Ͻ0.4 Å. that the TAF5-NTD2 domain may be distantly related to the The TAF5-NTD2 domain adopts a novel fold that is mostly CID domains (RNA polymerase CTD-interacting domains). helical, consisting of seven ␣-helices, three short 310 helices, and Superposition of ␣-helices ␣4–␣7 from the CID-containing two short ␤ strands (Fig. 2). Four of the ␣-helices (␣2–␣5) form protein PCF11, a S. cerevisiae polyadenylation factor (26), onto a four-helix bundle, adopting a right-handed superhelical ar- helices ␣2–␣5 of the TAF5-NTD2 results in an RMS deviation rangement that forms the core of the TAF5-NTD2 domain of Ϸ2.0 Å over 53 ␣ carbons. Although the core helices of the structure. Following the last helix of the four-helix bundle (␣5), PCF11 and TAF5-NTD2 domains align well, the remainder of the first of three short 310 helices (␩1) breaks the right-handed the structures are quite different. The PCF11 domain is formed superhelical pattern crossing over to the left. After a tight turn from seven helices all arranged in the right-handed superhelix, of the chain, the second longer 310 helix (␩2) is oriented parallel whereas the TAF5-NTD2 domain contains only four helices in to the principal axis of the four-helix bundle, essentially adding the superhelix to which an irregular helical sheet has been added. a fifth helix to the bundle motif. A third, short 310 helix follows Furthermore, the CTD phosphopeptide binding pocket of the (␩3) and connects to the first of the two short ␤ strands (␤1) PCF11 domain is located on the opposite side from the most present in the TAF5-NTD2 domain. The axes of the remaining apparent binding cleft present in the TAF5-NTD2. However in helices (␣6 and ␣7) make a 60° angle forming a triangular spiral TAF5-NTD2, a large patch of conserved residues (CID Homol- that is completed by a 2-strand parallel sheet formed from the ogy Patch) corresponding to the CID interaction site shows ␤1 and ␤2 strands. Together, helices ␣6 and ␣7 form a ‘‘helical sequence conservation within TAF5 sequences but not between sheet’’ with helix ␣1 and the first 310 helix.␩1. The ‘‘helical sheet’’ TAF5 and CID proteins (see below). is localized to one side of the four-helix bundle where it forms To identify possible binding sites for other TAFs, we looked an extended structure that covers an entire face of the core for clefts or pockets present in the structure that exhibited a high bundle formed from helices ␣2, ␣3, ␣4, and ␣5. degree of sequence conservation among TAF5 family members.

1190 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0610297104 Bhattacharya et al. Downloaded by guest on September 29, 2021 a <20 40 50 70 80 >90 b

Y218 Y241 helix α3 K304 L279 F267 Cleft1 F264

K222 R301 K266 Cleft2 R233 F305 Y248 L307 F240 P242 V252 K261 F244 Fig. 4. Structural similarity between TAF5-NTD2 and CID domains. Shown is helix α3 a stereo diagram of the core helical bundle (␣2–␣5) from hTAF5-NTD2 super- imposed onto the core helices from the PCF11 structure responsible for CTD 3 α c lix interaction. hTAF5-NTD2 is shown in red, whereas PCF11 is shown in blue. he F267 K222

of the cleft is highly hydrophobic in nature. In the crystal lattice K266 cleft 1 is partially obscured by a noncrystallographic dimer E215 S219 interface in which acidic residues bordering the channel facilitate 2ϩ E323 dimerization in the presence of Ca ions. Cleft 2 (584 Å3) is much smaller than cleft 1 and includes Q325 residues from helices ␣3, ␣4, and ␣5 and the loop connecting the CID Homology E216 K324 Surface T212 ␣4 and ␣5 helices. Cleft 2 is shown in Fig. 3b. The bottom of cleft 2 is formed from residues contributed from the central core helix Fig. 3. Conserved surface features of the TAF-NTD2 domain. The surface ␣3 and helix ␣4. Highly conserved residues Tyr-248, Val-252, conservation score was calculated with ESpript (28) based on the sequence Lys-261, Phe-264, and Leu-279 form the floor of the cleft. alignment in SI Fig. 7 and was mapped on the hTAF5-NTD2 molecular surface Although the ProFunc calculation identified cleft 2 as being by PYMOL. The shade of red in the figure indicates the degree of conservation distinct from cleft 1, examination of the relative positions of the BIOCHEMISTRY within TAF5 family members (white, no conservation, to dark red, absolute conservation). The scale bar (a) in the figure indicates the degree of residue two features reveals that they are located next to each other on conservation from 0.0 to 1.0 conserved cleft 1, the largest involuted cleft on the surface. Together clefts 1 and 2 form a semicontinuous stripe the TAF5-NTD2 domain with conserved residues. Helix ␣3 is pointed out of the of conserved residues that wraps around 50–60% of the circum- plane of the page. (b) Conserved cleft 2. (c) Conserved residue distributions on ference of the TAF5-NTD2 domain. the surface of hTAF5-NTD2 corresponding to the CID homology surface. Although not identified by ProFunc as a cleft, there is a relatively high degree of sequence conservation present on the surface of the TAF5-NTD2 domain on the face of the molecule Two programs, ProFunc (27) and ESpript (28), were used to equivalent to the PCF11 CID interaction surface (CID homol- consider both primary sequence and structural information for ogy surface; Fig. 3c). The CID homology surface is formed from a diverse set of TAF5 sequences (see SI Fig. 7). The surface the core four-helix bundle of the TAF5-NTD2 domain and is calculations from ProFunc identified four major clefts with a cut localized neighboring to cleft1. Like the interaction surface from Ϸ 3 off volume greater than 500 Å . ProFunc and ESpript analysis PCF11, the TAF5-NTD2 CID Homology surface is formed from identified the largest and third largest clefts as containing the both polar residues and also nonpolar residues. This surface highest percentage of conserved residues. Four clefts were feature contains one of the largest patches of conserved residues identified, and two of these seem to exhibit significant sequence among TAF5 molecules localized to the surface of the TAF- conservation (Fig. 3). NTD2 domain. An overlay of the core four-helix bundle of 3 The largest conserved cleft (cleft 1) has a volume of 1,054 Å TAF5-NTD2 to the PCF11, CID domain is shown in Fig. 4. and is localized to the N terminus of the core helix where it forms Strikingly the position where the CTD peptide binds to PCF11 an extended groove that runs across the top of the four-helix corresponds to a large patch of conserved residues present on bundle (Fig. 3a). This cleft is 25 Å in length, which could TAF5-NTD2, suggesting that this region of the TAF5 molecule accommodate a relatively long polypeptide chain (7–8 residues) might also function as an interaction surface. with extended secondary structure. The side walls of cleft 1 are Examination of the initial NCS averaged maps revealed clear formed residues donated by the ␣2, ␣3, and ␣4 helices, the loop electron density for a bound ion bridging the monomer chains connecting the ␣4 and ␣5 helices and an additional loop con- leading to dimerization. The bound ion was coordinated by ␩ ␤ necting the third 310 helix ( 3) and first strand ( 1), whereas the interactions with residues from helix ␣5 (side chains of Asp-277, base of the pocket is mostly formed by the helices ␣2, ␣3 (central Asp-278, and the main chain carbonyl oxygen of Tyr-274) and the helix), and ␣4. Residues forming cleft 1 make up the largest side chain of Asp-230 located at the N terminus of helix ␣3ina conserved region on the hTAF5-NTD2 surface and form a likely monomer related by a noncrystallographic 2-fold axis of sym- TAF–TAF interaction surface. The most highly conserved res- metry (see SI Fig. 8). From the components of the crystallization idues within cleft 1 are localized to the bottom of the channel. mother liquor [0.1 M Tris⅐HCl (pH 7.1), 0.6 M calcium chloride, Forming the central portion of the putative binding site are 20% PEG 4000], it was determined that most probable inter- several aromatic residues. Tyr-218, Phe-240, Phe-267 form the acting species was Ca2ϩ. This identity was subsequently con- upper half of the pocket, whereas Tyr-241, Pro-242, Phe-244, firmed through buffer dependent gel filtration and dynamic light Phe-305, and Leu-307 form the lower half of the pocket. Lys-222, scattering (DLS) measurements. Arg-233, and Lys-266 are located at one end of the putative The 2-fold symmetric interface present in the TAF-NTD2 TAF-binding site and Lys-304 and Arg-301 are found at the other dimers buries a surface accessible area of Ϸ600 Å2. The interface end of the groove. These charged residues are mostly conserved involves the residues from the N-terminal portion of the core in TAF5 family proteins. Thus the entry and exit points of cleft helix (␣3) and a part of helix ␣5. Only one hydrogen bond is 1 contain positively charged residues, whereas the inner surface formed across the interface between the side chains of Arg-308

Bhattacharya et al. PNAS ͉ January 23, 2007 ͉ vol. 104 ͉ no. 4 ͉ 1191 Downloaded by guest on September 29, 2021 and Tyr-274. The absence of extensive hydrophobic interactions 5 1.0x10 and the lack of numerous hydrogen bonds across the interface suggested that the dimer formation was mainly induced by 4 LisH-hTAF5 Molar Mass 2ϩ 8.0x10 presence of Ca ions in the crystallization buffer. The relatively LisH-hTAF5 Rayleigh Ratio limited buried surface area also suggested a weak intrinsic 4 LisH-hTAF5 ϩ 6.0x10 LisH-hTAF5 Refractive Index dimerization affinity in the absence of Ca2 ions. Given our Mass=55.7kDa observation of a dimeric form of TAF5-NTD2 in our crystals and 4 the presence of a bound Ca2ϩ ion, we were interested in 4.0x10 2ϩ determining whether there was any potential role of Ca ions 4 2.0x10 in inducing dimerization of TAF5 and stabilizing the TFIID Molar mass (g/mol) complex. 0.0 Because the crystallization buffer contained Ͼ0.3 M CaCl2, gel filtration in the presence or absence of 0.3 M CaCl2 was 6.0 7.0 8.0 9.0 10.0 carried out. In the absence of CaCl2, TAF5-NTD2 behaved as a Retention time (min) monomer of 18 kDa with no detectable dimer present. However, in the presence of 300 mM CaCl , a shift in the elution volume Fig. 5. Biophysical characterization of the association state of TAF5 frag- 2 ments. Shown is an estimation of native molecular mass by SEC-LS. The of the TAF5-NTD2 peak was observed corresponding to an Ϯ Ϸ molecular mass from the SEC-LS study was estimated to be 55.7 2.8 kDa, apparent molecular weight of 36 kDa (see SI Fig. 9a). confirming the presence of the dimeric form of the hTAF5-NTD1-NTD2 frag- DLS measurements were carried out for the hTAF5-NTD2 ment in the solution in buffer containing no divalent cations near physiologic fragment over a range of 0–300 mM CaCl2. In the absence of ion strength. 2ϩ Ca , the RH value was at its minimum (2.0 nm) corresponding to a monomer of 18 kDa assuming a globular protein, with increasing CaCl2 the RH and molecular mass increased to a patible with the calcium dependent mode of association ob- maximum value (2.9 nm, 38 kDa) above concentrations of 50 served in the TAF5-NTD2 crystal structure. However, our SLS mM Ca2ϩ. The estimated molecular weight was plotted vs. analysis clearly demonstrated the presence of an extended TAF5 divalent ion concentration (see SI Fig. 9b), yielding a mean KD dimer that is important for TFIID assembly. value of Ϸ16 mM at 10 mg/ml and Ϸ20 mM at 4.8 mg/ml of TAF5-NTD2 concentration. Similar measurements were per- Discussion formed in the presence of Mg2ϩ or Zn2ϩ but no binding could Detailed analyses presented by Leurent et al. (20, 21) describing be detected at any concentration of Mg2ϩ.Zn2ϩ ions were found electron microscopy of the yeast TFIID complex provided to strongly precipitate the TAF5-NTD2 domain so measure- important insights describing TAF protein/protein interaction ments for Zn2ϩ affinity could not be made. Similar measure- networks. Of particular interest was the finding that TAF5, a ments were also performed for the corresponding domain of the 90-kDa subunit in yeast and 100-kDa subunit in humans, seemed Drosophila TAF5 protein, however no calcium binding could be to localize to all three lobes of the TFIID complex. Antibodies detected (data not shown). to the extreme N-terminal end of TAF5 only labeled the C lobe To ascertain the effect of calcium concentration upon tran- of the complex, whereas antibodies directed toward either ProtA scriptional activity, in vitro transcription reactions were per- or HA epitopes fused to the C terminus of TAF5, labeled both formed by using HeLa nuclear extracts in the presence or the A and B lobes. The authors postulated that the TAF5 absence of 0.2 mM CaCl2. Transcriptional activity was strongly molecule might dimerize via some motif present at the N- inhibited at concentrations of calcium chloride well below what terminal end of the TAF5 subunit and act as an organizing would be necessary to stabilize the TAF5-NTD2 calcium depen- element facilitating the assembly of the other 13 TAF and TBP dent dimer (SI Fig. 9c). subunits of the complex, although the dimerization of TAF5 Taken together, the calcium induced dimer of hTAF5-NTD2 could not be proven at the EM study’s resolution. seemed unlikely to be biologically relevant. However, because Our results characterizing the N-terminal region of the human the immunomapping studies of the TFIID complex strongly TAF5 subunit confirm a role for TAF5 in organizing the TFIID hinted at the dimerization potential of the TAF5 molecule, we complex by dimerization and also reveal a surprising distant carried out additional analyses by using a longer fragment of the evolutionary relationship between the TAF5-NTD2 domain and TAF5 N terminus (residues 90–343) that included the additional RNA polymerase II CTD interacting factors. Our structural small conserved sequence motif of Ϸ35 aa with homology to the analysis of the TAF5-NTD2 fragment reveals a mostly helical LisH domain (TAF5-NTD1). LisH motifs have previously been domain with novel topology that contains conserved regions demonstrated to facilitate dimerization in other systems (29–31). which seem to be likely points of contact for TAF/TAF inter- Consequently, we were interested whether this motif would play actions. Surprisingly, in our crystals, we found that the TAF5- a similar role in the TAF5 protein. NTD2 domain exhibited higher order structure in the form of DLS measurements were carried out with the TAF5-NTD1- calcium ion mediated dimers. Our analysis suggested that cal- NTD2 fragment to estimate the native molecular weight in cium was not a key player in TAF5 assembly but instead that solution. DLS measurements for the longer fragment predicted TAF5 dimerizes via its NTD1 region (LisH homology). yielded RH ϭ 4.4 nm and an average molecular mass of 100 kDa It is intriguing that the TAF5-NTD2 domain utilizes a fold in the absence of any added Ca2ϩ or other divalent ions. Based related to the phosphopeptide binding motifs of the CTD upon a monomer weight of 28 kDa, the RH value suggested a interacting domains which are found in other RNA polymerase trimeric or tetrameric association state or extended dimer. II interacting factors. Even though the residues that would Subsequently, static light scattering (SLS) analysis coupled with contact the CTD phosphopeptide are not conserved between size exclusion chromatography (SEC) demonstrated that the TAF5 and CID factors, the TAF5 residues on the corresponding hTAF5-NTD1-NTD2 fragment molecular weight was 55.7 Ϯ 2.8 face of the domain seem to be well conserved among TAF5 kDa pointing to an extended dimer (Fig. 5). Additional analyses family members. It remains to be seen whether the TAF5-NTD2 by using DLS did not show any RH dependence on calcium ion domain might also recognize some non-CTD, posttranslationally concentration, nor could any calcium binding be detected by modified sequence via this interface. isothermal titration calorimetry. These results suggested that the What might the arrangement of the TAF5 subunit within the dimeric association of the longer TAF5 fragments were incom- TFIID complex look like? Fig. 6 shows a hypothetical model of

1192 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0610297104 Bhattacharya et al. Downloaded by guest on September 29, 2021 was carried out by incubating 50 ␮l of 5 mg/ml protein with TAF5 Dimer Axis Promega sequencing grade trypsin [enzyme:protein ratio, 1:100, C 1:500, 1:1,000 (wt/wt)] at room temperature for 60 min. Progress TAF5 NTD1 (91-124) of the partial digest was followed by SDS/PAGE and mass homology mode1 4/12 spectrometry on an Agilent LC-MSD Trap SL mass spectrom- 8/10 eter. Mass spectrometry confirmed the identity of the stable

TBP 18-kDa domain as a tryptic fragment mapping to amino acids TAF5 NTD2 194–340 of hTAF5 sequence, a highly conserved sequence motif (199-343) present in all TAF5 sequences. 3 The recombinant fragment of hTAF5-NTD2 aa 189–343 was 3 5 8/10 5 purified as mentioned above. The protein was concentrated to 11 4/12 11/13 7 6/9 TAF5 WD40 mg/ml for crystallization. A selenomethionine (SeMet) substi- Beta Propeller 6/9 11/1 3 1 tuted hTAF5-NTD fragment was also generated by growing E. (450-740) coli strain BL21 (DE3) in PASM-5052 medium (32) supple- B 150Å A mented with 100 ␮g/ml SeMet per liter of culture. SeMet Ϸ Fig. 6. Hypothetical model for the organization of the TAF5-NTD1 and incorporation at 80% efficiency was confirmed by mass spec- -NTD2 domains within TFIID. Fig. 6 is adapted from figure 6 of Leurent et al. trometry. A longer fragment encompassing residues 90–343 of (21), which roughly mapped the relative TAF positions by immunomapping human TAF5 was also purified for biophysical characterization studies. The TAF5-NTD1 domain has been modeled by using the LisH domain as above. from Lis1 (31), whereas the NTD2 domain has been determined in this study. The 30-Å linker between the C terminus of the LisH region of the NTD1 domain Crystallization and Data Collection. Crystals of the hTAF5-NTD and the N terminus of the NTD2 domain is a rough estimate of the distance and SeMet substituted hTAF5-NTD were obtained by the between the domains spanned by 70 residues in the human sequence and is hanging-drop vapor-diffusion method at 22°C by using equal presumably flexible. volumes of protein to well solution over a reservoir solution containing 0.1 M Tris⅐HCl (pH 7.1), 0.6 M calcium chloride, and the TAF5 chain. From the observed discrepancy between mo- 20% PEG 4000. lecular weights calculated from DLS and from the SLS studies For data collection, crystals of hTAF5-NTD were cryopro- of the hTAF5-NTD1-NTD2 fragment, it seems that the N tected by briefly soaking in mother liquor containing 12% BIOCHEMISTRY terminus of TAF5 forms a flexible, elongated dimer. Such a (volume/volume) ethylene glycol and flash-cooled in liquid model is consistent with our limited proteolysis experiments, nitrogen. MAD data were collected to 2.2-Å resolution at which failed to isolate a NTD1/NTD2 combined fragment. Based beamline 8.3.1 of the Advanced Light Source (ALS), Berkeley, upon this idea, we have built a rough model to give an estimate CA. The diffraction data were processed and scaled with the of the relative positions and size of the NTD1 and NTD2 and CCP4 set of crystallographic programs (33) by using shell scripts C-terminal domains within the TFIID complex. The LisH dimer- generated automatically by the program ELVES (34). The hTAF5-NTD crystallizes in space group P2 with typical unit cell ization domain structure from Lis1 was used to model residues 1 parameters of a ϭ 93.6 Å, b ϭ 61.8 Å, c ϭ 133.3 Å, ␤ ϭ 105.2°. 91–124 of the TAF5 sequence. Allowing 30 Å between the NTD1 Each asymmetric unit contained four dimers of hTAF5-NTD2 and NTD2 domains, the ␣-helical NTD2 domain seems well domain. positioned to occupy junctions between the A, B, and C lobes of the TFIID structure. The flexible nature of the TAF5 molecule Structure Determination and Refinement. Two MAD data sets were may allow the three lobes of the TFIID complex to change their collected to 2.2 Å at the wavelengths of 1.019 Å and 0.9797 Å relative position to facilitate core promoter recognition. near the selenium absorption edge. By using real space vector Which TAFs might interact with the NTD2 region of TAF5 verification (RSPS) (35), 12 of the 32 expected selenium sites in remains an unanswered question; however, based upon the the asymmetric unit were initially located and used to generate low-resolution models from the EM studies it is possible that preliminary phases calculated to 3.0 Å. Iterative cycles of solvent interaction partners for this portion of TAF5 might be asym- flattening, histogram matching phase refinement, and anoma- metric and could include a portion of TAF1, TAF2, TAF3, or lous difference Fourier analysis allowed the identification of an even other histone fold TAFs. As more information becomes additional 17 selenium sites. Phases were generated from the 29 available describing the molecular organization of TFIID, it will selenium sites by using MLPHARE (36) followed by solvent be very interesting to see how the readily apparent interaction flattening and histogram matching with DM (37) [density mod- surfaces present in the symmetric TAF5 dimer can specifically ification (DM) overall figure of merit (FOM) ϭ 0.638–3.0 Å]. interact in an asymmetric manner. Transformation matrices based upon real space correlation function searches carried out with O revealed eight independent Materials and Methods copies of TAF5-NTD2. NCS averaging and density modification Protein Expression and Purification. Deletion fragments of differ- with DM (37) allowed extension of the phases to the limit of the ent sizes were generated for hTAF5 by PCR to produce GST data (2.2 Å) with a substantially improved figure of merit (DM fusion expression clones by using the Invitrogen (Carlsbad, CA) overall FOM ϭ 0.83–2.2 Å). Iterative model building and Gateway system. Expression constructs were overexpressed in refinement was performed by using O (38) and REFMAC (39). Escherichia coli BL21 (DE3) cells at 30°C by using autoinduction Data collection and refinement statistics are summarized in SI (32) and affinity purified over glutathione-Sepharose. The N- Table 1. Figures were generated by using PYMOL (40). The terminal fragment (amino acids 1–500) was found to be highly stereochemical quality of the final model was assessed by Pro- soluble, yielding a 5 mg/liter fusion protein. After affinity check. Of the residues, 95.5% are in the allowed region of the purification, the GST tag was removed by TEV protease. The Ramachandran plot, and 4.5% of the residues are in the addi- free hTAF5 fragment was separated from the cleaved GST by tionally allowed region. The PDB accession code is 2NXP. anion-exchange column chromatography. DLS. DLS measurements were performed by using a DynaPro Limited Proteolysis and Mass Spectrometry. Limited proteolysis of instrument (Protein Solutions) with Ca2ϩ concentration varied the purified N-terminal fragment (amino acids 1–500) of hTAF5 from 5 to 300 mM. hTAF5-NTD2 at 5 mg/ml and 10 mg/ml was

Bhattacharya et al. PNAS ͉ January 23, 2007 ͉ vol. 104 ͉ no. 4 ͉ 1193 Downloaded by guest on September 29, 2021 centrifuged at 4°C for 20 min at a speed of 13,200 ϫ g. Fifteen In Vitro Transcription. In vitro transcription was carried by using microliters of protein solution at each concentration of calcium HeLa cell nuclear extracts and a Sp1-TATA box template ion was transferred to a quartz cuvette and allowed to equilibrate plasmid, which contained three GC boxes and a TATA box to 4°C before taking an average of twenty readings. Regular- from the adenovirus E1B promoter upstream of the chloram- ization fitting was used to fit the distribution of molecular species phenicol acetyltransferase (CAT) transcript. Transcription of differing hydrodynamic radii (RH) within the solution. Pre- reactions were performed by using two different conditions, dicted molecular weights based upon hydrodynamic radii were either with standard conditions as described (41) or in the plotted versus calcium ion concentration. same mixture but supplemented with 0.2 mM CaCl2. The transcripts were detected by primer extension by using a SLS. SLS measurements were performed to estimate the Mr 32P-labeled CAT primer. independent of shape. Protein was injected onto a Shodex size exclusion chromatography column equilibrated with PBS buffer We thank Drs. P. Anthony Weil, Brian Matthews, Richard Brennan, coupled to a light scattering detector (DAWN-Heleos) and Dagmar Truckses, Xiaoping Wang, and Yanming Feng for thoughtful refractive index detector (Optilab rEX) (Wyatt Technology). review of, and comments on, our manuscript. We also thank Drs. James Data were collected at every 0.5 s at a flow rate of 1 ml/min. Data Holton and George Meig of Beamline 8.3.1 at the Advanced Light analysis was carried out by the program ASTRA, yielding Source, Berkeley, CA. This work was supported by National Institutes of molecular mass and mass distribution of the sample. Health Grant GM069769 and the Burroughs Welcome Fund.

1. Lee TI, Causton HC, Holstege FC, Shen WC, Hannett N, Jennings EG, 23. Tao Y, Guermah M, Martinez E, Oelgeschlager T, Hasegawa S, Takada R, Winston F, Green MR, Young RA (2000) Nature 405:701–704. Yamamoto T, Horikoshi M, Roeder RG (1997) J Biol Chem 272:6714– 2. Tora L (2002) Genes Dev 16:673–675. 6721. 3. Chalkley GE, Verrijzer CP (1999) EMBO J 18:4835–4845. 24. Holm L, Sander C (1993) J Mol Biol 233:123–138. 4. Verrijzer CP, Yokomori K, Chen JL, Tjian R (1994) Science 264:933–941. 25. Krissinel E, Henrick K (2004) Acta Crystallogr D 60:2256–2268. 5. Verrijzer CP, Chen JL, Yokomori K, Tjian R (1995) Cell 81:1115–1125. 26. Meinhart A, Cramer P (2004) Nature 430:223–226. 6. Burke TW, Kadonaga JT (1997) Genes Dev 11:3020–3031. 27. Laskowski RA, Watson JD, Thornton JM (2005) Nucleic Acids Res 33:W89– 7. Imhof A, Yang XJ, Ogryzko VV, Nakatani Y, Wolffe AP, Ge H (1997) Curr W93. Biol 7:689–692. 28. Gouet P, Courcelle E, Stuart DI, Metoz F (1999) Bioinformatics 15:305–308. 8. Dikstein R, Ruppert S, Tjian R (1996) Cell 84:781–790. 29. Emes RD, Ponting CP (2001) Hum Mol Genet 10:2813–2820. 9. Mizzen CA, Yang XJ, Kokubo T, Brownell JE, Bannister AJ, Owen-Hughes T, 30. Gerlitz G, Darhin E, Giorgio G, Franco B, Reiner O (2005) Cell Cycle Workman J, Wang L, Berger SL, Kouzarides T, et al. (1996) Cell 87:1261–1270. 4:1632–1640. 10. Kouzarides T (2000) EMBO J 19:1176–1179. 31. Kim MH, Cooper DR, Oleksy A, Devedjiev Y, Derewenda U, Reiner O, 11. Jacobson RH, Ladurner AG, King DS, Tjian R (2000) Science 288:1422–1425. Otlewski J, Derewenda ZS (2004) Structure (London) 12:987–998. 12. Roeder RG (1996) Trends Biochem Sci 21:327–335. 32. Studier FW (2005) Protein Expr Purif 41:207–234. 13. Oelgeschlager T, Chiang CM, Roeder RG (1996) Nature 382:735–738. 33. Project CC (1994) Acta Crystallogr D 50:760–763. 14. Bell B, Tora L (1999) Exp Cell Res 246:11–19. 34. Holton J, Alber T (2004) Proc Natl Acad Sci USA 101:1537–1542. 15. Struhl K, Moqtaderi Z (1998) Cell 94:1–4. 16. Brown CE, Lechner T, Howe L, Workman JL (2000) Trends Biochem Sci 25:15–19. 35. Knight SD (2000) Acta Crystallogr D 56:42–47. 17. Chen BS, Hampsey M (2002) Curr Biol 12:R620–R622. 36. Otwinowski, Z. (1991) Proc CCP4 Study Weekend, eds Wolf W, Evans PR, 18. Albright SR, Tjian R (2000) Gene 242:1–13. Leslie AGW (Daresbury Laboratory, Warrington, UK). 19. Grob P, Cruse MJ, Inouye C, Peris M, Penczek PA, Tjian R, Nogales E (2006) 37. Cowtan KD, Main P (1996) Acta Crystallogr D 52:43–48. Structure (London) 14:511–520. 38. Jones TA, Zou JY, Cowan SW, Kjeldgaard M (1991) Acta Crystallogr A 20. Leurent C, Sanders S, Ruhlmann C, Mallouh V, Weil PA, Kirschner DB, Tora 47:110–119. L, Schultz P (2002) EMBO J 21:3424–3433. 39. Murshudov GN, Vagin AA, Dodson EJ (1997) Acta Crystallogr D 53:240– 21. Leurent C, Sanders SL, Demeny MA, Garbett KA, Ruhlmann C, Weil PA, 255. Tora L, Schultz P (2004) EMBO J 23:719–727. 40. DeLano WL (2002) The PyMol Molecular Graphics System (DeLano Scientific, 22. Dubrovskaya V, Lavigne AC, Davidson I, Acker J, Staub A, Tora L (1996) San Carlos, CA). EMBO J 15:3702–3712. 41. Tokusumi Y, Zhou S, Takada S (2004) J Virol 78:10856–10864.

1194 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0610297104 Bhattacharya et al. Downloaded by guest on September 29, 2021