The THAP domain of THAP1 is a large C2CH module with zinc-dependent sequence-specific DNA-binding activity

Thomas Clouaire*†, Myriam Roussigne*‡, Vincent Ecochard*‡, Catherine Mathe*, Franc¸ois Amalric*, and Jean-Philippe Girard*§

*Laboratoire de Biologie Vasculaire, Equipe Labellise´e ‘‘La Ligue 2003,’’ Institut de Pharmacologie et de Biologie Structurale, Centre National de la Recherche Scientifique Unite´Mixte de Recherche 5089, 205 Route de Narbonne, 31077 Toulouse, France; and †Endocube, Prologue Biotech, Rue Pierre et Marie Curie, BP700 31319 Labe`ge Cedex, France

Edited by H. Robert Horvitz, Massachusetts Institute of Technology, Cambridge, MA, and approved March 23, 2005 (received for review September 16, 2004) We have recently described an evolutionarily conserved (6), and 9 other human (1). Both THAP1 and DAP4͞ motif, designated the THAP domain, which defines a previously THAP0 appear to function in nuclear apoptotic pathways. THAP1 uncharacterized family of cellular factors (THAP proteins). The interacts and colocalizes within promyelocytic leukemia nuclear THAP domain exhibits similarities to the site-specific DNA-binding bodies with the proapoptotic leucine-zipper protein Par-4 and domain of Drosophila P element transposase, including a putative potentiates both serum withdrawal- and TNF␣-induced apoptosis metal-coordinating C2CH signature (CX2–4CX35–53CX2H). In this ar- (2). DAP4͞THAP0 was initially identified in a screen for ticle, we report a comprehensive list of Ϸ100 distinct THAP proteins involved in IFN␥-induced apoptosis in HeLa cells (4) and, more in model animal organisms, including human nuclear proapoptotic recently, as a nuclear partner of MST1 (5), a proapoptotic kinase factors THAP1 and DAP4͞THAP0, transcriptional repressor THAP7, that phosphorylates histone H2B during apoptosis (7). Recently, zebrafish orthologue of cell cycle regulator E2F6, and Caenorhab- THAP7 has been shown to function as a transcriptional repressor ditis elegans chromatin-associated protein HIM-17 and cell-cycle that binds to hypoacetylated histone H4 tails and may also induce regulators LIN-36 and LIN-15B. In addition, we demonstrate the the hypoacetylation of histone H3 by recruiting corepressor NcoR biochemical function of the THAP domain as a zinc-dependent and histone deacetylase HDAC3 to chromatin (6). sequence-specific DNA-binding domain belonging to the zinc- Although orthologous relationships with the human THAP finger superfamily. In vitro binding-site selection allowed us to proteins were not obvious, analysis of the D. melanogaster (1) and identify an 11-nucleotide consensus DNA-binding sequence spe- Caenorhabditis elegans (8) THAP families revealed several inter- cifically recognized by the THAP domain of human THAP1. Muta- esting features. Two of the predicted Drosophila THAP proteins tions of single nucleotide positions in this sequence abrogated were found to contain more than one THAP domain, the double- THAP-domain binding. Experiments with the zinc chelator 1,10-o- THAP protein CG14860 and the multi-THAP protein CG10631, phenanthroline revealed that the THAP domain is a zinc-dependent which is predicted to contain 27 THAP domains occurring as DNA-binding domain. Site-directed mutagenesis of single cysteine internal repeats (1). A third multi-THAP protein, designated or histidine residues supported a role for the C2CH motif in zinc HIM-17, has recently been described in C. elegans and plays a coordination and DNA-binding activity. The four other conserved critical role in segregation during meiosis by linking residues (P, W, F, and P), which define the THAP consensus chromatin modification and competence for initiation of meiotic sequence, were also found to be required for DNA binding. recombination by double-strand breaks (8). A substantial fraction Together with previous genetic data obtained in C. elegans, our of HIM-17 was found to comprise six internal repeats of the THAP

results suggest that cellular THAP proteins may function as zinc- domain (Fig. 1), including two divergent C2CH modules lacking the GENETICS dependent sequence-specific DNA-binding factors with roles in invariant W, F, or P residues. Similar divergent or consensus THAP proliferation, apoptosis, cell cycle, chromosome segregation, chro- domains were also identified in other C. elegans proteins (8), matin modification, and transcriptional regulation. including LIN-36, LIN-15A, and LIN-15B, three proteins initially characterized for their role in vulval development (9–12). Remark- protein motif ͉ zinc finger ͉ Caenorhabditis elegans ͉ cell cycle ably, LIN-36, LIN-15A, LIN-15B, and HIM-17 have all been found to interact genetically with LIN-35͞Rb, the sole C. elegans retino- e have recently described an evolutionarily conserved Ϸ90- blastoma homolog (8, 9, 12, 13). In addition, LIN-36 and LIN-15B Wresidue protein motif, designated the THAP domain, which have been found to function as inhibitors of the G1-to-S-phase defines a previously uncharacterized family of cellular factors, the cell-cycle transition (13), and LIN-36 has also been shown to THAP proteins (1, 2). This motif is characterized by a putative function redundantly with FZR-1, the C. elegans homolog of APC metal-coordinating C2CH module (CX2–4CX35–53CX2H) and four regulator Cdh1, in the global control of cell proliferation (14). additional invariant residues, P26, W36, F58, and P78, in human Although the THAP motif has been well defined and several THAP1 (Fig. 1). The THAP domain was found to be restricted to THAP proteins have been functionally characterized, the bio- animals and is present in both vertebrates (from zebrafish to chemical role of the THAP domain in cellular THAP proteins humans) and invertebrates (e.g., fly and worm) (1). Interestingly, has not yet been described. In this study, using the THAP the THAP-motif signature was identified (1) in the site-specific domain of human THAP1 as a prototype, we show that the DNA-binding domain of Drosophila melanogaster P element trans- THAP domain is a zinc-dependent sequence-specific DNA- posase (3). This finding suggested that the THAP domain may constitute an example of a DNA-binding domain shared between cellular proteins and transposases from mobile genomic parasites This paper was submitted directly (Track II) to the PNAS office. and that the THAP proteins may correspond to a previously Abbreviations: RRL, rabbit reticulocyte lysate; SELEX, systematic evolution of ligands by uncharacterized family of cellular DNA-binding proteins (1). exponential enrichment; THABS, THAP-domain-binding sequence. In humans, the THAP family comprises 12 distinct members, ‡M.R. and V.E. contributed equally to this work. including nuclear proapoptotic factor THAP1(2),death-associated §To whom correspondence should be addressed. E-mail: [email protected]. protein DAP4͞THAP0 (4, 5), transcriptional repressor THAP7 © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0406882102 PNAS ͉ May 10, 2005 ͉ vol. 102 ͉ no. 19 ͉ 6907–6912 Downloaded by guest on September 27, 2021 Fig. 1. THAP proteins in model animal organisms. (A) Identification of a consensus THAP domain in the zebrafish orthologue of cell-cycle transcription factor E2F6. Shown is CLUSTALW multiple alignment of zebrafish E2F6 with human E2F6 and the THAP domain of human THAP1. Conserved residues are boxed. Black boxes indicate identical residues, whereas boxes shaded in gray show similar amino acids. The consensus THAP motif, defined by the C2CH signature and four other invariant residues (P26, W36, F58, and P78 in human THAP1), is shown above the alignment. (B) Primary structures of C. elegans THAP proteins with known functions. THAP domains are shown in black. Divergent THAP domains containing the C2CH signature are shown in gray. Known protein motifs are indicated. The transcriptional-corepressor function of CtBP has not yet been confirmed in C. elegans.

binding module, and we demonstrate that its DNA-binding Systematic Evolution of Ligands by Exponential Enrichment (SELEX) activity absolutely requires the C2CH signature and the four Assay. DNA-binding specificity of the THAP domain from other conserved residues (P, W, F, and P) of the THAP motif. human THAP1 was determined by SELEX, essentially as de- scribed in ref. 15. The following 62-bp oligonucleotide was Materials and Methods synthesized: 5Ј-TGGGCACTATTTATATCAACN25AATGT- Plasmid Constructions. The THAP domain of human THAP1 CGTTGGTGGC CC-3Ј (where N is any nucleotide) along with (amino acids 1–90) was amplified by PCR using as a template primers complementary to each end, F-HindIII 5Ј-ACCG- pGADT7-THAP1 (2) with primers 5Ј-GCGCATATGGTG- CAAGCTTGGGCACTATTTATATCAAC-3Ј and R-XbaI 5Ј- CAGTCCTGCTCCGCCTACGGC-3Ј and 5Ј-GCGCTC- GGTCTAGAGGGCCACCAACGCATT-3Ј. A pool of double- GAGTTTCTTGTCATGTGGCTCAGTACAAAG-3Ј. The stranded 80-bp degenerate oligonucleotides was amplified by PCR product was digested with NdeI and XhoI and cloned PCR using the F-HindIII and R-XbaI primers. Recombinant in-frame with a carboxyl-terminal His tag into plasmid pET-21c THAP domain (Ϸ250 ng) was incubated with Ni-NTA magnetic (Novagen). Similarly, the THAP1 ORF was amplified by PCR beads (Qiagen) in NT2 buffer [20 mM Tris⅐HCl (pH 7.5)͞100 to generate pCDNA3-THAP1 or pCDNA3.1-THAP1 mM NaCl͞0.05% Nonidet P-40] for 30 min at 4°C, and the beads (THAP1-Myc). The THAP1-C5A, THAP1-C10A, THAP1- were washed twice with 500 ␮l of NT2 buffer. The immobilized C54A, THAP1-H57A, THAP1-P26A, THAP1-W36A, THAP domain was incubated with the random pool of oligo- THAP1-F58A, and THAP1-P78A single-point mutants were nucleotides (2–5 ␮g) in 100 ␮l of binding buffer [20 mM Tris⅐HCl obtained by PCR using specific primers containing the corre- (pH 7.50)͞100 mM NaCl͞0.05% Nonidet P-40͞0.5 mM EDTA͞ sponding mutations and cloned as EcoRI–XbaI fragments in 100 ␮g/ml BSA͞20–50 ␮g of poly(dI-dC)] for 10 min at room pCDNA3 expression vector. temperature. The beads were then washed six times with 500 ␮l of NT2 buffer, the protein–DNA complexes were extracted with Protein Expression and Purification. pET-21c-THAP-domain re- phenol͞chloroform, and DNA was precipitated with ethanol combinant protein was produced in Escherichia coli strain BL-21 with glycogen as a carrier. About 20% of the recovered DNA was pLysS according to the supplier’s instructions (Novagen). The amplified by PCR (15–20 cycles) and used for the next round of cells were lysed by sonication in buffer A [50 mM sodium selection. After 12 rounds of selection by the THAP domain, phosphate (pH 7.5)͞300 mM NaCl͞0.1% 2-mercaptoethanol͞10 selected double-stranded oligonucleotides were digested with mM imidazole], and the lysate was cleared by centrifugation. The XbaI and HindIII, cloned into the pBluescript II KS E. coli vector supernatant was loaded onto a Ni nitrilotriacetate (NTA) aga- (Stratagene), and sequenced. rose column (Amersham Pharmacia Biotech) equilibrated in buffer A. After washing, the protein was eluted with a linear EMSA. EMSAs were performed with purified recombinant gradient of imidazole in buffer A. Fractions containing the THAP domain of human THAP1, in vitro translated THAP1, or THAP domain were pooled, concentrated with YM-3 filter THAP1-Myc proteins synthesized in rabbit reticulocyte lysate devices (Amicon), and applied to a Superdex 75 gel-filtration (RRL) with the TNT-T7 kit (Promega). The THAP-domain- column (Amersham Pharmacia Biotech) equilibrated in buffer B binding sequence (THABS) probes, 25-bp (5Ј-AGCAAGTA- [50 mM Tris⅐HCl (pH 7.5)͞150 mM NaCl͞1 mM DTT). Frac- AGGGCAACTACTTCAT-3Ј) and 36-bp (5Ј-TATCAACTGT- tions containing the THAP domain were pooled and stored at GGGCAAACTACGGGCAACAGGTAATG-3Ј), were used in 4°C or frozen at Ϫ80°C in 20% glycerol. The purity of the sample the assays. After annealing of the complementary oligonucleo- was assessed by SDS͞PAGE, and the protein concentration was tides, double-stranded probes were purified on 12% polyacryl- determined with Bradford protein assay (Bio-Rad). amide gels, 32P-end-labeled, and quantified by Cerenkov count-

6908 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0406882102 Clouaire et al. Downloaded by guest on September 27, 2021 Table 1. The THAP family in model animal organisms members in the tetraploid frog X. laevis, including two multi-THAP proteins predicted to contain two THAP domains at their amino THAP Ͼ Model organism proteins Putative orthologues termini. We also found 30 distinct THAP family members (32 distinct genes) in the zebrafish D. rerio, likely because of the H. sapiens 12 THAP 0–THAP 11 large-scale duplication that occurred in this species (19). M. musculus 7 THAP 0, 1, 2, 3, 4, 7, and 11 Interestingly, one of these THAP proteins (locus CAH68904) was G. gallus (chicken) 6 THAP 0, 1, 4, 5, and 7 found to correspond to the zebrafish orthologue of cell-cycle X. laevis 23 THAP 1, 4, 7, and 11 transcription factor E2F6 (Fig. 1A). The THAP–E2F6 fusion gene D. rerio (zebrafish) 32 THAP 0, 1, 7, 9, and 11 was validated by several overlapping ESTs (GenBank accession nos. D. melanogaster 9 CK029739, CF348112, and CO813444), and a similar THAP-E2F6 C. elegans 8 fusion protein was also identified in other fish species, including Tetraodon nigroviridis (accession no. CAG01476) and Gasterosteus See Table 2 and Fig. 6 for complete THAP-domain sequences, alignment, and accession numbers. aculaetus (accession no. CD497753, partial cDNA). Analysis of the C. elegans THAP family (Fig. 1B) gave additional interesting clues to the biological roles of THAP ing. Purified THAP domain (Ϸ20 ng) was incubated with 30,000 proteins in vivo. In addition to the recently described chromatin- cpm of the appropriate probe (Ϸ2 ng). Binding reactions were associated protein HIM-17 and cell-cycle regulators LIN-15B carried out for 10 min at room temperature in 20 ␮l of binding and LIN-36 (8, 13), we identified five other proteins exhibiting buffer [20 mM Tris⅐HCl (pH 7.5)͞100 mM KCl͞0.1% Nonidet consensus THAP motifs in C. elegans, including CDC-14B, an P-40͞100 ␮g/ml BSA͞2.5 mM DTT͞5% glycerol͞10 ␮g/ml isoform of cell-cycle regulator tyrosine phosphatase CDC-14 poly(dI-dC)]. For in vitro translated proteins, 3 ␮lofRRLs (20), which contains both consensus and divergent THAP motifs expressing THAP1 or THAP1-Myc were incubated in 20 ␮lof at its carboxyl terminus. The C. elegans THAP family also binding buffer containing 150 mM KCl, 50 ␮g͞ml of poly(dI- included three previously uncharacterized proteins (Table 2 and dC), and 50 ␮g͞ml salmon sperm DNA. Electrophoresis was Fig. 6) as well as the orthologue of CtBP, a well conserved performed on 6% or 8% (29:1) polyacrylamide gels containing transcriptional corepressor for homeodomain, nuclear hormone 5% glycerol. The gels were run in 0.25ϫ TBE [1ϫ TBE ϭ 90 mM receptor, and C2H2 zinc-finger proteins (21), which does not Tris͞64.6 mM boric acid͞2.5 mM EDTA (pH 8.3)] at 150 V at exhibit a THAP domain in other animal species. Finally, LIN- 4°C, dried, and exposed on a PhosphorImager screen (Molecular 15A and seven other predicted C. elegans proteins that contained Dynamics) or autoradiographed. For competitive EMSA, unla- divergent THAP domains similar to those found in HIM-17, beled oligonucleotides were added to the reaction mixture just LIN-15B and CDC-14B, were also identified (Table 2 and Fig. 1). before the addition of the probe. Supershift experiments were performed by using 1 ␮g of anti-Myc (Sigma) or isotype-control The THAP Domain Is a Sequence-Specific DNA-Binding Domain. To determine whether the THAP domain possesses sequence-specific mouse monoclonal antibodies. For metal-chelation experiments, DNA-binding activity and to identify a consensus target-binding the proteins were preincubated with EDTA or 1,10-o- site, we used a PCR-based approach, SELEX (15). For that phenanthroline (Sigma) for 20 min at room temperature. Metal purpose, we used as a prototype the THAP domain of human salts (Sigma), as indicated, were added at 100 ␮M final concen- THAP1, recently characterized in our laboratory (2). We first tration. The reactions were allowed to equilibrate for 10 min at expressed and purified the THAP domain in E. coli, then used this room temperature before the addition of the probe. protein to select a preferential binding site from a degenerate pool of 25-bp oligonucleotides flanked by conserved sequences to facil- In Silico Sequence Analysis. GenBank nucleotide, protein, EST, and itate amplification and cloning. After 4, 8, and 12 cycles, we genome databases at the National Center for Biotechnology Infor- ͞ evaluated the DNA–protein interaction by EMSA (Fig. 2A). High- mation (www.ncbi.nlm.nih.gov ) were searched with both the nu- affinity binding sequences selected after 12 cycles were cloned and GENETICS cleotide and amino acid sequences of human THAPproteins (1), by sequenced. Analysis of 25 selected sequences with MEME identified using the programs BLASTN, TBLASTN, and BLASTP (16). CLUSTALW an 11-bp consensus motif (Fig. 2B) that was found twice in most (17) was used to carry out the alignment of zebrafish E2F6 (locus ࿝ selected sequences, as either direct or inverted repeats. The con- CAH68904) with human E2F6 (locus NP 001943) and the THAP sensus sequence, which we refer to as THABS, comprised a core ࿝ domain of human THAP1 (locus NP 060575). The oligonucleotide ‘‘GGCA’’ motif. Competitive EMSA experiments with unlabeled sequences recovered by SELEX were analyzed by using the motif THABS or mutant THABS oligonucleotides demonstrated that discovery program MEME (Multiple Em for Motif Elicitation; purified recombinant THAP domain binds the THABS target ͞͞ ͞ ͞ ͞ http: meme.sdsc.edu meme website meme.html), and the logo sequence specifically (Fig. 2C). These results were confirmed and representation of the THABS consensus motif was generated with further extended by using in vitro translated full-length THAP1 WEBLOGO (18). protein, which provided an independent source of THAP domain. Recognition of the THABS by the THAP domain was observed in Results the context of the whole THAP1 protein (Fig. 2D). This binding A Comprehensive List of THAP Proteins in Model Organisms. We was specific, and the DNA–protein complex formed between the performed extensive searches of sequence databases in an effort to THABS and a Myc-tagged THAP1 protein was supershifted by an extend our previous work (1) and build up a comprehensive list of anti-Myc antibody (Fig. 2D) but not by an unrelated control proteins containing THAP domains in model organisms. We antibody (data not shown). identified Ϸ100 distinct THAP-protein sequences in model animal organisms Homo sapiens, Mus musculus, Gallus gallus (chicken), DNA-Binding-Site Specificity of the THAP Domain of Human THAP1. To Xenopus laevis, Danio rerio (zebrafish), D. melanogaster, and C. further characterize the DNA-binding-site specificity of the elegans (Table 1 and see Table 2 and Fig. 6, which are published as THAP domain, we performed scanning mutagenesis of a supporting information on the PNAS web site). Surprisingly, we THABS consensus oligonucleotide obtained from the SELEX found that five of the human THAP genes (THAP5, THAP6, experiment. Labeled oligonucleotides bearing substitutions in THAP8, THAP9, and THAP10) did not appear to have orthologues each of the bases comprising the THABS were incubated with in the M. musculus genome, which encoded only seven distinct recombinant THAP domain. These analyses revealed that the THAP proteins. In contrast, we identified 23 distinct THAP family core GGCA motif of the THABS is essential for recognition by

Clouaire et al. PNAS ͉ May 10, 2005 ͉ vol. 102 ͉ no. 19 ͉ 6909 Downloaded by guest on September 27, 2021 Fig. 3. Mutations of single nucleotide positions in the THABS abrogate recognition by the THAP domain of human THAP1. (A) The GGCA core motif of the THABS is required for THAP-domain–THABS interaction. EMSAs were performed by using recombinant THAP domain and labeled oligonucleotides bearing mutations in the GGCA core motif of the THABS. (B) Scanning mu- tagenesis of the THABS sequence reveals that bases upstream of the GGCA core motif modulate the strength and affinity of THAP-domain–THABS inter- action. EMSAs were performed by using recombinant THAP-domain and DNA targets bearing single-point mutations in the THABS sequence.

ally in good agreement with the percentage occurrence of each base at each position in the sequences identified by SELEX (Fig. 2B). For instance, very low binding was observed with mutant oligonucleotides bearing T or G in positions 4 or 5 of the Fig. 2. The THAP domain is a sequence-specific DNA-binding domain. (A) Identification of THAP-domain DNA-target sequence by SELEX. DNA recov- THABS, respectively (Fig. 3B), and these bases never occurred ered after 0, 4, 8, and 12 rounds of selection was labeled and incubated with at these positions in the SELEX sequences (Fig. 2B). A notable increasing amounts of recombinant THAP domain (1, 6, and 60 ng, respec- exception was the T at position 3, which was found in all of the tively). Resulting protein–DNA complexes were analyzed in EMSA. (B) Identi- oligonucleotides obtained from the SELEX experiments but did fication of a consensus THABS. The oligonucleotide sequences recovered after not appear to be absolutely required for THAP-domain binding. 12 rounds of selection were analyzed by using the motif-discovery program MEME. The position-specific probability matrix returned by MEME is given. (C) The THAP Domain Is a Zinc-Dependent DNA-Binding Domain. A Specificity of THAP-domain–THABS interaction. Recombinant THAP domain notable feature of the THAP domain is the C2CH motif was incubated with THABS probe in the absence or presence of increasing amounts (50-, 150-, and 250-fold molar excess) of either THABS (AGTA- (CX2–4CX35–53CX2H), which may constitute a metal-coordinat- AGGGCAA) or mutTHABS (AGTAATTTCAA) unlabeled competitor. The pro- ing module, as in previously described zinc fingers. However, to tein–DNA complexes were analyzed in EMSA. (D) Binding of full-length THAP1 our knowledge, a large C2CH module such as the one found in to the THABS. In vitro translated THAP1 or THAP1-Myc was incubated with the THAP domain (up to 53 residues of spacing between the C2 labeled THABS probe in the absence or presence of a 200-fold molar excess of and CH coordinating residues) has not previously been described either THABS or mutTHABS unlabeled competitor, and protein–DNA com- in the zinc-finger superfamily. Therefore, we investigated plexes were analyzed in EMSA. For the supershift experiment, THAP1-Myc was whether the THAP domain is a zinc-dependent DNA-binding incubated with the THABS probe in the presence of anti-Myc mAb. RRL, THABS probe incubated with unprogrammed RRL; black arrowhead, THAP1–THABS domain. EMSA was performed after incubation of the THAP DNA complex; white arrowhead, THAP1-Myc–THABS DNA complex; *, non- domain with metal-chelating agents 1,10-o-phenanthroline and specific complex. EDTA. Increasing amounts of 1,10-o-phenanthroline (up to 5 mM) and, to a lesser extent, EDTA (up to 50 mM) gradually inhibited DNA-binding activity (Fig. 4A), indicating that the the THAP domain (Fig. 3A). In addition,aGoraTatposition THAP domain requires divalent metal ions for its functional 6 was also strictly required (Fig. 3B). In contrast, single base activity. To examine the role of zinc in THAP-domain activity, substitutions at other nucleotide positions upstream of the core we added back zinc, calcium, magnesium, or iron to binding motif did not abrogate DNA-binding but, rather, were found to reactions after preincubation of the THAP domain with 5 mM modulate the strength and affinity of THAP-domain–THABS 1,10-o-phenanthroline. Significant THAP-domain DNA- interaction (Fig. 3B). Scanning mutagenesis results were gener- binding activity was restored by the addition of zinc but not

6910 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0406882102 Clouaire et al. Downloaded by guest on September 27, 2021 phenanthroline (Fig. 5B), supporting an important role for the C2CH motif in zinc coordination and DNA-binding activity of the THAP domain. The absolute conservation in all THAP domains of four additional residues (P26, W36, F58, and P78 in human THAP1) suggested a possible requirement for these residues in DNA-binding function. Four additional THAP1 mutants (P26A, W36A, F58A, and P78A) were therefore gen- erated and expressed in RRL (Fig. 5A). These mutants lost DNA-binding activity and failed to recognize the THABS in EMSAs (Fig. 5C). These results favor an important role for both Fig. 4. The THAP domain is a zinc-dependent DNA-binding domain. (A) the C2CH motif and the four other conserved residues of the Inhibition of THAP-domain DNA-binding activity by metal-chelating agents EDTA and 1,10-o-phenanthroline. THAP-domain–THABS DNA complexes THAP domain in DNA-binding activity. were analyzed by EMSA. Lane 0, THABS probe alone; UT, THABS probe incubated with untreated THAP domain; MeOH, methanol vehicle alone. (B) Discussion Role of zinc in THAP-domain DNA-binding activity. Recombinant THAP do- The THAP domain is an evolutionarily conserved protein motif main was treated with 5 mM 1,10-o-phenanthroline, and the chloride of Zn2ϩ, restricted to animals (1), which defines a previously uncharacterized Ca2ϩ,Mg2ϩ,orFe2ϩ was subsequently added to the binding reactions before family of cellular factors, the THAP proteins, with Ͼ100 distinct analysis of THAP-domain–DNA complexes (arrowhead) by EMSA. members in the animal kingdom (Tables 1 and 2 and Fig. 6). In this article, we demonstrate the biochemical function of the THAP domain in cellular THAP proteins as a zinc-dependent sequence- calcium, magnesium, or iron (Fig. 4B), demonstrating that the specific DNA-binding domain belonging to the zinc-finger super- THAP domain possesses zinc-dependent DNA-binding activity. family. We reported in ref. 1 that the site-specific DNA-binding domain of Drosophila P element transposase (3) corresponded to a The C2CH Motif and the Four Other Conserved Residues of the THAP consensus THAP domain. Our present data show that, although Domain Are Required for Sequence-Specific DNA Binding. To inves- the THAP domains of human THAP1 and P element transposase tigate the role of the putative metal-coordinating residues of the share Ͻ25% sequence identity, the biochemical function of the C2CH module in THAP-domain functional activity, we gener- domain, sequence-specific DNA binding, has been conserved be- ated four single-point mutants in the C2CH signature. The three tween the two proteins. These results emphasize the evolutionary cysteines and the single histidine residues were replaced by and functional relationships between cellular THAP proteins and alanines to generate THAP1 mutants C5A, C10A, C54A, and P element transposase and are in agreement with the previous H57A, respectively. Mutagenesis of the putative metal- observation that one of the human THAP proteins, THAP9, coordinating residues of the C2CH module did not appear to appears to be an ancient descendant of P element transposase (1, affect translation or stability of the mutant proteins because in 22). The THAP domain is therefore another example of a DNA- vitro translation of the four mutant proteins in RRL revealed binding domain shared between cellular proteins and transposases expression levels similar to those of the wild-type THAP1 from mobile genomic parasites. Previous examples include the protein (Fig. 5A). However, the four THAP1 mutants lost DNA-binding domain of centromere protein CENP-B (23), which DNA-binding activity and failed to recognize the THABS in is homologous to that of Drosophila pogo transposase, human tigger EMSAs (Fig. 5B). Mutagenesis of single cysteine or histidine pogo-like transposases, and the BED finger (24), an atypical residues in the C2CH motif had the same effect as treatment of zinc-finger DNA-binding domain found in both cellular chromatin- the wild-type THAP1 protein with the zinc chelator 1,10-o- boundary element-binding proteins BEAF͞DREF and AC1͞ Hobo-like transposases from animals, plants, and fungi. We demonstrate that the DNA-binding activity of the THAP domain of human THAP1 is zinc-dependent and that the four GENETICS putative metal-coordinating residues of the C2CH module are essential for functional activity. This DNA binding suggests that the THAP domain belongs to the zinc-finger superfamily. Classical zinc fingers have been defined as small, functional, independently folded domains that require coordination of one or more zinc ions to stabilize their structure (25, 26). Indeed, the C2H2-type zinc finger, which defines the most abundant class of DNA-binding proteins in the , is a compact, Ϸ30-aa DNA-binding domain repeated in multiple copies in the proteins (25, 26). Similarly, the zinc-coordinating module of the C4-type zinc finger found in the GATA family of transcription factors covers only the Fig. 5. The C2CH signature and the four other invariant residues of the THAP first 30 residues of the 60-aa DNA-binding domain (27). Therefore, domain are essential for DNA binding. (A) Generation of THAP1 mutants in the Ϸ consensus THAP motif. Wild-type THAP1 (wt) and single-point mutants the size of the THAP domain ( 90 aa) and the spacing between THAP1-C5A, THAP1-C10A, THAP1-C54A, THAP1-H57A, THAP1-P26A, THAP1- the C2 and CH residues of the large C2CH module (up to 53 aa) W36A, THAP1-F58A, and THAP1-P78A were translated in vitro in RRL in the are atypical in the zinc-finger superfamily. In addition, the sizes of presence of 35S-labeled methionine and analyzed by SDS͞PAGE and autora- the DNA target sequences recognized by the THAP domain, 11 diography. Molecular mass markers are shown on the left (kDa). (B) Mutation nucleotides for human THAP1 (Figs. 2 and 3) and 10 nucleotides of single cysteine or histidine residues of the C2CH signature abrogates for Drosophila P element transposase (28), are considerably larger THAP-domain DNA-binding activity. EMSAs were performed with the THABS than those recognized by classical C2H2 zinc fingers, which typically probe and THAP1 wild-type (wt) or mutant proteins (C5A, C10A, C54A, and recognize only 3–4 nucleotides (25). Despite these differences, we H57A). For comparison, wild-type THAP1 was incubated with 5 mM 1,10-o- phenanthroline or methanol vehicle alone (MeOH). RRL, unprogrammed RRL; believe that the THAP domain, which is a functional, indepen- dently folded domain that requires coordination of zinc for its arrowhead, THAP1–THABS DNA complex; *, nonspecific complexes. (C) Mu- tation of the four other conserved residues of the THAP domain (P, W, F, and DNA-binding activity, should be classified as a zinc-coordinating P) abrogates DNA binding. EMSAs were performed with the THABS probe and DNA-binding domain belonging to the zinc-finger superfamily. THAP1 wild-type (wt) or mutant proteins (P26A, W36A, F58A, and P78A). With Ϸ100 distinct THAP-domain protein sequences identified in

Clouaire et al. PNAS ͉ May 10, 2005 ͉ vol. 102 ͉ no. 19 ͉ 6911 Downloaded by guest on September 27, 2021 model animal organisms (Tables 1 and 2 and Fig. 6), including 12 in mammalian cells (30). Interestingly, human THAP7 has recently human proteins (1), the THAP domain may define one of the most been found to associate with chromatin and to function as a abundant class of zinc-coordinating DNA-binding proteins in the histone-tail-binding protein that represses transcription through animal kingdom, after the C2H2 zinc-finger proteins and the recruitment of corepressor NcoR and histone deacetylase HDAC3 nuclear hormone receptors. (6). Similarly, nuclear proapoptotic factors THAP1 and THAP0 The four other residues strictly conserved in all THAP-domain may also function in chromatin modification and͞or transcriptional sequences were also found to be required for DNA binding (Fig. repression. 5C). Therefore, mutation of any of the eight residues that define the The genetic interactions of C. elegans THAP proteins with THAP motif abrogates DNA-binding activity. These results are LIN-35͞Rb (8, 9, 13, 20) and our observation that the zebrafish important because they link our in vitro data on THAP-domain– orthologue of cell-cycle transcription factor E2F6 contains a DNA interactions to in vivo genetic data previously obtained in C. THAP domain (Fig. 1) suggest important roles for THAP elegans, because equivalent mutations have been identified in lin-36 proteins in cell proliferation and͞or cell-cycle progression. Ev- and him-17: Of seven single-point mutations identified in the idence for such a role has already been provided for LIN-36 (14) LIN-36 protein, four mutations were found in the THAP domain, and LIN-15B, which have been shown to negatively regulate G1 including two independent mutations in the last P residue of the progression (13). A third C. elegans THAP protein, the CDC- THAP motif (12); two single-point mutations were identified in the 14B isoform of the cell-cycle regulator tyrosine phosphatase HIM-17 protein, and both were found in the THAP domains, CDC-14, may also function with LIN-35͞Rb in G1 regulation including mutation of the second C of the C2CH motif in THAP because C. elegans CDC-14 has been shown to inhibit G1–S domain 6 (8). The identification of single-point mutants in the transition by modulating nuclear levels of CDK inhibitor CKI-1 THAPdomains of LIN-36 and HIM-17 supports the possibility that (20). Alternatively, the CDC-14B isoform, which contains the these THAP domains are functional and likely to exhibit DNA- DNA-binding THAP motif, could mediate the effects of binding activity in vivo. CDC-14 in the protection of the C. elegans genome against DNA The consensus THABS motif recognized by the THAP domain damage (31). CDC-14 isoforms containing THAP domains have of human THAP1 (Fig. 2B) does not share significant homology not yet been described in other animal species. However, it is well with the AϩT-rich motif recognized by P element transposase (28). known that two proteins, which have homologues in another Together with the observation that distinct THAP-domain se- organism fused into a single protein chain, often show interac- quences within a single species exhibit Ͻ50% identity between each tion between each other (32). Therefore, in other organisms, other, this finding suggests that each THAP domain may possess its CDC-14 could be targeted to DNA and͞or chromatin by direct own specific DNA-binding site. However, we cannot exclude, at this interaction with a cellular THAPprotein. A similar scenario may stage, the possibility that some THAP domains may lack sequence apply to C. elegans CtBP and zebrafish E2F6, which contain specificity. Similarly, the divergent THAP motifs found in HIM-17 consensus THAP domains at their amino termini (Fig. 1) that and other C. elegans proteins (8) may have lost DNA-binding are not observed in homologues from other animal species. activity and function instead as protein–protein-interaction mod- In summary, the results presented in this article show that the ules, as shown for some C2H2 zinc fingers in ref. 29. Finally, it THAP domain of human THAP1 is a zinc-dependent sequence- remains possible that a single THAP domain may function in both specific DNA-binding domain. Together with previous data protein–protein and protein–DNA interactions, as suggested in ref. obtained in humans and C. elegans, these findings suggest that 6 for human THAP7. cellular THAP proteins may function as sequence-specific Genetic data obtained in C. elegans indicate that cellular THAP DNA-binding factors with roles in cell proliferation, apoptosis, proteins may be involved in chromatin modification. The multi- cell cycle, chromosome segregation, chromatin modification, THAP C. elegans protein HIM-17 has been shown to be associated and transcriptional regulation. with chromatin and required for proper accumulation of histone H3 methylation at lysine-9 on meiotic prophase (8), We thank Laurence Nieto [Institut de Pharmacologie et de Biologie suggesting that HIM-17 recruits chromatin-modifying and͞or -re- Structurale (IPBS)–Centre National de la Recherche Scientifique modeling complexes essential for chromatin modification during (CNRS)] and Sophia Kossida (Endocube) for help with EMSAs and meiosis. The link between THAP proteins and chromatin modifi- initial characterization of THAP-family protein sequences, respectively, and members of the Laboratory of Vascular Biology (IPBS–CNRS) for cation͞remodeling is further reinforced by the observation that five ͞ stimulating discussions. We are grateful to Corinne Cayrol (IPBS– distinct C. elegans proteins containing consensus and or divergent CNRS) for the identification of THAP-E2F6 fusion proteins. This work THAP domains (LIN-36, LIN-15B, HIM-17, CDC-14B, and LIN- was supported by grants from Ligue Nationale Contre le Cancer (Equipe 15A) have been shown to interact genetically with LIN-35͞Rb (8, Labellise´e ‘‘La Ligue 2003’’) and Ministe`rede la Recherche Actions 9, 13, 20), a known component of chromatin-remodeling complexes Concerte´es Incitatives ‘‘Jeunes Chercheurs.’’

1. Roussigne, M., Kossida, S., Lavigne, A. C., Clouaire, T., Ecochard, V., Glories, A., Amalric, 18. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. (2004) Genome Res. 14, F. & Girard, J. P. (2003) Trends Biochem. Sci. 28, 66–69. 1188–1190. 2. Roussigne, M., Cayrol, C., Clouaire, T., Amalric, F. & Girard, J. P. (2003) Oncogene 22, 19. Taylor, J. S., Braasch, I., Frickey, T., Meyer, A. & Van de Peer, Y. (2003) Genome Res. 13, 2432–2442. 382–390. 3. Lee, C. C., Beall, E. L. & Rio, D. C. (1998) EMBO J. 17, 4166–4174. 20. Saito, R. M., Perreault, A., Peach, B., Satterlee, J. S. & van den Heuvel, S. (2004) Nat. Cell 4. Deiss, L. P., Feinstein, E., Berissi, H., Cohen, O. & Kimchi, A. (1995) Genes Dev. 9, 15–30. Biol. 6, 777–783. 5. Lin, Y., Khokhlatchev, A., Figeys, D. & Avruch, J. (2002) J. Biol. Chem. 277, 47991–48001. 21. Chinnadurai, G. (2002) Mol. Cell 9, 213–224. 6. Macfarlan, T., Kutney, S., Altman, B., Montross, R., Yu, J. & Chakravarti, D. (2005) J. Biol. 22. Hammer, S. E., Strehl, S. & Hagemann, S. (2005) Mol. Biol. Evol. 22, 833–844. Chem. 280, 7346–7358. 23. Smit, A. F. & Riggs, A. D. (1996) Proc. Natl. Acad. Sci. USA 93, 1443–1448. 7. Cheung, W. L., Ajiro, K., Samejima, K., Kloc, M., Cheung, P., Mizzen, C. A., Beeser, A., 24. Aravind, L. (2000) Trends Biochem. Sci. 25, 421–423. Etkin, L. D., Chernoff, J., Earnshaw, W. C. & Allis, C. D. (2003) Cell 113, 507–517. 25. Pavletich, N. P. & Pabo, C. O. (1991) Science 252, 809–817. 8. Reddy, K. C. & Villeneuve, A. M. (2004) Cell 118, 439–452. 26. Laity, J. H., Lee, B. M. & Wright, P. E. (2001) Curr. Opin. Struct. Biol. 11, 39–46. 9. Ferguson, E. L. & Horvitz, H. R. (1989) Genetics 123, 109–121. 27. Omichinski, J. G., Clore, G. M., Schaad, O., Felsenfeld, G., Trainor, C., Appella, E., Stahl, 10. Clark, S. G., Lu, X. & Horvitz, H. R. (1994) Genetics 137, 987–997. S. J. & Gronenborn, A. M. (1993) Science 261, 438–446. 11. Huang, L. S., Tzou, P. & Sternberg, P. W. (1994) Mol. Biol. Cell 5, 395–411. 28. Kaufman, P. D., Doll, R. F. & Rio, D. C. (1989) Cell 59, 359–371. 12. Thomas, J. H. & Horvitz, H. R. (1999) Development (Cambridge, U.K.) 126, 3449–3459. 29. Sun, L., Liu, A. & Georgopoulos, K. (1996) EMBO J. 15, 5358–5369. 13. Boxem, M. & van den Heuvel, S. (2002) Curr. Biol. 12, 906–911. 30. Zhang, H. S., Gavin, M., Dahiya, A., Postigo, A. A., Ma, D., Luo, R. X., Harbour, J. W. & 14. Fay, D. S., Keenan, S. & Han, M. (2002) Genes Dev. 16, 503–517. Dean, D. C. (2000) Cell 101, 79–89. 15. Bouvet, P. (2001) Methods Mol. Biol. 148, 603–610. 31. Pothof, J., van Haaften, G., Thijssen, K., Kamath, R. S., Fraser, A. G., Ahringer, J., Plasterk, 16. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol. 215, R. H. & Tijsterman, M. (2003) Genes Dev. 17, 443–448. 403–410. 32. Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O. & Eisenberg, D. (1999) 17. Higgins, D. G., Thompson, J. D. & Gibson, T. J. (1996) Methods Enzymol. 266, 383–402. Science 285, 751–753.

6912 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0406882102 Clouaire et al. Downloaded by guest on September 27, 2021