a r t i c l e s

Human mitochondrial A induces a U-turn structure in the light strand promoter

Anna Rubio-Cosials1, Jasmin F Sidow1,5, Nereida Jiménez-Menéndez1,2, Pablo Fernández-Millán1, Julio Montoya3, Howard T Jacobs4, Miquel Coll1,2, Pau Bernadó2 & Maria Solà1

Human mitochondrial transcription factor A, TFAM, is essential for mitochondrial DNA packaging and maintenance and also has a crucial role in transcription. Crystallographic analysis of TFAM in complex with an oligonucleotide containing the mitochondrial light strand promoter (LSP) revealed two high-mobility group (HMG) domains that, through different DNA recognition properties, intercalate residues at two inverted DNA motifs. This induced an overall DNA bend of ~180°, stabilized by the interdomain linker. This U-turn allows the TFAM C-terminal tail, which recruits the transcription machinery, to approach the initiation site, despite contacting a distant DNA sequence. We also ascertained that structured protein regions contacting DNA in the crystal were highly flexible in solution in the absence of DNA. Our data suggest that TFAM bends LSP to create an optimal DNA arrangement for transcriptional initiation while facilitating DNA compaction elsewhere in the genome.

Human mitochondrial DNA (mtDNA) is a circular, double-stranded mtRNA polymerase (MTRPOL), which forms a complex with tran- (ds) 16.5-kilobase (kb) molecule that encodes two ribosomal RNAs scription factor B2 (TFB2M)18. TFB2M is a bona fide transcription (rRNAs) and 22 transfer RNAs (tRNAs), as well as 13 of the ~80 factor that forms a transient complex with MTRPOL, melts the DNA subunits participating in oxidative phosphorylation. The two mtDNA at the promoter and interacts with the priming substrate19. Notably, strands, termed heavy (H) and light (L), are transcribed into genome- it has a paralog, the ‘transcription factor’ B1 (TFB1M), and both are length polycistronic transcripts from two respective promoters, HSP2 ancestrally related to methyltransferases20, but they have different and LSP, which are located in the control region of the genome1,2. functions, as TFB1M is a ribosome large-subunit dimethylase21. The H strand is further transcribed from an additional promoter, Footprinting assays revealed that TFAM weakly protects HSP1 over HSP1 (ref. 1), which generates a shorter transcript comprising two the 23 base pairs (bp) between positions −35 and −13 upstream rRNAs and two tRNAs3. The mtDNA is organized in nucleoprotein of the transcription initiation site2. Footprinting studies with LSP, entities, termed nucleoids, which contain from two to ten copies of recombinant TFAM, MTRPOL and TFB2M revealed protection by Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 the genome4,5, plus that regulate DNA transactions6. TFAM of 23 bp from −38 to −15 at LSP22. Additional analyses showed The architecture, packaging, copy number and maintenance of that the MTRPOL–TFB2M complex covers the region immediately mtDNA depend upon the mitochondrial-specific, nuclear-encoded downstream of the TFAM-binding site, viz. from bp −14 to +10, with mitochondrial transcription factor A (TFAM or mtTFA)7–10. In mouse, TFB2M putatively positioned close to TFAM19. This is consistent with Tfam is an essential gene11, and in some cell types, TFAM is present in earlier suggestions that TFB2M contacts the C-terminal tail of TFAM, amounts sufficient to coat the entire genome12,13. Footprinting studies thus facilitating binding of MTRPOL–TFB2M to the LSP promoter in organello have revealed phased binding at regular intervals in the and bridging the transcription machinery to the preinitiation com- control region14. TFAM bends and packs mtDNA through nonspecific plex18,23,24. LSP and HSP1 are located close together, each contain- binding8,9,13, as shown by atomic force microscopy studies, which ing two partially conserved segments of 10 and 12 bp separated by showed that recombinant human TFAM induces bending and com- 6 bp. In vitro studies have shown that, in the absence of TFAM, the paction of nonspecific DNA into nucleoprotein structures reminis- MTRPOL–TFB2M complex initiates transcription from HSP1 on cent of mitochondrial nucleoids7. Other studies revealed that TFAM the joint LSP–HSP1 template25. Addition of small amounts of TFAM preferentially binds cruciform15, oxidizes mtDNA16 and is involved shifts the balance of initiation toward LSP, whereas a large excess in base-excision DNA repair17. of TFAM restores initiation at HSP1 (ref. 25). When overexpressed, To determine the binding sites of TFAM to DNA, footprinting studies TFAM appears to trigger a paradoxical suppression of transcrip- were conducted with the protein in the presence or absence of the tion, combined with effects on mtDNA replication, which have been

1Department of Structural Biology, Molecular Biology Institute of Barcelona (CSIC), Barcelona, Spain. 2Structural Biology Program, Institute for Research in Biomedicine, Barcelona, Spain. 3Departmento de Bioquímica y Biología Molecular y Celular, Universidad de Zaragoza–CIBER de Enfermedades Raras (CIBERER), Zaragoza, Spain. 4Institute of Biomedical Technology and Tampere University Hospital, University of Tampere, Tampere, Finland. 5Present address: Department of Bioanalytics, R&D Protein Analytics, Biologics Research, Pharma Research and Early Development (pRED) Penzberg; Roche Diagnostics GmbH, Penzberg, Germany. Correspondence should be addressed to M.S. ([email protected]). Received 10 June; accepted 13 September; published online 30 October 2011; doi:10.1038/nsmb.2160

nature structural & molecular biology VOLUME 18 NUMBER 11 NOVEMBER 2011 1281 a r t i c l e s

­suggested to be due to increased mtDNA compaction from nonspe- extended conformation (residues 226–237). In both HMG domains cific binding2,18,25–27. TFAM may therefore have a key role in the the three helices fold into an L-shaped arrangement typical of HMG interplay between different mtDNA transactions in vivo. boxes (see above and Fig. 2a), where the L-‘inner’ surface contacts the TFAM is a high-mobility group protein of type B (HMGB), which DNA minor groove by polar and nonpolar interactions, with specific contains two tandem HMG-box domains, HMG1 and HMG2, sepa- residues partially or fully intercalating between bases (see below). rated by a linker and followed by a C-terminal tail that confers specific The two HMG boxes have a head-to-head orientation, with the short recognition for LSP and mediates interaction with transcription L-arms, comprising helices 1 and 2, oriented toward the central two- factor TFB2M10,23,24,28,29. In general, HMGB proteins may contain a fold axis of the double HMG-domain structure (Fig. 1c). In contrast, single HMG domain, like the sequence-specific transcription factors a ‘tail-to-tail’ orientation, with the two short L-arms oriented out- LEF-1, TCF-1 and SRY; or the non–sequence-specific proteins ward with respect to the central axis, was found for the synthetic Drosophila HMG-D and yeast NHP6A and NHP6B. Other HMG- didomain protein SRY.B, designed to create a stable complex of a box representatives contain two HMG domains, like the chromatin- nonspecific HMG box bound to DNA37. TFAM HMG1 contacts LSP binding proteins HMGB1, HMGB2 and yeast ABF2p; or even more region 1, which comprises the sequence T1A2A3C4A5G6 (numbered domains, like human transcription factor UBF; all these domains are as in Fig. 1a), whereas HMG2 contacts LSP region 2, of sequence 30 similar at the sequence and structural levels . In particular, TFAM is C14C15A16A17C18T19A20A21. In these regions the protein markedly a protein containing two HMG-box domains with sequence-specific flattens and widens the DNA minor groove by separating the phos- DNA-binding capability that also participates in events that involve phate backbones of the two strands (Fig. 1c). nonspecific DNA binding. Structural analyses of HMGB domains Overall, the DNA undergoes two sharp ~90° kinks, each caused by have revealed that they have an L shape, the short L-arm consisting one HMG box (Fig. 1d), giving rise to an overall U-turn of ~180°. of two short antiparallel α-helices (helix 1 and 2) and the long L-arm Within the kinked regions the three base steps at sequence A3C4A5G6 comprising an elongated segment of about six to seven residues from in LSP region 1 and C14C15A16A17 in region 2 show high positive rolls the N terminus of the domain, packed against a C-terminal α-helix (with maximums of 50° and 60°, Fig. 1d) due to a sharp bend toward (helix 3). The structures solved in complex with DNA revealed that the major groove, whereas the twist value of the central steps (DNA contacts occur through the internal, concave surface of the L, into steps are indicated by a slash, /) C4A5 / T18G19, and C15A16 / T7G8 which DNA fits with pronounced bending, varying from 61°, as in decreases (with minima of 10°) because of flattening of the minor rat HMG1 protein bound to modified DNA31, to 117° for mouse tran- groove (Fig. 1d). On the inside face of the bends, at the DNA major scription factor LEF-1 (ref. 31). DNA bending is stabilized by several groove of both regions 1 and 2, the highly rolled base pairs undergo polar and nonpolar interactions and, importantly, by a characteristic regular Watson-Crick interactions and bifurcated interstrand bonds38. intercalation of nonpolar residues from either or both helices 1 and 2, In addition, water molecules are coordinated by the base atoms which disrupt base-pair stacking. Such an L-shape was earlier (Figs. 2 and 3a,b). On the outside face of the bends, at the minor reported for the HMG2 domain of TFAM29. An HMG-box structure groove contacting the HMG boxes, the oxygen atoms from the phos- is also predicted for HMG1 (ref. 32), but the details and overall phate backbone are stabilized by polar contacts and positive charges ­organization of the full-length protein, as well as the molecular basis provided by residues from the three helices of either HMG box, most of its interaction with DNA, are unknown. notably the highly conserved Trp88 and Trp189 from HMG1 and The importance of understanding TFAM function has been high- HMG2, respectively (Fig. 2b and 3a,b; Supplementary Fig. 2a,b). lighted by recent reports showing that it influences various patho- logical states in the mouse33,34. In addition, it can apparently operate The HMG boxes recognize DNA by different means as either a tumor promoter35 or suppressor36. To shed light on its The two HMG domains share the same fold (r.m.s. deviation Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 function, we analyzed the crystal structure of full-length TFAM in 0.95 Å), but they are not identical. Notably, helix 1 of HMG2 is one complex with a double-stranded oligonucleotide encompassing the turn shorter than its HMG1 counterpart, a feature that is not modi- LSP sequence, and we assessed the flexibility of the free protein in fied upon DNA binding (Fig. 2a,c) and that differentiates this domain solution by small-angle X-ray scattering (SAXS). from any other reported HMG-box domain. This shorter HMG2 helix 1 is found in TFAM of all metazoans analyzed (Supplementary RESULTS Fig. 2). Despite this structural difference between HMG1 and HMG2, Both HMG boxes of TFAM bend the LSP-22 sequence they both intercalate nonpolar residues to the DNA, though by Full-length mature TFAM did not crystallize without DNA, and ­different means. the best crystals were obtained with an LSP oligonucleotide of In HMG1, Leu58 from helix 1 intercalates between bases A3 and 22 bp (LSP-22, Fig. 1a–d and Table 1), comprising the sequence fully C4 (Figs. 1 and 2b), thus hampering their stacking and strongly con- protected in previous footprint assays2,22. Structure determination by tributing to the high positive roll of the corresponding base-pair step experimental methods, which included X-ray data from crystals modi­ (Fig. 1d). This distortion is stabilized by a highly conserved neighbor- fied by seleno-methionine (SeMet-TFAM–LSP-22) or bromouracil ing residue, Tyr57 of helix 1 (Fig. 3a, Supplementary Fig. 2a), which (TFAM–LSP-22Br), gave rise to electron density maps that allowed partially intercalates between bases T20 and G19 of chain D (Figs. 1, 2 ­unambiguous protein and DNA sequence assignment (Supplementary and 3a) and makes a hydrogen bond with the latter guanine. Helix 2 Fig. 1). A detailed analysis of protein-DNA interactions showed that also contributes to DNA distortion by partially intercalating Thr77, the structures from crystals grown in the presence of SeMet-TFAM or Thr78 and Ile81 (Figs. 1a, 2b and 3a). HMG2 interacts differently LSP-22Br did not deviate substantially from that of the native protein with DNA region 2 (Figs. 1a, 2b and 3b). At the position equiva- bound to LSP. TFAM is an all-α modular protein, comprising two lent to HMG1-box Leu58, a polar residue, Asn163, does not inter- HMG-box domains (HMG1 and 2), each of which spans ~75 residues calate but makes hydrogen bonds with both chain D T6 and T7 of (residues 44–120 and 153–225, respectively, Fig. 1c, Supplementary DNA region 2, inducing a shear to A17–T6. The side chain Fig. 1d,e). These are connected by a linker of helical conformation of the preceding residue, Tyr162, highly conserved and part of the (residues 124–152) and are followed by a C-terminal tail in almost HMG2 hydrophobic core, showed alternate orientations in the

1282 VOLUME 18 NUMBER 11 NOVEMBER 2011 nature structural & molecular biology a r t i c l e s

a c HMG1 HMG2 HMG1 (T1–G6) HMG2 (C14–C21) Helix 1 Helix 2 Helix 3 Linker Helix 1 Helix 2 Helix 3 C-ter tail LSP coding 424 Rolled bp 434 Rolled bp 444 939077715643 123120 194191178172161152 227225 246 Leu58 Leu182 Lys237 5′,C 3′ Helix 1

HMG1 G10 G9 HMG2 T7 C8 G11 3′,D 5′ Helix 1 G6 Helix 2 Twist Twist G12 L Linker (A16–G11) L A9 G8

C14 11 A5 A16 C L182 b Helix 2 G15 C15 Region 1 Region 2 Region 1 Region 2 C17 C10 C13 T7 A16 5′ 3′ 5′ 3′ 5′ 3′ 5′ 3′ G13 C12 T18 A17 C4 T14 T6 Helix 3 G19 A3 G5 Leu L58 N C18 A2 T20 A4 3D Helix 3 T19 T3 T21 T1 A22 T2 A20 Chain D G1 Chain C A21 Linker C22 C

86° 88° 70 d 60 Figure 1 TFAM–LSP-22 complex. α C10–G13 C12–G11 α G6–C17 8 15 C –G C13–G10 50 (a) Crystallized light strand promoter A5–T18

(LSP-22) coding sequence (dark blue, C4–G19 C14–G9 40

chain C in the PDB), with mtDNA C15–G8 A9–T14 30 numbering, and complementary Leu58 Leu182 Roll angle 20 A3–T20 C11–G12 sequence (cyan, chain D). T7–A16 A16–T7 Orange, green and yellow boxes 10 A2–T21 A17–T6 symbolize HMG1, HMG2 and linker (L) 0 C18–G5 5′T1–A22 3′

domains contacting DNA (contacted uA/TA –10 TA/TA AA/TT AA/TT AA/TT Cu/AG AC/GT CA/TG AG/CT GT/AC TC/GA CA/TG AC/GT CA/TG AC/GT AC/GT 90 CC/GG CC/GG CC/GG CC/GG CC/GG base pairs in brackets). TFAM residues ° T19–A4 intercalate (pink background) or A20–T3 LSP-22 steps Leu58 Leu182 contact (domain color-code) the A21–T2 high rolled base pairs, which like 3 5 the inverted motif are framed (black ′C22–G1 ′ 45 40 and pink outlined boxes, respectively), C10–G13 C13–G10 35 C11–G12 C14–G9 A2–T21 and central low-twisted steps are arrowed. A9–T14 Leu182 30 Dashed lines represent interstrand 25 5′ T1–A22 3′ T7–A16 20 bifurcated hydrogen bonds. 3′C22–G15′ 15 Twist angle (b) Alignment of rolled regions 1 C12–G11 10 5 and 2. The alignments are based on A3–T20 A21–T2 Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 T19–A4 0 G6–C17 A20–T3 structurally equivalent residues (left) C4–G19 C8–G15 Leu58 A5–T18 uA/TA TA/TA AA/TT AA/TT AA/TT Cu/AG AC/GT CA/TG AG/CT GT/AC TC/GA CA/TG AC/GT CA/TG AC/GT AC/GT or on intercalating leucines (right). CC/GG CC/GG CC/GG CC/GG CC/GG (c) Top, representation of TFAM domains along the sequence, with intercalating and last-traced residues indicated. Bottom, ribbon plot of the TFAM–LSP-22 crystal structure. Domains HMG1 (in orange) and HMG2 (green); respective helices 1 to 3, the central linker (yellow), flanking segments (gray), and N- and C-terminal ends are indicated. The intercalating and kink-stabilizing residues are shown in sticks. In B-DNA, the distance across the minor groove between phosphates of complementary strands spaced at four base pairs is 11.7 Å. In DNA region 1, this distance is 22.4 Å between T7 (chain C) and T20 (chain D), whereas in region 2 it is 21.9 Å, between A17 and G10. (d) Two views of LSP-22 representation, showing its U-turn shape. Intercalation sites are arrowed; deviation from a straight dsDNA axis is depicted on top. The right panel shows the roll and twist angles, and intercalated steps along LSP-22.

unbound form, which was suggested to reflect a lower stability of at DNA region 2 are shown in Figure 2b. Note that the highly con- the hydrophobic core29,32. In the present structure, Tyr162 has a single served Tyr200, found in different orientations in the unbound HMG2 29,32 conformation that makes a hydrogen bond with chain C A16, in the structure , shows a single orientation in which the aromatic ring same fashion as the topologically equivalent residue in HMG1, Tyr57, contributes to the HMG2 hydrophobic core and the hydroxyl group does with chain D G19. The most marked distortion within DNA points toward a surface not facing the DNA. region 2 is conferred by a hydrophobic residue from HMG2 helix 2, Previous sequence and structural analyses of HMG-box domains Leu182, which intercalates between C15–G8 and A16–T7 (Figs. 1 identified that the type (polar or nonpolar) of (partially) intercalat- and 3b). In contrast, its topological counterpart in HMG1, Ile81, ing residue at positions equivalent to HMG1 Leu58 (or Asn163 in only partially intercalates (see above). Leu182 also has alternate HMG2) in helix 1 (site ‘X’37, Supplementary Fig. 1e), and Thr77– conformations in the DNA-free HMG2 structure29, whereas in our Thr78 (or HMG2 Pro178–Gln179) in helix 2 (site ‘Y’, Supplementary structure one of the two conformers is selected for base intercalation Fig. 1e), dictated the specific versus nonspecific DNA-binding mode, during DNA binding (Fig. 2c). Additionally, partially intercalating and were defined as ‘specificity determinants’ (refs. 30,39 and refer- residues and hydrogen bond interactions that stabilize the distortion ences therein). This position in helix 1 was defined as the primary

nature structural & molecular biology VOLUME 18 NUMBER 11 NOVEMBER 2011 1283 a r t i c l e s

Table 1 Crystallographic data processing and refinement statistics boxes (schematically represented in Fig. 1b, left panel) shows an TFAM-Br in complex with LSP-22 overall spatial coincidence of the amino acids contacting the DNA (for example, Tyr57 structurally aligns with Tyr162). However, it Data collection shows a shift in the intercalating residues Leu58 and Leu182, and Space group P2 2 2 1 1 intercalated DNA steps. If the schematic alignment is done based Cell dimensions on the intercalating residues (Fig. 1b, right panel), then DNA a, b, c (Å) 113.9,117.2, 56.53 sequence A A ↓C aligns with C ↓A A . If the latter is inverted α, β, γ (°) 90 2 3 4 15 16 17 to A A ↓C , it matches the former as both show the AAC Resolution (Å) 40.84–2.45 (2.58–2.45) 17 16 15 pattern and are intercalated at the second step. This defines an R (%)* 8.0 (46.7) sym inverted motif, AA↓C–10 bp–C↓AA, that follows the symmetry of I /σ I 17.5 (4.2) the HMG boxes (Fig. 1c). The alignment of LSP with HSP1 and Completeness (%) 99.8 (100) other 28-bp TFAM-binding sites, termed X and Y2, shows that the Redundancy 7 (7.4) number of base pairs between the trinucleotide inverse repeats is Refinement systematically 10 bp. This suggests that the mode of TFAM binding Resolution (Å) 40.84–2.45 to these sequences could be topologically similar. However, whereas No. reflections 28,483 the first AAC is identical among the four binding sites, the second Rwork / Rfree 22.8 / 18.2 trinucleotide is less conserved. Moreover, the nucleotide content of No. atoms the intervening 10 bp varies considerably. We suggest that, whereas Protein 3,238 the HMG boxes intercalate similarly at the inverse repeats, these DNA 1,797 different sequence contexts could account for the differences in Water 185 protein binding and DNA bending between LSP and other sites. B-factorsa Protein 42.5 The linker adds contacts to the DNA Ligand/ion 40.8 In the structure, the linker connecting HMG1 and HMG2 compen- Water 40.2 sates for the repulsion of the backbone phosphates brought closer by R.m.s. deviations the DNA U-turn (Fig. 1c). In particular, a set of positive residues at Bond lengths (Å) 0.012 the C-terminal end of the linker makes three types of contact with the Bond angles (°) 1.4 DNA. Two of these contacts are made by the highly conserved resi- aB factors as after Refmac5/Phenix refinement including TLS. dues Arg140 and Lys146 (Fig. 3c and Supplementary Fig. 2). Arg140 faces the major groove of DNA region 1, and Lys146 fronts the same ­‘intercalation wedge’40. In nonspecific HMG domains, nonpolar resi- groove at region 2. These residues interpose their positive side chain dues, chiefly Met or Phe, are found at the primary intercalation wedge, ends between the oxygens of two phosphates from complementary which is preceded by an aromatic residue (typically Phe or Tyr)30. strands, thereby stabilizing the DNA kink at the major-groove side At site Y, nonspecific HMGs likewise have a nonpolar residue30. In (Figs. 1c and 3c). contrast, the DNA-specific boxes may harbor a polar residue in either The third contact of the linker is made at a segment of the minor 30 of the two positions at the primary wedge , although all structurally groove between DNA regions 1 and 2, comprising A16 to G13 (chain D) characterized complexes with cognate DNA show Phe–Met or Phe–Ile and A9 to C11 (chain C), which adopts canonical B-DNA conformation doublets—for example, Sox2 (PDB 1O4X)39, LEF1 (PDB 2LEF)41 and (Fig. 1c). The tetramethylene side chains of Lys139 and Lys147, Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 SRY (PDB 1J46)42—whereas a polar residue (usually Asp) is found together with the nearby electronegative phosphate backbones, lock 30 at site Y, engaged in specific DNA recognition . In TFAM HMG1, the side chain of Met143, which points straight to bases G13 and Phe57–Leu58 constitutes the hydrophobic wedge at site X and Thr77 T14 from DNA chain D (Fig. 3c). The two lysines and neighboring is found at site Y (Supplementary Fig. 1e). Together, these residues residues make additional contacts, mostly electrostatic and involving are consistent with a specific DNA-recognition nature for domain conserved residues (Fig. 3c, Supplementary Fig. 2), which stabilize HMG1. In contrast, in HMG2 neither of these sites intercalates the the interaction without any specific recognition of the base atoms. In dsDNA. Instead, intercalation is done by Leu182, four positions ahead summary, the linker fits into the minor groove and stabilizes the two of site Y. Structural analysis of HMGB domains suggested classify- kinks. By passing perpendicularly over the DNA, it connects the two ing this position as a ‘specificity determinant’ (ref. 39 and references HMG domains at either side of the double helix, thus contributing therein), but phylogenetic covariation analysis of several HMGB to the overall U-turn shape of the dsDNA. In contrast, in artificial sequences30 did not show a substantial correlation with DNA mode didomain protein SRY.B38, the basic linker is docked loosely along the recognition. The TFAM HMG2 domain is the first case where an minor groove between the HMG boxes, stabilizing the overall inter­ intercalating wedge is found at this position. action but without contributing to DNA bending, which, in this case, is only 101 Å. TFAM intercalates residues at a DNA inverted motif Previous biophysical studies by fluorescence anisotropy, a sensitive As explained above, HMG1 and HMG2 intercalate residues from technique for detection of macromolecular interactions in solution, different helices into the DNA (Fig. 2a, Supplementary Fig. 1e): demonstrated the contribution of the linker in DNA binding32. In these HMG1 intercalates Leu58 from helix 1 between the first two base studies, the weak affinity of HMG2 for DNA was markedly stimulated pairs in the A3↓C4A5G6 sequence, whereas HMG2 does so with by addition of residues from the linker to the construct, which included Leu182 from helix 2 between the second and the third base pairs Lys146 and Lys147 (ref. 32). Activity studies with the yeast TFAM homolog of the C14C15↓A16A17 sequence. Therefore, either residue disrupts Abf2p showed that replacement of its HMG2 box by a sequence includ- the stacking at different DNA steps of the respective highly rolled ing TFAM linker, HMG2 and the C-terminal tail, had a stronger effect regions (Fig. 1a,b). Accordingly, superimposition of the two HMG on specific DNA binding and transcription activation than a construct

1284 VOLUME 18 NUMBER 11 NOVEMBER 2011 nature structural & molecular biology a r t i c l e s

HMG2 bound a b 5 ,C c HMG2 ′ 3′,D P T1 A22 unbound P P Trp88 A2 T21 C P P TFAM-HMG2 A3 T20 Lys136 P Leu58 Tyr57 P Arg89 C4 G19 Ser61 P lle81 P TFAM-HMG1 Arg82 90° A5 T18 Thr78 P Thr77 P G6 C17 90 Helix 3 P ° N Lys69 P T7 A16 90° P P Arg140 C8 G15 Long arm P P Lys139 A9 T14 P Leu182 Met143 P Leu58 C10 G13 90° P Leu182 P Helix 2 Lys146 C11 G12

Short arm Helix 1 Helix Leu58 P Tyr150 P C12 G11 P P Helix 1 C13 G10 d P P Val225 (Gln179) C14 G9 Gln179 Pro178 P lle223 C15 G8 Arg227 Met222 Helix 2 P Lys156 Leu182 Val166 P Lys228 Short arm A16 T7 Arg159 Helix 3 P Glu148 Glu219 Tyr162 P Asn163 Leu230 A17 T6 Trp218 Trp189 P P Lys145 C18 G5 Asp229 Figure 2 Comparison of HMG1 and HMG2 boxes and their P P Leu231 Arg157 T19 A4 contacts with DNA. (a) Superposition of HMG1 (orange) onto P P lle235 A20 T3 Arg232 HMG2 (green). The HMG1-contacting DNA is shown as gray P P Tyr211 A21 T2 Lys237 sticks. The L-shaped long and short arms, helices 1 to 3, and Arg232 P P C22 G1 C-terminal N- and C-terminal ends are indicated. Note the shorter length Arg233 P tail of helix 1 in HMG2. Leu58 and Leu182, located in different Thr234 3′,C 5′,D Thr234 helices, are framed. (b) Scheme of the protein-DNA contacts. Residues are framed according to the domain color-code (see Fig. 1); contacts with phosphates or bases are shown by red or black arrows, and (partially) intercalating residues are above the contacted bases. The inverted AAC motif and bases forming interstrand bonds are framed in pink and blue outlined boxes, respectively. (c) Structural superimposition of HMG2 in complex with DNA (green) onto the unbound HMG2 (violet; PDB 3FGH29). The HMG2 overall shape does not substantially change upon DNA binding. The inset shows the two conformations of Leu182 in the violet unbound structure and the one selected upon DNA binding as found in the TFAM–LSP-22 complex (in green). (d) Close-up view of the C-terminal tail (in gray) packing against HMG2 helix 3 (in green) and the linker (yellow). Side chains participating in the hydrophobic core or in DNA interactions mentioned in the text are shown as sticks; oxygen and nitrogen atoms are in red and blue, respectively; salt bridges and hydrogen bonds are shown as dotted lines. The position of Lys237 is indicated.

lacking the linker28, pointing to the importance of the helical linker in reflecting species-specific adaptation to the targeted molecule(s). contacting the DNA and influencing the affinity of HMG2 for DNA. Whereas the length of the C-terminal tail is conserved among mammals, markedly longer tails (by ~20 amino acids) are found in The C-terminal tail Xenopus laevis, Salmo salar and Anopheles darlingi (Supplementary The C-terminal tail is required for specific recognition of the Fig. 2). In contrast, the putative TFAM ortholog in Caenorhabditis DNA28,29 and for interaction with transcription factor B from the elegans has only four residues after the last helix of HMG2, sug- transcription initiation complex23. This tail is packed antiparallel gesting that this protein might function in DNA packing but not in Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 to helix 3 up to Arg232, where the guanidinium group of the latter transcription initiation43. bridges to the side chain of Glu219 from helix 3 and to the DNA 3′-phosphate group of A21 (chain C) (Fig. 2d). Notably, and crucial Particular regions of TFAM are intrinsically flexible for the interpretation of the structure (see below), the tail contacts Previous studies based on UV CD spectropolarimetry showed an the DNA on the other side of LSP from the transcription initiation increase in the α-helix content of TFAM, upon DNA binding32. The site. However, by virtue of the U-turn imposed on the DNA, the crystal structure shows that TFAM and LSP intimately intertwine, C-terminal tail is nevertheless brought into close proximity with the indicating that both molecules structurally rearrange upon bind- sequence immediately upstream of the initiation site. The impor- ing, by mutual induced fitting. To characterize TFAM structurally tance of Arg232 was highlighted by previous transcription activa- in its free state we conducted SAXS analysis of the protein in solu- tion and footprinting analyses, which showed that excision of the tion. The analysis of the scattering curve unambiguously showed the C-terminal residues Arg232–Cys246 impaired LSP binding and presence of a particle with an apparent molecular mass of 24 kDa, transcription activation. The same deleterious effect was caused by in good agreement with a TFAM monomer (25.6 kDa) and ruling out the single-point mutation R232C28. From position Arg232 to the last the presence of TFAM multimers. This particle presents a radius of residue traced, Lys237, the C-terminal tail passes over the phosphates gyration (Rg) with a value of 32.0 ± 0.3 Å, larger than expected for a without making any specific interaction with bases, possibly because 24-kDa protein (about 18 Å, according to the Flory equation Rg = 3 × the DNA that crystallized does not include the sequence required. N × 100.33). The corresponding pairwise distribution function, p(r), In mammals the C-terminal tail is conserved up to Lys236 (this, of the curve, which reflects the distribution of intraparticle distances, fully conserved; Supplementary Fig. 2). The last residue traced in shows a smooth decrease toward a large maximum (Dmax) of 135 ± 5 Å the structure, Lys237, is exposed to the solvent (Fig. 2d), and the (Supplementary Fig. 3a). These features indicate that unbound residues Gln238–Cys246 were not visible because of crystallographic TFAM is highly flexible and that conformations of different dimen- disorder, which could be due to intrinsic flexibility, a feature com- sions coexist in solution. Another very informative analysis of the patible with the availability of this last segment for interaction with data is the Kratky representation of the experimental SAXS curve other proteins. These last residues show low conservation, possibly (Supplementary Fig. 3b), which yielded a profile corresponding to

nature structural & molecular biology VOLUME 18 NUMBER 11 NOVEMBER 2011 1285 a r t i c l e s

a globular protein (implying folded domains) mixed with a flatter structures based on the crystallographic protein coordinates, in which profile typical for unfolded proteins, substantiating a partial structural complete conformational freedom was allowed for in the linker and in disorder for TFAM44. the C-terminal tail. These calculations identified a subensemble of 50 To describe the SAXS curve as an ensemble of coexisting TFAM conformations that, collectively, were in perfect agreement with the 2 conformations we subsequently applied the ensemble optimization SAXS curve (χ = 0.59) (Fig. 4a, left panel). The corresponding Rg method (EOM, see Online Methods). We generated a pool of 10,000 distribution was broad and similar to the one obtained from the initial

A B

C a HMG1-DNA Region 1 Thr77

lle81 Thr78 Ser61 Thr78 Thr78 Lys96 Tyr57 Lys96 Thr77 Trp88 Thr77 Trp88 Tyr57 Tyr57 C17 T7 T18 G19 A5 Leu58 Leu58

Leu58 Ser61 Ser61

C4 T20 T21 A16 A3 G15

180°

b HMG2-DNA Region 2

Gln179 Pro178 Trp189 Pro178 Trp189 Pro178 Leu182 Gln179 Gln179 Leu182 Leu182 Tyr162 Tyr162 Tyr162 C14 Val166 G8 G9 G10 C15 Asn163 Asn163 A16

17 A C13

T7 Tyr211 Tyr211 Asn163 C12 Pro155 Pro155 Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 C18 Tyr218 T6 Tyr218

c Linker

T14 G15 A16

C17 T18 90° C13 C12 C11

Lys147 Thr150 Arg140 Arg140 Met143 Arg140 Lys136 Lys136 Lys146 Lys136 A2 Lys150 Lys150 Met143 His137 Met143 His137 G5 Lys139 Lys147 Lys147 Lys139 Lys139 Trp218 His137 Trp218 T1 Lys146 A4 Lys146 Glu219 Glu219

Figure 3 Close-up views of three TFAM areas contacting LSP-22 (see top scheme for reference). (a) Contacts between HMG1 and DNA region 1; (b) between HMG2 and DNA region 2; and (c) between the helical linker and the LSP minor groove. In all left panels, side chains intercalating (Leu58 in a, Leu182 in b), half-intercalating, hydrogen-bonding or salt-bridging to DNA are shown as sticks and colored as in Figure 1 domain color- codes. Water molecules and oxygen atoms are represented in red; nitrogen, sulfur and phosphate atoms are in dark blue, green and gray, respectively; polar interactions are shown as black dashed lines, except for DNA interstrand hydrogen bonds, which are shown in cyan and involve adenines A3 or A5 (chain C) with G19 (chain D), or A16 or C14 (chain C) with G8 (chain D); LSP-22 sequence is shown as in Figure 1a. The middle panels show electrostatic potential surfaces (blue, positive; red, negative) mapped on the TFAM Connolly surface. The right panels depict residue conservation across metazoan TFAM molecules (see Supplementary Fig. 2a). Identical residues are shown in red; higher to lower similarity values gradually vary from dark orange (90 to 99%) to light yellow (30 to 50%); lower values, in white.

1286 VOLUME 18 NUMBER 11 NOVEMBER 2011 nature structural & molecular biology a r t i c l e s

a Experimental curve Pool that the TFAM linker formed a stable α-helix was explored by generat- EOM selected EOM selected ing a new pool of 2,000 conformations in which HMG1, HMG2 and the helical linker were assumed to be rigid bodies linked by flexible hinges. The computed SAXS profile of this model did not agree with the experimental data (χ2 = 2.93). In summary, unbound TFAM has

Log I ( s ), relative two nonstructured regions, the linker and the C-terminal tail. In conclusion, the SAXS analysis unambiguously shows that TFAM 0 0.1 0.2 0.3 0.4 0.5 15 25 35 45 55 is a monomer in the experimental conditions tested. In addition, it –1 s (Å ) Rg (Å) shows that the protein is intrinsically highly flexible and that flex- b ibility is not evenly distributed along the sequence but affects the linker and the C-terminal tail. Comparison with the crystal structure shows that the linker folds into an α-helix upon DNA fitting and binding, and supports a model where intertwining between the two ­macromolecules takes place, by TFAM adopting a fixed structure.

DISCUSSION 90° Working model for DNA recognition, binding and bending The crystal structure of TFAM bound to LSP-22 shows an intertwined molecular arrangement that cannot result from a direct contact between rigid molecules. Molecular intertwining can be explained Figure 4 SAXS analysis of unbound TFAM. (a) Left panel, experimental by a partial interaction that stimulates a conformational change, scattering-intensity curve (black line) represented in a logarithmic leading progressively to a full contact. In this case the conforma- scale as a function of the momentum transfer, s = 4π sin(θ) λ−1 (2θ, tional change is induced by each molecule to the other. A similar scattering angle; λ = 1.5 Å, X-ray wavelength). The fitted EOM (ensemble optimization method, see Online Methods) curve (red curve) describes case was previously found for bacterial integration host factor and 45 the complete s-range. Right panel, radius of gyration (Rg) distributions its target DNA, which also give rise to a DNA U-turn . Based on our of both the subensemble of conformations selected by EOM (red curve) SAXS data, unbound TFAM has two folded HMG boxes linked by and that of the starting 10,000 conformations of the pool (in black). an unfolded segment (Fig. 5a). This, in addition to the much higher (b) Molecular representation of a subensemble of 50 models that affinity of HMG1 for DNA than that of HMG2 (refs. 29 and 32), describes the data, superimposed by their HMG2 domains (green makes simultaneous binding of the two boxes to two separate DNA surface); both side and bottom views (left and right panels, respectively) show that the HMG1 domain (orange ribbon) can be found in a wide regions highly unlikely and suggests that HMG1 binds first (Fig. 5b). range of orientations. This would induce a DNA bend, of about 70–110°, as predicted from previous structures of HMGBs bound to DNA (for example, PDB 1CKT31, 1J5N30 and 1J46, see ref. 42). This structure would then be pool of random conformations (Fig. 4a, right panel), indicating both stabilized by the linker, which contacts the minor groove by adopt- the impossibility of describing TFAM in a single conformation and ing an α-helical conformation (Fig. 5c). This arrangement would the high plasticity of both the linker segment and the C-terminal tail place HMG2 on the opposite side of the double helix and close to it, (Fig. 4b). Accordingly, the interdomain dis- tance distribution of the selected subensemble

Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 a b was also similar to that obtained for the initial pool (Supplementary Fig. 3c). The possibility HMG2

HMG2

Figure 5 Working model for the role of TFAM TFAM in transcriptional activation at LSP. (a) TFAM presents two HMG box domains (labeled) that move freely with respect to one another (arrows). HMG1 TFAM Below the DNA double helix, the binding sites for –43 TFAM (in black), transcription factor B2 (TFB2M, yellow, based on ref. 19) and mitochondrial RNA –35 –15 HMG1 19 –14 +6 polymerase (MTRPOL, in black ) at the light LSP –7 +14 Start site strand promoter (LSP) are indicated. (b) TFAM HMG1 is the first in contacting the DNA minor groove and induces a first kink to the DNA. (c) The c d linker segment contacts the DNA minor groove TFAM HMG2 while adopting a helical conformation, which HMG2 stabilizes the first DNA kink. (d) Binding of the linker leads domain HMG2 to bind the minor MTRPOL TFAM groove on the opposite side of the double helix, TFB2M introducing a second DNA kink, causing a DNA U-turn that positions the TFAM C-terminal tail (gray coil) close to the 5′ end of the LSP. The red arrow indicates the hypothesis that the TFAM C terminus HMG1 may interact with transcription factor B of the HMG1 Start site transcription machinery to initiate transcription.

nature structural & molecular biology VOLUME 18 NUMBER 11 NOVEMBER 2011 1287 a r t i c l e s

thus conferring a high probability for it to contact DNA despite its low to P.B.), the European Union (FP7-HEALTH-2010-261460 to M.S., FP7- intrinsic affinity and induce a second bend to the DNA. By successive BioNMR-2010-261863 to P.B.), and Instituto de Salud Carlos III-FIS-PI 10/00662. intercalation of residues into the DNA AAC motifs, the two HMG The Centro de Investigación Biomédica en Red de Enfermedades Raras is an initiative of the Instituto de Salud Carlos III. A.R.-C., J.F.S., N.J.-M. and P.F.-M. hold or held domains cooperatively induce an overall U-turn of ~180° stabilized fellowships from Consejo Superior de Investigaciones Científicas, MICINN and by the linker (Fig. 5d). Importantly, we infer the key contribution of Cusanswerk-Bischöfliche Studienförderung. H.T.J. is supported by Academy of TFAM at LSP to be in bending the DNA and bringing the C-terminal Finland, Tampere University Hospital Medical Research Fund and Sigrid Juselius tail close to the transcription initiation start site for TFB2M to enforce Foundation. We also thank the European Molecular Biology Laboratory (EMBL)- Grenoble and EMBL-Hamburg Outstations, the European Synchrotron specific melting. Radiation Facility in Grenoble and the Automated Crystallography Platform Another important activity of TFAM is DNA packaging, which was (Barcelona Science Park) for their support. deduced from experiments in which the protein induced negative super- 8,46 coiling in relaxed plasmids . Supercoiling results from variation in AUTHOR CONTRIBUTIONS twist (T, number of helical turns in the DNA double helix) and writhe A.R.-C. and J.F.S. contributed to cloning, protein production and crystallization; (W, number of crosses over the double helix), whose sum give rise to the A.R.-C., N.J.-M. P.F.-M. and P.B. conducted the SAXS studies; A.R.-C. and M.S. contributed to X-ray structure solution; A.R.-C. and M.S. contributed to figure linking number (L). Our results show that, despite TFAM inducing a preparation. Together with the rest of authors, M.C., J.M. and H.T.J. participated in strong unwinding at two specific base-pair triplets, it results in an overall manuscript writing, provision of materials and infrastructure, and discussion. M.S. increment of twist of only 11 bp per DNA turn. Such a modest overall designed and supervised the project. unwinding would be insufficient to explain the negative supercoiling observed8. Therefore, in agreement with the aforementioned studies, COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. we posit that DNA supercoiling is due to an increase in writhe, resulting from an accumulation of sharp kinks generated by TFAM binding. Published online at http://www.nature.com/nsmb/. Several studies suggest that TFAM binds to LSP or to nonspecific Reprints and permissions information is available online at http://www.nature.com/ sequences as a dimer and cooperatively7,29,32. Based on the crystal reprints/index.html. structure, a second molecule would not fit on LSP-22, even if it con- tained up to 30 bp. TFAM is in monomer-dimer equilibrium in solu- 1. Brandon, M.C. et al. MITOMAP: a human mitochondrial genome database—2004 7,29,32 update. Nucleic Acids Res. 33 (Database issue), D611–3 (2005). tion , and cooperative binding on LSP is conceivable through 2. Fisher, R.P., Topper, J.N. & Clayton, D.A. Promoter selection in human mitochondria protein-protein interactions stimulated by a previously formed 1:1 involves binding of a transcription factor to orientation-independent upstream protein–LSP complex (for example, if the protein–DNA complex regulatory elements. Cell 50, 247–258 (1987). 3. Montoya, J., Gaines, G.L. & Attardi, G. The pattern of transcription of the human ­stabilized a protein-protein interaction surface). After dimerization, mitochondrial rRNA reveals two overlapping transcription units. Cell 34, the additional HMGs would allow formation of DNA loops in addition 151–159 (1983). 4. Legros, F., Malka, F., Frachon, P., Lombes, A. & Rojo, M. Organization and dynamics to DNA bends, thus agreeing with atomic force microscopy studies of human mitochondrial DNA. J. Cell Sci. 117, 2653–2662 (2004). that demonstrated the ability of TFAM cooperatively to introduce 5. Iborra, F.J., Kimura, H. & Cook, P.R. The functional organization of mitochondrial bends and loops into linearized and circular plasmids7. Cooperativity genomes in human cells. BMC Biol. 2, 9 (2004). 6. Bogenhagen, D.F., Rousseau, D. & Burke, S. The layered structure of human on large DNA molecules would arise from binding of successive mon- mitochondrial DNA nucleoids. J. Biol. Chem. 283, 3665–3675 (2008). omers, generating a more favorable substrate for binding of the next 7. Kaufman, B.A. et al. The mitochondrial transcription factor TFAM coordinates the protein by virtue of the structural distortions created on the DNA. assembly of multiple DNA molecules into nucleoid-like structures. Mol. Biol. Cell 18, 3225–3236 (2007). The proposed binding model accounts for the dual function of TFAM 8. Fisher, R.P., Lisowsky, T., Parisi, M.A. & Clayton, D.A. DNA wrapping and bending in transcriptional initiation and DNA compaction. Generation of highly by a mitochondrial high mobility group-like transcriptional activator protein. J. Biol. Chem. 267, 3358–3367 (1992). bent structures like U-turns, within fragments as short as 22 bp, pro- 9. Ekstrand, M.I. Mitochondrial transcription factor A regulates mtDNA copy Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201 et al. vides a tool for the required compaction, as has been postulated for number in mammals. Hum. Mol. Genet. 13, 935–944 (2004). integration host factor in bacterial DNA structures such as the relaxo- 10. Kanki, T. et al. Architectural role of mitochondrial transcription factor A in 47 maintenance of human mitochondrial DNA. Mol. Cell. Biol. 24, 9823–9834 some . Binding of TFAM to mtDNA, initiated by HMG1, is progressive. (2004). This raises the additional possibility that the linker and/or the HMG2 11. Larsson, N.G. et al. Mitochondrial transcription factor A is necessary for mtDNA domain, together with the C-terminal tail, may participate elsewhere in maintenance and embryogenesis in mice. Nat. Genet. 18, 231–236 (1998). 12. Takamatsu, C. et al. Regulation of mitochondrial D-loops by transcription factor A the genome in the recruitment of other proteins, such as members of the and single-stranded DNA-binding protein. EMBO Rep. 3, 451–456 (2002). MTERF family with otherwise nonspecific binding properties48. 13. Alam, T.I. et al. Human mitochondrial DNA is packaged with TFAM. Nucleic Acids Res. 31, 1640–1645 (2003). 14. Ghivizzani, S.C., Madsen, C.S., Nelen, M.R., Ammini, C.V. & Hauswirth, W.W. Methods In organello footprint analysis of human mitochondrial DNA: human mitochondrial Methods and any associated references are available in the online transcription factor A interactions at the origin of replication. Mol. Cell. Biol. 14, 7717–7730 (1994). version of the paper at http://www.nature.com/nsmb/. 15. Ohno, T., Umeda, S., Hamasaki, N. & Kang, D. Binding of human mitochondrial transcription factor A, an HMG box protein, to a four-way DNA junction. Biochem. Accession codes. : coordinates and structure Biophys. Res. Commun. 271, 492–498 (2000). 16. Yoshida, Y. et al. Human mitochondrial transcription factor A binds preferentially factors have been deposited for human TFAM–LSP-22Br with the to oxidatively damaged DNA. Biochem. Biophys. Res. Commun. 295, 945–951 accession code 3TQ6. (2002). 17. Canugovi, C. et al. The mitochondrial transcription factor A functions in mitochondrial base excision repair. DNA Repair (Amst.) 9, 1080–1089 (2010). Note: Supplementary information is available on the Nature Structural & Molecular 18. Falkenberg, M. et al. Mitochondrial transcription factors B1 and B2 activate Biology website. transcription of human mtDNA. Nat. Genet. 31, 289–294 (2002). 19. Sologub, M., Litonin, D., Anikin, M., Mustaev, A. & Temiakov, D. TFB2 is a transient component of the catalytic site of the human mitochondrial RNA polymerase. Acknowledgments Cell 139, 934–944 (2009). We thank C. Silva and J. Colom for technical support. This study was supported 20. Cotney, J. & Shadel, G.S. Evidence for an early duplication event in the by the Ministerio de Ciencia e Innovación (grants BFU2006-09593 to M.S., evolution of the mitochondrial transcription factor B family and maintenance of BFU2009-07134 to M.S., BFU2008-02372 to M.C., CSD2006-00023), Generalitat rRNA methyltransferase activity in human mtTFB1 and mtTFB2. J. Mol. Evol. 63, de Catalunya (SGR2009-1366 to M.S., SGR2009-1309 to M.C., SGR2009-1352 707–717 (2006).

1288 VOLUME 18 NUMBER 11 NOVEMBER 2011 nature structural & molecular biology a r t i c l e s

21. Seidel-Rogol, B.L., McCulloch, V. & Shadel, G.S. Human mitochondrial transcription 34. Gauthier, B.R. et al. PDX1 deficiency causes mitochondrial dysfunction and defective factor B1 methylates ribosomal RNA at a conserved stem-loop. Nat. Genet. 33, insulin secretion through TFAM suppression. Cell Metab. 10, 110–118 (2009). 23–24 (2003). 35. Weinberg, F. et al. Mitochondrial metabolism and ROS generation are essential for 22. Gaspari, M., Falkenberg, M., Larsson, N.G. & Gustafsson, C.M. The mitochondrial Kras-mediated tumorigenicity. Proc. Natl. Acad. Sci. USA 107, 8788–8793 (2010). RNA polymerase contributes critically to promoter specificity in mammalian cells. 36. Guo, J. et al. Frequent truncating mutation of TFAM induces mitochondrial DNA EMBO J. 23, 4606–4614 (2004). depletion and apoptotic resistance in microsatellite-unstable colorectal cancer. 23. McCulloch, V. & Shadel, G.S. Human mitochondrial transcription factor B1 interacts Cancer Res. 71, 2978–2987 (2011). with the C-terminal activation region of h-mtTFA and stimulates transcription 37. Stott, K., Tang, G.S., Lee, K.B. & Thomas, J.O. Structure of a complex of tandem independently of its RNA methyltransferase activity. Mol. Cell. Biol. 23, 5816–5824 HMG boxes and DNA. J. Mol. Biol. 360, 90–104 (2006). (2003). 38. Coll, M., Frederick, C.A., Wang, A.H. & Rich, A. A bifurcated hydrogen-bonded 24. Garstka, H.L. et al. Import of mitochondrial transcription factor A (TFAM) into rat conformation in the d(A.T) base pairs of the DNA dodecamer d(CGCAAATTTGCG) liver mitochondria stimulates transcription of mitochondrial DNA. Nucleic Acids and its complex with distamycin. Proc. Natl. Acad. Sci. USA 84, 8385–8389 Res. 31, 5039–5047 (2003). (1987). 25. Shutt, T.E., Lodeiro, M.F., Cotney, J., Cameron, C.E. & Shadel, G.S. Core human 39. Williams, D.C. Jr., Cai, M. & Clore, G.M. Molecular basis for synergistic transcriptional mitochondrial transcription apparatus is a regulated two-component system in vitro. activation by Oct1 and Sox2 revealed from the solution structure of the 42-kDa Proc. Natl. Acad. Sci. USA 107, 12133–12138 (2010). Oct1.Sox2.Hoxb1-DNA ternary transcription factor complex. J. Biol. Chem. 279, 26. Maniura-Weber, K., Goffart, S., Garstka, H.L., Montoya, J. & Wiesner, R.J. Transient 1449–1457 (2004). overexpression of mitochondrial transcription factor A (TFAM) is sufficient to 40. Klass, J. et al. The role of intercalating residues in chromosomal high-mobility-group stimulate mitochondrial DNA transcription, but not sufficient to increase mtDNA protein DNA binding, bending and specificity. Nucleic Acids Res. 31, 2852–2864 copy number in cultured cells. Nucleic Acids Res. 32, 6015–6027 (2004). (2003). 27. Pohjoismäki, J.L. et al. Alterations to the expression level of mitochondrial 41. Love, J.J. et al. Structural basis for DNA bending by the architectural transcription transcription factor A, TFAM, modify the mode of mitochondrial DNA replication in factor LEF-1. Nature 376, 791–795 (1995). cultured human cells. Nucleic Acids Res. 34, 5815–5828 (2006). 42. Murphy, E.C., Zhurkin, V.B., Louis, J.M., Cornilescu, G. & Clore, G.M. Structural 28. Dairaghi, D.J., Shadel, G.S. & Clayton, D.A. Addition of a 29 residue carboxyl- basis for SRY-dependent 46-X,Y sex reversal: modulation of DNA bending by a terminal tail converts a simple HMG box-containing protein into a transcriptional naturally occurring point mutation. J. Mol. Biol. 312, 481–499 (2001). activator. J. Mol. Biol. 249, 11–28 (1995). 43. Sumitani, M., Kasashima, K., Matsugi, J. & Endo, H. Biochemical properties of 29. Gangelhoff, T.A., Mungalachetty, P.S., Nix, J.C. & Churchill, M.E. Structural analysis Caenorhabditis elegans HMG-5, a regulator of mitochondrial DNA. J. Biochem. 149, and DNA binding of the HMG domains of the human mitochondrial transcription 581–589 (2011). factor A. Nucleic Acids Res. 37, 3153–3164 (2009). 44. Bernadó, P. Effect of interdomain dynamics on the structure determination of modular 30. Masse, J.E. et al. The S. cerevisiae architectural HMGB protein NHP6A complexed proteins by small-angle scattering. Eur. Biophys. J. 39, 769–780 (2010). with DNA: DNA and protein conformational changes upon binding. J. Mol. Biol. 323, 45. Sugimura, S. & Crothers, D.M. Stepwise binding and bending of DNA by Escherichia 263–284 (2002). coli integration host factor. Proc. Natl. Acad. Sci. USA 103, 18510–18514 31. Ohndorf, U.M., Rould, M.A., He, Q., Pabo, C.O. & Lippard, S.J. Basis for recognition of (2006). cisplatin-modified DNA by high-mobility-group proteins. Nature 399, 708–712 (1999). 46. Ohgaki, K. et al. The C-terminal tail of mitochondrial transcription factor a markedly 32. Wong, T.S. et al. Biophysical characterizations of human mitochondrial transcription strengthens its general binding to DNA. J. Biochem. 141, 201–211 (2007). factor A and its binding to tumor suppressor p53. Nucleic Acids Res. 37, 47. Gomis-Rüth, F.X. & Coll, M. Cut and move: protein machinery for DNA processing 6765–6783 (2009). in bacterial conjugation. Curr. Opin. Struct. Biol. 16, 744–752 (2006). 33. Hokari, M. et al. Overexpression of mitochondrial transcription factor A (TFAM) 48. Hyvärinen, A.K., Pohjoismaki, J.L., Holt, I.J. & Jacobs, H.T. Overexpression of ameliorates delayed neuronal death due to transient forebrain ischemia in mice. MTERFD1 or MTERFD3 impairs the completion of mitochondrial DNA replication. Neuropathology 30, 401–407 (2010). Mol. Biol. Rep. 38, 1321–1328 (2011). Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201

nature structural & molecular biology VOLUME 18 NUMBER 11 NOVEMBER 2011 1289 ONLINE METHODS with SHELXD and SHELXE54 using anisotropy-corrected data. Electron den- Protein production and complex preparation. Human mature TFAM (resi- sity modification under two-fold averaging generated an electron density map dues 43–246; UniProt Q00059, Supplementary Fig. 1d) was cloned into pET28b that enabled tracing of almost the complete asymmetric unit, with Coot55. An (Novagen) and produced in Escherichia coli BL21(DE3) pLysS cells, which were anomalous Fourier synthesis with the TFAM–LSP-22Br data phased with the lysed in buffer A (750 mM NaCl, 20 mM imidazole, 50 mM HEPES, 1 mM partial SeMet-TFAM–LSP-22 model showed an anomalous peak (>5σ) at the DTT pH 7.5). The protein was purified by Ni-NTA-affinity chromatography expected positions for bromine, confirming the DNA sequence (Supplementary (HisTrap HP; GE Healthcare) in buffer A, with gradient elution up to 500 mM Fig. 1b). As these crystals diffracted best, the model was refined against these imidazole, and by size exclusion chromatography (HiLoad 26/60 Superdex 75 pg, data with BUSTER (Global Phasing) and completed (see Table 1). The final GE Healthcare) in buffer A without imidazole. A selenomethionine (SeMet) structure includes residues Ser44–Lys237 (chain A) and Ser44–Arg233 (chain B), (Sigma-Aldrich) protein derivative (SeMet-TFAM) was produced by adding and the full sequence for LSP-22 DNA molecules (T1–C22, chains C and E and Se-Met and the other 19 amino acids to minimal medium, and purified as the G1–A22, chains D and F). Final superposition of the coordinates of the SeMet- native protein. Complexes of native TFAM and SeMet-TFAM were obtained TFAM–LSP-22 and TFAM–LSP-22Br complexes revealed that bromine did not with complementary blunt-end oligonucleotides (Biomers) of 22 bp (LSP-22; introduce substantial differences into the overall structure. Accordingly, the 5′-TAACAGTCACCCCCCAACTAAC-3′). An additional complex was Results and Discussion sections refer to the latter structure, which, based on obtained by modifying LSP-22 with T19 replaced with bromouracil (TFAM– MolProbity (http://molprobity.biochem.duke.edu/) shows 376 residues in favored LSP-22Br). dsDNA samples incubated with increasing amounts of TFAM Ramachandran regions and all residues (380) in allowed regions. were subjected to electrophoretic mobility shift assay by native 5% (v/v) PAGE (Supplementary Fig. 1a). Miscellaneous. DNA parameters were analyzed with 3DNA56 and CURVES (http://www.ibpc.fr/UPR9080/Curonline.html). Figures were generated with SAXS measurements. SAXS data of TFAM at 2.3, 5.0 and 9.7 mg ml−1 were PyMOL (DeLano Scientific), Espript57 and 3DDART (http://haddock.chem. measured at beamline X33 of the Deutsches Elektronensynchrotron (DESY, uu.nl/services/3DDART/); sequence alignments were generated with ClustalW Hamburg). The curves were merged and analyzed with PRIMUS49. The forward (http://www.ebi.ac.uk/Tools/msa/clustalw2/). scattering (I(0)) and the radius of gyration (Rg) were evaluated with Guinier’s approximation. The maximum particle dimension (Dmax) and the distance distri- 49. Konarev, P.V., Volkov, V.V., Sokolova, A.V., Koch, M.H.J. & Svergun, D.I. PRIMUS: bution function (p(r)) were calculated from the scattering pattern. The molecular a Windows PC-based system for small-angle scattering data analysis. J. Appl. weight of the protein was estimated by comparison of I(0) with that of a refer- Crystallogr. 36, 1277–1282 (2003). 50. Bernadó, P., Mylonas, E., Petoukhov, M.V., Blackledge, M. & Svergun, D.I. Structural ence solution of BSA. The conformational variability of TFAM was characterized characterization of flexible proteins using small-angle X-ray scattering.J. Am. Chem. 50 with EOM using the crystal structure HMG domains as rigid bodies, whereas Soc. 129, 5656–5664 (2007). linker and C-terminal tail adopted random conformations. For the EOM analysis, 51. Svergun, D.I., Barberato, C. & Koch, M.H.J. CRYSOL—a program to evaluate X-ray standard parameters for the genetic algorithm and for CRYSOL51 were used. solution scattering of biological macromolecules from atomic coordinates. J. Appl. Crystallogr. 28, 768–773 (1995). 52. Kabsch, W. (ed.). Chapter 25.2.9: XDS 730–734 (Kluwer Academic Publishers (for Structure analysis. Orthorhombic (P21212) crystals with two complexes per The International Union of Crystallography), 2001). asymmetric unit were obtained by vapor diffusion at 20 °C for TFAM–LSP-22Br 53. CCP4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr. D (resolution 2.4 Å) and SeMet-TFAM–LSP-22 (resolution 2.7 Å) with 37.5% Biol. Crystallogr. 50, 760–763 (1994). 54. Sheldrick, G.M. A short history of SHELX. Acta Crystallogr. A 64, 112–122 (v/v) PEG1000, 300 mM sodium chloride, 0.1 M Na,K-phosphate, pH 6.2 as (2008). reservoir solution. Diffraction data for SeMet-TFAM–LSP-22 were collected 55. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. at the Se-absorption peak at beamline ID14-4 of the European Synchrotron Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004). Radiation Facility (ESRF, Grenoble, France) and treated with XDS52 and 56. Lu, X.J. & Olson, W.K. 3DNA: a versatile, integrated software system for the analysis, 53 rebuilding and visualization of three-dimensional nucleic-acid structures. SCALA . Diffraction data of TFAM–LSP-22Br were measured at ESRF ID23- Nat. Protoc. 3, 1213–1227 (2008). EH2 at a high-remote energy for bromine (resolution 0.87260 Å). The structure 57. Gouet, P., Courcelle, E., Stuart, D.I. & Metoz, F. ESPript: analysis of multiple of SeMet-TFAM–LSP-22 was solved by single-wavelength anomalous diffraction sequence alignments in PostScript. Bioinformatics 15, 305–308 (1999). Nature America, Inc. All rights reserved. All rights Inc. America, 1 Nature © 201

nature structural & molecular biology doi:10.1038/nsmb.2160