doi:10.1016/S0022-2836(03)00473-X J. Mol. Biol. (2003) 330, 503–511

The Crystal Structure of AF1521 a from Archaeoglobus fulgidus with Homology to the Non-histone Domain of MacroH2A

Mark D. Allen, Ashley M. Buckle, Suzanne C. Cordell, Jan Lo¨ we and Mark Bycroft*

MRC Centre for Protein MacroH2A is an unusual histone H2A variant that has an extensive Engineering and MRC C-terminal tail that comprises approximately two thirds of the protein. Laboratory of Molecular The C-terminal non-histone domain of macroH2A is also found in a Biology, Hills Road, Cambridge number of other and has been termed the macro domain. Here CB2 2QH, UK we report the crystal structure to 1.7 A˚ of AF1521, a protein consisting of a stand-alone macro domain from Archaeoglobus fulgidus. The structure has a mixed a/b fold that closely resembles the N-terminal DNA binding domain of the Escherichia coli leucine aminopeptidase PepA. The structure also shows some similarity to members of the P-loop family of nucleotide hydrolases. q 2003 Elsevier Science Ltd. All rights reserved Keywords: histone macroH2A; crystal structure; P-loop nucleotide *Corresponding author hydrolases; RNA binding; X-chromosome inactivation

Introduction histone.4 It has an amino-terminal domain very similar in sequence to histone H2A followed by a The basic component of is the nucleo- large C-terminal non-histone domain which some, a complex of approximately 150 base-pairs accounts for about two thirds of the protein. The of DNA wrapped around an octomer of the core contains two genes that code for histones H2A, H2B H3 and H4.1 The core histones macroH2A histones. The MACROH2A1 gene contain a central histone fold domain, which is encodes two subtypes MACROH2A1.1 and flanked by N- and C-terminal tails. Histone tails MACROH2A1.2 produced by alternative splicing. are the targets of several forms of covalent modi- A second gene codes for MACROH2A2.5,6 These fication including acetylation, methylation, ubiqui- proteins appear to have a role in transcriptional tination, ADP-ribosylation and phosphorylation. silencing and are enriched in the chromatin of These modifications play a central role in regulat- inactive X-chromsomes.7,8 In female one ing gene expression.2 They can influence nucleo- of the X-chromosomes is transcriptionally some stability and thereby affect the accessibility inactivated to equalise the expression of X-linked of DNA to the transcriptional machinery and can genes between XY males and XX females.9 The also provide docking sites for regulatory proteins. inactivated X-chromosome has several features In addition to the basic core histones a number of associated with gene silencing. Its DNA is hyper- histone variants have been identified. These methylated within CpG islands and its chromatin proteins have a central domain similar to the core is condensed and characterised by hypoacetylation histones but vary at their N and C termini. Their of histones H3 and H4 and hypermethylation of incorporation into nucleosomes offers an alterna- histone H3. The inactivation process can be separ- tive means of modulating chromatin structure and ated into several distinct stages. The initiation and function and variant histones have been implicated establishment of inactivation requires the in the regulation of processes such as transcription expression of a large non-coding RNA Xist, which and DNA repair.3 MacroH2A is the largest variant coats, in cis, the chromosome that is to be inactivated.10,11 Once inactivation has been estab- Abbreviations used: Appr-100p, ADP-ribose 100- lished Xist is not required for the maintenance of phosphate. silencing. Xist is believed to function by recruiting E-mail address of the corresponding author: proteins to the chromosome that modify [email protected] chromatin.12 The role of macroH2A histones in

0022-2836/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved 504 Structure of a Macro Domain

X-inactivation is unclear. Their incorporation into NADþ dependent reaction that removes the 20 the chromatin of the inactivated X-chromosome is phosphate group from the 30,50-phosphodiester-20- a late event that depends on Xist accumulation13 phosphomonoester linkage that is produced and it seems that these proteins play some role in during tRNA splicing in yeast.18 Whether this the establishment of the inactivated state. activity is biologically significant and what The C-terminal non-histone domain of relevance it has to the function of macro domains macroH2A is also found alone or in multiple in other proteins has not been established. This copies in a number of otherwise unrelated result does, however, suggest that this family of proteins14 and has been termed the macro domain15 proteins may act as ADP-ribose phosphoesterases. (Figure 1). One of these proteins, BAL, has been To further our understanding of this family of shown to be a highly expressed risk factor in some proteins we have determined the structure of aggressive lymphomas.16 Macro domains are also AF1521, a macro domain containing protein from found in the non-structural proteins of several the thermophile Archaeoglobus fulgidus. The struc- types of ssRNA viruses including rubella virus ture reveals homology to the P-loop family of and hepatitis E virus. The presence of the macro nucleotide hydrolases. The AF1521 structure is domain in macroH2A histones and in proteins con- also very similar to the N-terminal domain of taining DNA and RNA binding domains would Escherichia coli PepA, a domain known to bind to suggest a role in nucleic acid recognition. In other DNA demonstrating that proteins with this fold proteins, however, it is linked to domains that are are also capable of interacting with nucleic acids. known to have a role in protein–protein (WWE) or protein–lipid (Sec14) interaction. The macro domain is also found on its own in a Results and Discussion family of largely uncharacterised proteins from bacteria, and . The wide distri- Overall structure bution of this protein family suggests that it is involved in an important and ubiquitous cellular The structure of A. fulgidus AF1521 was solved process. The only functional information available using phases derived from a three-wavelength on this group of proteins has come from a genome MAD data set collected from a single crystal of wide screen of yeast proteins for biochemical seleno-methionine-substituted protein (Table 1). activities related to tRNA splicing.17 In this study The structure of the seleno-methionine derivative YBR022Wp, a protein consisting of an isolated was then used to determine the structure of the macro domain, was shown to have ADP-ribose non-substituted protein. The final model of the 100-phosphate (Appr-100p) processing activity. native protein containing all 192 residues of Appr-100p is produced as a by product of the AF1521 was refined against 1.7 A˚ resolution data

Figure 1. A schematic representation of proteins containing macro domains. H2A, histone H2A domain; DUF144, a domain described in the database41 involved in phage tail assembly; SNF2, SNF2 helicase like domain; AAA, AAA ATPase domain; SEC14, lipid-binding domain found in SEC14p and other proteins; HLH, helix-loop-helix DNA binding domain; ZF, zinc finger; RRM, RNA recognition motif; WWE, a protein–protein interaction domain con- taining conserved tryptophan and glutamate residues; PARP, poly-ADP-ribose polymerase domain. This domain was detected using the program Superfamily.42 The Figure was adapted from the SMART database.43 Structure of a Macro Domain 505

Table 1. Summary of statistics for data collection Peak Inflection High-energy remote Native

Space group P21212 P21212 P21212 C2 Wavelength (A˚ ) 0.9797 0.9799 0.9393 1.00 Resolution range (A˚ ) 33.5–2.5 33.5–2.5 33.5–2.5 17.1–1.7 Unique reflections 7251 7251 7251 40692 I=sIa 21.0(8.5) 21.6(8.5) 23.6(10.9) 17.6(5.0) b Rm 0.060 0.058 0.055 0.047 Multiplicityc 7.0 6.9 7.0 3.4 The first three columns refer to the Se derivative and the fourth column refers to the native protein. a Signal to noise ratio of intensities, highest resolution bin in brackets. b l l Rm : ShSi Iðh; iÞ 2 IðhÞ =ShSiIðh; iÞ where Iðh; iÞ are symmetry related intensities and IðhÞ is the mean intensity of the reflection with unique index h: c Multiplicity for unique reflections, for MAD datasets I(þ) and I(2) are kept separate. Correlation coefficients of anomalous differences at different wavelengths for the MAD experiment: Peak versus Inflection, 0.55; Peak versus High-energy remote, 0.70; Inflection versus High-energy remote, 0.46.

(Table 2) and has an Rwork of 20.3% ðRfree ¼ 22:7%Þ: and insertions are restricted to the loops between AF1521 is a compact, single-domain protein with a elements of secondary structure and it is likely mixed a/b structure. It contains a single seven- that the majority of macro domains have a struc- stranded mixed b-sheet with strand order 1276354 ture very similar to that of AF1521. It has been and five a-helices (Figure 2). The two strands at noted that the residues at the N terminus of the the edge of the b-sheet are antiparallel to the macro domain of macroH2A1 could form a leucine others. The b-sheet is sandwiched between four zipper.4 In AF1521 these residues form strands 1 helices, three packing onto one face and one onto and 2 of the b-sheet and a 310 helix. Given the the other. The remaining a-helix packs onto the degree of sequence similarity between macroH2A1 edge of the sheet. and AF1521 it is likely that the equivalent regions A structure-based alignment of representative of macroH2A1 adopt a similar structure. macro domains is shown in Figure 3. Several macro domain only proteins are significantly A putative active site shorter than AF1521. Based on this alignment it would appear that some of these proteins lack The AF1521 structure has a deep cleft in the either the first strand of the b-sheet or the protein surface, a feature typical of an enzyme C-terminal a-helix. Both of these elements of active site (Figure 4). AF1521 is a member of a secondary structure are at the periphery of the family of proteins conserved in bacteria, archaea fold and their absence would not be expected to and eukaryotes that have a high degree of produce a substantial alteration in the structure. sequence similarity (Figure 3). Most of the con- Based on the sequence alignment it would also served residues are located within this cleft. Many appear that in some proteins helices 2 and 3 are of them are solvent exposed, suggesting that they shorter than in AF1521 and that most macro are functionally important and this region of the domains lack the 310 helix in the turn between protein is a good candidate for the active site. Part strands 4 and 5. With these exceptions deletions of this cleft is occupied by a molecule of MES buffer. The residues surrounding the buffer molecule are particularly highly conserved Table 2. Refinement statistics suggesting that it is mimicking a natural ligand. Resolution range (A˚ ) 50.0–1.7 These include a glycine rich sequence that forms No. reflections (working/ 38,653/2039 part of the loop between strand 3 and helix 1, two test set) asparagine residues in strand 3 and residues in No. protein residues A, 192; B, 192 No. water molecules 279 the loop between strand 6 and helix 6 (Figure 4). No. MES molecules 2 Interestingly the alternative splicing of the R-factor; R-freea 0.203, 0.227 MACROH2A1 mRNA alters the protein sequence B-averageb (A˚ 2) 21.5 between the end of strand 2 and the start of helix c ˚ Geometry bonds/angles 0.005 A, 1.1858 1(Figure 3). As this region of the protein forms Ramachandrand 93.8%/0.6% (1 residue/chain) PDB IDe 1HJZ part of the putative active site the two isoforms are likely to have significantly different properties. a 5% of reflections were randomly selected for determination of the free R-factor; prior to any refinement. b AF1521 is a divergent member of the P-loop- Temperature factors averaged for all atoms. c RMS deviations from ideal geometry for bond lengths and containing nucleotide triphosphate restraint angles.40 hydrolase superfamily d Percentage of residues in the “most favoured region” of the Ramachandran plot and percentage of outliers (PROCHECK48). e The central portion of AF1521, including the Protein Data Bank identifiers for co-ordinates. putative active site, shows some structural 506 Structure of a Macro Domain

Figure 2. Overall structure of AF1521. The Figure was prepared using the program PyMOL.44

similarity to members of the P-loop-containing b-sheet and a a-helix that typically adopts the superfamily of nucleotide triphosphate hydrolases sequence pattern GxxxGK[ST] sometimes referred including the hexamerisation domain of N-ethyl- to as the Walker A motif.24 In AF1521 the equival- 19 20 maleimide-Sensitive Fusion Protein, RecA, ent loop is between the end of strand 2 and a 310 helicase core domain21 and F1 ATPase.22 Given helix and contains a highly conserved GDIT motif. that a macro domain containing protein has been In AF1521 this region probably interacts with the shown to hydrolyse Appr-1”p the structural simi- diphosphate group of the substrate and the larity to this protein family is of particular interest. differences in structure and sequence in this part The strongest similarity (Dali Z-score 5.2) is with of the protein presumably reflects this change in CobA, an ATP:co(I)rrinoid adenosyltransferase specificity. CobA and other P-loop containing which transfers an adenosyl moiety from MgATP nucleotide triphosphate hydrolases bind to sub- to a variety of co(I)rrinoid substrates23 (Figure 5). strates using a pocket formed by loops and helices Both CobA and AF1521 contain five strands of at the C-terminal end of the b-sheet (Figure 5). parallel b-sheet hydrogen bonded in the order The equivalent part of the AF1521 structure con- 32451. Although the loops and helices connecting tains the highly conserved MES binding pocket them differ these strands can be can be super- suggesting that the buffer molecule is occupying a imposed with a root mean squared deviation of substrate-. The group of conserved 1.3 A˚ for 30 pairs of aligned Ca atoms. The P-loop- residues at the end of strand 5 in AF1521 is in the containing nucleotide triphosphate hydrolases are same position as the Walker B motif of the P-loop an important and well-studied family of proteins containing nucleotide triphosphate hydrolases. and the residues involved in ligand binding and This sequence motif is involved in catalysis and it catalysis have been characterized in detail. The is likely that the corresponding residues in the active site in these proteins is in the same part of AF1521 family have a similar role. Cleavage of the the structure as the putative active site identified phosphodiester bond of the substrate therefore for AF1521. All of the blocks of amino acids con- probably occurs at the junction between the MES served within the AF5121 family correspond to binding pocket and the rest of the active site cleft. residues that are known to be functionally important in the P-loop-containing nucleotide AF1521 has the same fold as the DNA binding triphosphate hydrolases. The combination of struc- domain of PepA tural similarity and correspondence of active sites taken in aggregate strongly suggest that AF1521 is When the AF1521 structure was searched against evolutionarily related to the P-loop-containing known structures using the program Dali25 the nucleotide triphosphate hydrolases. The available strongest structural similarity detected was to the biochemical data suggest that macro domains N-terminal domain of leucine aminopeptidase hydrolyse ADP-ribose derivatives rather than ðZ ¼ 10:2Þ:26 Leucine aminopeptidases catalyse the nucleotide triphosphates. The P-loop containing hydrolysis of amino acids from the amino terminus nucleotide triphosphate hydrolases all interact of polypeptide chains and are widely distributed with the triphosphate moiety of their substrates in plants, animals and bacteria. The AF1521 struc- using a P-loop situated between a strand of parallel ture is most similar to the N-terminal domain of Structure of a Macro Domain 507

Figure 3. Structure based sequence alignment of selected macro domains. Residues conserved in the AF1521 family are highlighted. The protein name is followed by the species abbreviation. The limits of the domain are given in parenthesis to the right of the alignment followed by the gene bank GI identification number. The residues altered by the alternative splicing of the MACROH2A1 gene are underlined. YMDB_ECOLI and LRP 16 are AF1521 orthologues from E. coli and mouse. Q96F23 is a human protein that is a member of a second family of macro domain only proteins found in eukaryotes. GDAP245 is a human protein with a N-terminal macro domain and a C-terminal domain similar in sequence to the lipid-binding domain of the S. cerevisiae protein Sec14p.46 Similar proteins are found in Drosophila melanogaster and Arabidopsis thaliana. Q9CXF7 is a mouse protein with a N-terminal SNF2 like helicase domain and a C-terminal macro domain. Related proteins are found in Homo sapiens and in a variety of plants.47 The other proteins are referred to in the text.

E. coli PepA aminopeptidase.27 Although there is logous family in the SCOP database.28 The amino- no significant sequence similarity between the two peptidase active site is located entirely within the proteins they can be superimposed with a root C-terminal domain. PepA also functions as a mean squared deviation of 1.6 A˚ for 106 pairs of DNA-binding protein in Xer site-specific recombi- aligned Ca atoms (Figure 6). This degree of struc- nation and in transcriptional control of the carAB tural similarity indicates that these proteins are operon. Mutagenesis studies have shown that the descended from a common ancestor and would be N-terminal domain is mainly responsible for DNA classified as being members of the same homo- binding.29 It seems likely that the N-terminal 508 Structure of a Macro Domain

Figure 4. (a) Surface represen- tation of the structure of AF1521 showing the putative active site. The residues conserved in related proteins are coloured in blue. The bound MES buffer molecule is rep- resented by spheres. (b) The same view with the protein molecule shown as a ribbon. The Figure was prepared using the program PyMOL.44

Figure 5. Comparison of the structure of AF1521 and CobA. The ATP and hydroxycobalamin mol- ecules bound to CobA are coloured blue and yellow, respectively. The residues that interact with the phos- phate groups of ATP in CobA, and those that are proposed to interact with the phosphate groups of nucleotide substrates in AF1521 are coloured red. The Figure was pre- pared using the program PyMOL.44

Figure 6. Structural homology of AF1521 and the N-terminal domain of PepA. Stereoview of the superimposed folds of AF1521 and the N-terminal domain of PepA. The regions that can be superimposed within 1.6 A˚ are coloured red. domain of PepA is derived from a fusion between the AF1521 family, particularly for residues that a protein with peptidase activity and a macro are likely to be involved in catalysis or substrate domain containing protein that subsequently binding, suggests that these proteins are all ortho- evolved a DNA binding activity. logous that carry out the same ubiquitous cellular function. Sequence variation in macro domains The sequences of the macro domains of eukaryotic and viral proteins, however, diverge The high degree of sequence similarity within significantly from that of AF1521 particularly in Structure of a Macro Domain 509 the putative active site suggesting that they do not been established this is a possibility worth retain the function of the parent protein family exploring. (Figure 3). This sequence variation may reflect a change in substrate specificity, as the residues that are probably involved in substrate binding are Conclusions especially variable. In mammalian proteins the macro domain is often found in association with Both structural and biochemical evidence now poly-ADP-ribose polymerase domains (Figure 1) strongly suggest that AF1521 and related proteins and it is possible that the macro domains in these are divergent members of the P-loop containing proteins are regulating protein–ADP superfamily of phosphoesterases that act on ADP ribosylation.15 This post-translational modification ribose derivatives. The structural similarity to a is catalyzed by ADP-ribosyl transferases that attach known DNA binding domain demonstrates that ADP-ribose to target proteins using NADþ as a this fold is also capable of interacting with nucleic precursor. Protein–ADP ribosylation has been acids. The ability of this scaffold to fulfil both a shown to play a role in DNA repair and replica- catalytic and binding role may explain the wide tion, modulation of chromatin structure, and distribution of the macro domain. The structure .30 The sequences of some macro described here can now be used to design experi- domains, however, vary so much from that of the ments to help to further elucidate the role of this AF1521 family that it is not clear if they still retain important module. catalytic activity. It is possible that in these pro- teins, as is the case with the PepA N-terminal domain, the macro domain functions as a nucleic Materials and Methods acid binding module. The gene coding for AF1521 protein was amplified Implications for the role of macroH2A histones from A. fulgidus DNA using PCR amplification and in chromosome silencing cloned into a pRSETA (Invitrogen) derivative that enables proteins to be expressed with a thrombin The recruitment of macroH2A to the inactivated cleavable N-terminal hexa-histidine tag. The plasmid 34 X-chromosome is dependent on the accumulation was transformed into C41(DE3) E. coli cells. Cells were of the non-coding RNA Xist and it has been grown at 37 8C in L-broth to mid log phase and induced suggested that there may be a direct interaction with 1 mM IPTG. The temperature was then reduced to 25 8C and the cells were grown for a further 16 hours. between the protein and the RNA mediated by 14 Cells were lysed by sonication and the fusion protein the histone’s C-terminal macro domain. The find- was purified using a Ni-NTA Superflow affinity column. ing that a related protein can bind to nucleic acids Following cleavage with thrombin (four hours at 30 8C) supports this proposal. The role of macroH2A AF1521 was purified by gel filtration using a Superdex could simply be to consolidate the binding of Xist 75 HR column (Amersham Pharmacia) equilibrated in to the inactive X-chromosome. Alternatively the 50 mM potassium phosphate buffer (pH 8.0), 50 mM macro domain could also be acting as a DNA bind- NaCl. The protein eluted from this column as a mono- ing domain and may alter chromatin structure by mer. The seleno-methionine-substituted protein was pre- interacting with linker DNA performing a role pared in an identical manner except that the cells were analogous to that of the N-terminal domain of grown in M9 minimal media supplemented with seleno-methionine. PepA. It is also possible that the macro domain of macroH2A is involved in the chemical modifi- cation of chromatin. There is growing evidence Crystallization and data collection that poly-ADP-ribosylation has a role in transcrip- tional regulation via the modulation of chromatin Native crystals were grown using the sitting drop structure.31 Poly-ADP-ribosylation of linker vapour diffusion technique using 18% (w/v) PEG 5000 MME, 0.1 M MES pH 6.5, 0.2 M ammonium sulphate, histones has, for example, been shown to induce and 25 mM CdCl2 as the crystallization solution. Drops chromatin decondensation and protect CpG composed of 5 ml of protein at 10 mg/ml and 5 mlof dinucleotides from full methylation in genomic crystallization solution were equilibrated for a minimum DNA.32 The macro domain of macroH2A could of three days at 17 8C. SeMet-substituted crystals were therefore be acting as an ADP-ribose phospho- grown in the same manner as for the native protein but esterase. Poly-ADP-ribosylation is normally with 18% PEG 5000 MME, 0.1 M MES pH 6.5, 0.2 M reversible. The enzyme poly-ADP-ribose glyco- ammonium sulphate, and 25 mM MnCl2 for the crystal- hydrolase can remove poly-ADP-ribose chains lization solution, and additionally 5 mM DTT in the leaving the protein free for future modification.33 drop. It is possible that the macro domain of macroH2A Crystals were flash-cooled in liquid nitrogen in the cleaves poly-ADP-ribose chains in a way that mother liquor with 20% PEG 200. The native protein crystals belong to space group C2 and have cell blocks subsequent modification. This could then dimensions a ¼ 119:93 A ; b ¼ 55:72 A ; c ¼ 62:74 A ; act as an epigenetic marker that helps to maintain b ¼ 115.158. The SeMet-substituted crystals belong to the inactive state even in the absence of the variant space group P21212 and have cell dimensions a ¼ histone. Although no link between X-chromosome 62:85 A ; b ¼ 56:1 A ; c ¼ 56:1 A : The native dataset was inactivation and poly-ADP-ribosylation has yet collected at beamline 14-112, Daresbury, UK and the 510 Structure of a Macro Domain

MAD datasets were collected at ID14-4 ESRF, Grenoble. & Jaenisch, R. (1999). Conditional deletion of Xist X-ray diffraction data were indexed and integrated disrupts histone macroH2A localization but not using the MOSFLM package35 and were further maintenance of X inactivation. Nature Genet. 22, processed using the CCP4 package. 323–324. 14. Pehrson, J. R. & Fuji, R. N. (1998). Evolutionary con- Structural determination and refinement servation of histone macroH2A subtypes and domains. Nucl. Acids Res. 26, 2837–2842. An initial 2.5 A˚ MAD density map was generated by 15. Aravind, L. (2001). The WWE domain: a common locating five selenium sites in the data sets Peak, Inflec- interaction module in protein ubiquitination and tion, High-energy remote using the program SOLVE,36 ADP ribosylation. Trends Biochem. Sci. 26, 273–275. which was also used to calculate phases. RESOLVE37 16. Aguiar, R. C. T., Yakushijin, Y., Kharbanda, S., Salgia, was used for solvent flattening, assuming a 48% solvent R., Fletcher, J. A. & Shipp, M. A. (2000). BAL is a content. The structure was built using MAIN38 and novel risk-related gene in diffuse large B-cell refined using CNS39 with Engh and Huber stereo- lymphomas that enhances cellular migration. Blood, chemical parameters.40 CNS was used for molecular 96, 4328–4334. replacement to determine the structure of the native 17. Martzen, M. R., McCraith, S. M., Spinelli, S. L., protein in the C2 crystals. Details of the final model are Torres, F. M., Fields, S., Grayhack, E. J. & Phizicky, summarized in Table 2. E. M. (1999). A biochemical genomics approach for identifying genes by the activity of their products. Science, 286, 1153–1155. 18. Culver, G. M., McCraith, S. M., Zillmann, M., References Kierzek, R., Michaud, N., Lareau, R. D. et al. (1993). An NAD derivative produced during transfer-RNA 1. Luger, K., Mader, A. W., Richmond, R. K., Sargent, 00 00 D. F. & Richmond, T. J. (1997). Crystal structure of splicing—ADP-ribose 1 -2 cyclic phosphate. Science, the nucleosome core particle at 2.8 Angstrom 261, 206–208. resolution. Nature, 389, 251–260. 19. Lenzen, C. U., Steinmann, D., Whiteheart, S. W. & 2. Gamble, M. J. & Freedman, L. P. (2002). A coactivator Weis, W. I. (1998). Crystal structure of the hexameri- code for transcription. Trends Biochem. Sci. 27, zation domain of N-ethylmaleimide-sensitive fusion 165–167. protein. Cell, 94, 525–536. 3. Ausio, J., Abbott, D. W., Wang, X. Y. & Moore, S. C. 20. Story, R. M. & Steitz, T. A. (1992). Structure of the (2001). Histone variants and histone modifications: a RecA protein–ADP complex. Nature, 355, 374–376. structural perspective. Biochem. Cell Biol. 79, 693–708. 21. Story, R. M., Li, H. & Abelson, J. N. (2001). Crystal 4. Pehrson, J. R. & Fried, V. A. (1992). MacroH2A, a core structure of a DEAD box protein from the hyper- histone containing a large nonhistone region. Science, thermophile Methanococcus jannaschii. Proc. Natl 257, 1398–1400. Acad. Sci. USA, 98, 1465–1470. 5. Costanzi, C. & Pehrson, J. R. (2001). MACROH2A2, a 22. Abrahams, J. P., Leslie, A. G. W., Lutter, R. & Walker, new member of the MACROH2A core histone J. E. (1994). Structure at 2.8-Angstrom resolution of family. J. Biol. Chem. 276, 21776–21784. F1-ATPase from bovine heart-mitochondria. Nature, 6. Chadwick, B. P. & Willard, H. F. (2001). Histone H2A 370, 621–628. variants and the inactive X chromosome: identi- 23. Bauer, C. B., Fonseca, M. V., Holden, H. M., Thoden, fication of a second macroH2A variant. Hum. Mol. J. B., Thompson, T. B., Escalante-Semerena, J. C. & Genet. 10, 1101–1113. Rayment, I. (2001). Three-dimensional structure of 7. Costanzi, C. & Pehrson, J. R. (1998). Histone ATP: corrinoid adenosyltransferase from Salmonella macroH2A1 is concentrated in the inactive X typhimurium in its free state, complexed with chromosome of female mammals. Nature, 393, MgATP, or complexed with hydroxycobalamin and 599–601. MgATP. Biochemistry, 40, 361–374. 8. Perche, P. Y., Vourc’h, G., Konecny, L., Souchier, C., 24. Walker, J. E., Saraste, M., Runswick, M. J. & Gay, N. J. Robert-Nicoud, M., Dimitrov, S. & Khochbin, S. (1982). Distantly related sequences in the alpha-sub- (2000). Higher concentrations of histone macroH2A units and beta-subunits of ATP synthase, myosin, in the Barr body are correlated with higher nucleo- kinases and other ATP-requiring enzymes and a some density. Curr. Biol. 10, 1531–1534. common nucleotide binding fold. EMBO J. 1, 9. Boumil, R. M. & Lee, J. T. (2001). Forty years of 945–951. decoding the silence in X-chromosome inactivation. 25. Holm, L. & Sander, C. (1995). Dali—a network tool Hum. Mol. Genet. 10, 2225–2232. for protein-structure comparison. Trends Biochem. 10. Brockdorff, N., Ashworth, A., Kay, G. F., McCabe, Sci. 20, 478–480. V. M., Norris, D. P., Cooper, P. J. et al. (1992). The 26. Burley, S. K., David, P. R., Taylor, A. & Lipscomb, product of the mouse Xist gene is a 15 Kb inactive W. N. (1990). Molecular-structure of leucine amino- X-specific transcript containing no conserved ORF peptidase at 2.7-A˚ resolution. Proc. Natl Acad. Sci. and located in the nucleus. Cell, 71, 515–526. USA, 87, 6878–6882. 11. Brown, C. J., Hendrich, B. D., Rupert, J. L., 27. Strater, N., Sherratt, D. J. & Colloms, S. D. (1999). Lafreniere, R. G., Xing, Y., Lawrence, J. & Willard, X-ray structure of aminopeptidase A from Escherichia H. F. (1992). The human Xist gene—analysis of a coli and a model for the nucleoprotein complex in 17 Kb inactive X-specific RNA that contains con- Xer site-specific recombination. EMBO J. 18, served repeats and is highly localized within the 4513–4522. nucleus. Cell, 71, 527–542. 28. Lo Conte, L., Brenner, S. E., Hubbard, T. J. P., 12. Cohen, D. E. & Lee, J. T. (2002). X-chromosome Chothia, C. & Murzin, A. G. (2002). SCOP database inactivation and the search for chromosome-wide in 2002: refinements accommodate structural silencers. Curr. Opin. Genet. Dev. 12, 219–224. genomics. Nucl. Acids Res. 30, 264–267. 13. Csankovszki, G., Panning, B., Bates, B., Pehrson, J. R. 29. Charlier, D., Kholti, A., Huysveld, N., Gigot, D., Structure of a Macro Domain 511

Maes, D., Thia-Toong, T. L. & Glansdorff, N. (2000). 39. Brunger, A. T., Adams, P. D., Clore, G. M., DeLano, Mutational analysis of Escherichia coli PepA, a multi- W. L., Gros, P., Grosse-Kunstleve, R. W. et al. (1998). functional DNA-binding aminopeptidase. J. Mol. Crystallography & NMR system: a new software Biol. 302, 411–426. suite for macromolecular structure determination. 30. D’Amours, D., Desnoyers, S., D’Silva, I. & Poirier, Acta Crystallog. sect. D, 54, 905–921. G. G. (1999). Poly(ADP-ribosyl)ation reactions in the 40. Engh, R. A. & Huber, R. (1991). Accurate bond and regulation of nuclear functions. Biochem. J. 342, angle parameters for X-ray protein-structure refine- 249–268. ment. Acta Crystallog. sect. A, 47, 392–400. 31. Tulin, A., Stewart, D. & Spradling, A. C. (2002). The 41. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Drosophila heterochromatic gene encoding poly- Etwiller, L., Eddy, S. R. et al. (2002). The Pfam protein (ADP-ribose) polymerase (PARP) is required to families database. Nucl. Acids Res. 30, 276–280. modulate chromatin structure during development. 42. Gough, J., Karplus, K., Hughey, R. & Chothia, C. Genes Dev. 16, 2108–2119. (2001). Assignment of homology to genome 32. De Capoa, A., Febbo, F. R., Giovannelli, F., Niveleau, sequences using a library of Hidden Markov Models A., Zardo, G., Marenzi, S. & Caiafa, P. (1999). that represent all proteins of known structure. J. Mol. Reduced levels of poly(ADP-ribosyl)ation result in Biol. 313, 903–919. chromatin compaction and hypermethylation as 43. Letunic, I., Goodstadt, L., Dickens, N. J., Doerks, T., shown by cell-by-cell computer-assisted quantitative Schultz, J., Mott, R. et al. (2002). Recent improve- analysis. FASEB J. 13, 89–93. ments to the SMART domain-based sequence anno- 33. Davidovic, L., Vodenicharov, M., Affar, E. B. & tation resource. Nucl. Acids Res. 30, 242–244. Poirier, G. G. (2001). Importance of poly(ADP-ribose) 44. DeLano, W. L. (2002). The PyMOL User’s Manual, glycohydrolase in the control of poly(ADP-ribose) DeLano Scientific, San Carlos. metabolism. Expt. Cell Res. 268, 7–13. 45. Liu, H., Nakagawa, T., Kanematsu, T., Uchida, T. & 34. Miroux, B. & Walker, J. E. (1996). Over-production of Tsuji, S. (1999). Isolation of 10 differentially proteins in Escherichia coli: mutant hosts that allow expressed cDNAs in differentiated Neuro2a cells synthesis of some membrane proteins and globular induced through controlled expression of the GD3 proteins at high levels. J. Mol. Biol. 260, 289–298. synthase gene. J. Neurochem. 72, 1781–1790. 35. Leslie, A. G. W. (1998). MOSFLM, 6.0 edit., 46. Aravind, L., Neuwald, A. F. & Ponting, C. P. (1999). Cambridge, UK. Sec14p-like domains in NF1 and Dbl-like proteins 36. Terwilliger, T. C. & Berendzen, J. (1999). Automated indicate lipid regulation of Ras and Rho signaling. MAD and MIR structure solution. Acta Crystallog. Curr. Biol. 9, 195–197. sect. D, 55, 849–861. 47. Yan, L., Echenique, V., Busso, C., SanMiguel, P., 37. Terwilliger, T. C. (2000). Maximum-likelihood Ramakrishna, W., Bennetzen, J. L. et al. (2002). Cereal density modification. Acta Crystallog. sect. D, 56, genes similar to SW define a new subfamily that 965–972. includes human and mouse genes. Mol. Genet. 38. Turk, D. (1992). Weiterentwicklung eines Programms Genomics, 268, 488–499. fu¨ r Moleku¨ lgrafik und Elektrondichte-Manipulation 48. Laskowski, R. A., Macarthur, M. W., Moss, D. S. & und seine Anwendung auf verschiedene Protein- Thornton, J. M. (1993). PROCHECK—a program to Strukturaufkla¨rungen. PhD Thesis, Technische check the stereochemical quality of protein struc- Universita¨tMu¨ nchen, Mu¨ nchen tures. J. Appl. Crystallog. 26, 283–291.

Edited by R. Huber

(Received 2 December 2002; received in revised form 5 April 2003; accepted 7 April 2003)