A NALYSIS OF G ENOMIC I NFORMATION tation of the domain architecture for each of the , are available at ftp://ncbi.nlm. Apoptotic Molecular Machinery: nih.gov/pub/koonin/PCD. Examination of the number of occurrenc- Vastly Increased Complexity in es of several domains that perform central functions in shows a marked ex- Vertebrates Revealed by pansion in vertebrates relative to insects and nematodes ( Table 1). The growth in the num- ber of these domains detectable in humans Genome Comparisons was noticed in all functional categories of L. Aravind,1 Vishva M. Dixit,2 Eugene V. Koonin1* proteins that contribute to programmed death, but was particularly striking among the A comparison of the proteins encoded in the recently (nearly) completed human extracellular components of the apoptotic genome to those from the fly and nematode genomes reveals a major increase system (ligands and receptors), the intracel- in the complexity of the apoptotic molecular machinery in vertebrates, in terms lular adaptor domains that transfer the signal of both the number of proteins involved and their domain architecture. Several from the receptors to the executors of apo- components of the apoptotic system are shared by humans and flies, to the ptosis (such as caspases), the BCL2 family of exclusion of nematodes, which seems to support the existence of a coelomate apoptosis regulators, and the NACHT family clade in animal evolution. A considerable repertoire of apoptotic of nucleoside triphosphatases (NTPases). domains was detected in Actinomycetes and Cyanobacteria, which suggests a In vertebrates, several secreted ligands, major contribution of horizontal gene transfer to the early evolution of primarily members of the tumor necrosis fac- apoptosis. tor (TNF) family, directly induce apoptosis (5). A single, previously undetected member Comparison of genome sequences—or melanogaster, and Caenorhabditis elegans of the TNF family was identified in Drosoph- more precisely, of the protein sequences (3). It is only with the near-completion of the ila, which suggests that this was al- encoded in genomes—is a potentially pow- human genome sequence that such a compar- ready present before the divergence of the erful tool for identifying the components of ison is poised to present an accurate picture coelomates (6). The TNF family proteins functional systems and reconstructing their of the relationships between the programmed function through specific receptors (TNFRs) evolution. Such comparisons allow re- cell death systems in vertebrates and inverte- that contain multiple repeats of an extracel- searchers to transfer information from well- brates, and the results show a strikingly in- lular cysteine-rich domain and an intracellu- studied model organisms to poorly charac- creased complexity of the apoptosis machin- lar Death domain (DD). Predicted receptors terized ones and to draw functional and ery in the former. with a single copy of the cysteine-rich do- evolutionary inferences from the presence, The evolutionary engineering of the apo- main are present in Drosophila, C. elegans, absence, and relative abundance of genes ptotic system followed the same pattern as and plants (6), but none of them has the same coding for different types of proteins in the seen in other signal transduction and regula- architecture as the vertebrate TNFRs, and compared genomes. Programmed cell death tory systems, particularly in eukaryotes, accordingly these proteins cannot be consid- (apoptosis) is one of the central cellular namely the formation of a wide variety of ered TNFR orthologs (direct evolutionary processes in development, the stress re- protein domain architectures from a relatively counterparts). sponse, aging, and disease in multicellular small set of ancient conserved domains (4). The transmission of the external cell death eukaryotes (1). Comparative analysis of the Therefore, we applied a domain-centered ap- stimuli to the executors of apoptosis, such as components of the apoptotic machinery proach to the comparative study of this sys- the caspases, is largely mediated by several have shown that many of the protein do- tem in animals. First, the occurrences of the specialized adaptor domains. The most prom- mains that perform critical roles in this individual domains in apoptotic proteins were inent apoptotic adaptors are the CARD, DED, system were already present in the common enumerated as accurately as possible by using pyrin, and Death domains that have a com- ancestor of animals, plants, and fungi (2). a sensitive sequence analysis method based mon fold with six ␣ helices and probably From the functions of the extant proteins on the information contained in the multiple evolved from a common ancestor before the containing these conserved domains, it can alignments of the corresponding protein se- divergence of the extant animal lineages (2, be extrapolated that they participated in quences (4). Second, the domain architec- 7). A conspicuous expansion in the number ancestral signaling pathways, including tures of the apoptotic proteins identified in of distinct proteins that contain these do- those for pathogen and stress responses. humans, flies, and nematodes (and, if appli- mains, particularly the CARD domain, is The evolution of the programmed cell death cable, other organisms) were systematically seen in humans (Table 1). Furthermore, a system from such signaling pathways was compared. A panoply of proteins with func- previously undetected version of the ␣-heli- probably driven by general kin selection tions in almost every basic cellular process cal adaptor module, the pyrin domain, which during the emergence of multicellularity. have been directly or indirectly linked to was predicted during the present protein se- Here, we briefly discuss the results of a apoptosis, which is not too surprising because quence analysis, appears to be vertebrate- comparative analysis of the nearly complete programmed cell death is a complicated se- specific. The pyrin domain was identified in protein sets of Homo sapiens, Drosophila ries of events involving various cellular sub- pyrin, Asc (a CARD-domain protein), the systems. Nevertheless, here we restrict the interferon-induced protein 16, the AIM2 pro- discussion to the central participants of cell tein, a caspase from zebrafish, and some un- 1National Center for Biotechnology Information, Na- death signaling and execution and their ho- characterized proteins that also contain the tional Library of Medicine, National Institutes of mologs that might shed light on the origin NACHT NTPase domain (8). Health, Bethesda, MD 20894, USA. 2Department of Molecular Oncology, Genentech Inc., 1 DNA Way, and evolution of apoptotic mechanisms. The BCL2 family proteins are conserved South San Francisco, CA 94080, USA. Complete lists of the Gene Identifiers (GI in all animals and have been implicated in *To whom correspondence should be addressed. E- numbers) for all detected components of the alteration of mitochondrial permeability re- mail: [email protected] apoptotic machinery, including a brief anno- sulting in leakage of cytochrome c and the

www.sciencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 1279 A NALYSIS OF G ENOMIC I NFORMATION triggering of apoptosis (9). The antiapoptotic in the coelomates, whereas the ancestral form toward diversification in vertebrates seems to members of this family interact with the apparently functioned only in the antiapop- hold among the BH3-only proteins because Ced4-like apoptotic adenosine triphosphata- totic capacity. The vertebrates show a prolif- several proteins of this group have been iden- ses (AP-ATPases) and inhibit their function eration of both these versions, with extreme tified in mammals, as opposed to only one in caspase activation (10), whereas the pro- sequence divergence observed in several pro- found thus far in C. elegans (EGL-1) (12). apoptotic family members (for example, apoptotic members such as Bid and Mil1 However, there is no statistically significant BAK) apparently interact antagonistically (10). similarity between diverse BH3 proteins, nor with antiapoptotic forms. Consistent with the Another group of proapoptotic proteins— is the motif itself prominent enough to allow experimental data, Ced-9 from C. elegans the so-called BH3-only proteins, which share reliable sequence-based predictions in ge- and the poriferan BCL2 homolog cluster with only a region of limited sequence similarity nome-wide searches. It remains unclear the antiapoptotic members of the family in (the BH3 motif ) with the BCL2 family pro- whether all reported occurrences of the BH3 phylogenetic analyses, whereas the fly teins—have recently attracted considerable motif are functionally relevant and whether BCL2s cluster with the proapoptotic versions attention (12). The BH3-only proteins inter- the BH3-only proteins share a common (11). Thus, the differentiation between the act with antiapoptotic members of the BCL2 ancestry. proapoptotic and antiapoptotic members of family via an amphipathic helix formed by The NACHT NTPases that appear to be a the BCL2 family might have been established the BH3 motif and inactivate them. The trend sister group of the well-characterized AP-

Table 1. Domains and proteins involved in apoptosis and related pathways. The number of detected proteins containing each domain is indicated for each organism. H indicates the presence of homologous domains, but not orthologs.

Vertebrates Arthropods Nematodes Protein/domain family Others (human) (Drosophila) (C. elegans)

Receptors TNFR 8 H H H (plants) IL1-like 8 0 0 0 Toll-like 10 8 0 Ligands TNF 17 1 0 0 Cysteine knots TGF-like: 12; TGF-like: 3; Spaetzle- 00 NGF-like: 3 like: 3; NGF-like: 1 Adaptors (six–␣-helix domains) Death 30 9 6 0 DED 7 1 0 0 CARD 20 1 2 0 Pyrin 8 0 0 0 Adaptors (other) TIR 22 10 1 Arabidopsis (ϳ135); Streptomyces (4); 1 each in a number of other bacteria MATH (TRAF-like) 6 3 1 Dictyostelium (3); MATH domains found in all other eukaryotes BCL2-family 11 2 1 Enzymes Caspases Classic caspase: Classic caspase: 7 Classic caspase: 4; Arabidopsis (metacaspase: Ն10); yeast, 14 (one inactive); paracaspase: 1 Plasmodium, Leishmania (metacaspase: paracaspase: 1 Ն1); Anabaena (metacaspase: Ն6); one metacaspase in several other bacteria; Dictyostelium (paracaspase: Ն1) A20 3 1 1 Kinases IKK: 4; DAP: 1; IKK: 2; NIK: 1; DAP: 1; IRAK: 1 H NIK: 1; IRAK: 4 IRAK: 1 NTPases AP-ATPase 1 1 1 Arabidopsis (ϳ173); Streptomyces (8); M. tuberculosis (6); 1 each in a number of other prokaryotes NACHT 18 (NAIP-like: 2 (TP1-like: 1; 1 (TP1-like) Streptomyces (3); Synechocystis (2); 17; TP1-like: 1) a distinct form: 1) Anabaena (6) D-GTPase 2 1 2 Arabidopsis (1) Nuclear factors NF␬B 5 3 H H (plants, fungi) NFAT 6 1 H H (plants, fungi) P53 3 1 0 0 E2F 8 2 3 Arabidopsis (6) DP1 5 1 1 Arabidopsis (2) STAT 6 1 4 Dictyostelium (1) RB 3 1 1 Arabidopsis (1) CAD 5 4 0 0 BIR 8 4 2 Yeast (1)

1280 16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org A NALYSIS OF G ENOMIC I NFORMATION ATPases, such as Apaf1 and Ced4, have been closely related to CIITA, NAIP, and CARD4 architectures, particularly in vertebrates identified as participants in diverse regulatory rather than to TP1. In mice, a locus contain- (Figs. 1 and 2). Together with the numerical interactions including activation of the tran- ing several highly conserved NAIP paralogs expansion, this amounts to a major increase scription factor NF␬B and apoptosis regula- affects the survival of the pathogenic bacte- in the complexity of the apoptotic system tion (CARD4/NOD1 and NAIP), transcrip- rium Legionella in macrophages (15). Thus, (Fig. 2). The diversification of domain archi- tion regulation (CIITA), and telomerase func- the additional human NACHT NTPases tectures and increase in overall complexity tion (TP1) (13). All animals encode a TP1 might function in the regulation of immune are particularly remarkable in the case of the ortholog that probably regulates the telo- response and related apoptotic processes. DD-fold adaptor domains that contribute to merase function and may not be involved in The increase in the number of proteins many domain combinations unique to the apoptosis (14). In addition, analysis of the containing apoptosis-associated domains is vertebrates, in addition to those formed by human protein set shows a major, previously accompanied by diversification of their do- the vertebrate-specific offshoot of this fold, undetected expansion of the NACHT NTPase main architectures, including the emergence the pyrin domain (Figs. 1 and 2). The family; all newly detected members are of a considerable number of lineage-specific NACHT NTPases also show diversification

Fig. 1. Domain architectures of apoptotic proteins and their advent in immunoglobulin domain (as in NF␬B); P53F, P53 fold all–␤-strand domain; evolution. The evolutionary tree for the major divisions of life is shown a20, A20-like Zn-finger; OTA20, OTU-A20–like predicted protease under the assumptions of an archaeal-eukaryotic clade (28) and a domain; S, SAM domain; R, RING finger; Cr, cysteine-rich domain; coelomate clade (see text). Each box shows the domain architectures of Math, meprin-associated Traf homology domain; Tir, Toll–interleukin proteins that are either specific to a particular lineage (for example, 1 (IL1) domain; ank, ankyrin repeat domain; TSP, throm- vertebrates) or are shared by the two lineages coming out of a given bospondin domain; wd, WD40 propeller domain; zu5, zona pellucida internal node (for example, vertebrates and arthropods) and therefore Unc-5 domain; P84, conserved domain in the human P84 protein; c4, inferred to have been present in their common ancestor. The direction of C4 “little” finger domain; CAD, common domain found in CAD and probable horizontal transfer of genes encoding homologs of apoptotic ICAD; UB, ubiquitin domain; UBC-E2, ubiquitin-conjugating E2 en- proteins is tentatively shown by an arrow pointing from bacteria to zyme; tnfr, cysteine-rich domain in TNFR; Spry, SplA-ryanodine re- eukaryotes. E.B., early branching (eukaryotes). Domain name abbrevia- ceptor domain; B, B-box domain; lrr, leucine-rich domain; arm, arma- tions: De, Death effector domain; D, Death domain; C, Card domain; dillo repeat; tpr, tetratricopeptide repeat; lsd1, plant hypersensitive P, pyrin domain; Ig, immunoglobulin domain; Tig, response protein LSD1-like Zn-finger domain.

www.sciencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 1281 A NALYSIS OF G ENOMIC I NFORMATION of the domain architectures in vertebrates domain protein survivin, appear to have an previously undetected ZU5 domain (23)in through the addition of a variety of domains, ancient function related to mitosis and cell the netrin receptors (Unc-5) (24) and such as BIR repeats, CARD, and pyrin, to an cycle regulation rather than to apoptosis (22); ankyrins in all animals suggests that these ancestral core that consists of a NACHT do- this protein probably has been recruited for domains were already used in cytoskeleton- main and leucine-rich repeats (Fig. 1). its antiapoptotic function only in the coelo- and receptor-mediated signaling. Other com- In addition to the expansion and diversi- mates. Others, including proteins containing ponents of the apoptotic system and related fication of proteins containing evolutionarily the TIR, caspase, and AP-ATPase domains, molecules that were probably present in the conserved domains, several proteins with no possibly interacted to form one or more path- common ancestor of all animals include the detectable homologs outside the respective ways related to apoptosis and involved in BCL-2 family proteins, certain adaptors such lineages have been implicated in apoptosis. pathogen or stress response. Participation of as TRAFs and Tollip, the A20-like protease, These include the proapoptotic protein the MATH domain [present in the COOH- the AP-ATPase, and the IRAK and DAP SMAC (Diablo) that is specific to vertebrates terminal ubiquitin hydrolase and the TNFR- protein kinases. The conservation of all these (16) and three small proteins with a similar associated factors ( TRAFs)], the RING fin- apoptotic components in animals suggests

hydrophobic NH2-terminal peptide—Reaper, ger (the other domain of the TRAFs), and that a relatively simple, but (in its main fea- Grim, and Hid (17)—in Drosophila. These possibly A20-like proteases (20) in the ubiq- tures of execution and regulation) complete, appear to have evolved largely from compo- uitin signaling system suggests that these pro- molecular machinery for programmed cell sitionally biased, nonglobular, or predomi- teins as well as the ubiquitin-containing pro- death had evolved before the divergence of nantly ␣-helical proteins through selection of tein kinase IKK have been recruited for their the major animal lineages. specific peptides for interactions with other functions in apoptosis from the ubiquitin- Relative to the number of orthologs be- proteins. based pathways. tween nematodes and arthropods to the ex- From the wealth of genomic information, An apoptotic system resembling the core clusion of vertebrates, vertebrates and arthro- the evolution of the cell death pathways in of the extant one appears to have emerged pods share more orthologs of the apoptotic animals can now be reconstructed in some concomitantly with the origin of the metazo- system components—and, notably, more do- detail (Fig. 1). As noticed previously, several ans. This event apparently was marked by the main architectures—to the exclusion of nem- of the key domains of this system were ap- rapid divergence of the caspase-paracaspase atodes (Fig. 1). The group of apoptosis-relat- parently present in the common ancestor of protease family from a metacaspase-like an- ed proteins specifically shared by vertebrates the eukaryotic crown group (2, 18). These cestor, followed by the divergence of classi- and arthropods includes the transcription fac- ancient homologs of apoptotic proteins in- cal caspases and paracaspases (19). Another tors NFAT and NF␬B that apparently have clude enzymes such as the caspases [which key early event in animal evolution was the evolved from ancestral immunoglobulin (Ig) were probably represented by an ancestral emergence of the six–␣-helical adaptor do- domain–containing transcription factors, form resembling the extant plant and fungal main, which was soon followed by its diver- such as OLF-1 (SPT23) or Su(H), and the metacaspases (19)], the predicted A20-like sification, the earliest split probably being signaling cascade associated with NF␬B. protease (20), AP-ATPase, NACHT NTPase, between the DD and CARD domains, which This cascade minimally consists of the Toll- and the previously undetected apoptotic are the only two domains of this class that like receptors, adaptors (MYD88 and guanosine triphosphatase (AP-GTPase) (21); apparently are present in all animals (Table FADD), and protein kinases including NIK adaptors such as TIR, BIR, and MATH; and 1). The direct apoptotic function of these (NF␬B-inducing kinase) and two paralogs of nuclear factors such as E2F, Rb, and signal domains in early animals remains to be as- IKK (Fig. 1). The presence of TNF but the transducers and activators of transcription certained, but even if they originally played a apparent absence of a TNFR in the common (STATs). Some of these, such as the BIR- different role, the presence of DD and the ancestor of insects and vertebrates suggests

Fig. 2. Protein complexity plot for apoptotic domains. The “complexity quotient” of a giv- en protein domain was defined as the product of two values: the number of different types of domains with which it co- occurs in proteins, and the av- erage number of domains de- tected in these proteins (4). The complexity quotient is plotted against the total num- ber of proteins that contain the respective domain in the pro- tein set from a given organism. This plot allows a simultaneous assessment of the numerical and architectural contributions to the complexity of a func- tional system. The data points for the three animals are color- coded as indicated. The aver- age values over all domains for each of the three organisms are also shown. The data points are for the apoptotic domains from Table 1; the points for selected individual domains are labeled (for abbreviations, see Fig. 1).

1282 16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org A NALYSIS OF G ENOMIC I NFORMATION differences in the upstream portion of this sence in archaea and in other bacteria and References and Notes apoptotic pathway. Another group of apopto- suggests a history of concerted horizontal 1. M. O. Hengartner, Nature 407, 770 (2000); T. Rich, sis-associated proteins that are shared by ver- gene transfer. The direction of this transfer, R. L. Allen, A. H. Wyllie, Nature 407, 777 (2000); P. Meier, A. Finch, G. Evan, Nature 407, 796 (2000); J. tebrates and arthropods to the exclusion of however, is uncertain, and although acquisi- Yuan, B. A. Yankner, Nature 407, 802 (2000). nematodes is the CAD family, whose mem- tion of the corresponding genes from bacteria 2. L. Aravind, V. M. Dixit, E. V. Koonin, Trends Biochem. bers regulate “post mortem” DNA degrada- by early eukaryotes seems more likely—be- Sci. 24, 47 (1999). tion (25). cause the bacterial lineages probably had 3. A preliminary version of the human Integrated Pro- tein Index [5 International Human Genome Sequenc- The most straightforward interpretation of been fully established by the time of the ing Consortium, Nature 409, 860 (2001)] was used these observations, with implications beyond emergence of the crown-group eukaryotes— for this analysis. The analyzed protein set is available apoptosis, is that the domains and domain the opposite model of a relatively late dis- at ftp://ncbi.nlm.nih.gov/pub/koonin/PCD. The C. el- egans predicted protein set was from the Worm- architectures present in vertebrates and in- semination from eukaryotes to the bacteria Pep38 database [C. elegans Sequencing Consortium, sects but not in nematodes are indeed shared cannot be dismissed. Science 282, 2012 (1998); www.sanger.ac.uk/ derived characters (synapomorphies) of the The principal conclusion from the com- Projects/C_elegans/wormpep]. The D. melanogaster predicted protein set was from the Genome Division coelomate clade. This is compatible with the parison of the apoptotic system components of the Entrez retrieval system [ftp://ncbi.nlm.nih.gov/ traditional view of animal evolution but not and their homologs encoded in the sequenced genbank/genomes/D_melanogaster/Scaffolds/; M. D. with the currently popular ecdysozoa model, eukaryotic genomes is the major increase in Adams et al., Science 287, 2185 (2000)]. which argues for a clade of molting animals complexity in vertebrates relative to insects 4. For the purpose of this discussion, we define a do- main as a distinct portion of protein sequence that including arthropods and nematodes (26). and nematodes. This is manifest both in a shows detectable evolutionary conservation and is, to However, the alternative explanation—coor- numerical increase of apoptosis-related pro- some extent, evolutionarily independent; that is, it dinated loss, in the nematode lineage, of mul- teins (due to gene duplication) and in domain appears in proteins with two or more domain ar- rangements, and possibly also as a stand-alone pro- tiple genes coding for proteins involved in accretion, which leads to increasingly elabo- tein. Very often, but not always, domains defined in several apoptotic pathways—cannot be en- rate domain architectures within orthologous this fashion correspond to experimentally identified tirely ruled out (27). protein sets (Fig. 2). structural domains when the latter are known. A domain architecture of a protein is defined as a As mentioned above, the prevailing theme What, if anything, is the unique contri- unique linear combination of domains. For domain in the evolution of the apoptosis-associated bution of the (nearly) complete genome detection, domain-specific, multiple alignment– domains in the vertebrate lineage is the sequences to our understanding of this sys- based sequence profiles were constructed and run against the nonredundant protein sequence database growth of complexity that is detectable across tem? At a qualitative level, most of the (National Center for Biotechnology Information, NIH, the entire range of the apoptosis-associated observations discussed here and the above Bethesda) or against protein sets from individual proteins and domains (Table 1, Figs. 1 and 2). conclusions do not depend on such se- genomes using the PSI-BLAST program [S. F. Altschul et al., Nucleic Acids Res. 25, 3389 (1997); S. A. In some cases, such as the BCL2 family, this quences and, in fact, have been considered Chervitz et al., Science 282, 2022 (1998)]. Typically, is achieved primarily through duplication previously. However, only the genome se- multiple profiles were generated for a given domain with limited diversification; on other occa- quences allow for a reasonably accurate to ensure complete recovery of the respective pro- sions, the emergence of new domains (such quantitative comparison of the complexity teins [L. Aravind, E. V. Koonin, J. Mol. Biol. 287, 1023 (1999)]. The library of profiles used for the detection as the pyrin domain) through a more radical of functional systems, including the apo- of apoptosis-associated domains is available at ftp:// modification of preexisting ones, and reorga- ptotic machinery, in different organisms ncbi.nlm.nih.gov/pub/koonin/PCD. nization of protein domain architectures (for and for a reasonably confident reconstruc- 5. P. C. Rath, B. B. Aggarwal, J. Clin. Immunol. 19, 350 (1999). example, in NACHT NTPases), may be tion of the ancestral systems. Moreover, the 6. The CG12919 protein was identified here as the equally important. To a large extent, the in- expansion of certain protein and domain previously undetected TNF ortholog in Drosophila by novations in apoptotic and related cytokine families, such as the NACHT NTPases and using the TNF domain profile. The Drosophila protein CG6531 and C. elegans protein T02C5.1 are predict- signaling in vertebrates could have been the pyrin domain in vertebrates, became ed receptors containing an extracellular cysteine-rich linked to the evolution of the vertebrate im- apparent only from the analysis of the near- domain homologous to that of TNFR. mune system, with its several new cell types ly complete sequence of the human ge- 7. K. Hofmann, Cell. Mol. Life Sci. 55, 1113 (1999). that require highly specialized regulatory nome. And, of course, any statements that a 8. The pyrin domain was initially identified in the data- base searches with the ASC1, AIM2, and pyrin protein pathways. particular protein or domain is lineage-spe- sequences used as queries in PSI-BLAST–dependent Perhaps the greatest mystery in the evo- cific—that is, missing in other lineages (for profile searches. Further iterations of the database lution of apoptosis is the presence of ho- example, the vertebrate-specific pyrin do- search showed moderate but statistically significant similarity between pyrin and DD. Secondary structure mologs of several components of the eukary- main)—rely both on the completeness of a prediction [B. Rost, C. Sander, Proteins 19, 55 (1994)] otic apoptotic machinery in bacteria. At least representative genome sequence(s) from and threading [B. Rost, R. Schneider, C. Sander, J. Mol. two bacterial lineages, Actinomycetes and each of the lineages and on the assumption Biol. 270, 471 (1997)] strongly supported the pres- ence of six ␣ helices in the pyrin domain, indicating Cyanobacteria, encode a considerable reper- that they accurately reflect the gene com- that it probably forms a DD-like fold. This domain has toire of apoptosis-associated domains, in- plement of the entire lineage. been independently described as the “pyrin-like mo- cluding AP-ATPases, metacaspase-like pro- With the completion of several eukaryotic tif,” although the fold assignment has not been re- teases, NACHT NTPases, and TIR domains genomes, the study of the functional systems ported (T. Hlaing et al., J. Biol. Chem., in press). 9. R. J. Lutz, Biochem. Soc. Trans. 28, 51 (2000). (Fig. 1). Some of the bacterial AP-ATPases of these organisms, including apoptosis, is 10. A. M. Chinnaiyan, Neoplasia 1, 5 (1999); J. M. Mc- are involved in transcription regulation and entering the postgenomic era. However, to Donnell et al., Cell 96, 625 (1999). signaling (2), whereas the functions of the understand the origin and evolution of apo- 11. A sequence alignment of the Bcl-2 family members was constructed using ClustalW [J. D. Thompson, rest of these proteins remain unclear. How- ptosis at a more satisfactory level, we need D. G. Higgins, T. J. Gibson, Nucleic Acids Res. 22, 4673 ever, it is almost certain that they are func- more genomes from diverse branches of life. (1994)] followed by a phylogenetic analysis per- tionally connected, given the fusions of the Additional genome sequences of complex formed using the neighbor-joining method as imple- metacaspase-like domain and the TIR domain bacteria (such as Myxococcus, Cyanobacte- mented in the PHYLIP package [J. Felsenstein, Meth- ods Enzymol. 266, 418 (1996)]. with AP-ATPases (Fig. 1). The presence of ria, and Actinomycetes), early-branching eu- 12. D. C. S. Huang, A. Strasser, Cell 103, 839 (2000); S. W. the apoptosis-associated domains in the karyotes, and diverse animals such as primi- Fesik, Cell 103, 273 (2000). crown-group eukaryotes and in specific divi- tive chordates will help to piece together the 13. The acronym NACHT comes from the four function- ally characterized NTPases that were originally iden- sions of developmentally complex bacteria details of various steps in the evolution of cell tified as members of this family: NAIP, CIITA, HET-E, contrasts with their (thus far) complete ab- death. and TP1. Members of this NTPase family are most

www.sciencemag.org SCIENCE VOL 291 16 FEBRUARY 2001 1283 A NALYSIS OF G ENOMIC I NFORMATION

likely GTPases, as indicated by the activity of CIITA 27. L. Aravind, H. Watanabe, D. J. Lipman, E. V. Koonin, of the Integrated Protein Index and A. Uren for critical and HET-E [E. V. Koonin, L. Aravind, Trends Biochem. Proc. Natl. Acad. Sci. U.S.A. 97, 11319 (2000). reading of the manuscript and useful comments. The Sci. 25, 223 (2000)]. 28. J. R. Brown, W. F. Doolittle, Microbiol. Mol. Biol. Rev. release of the unpublished WormPep data set by The 14. T. L. Beattie, W. Zhou, M. O. Robinson, L. Harrington, 61, 456 (1997). Sanger Center is acknowledged and greatly appreciated. Curr. Biol. 8, 177 (1998). 29. We thank E. Birney and A. Bateman (The Sanger Center, 15. E. Diez, Z. Yaraghi, A. MacKenzie, P. Gros, J. Immunol. Hinxton, UK) for kindly providing the preliminary version 25 October 2000; accepted 18 January 2001 164, 1470 (2000). 16. A. M. Verhagen et al., Cell 102, 43 (2000). 17. L. Goyal, K. McCall, J. Agapite, E. Hartwieg, H. Steller, EMBO J. 19, 589 (2000). 18. The eukaryotic crown group is the assemblage of Human DNA Repair Genes relatively late-diverging, major eukaryotic taxa whose exact order of radiation is difficult to deter- Richard D. Wood,1* Michael Mitchell,2 John Sgouros,2 mine with confidence. The crown group includes the 1 multicellular eukaryotes (animals, fungi, and plants) Tomas Lindahl and some unicellular eukaryotic lineages such as slime molds and Acanthamoebae [A. H. Knoll, Science 256, 622 (1992); S. Kumar, A. Rzhetsky, J. Mol. Evol. Cellular DNA is subjected to continual attack, both by reactive species inside 42, 183 (1996)]. cells and by environmental agents. Toxic and mutagenic consequences are 19. The sister group of the classic animal caspase minimized by distinct pathways of repair, and 130 known human DNA repair family of thiol proteases are the paracaspases that thus far have been identified only in animals and genes are described here. Notable features presently include four enzymes that Dictyostelium; together, these two families consti- can remove uracil from DNA, seven recombination genes related to RAD51, and tute the sister group of the metacaspases that many recently discovered DNA polymerases that bypass damage, but only one have been detected in plants, protists, and bacteria [A. G. Uren et al., Mol. Cell 6, 961 (2000)]. On the system to remove the main DNA lesions induced by ultraviolet light. More basis of conserved structural features, Uren et al. human DNA repair genes will be found by comparison with model organisms showed that the paracaspases and metacaspases and as common folds in three-dimensional protein structures are determined. are specifically related to the caspases, to the exclusion of other members of the caspase-gingi- Modulation of DNA repair should lead to clinical applications including im- pain fold [A. Eichinger et al., EMBO J. 18, 5453 provement of radiotherapy and treatment with anticancer drugs and an ad- (1999)]. vanced understanding of the cellular aging process. 20. The A20 protein is a regulator of apoptosis that appears to be involved in the NF␬B pathway and interactions with the TRAFs [R. Beyaert, K. Heyn- The human genome, like other genomes, en- through the accession numbers. Recent re- inck, S. Van Huffel, Biochem. Pharmacol. 60, 1143 codes information to protect its own integrity view articles on the evolutionary relation- (2000)]. A20 belongs to a distinct family of pre- (1). DNA repair enzymes continuously mon- ships of DNA repair genes (3) and common dicted thiol proteases that is conserved in all itor chromosomes to correct damaged nucle- sequence motifs in DNA repair genes (4) crown-group eukaryotes and many viruses. None of the members of this family has a known bio- otide residues generated by exposure to car- may also be helpful. chemical function, but they share two conserved cinogens and cytotoxic compounds. The The functions required for the three dis- motifs with the cysteine proteases of arteriviruses, damage is partly a consequence of environ- tinct forms of excision repair are described which led to the prediction of the protease activity. A20 and another protein of this family, cezanne, mental agents such as ultraviolet (UV) light separately. These are base excision repair contain a specialized finger module that is also from the sun, inhaled cigarette smoke, or (BER), nucleotide excision repair (NER), and found in some proteins of the ubiquitin pathway. incompletely defined dietary factors. Howev- mismatch repair (MMR). Additional sections Together with a fusion of an A20-like protease domain with a ubiquitin hydrolase that has been er, a large proportion of DNA alterations are discuss direct reversal of DNA damage, re- detected in C. elegans, this suggests a functional caused unavoidably by endogenous weak combination and rejoining pathways for re- connection between these predicted proteases and mutagens including water, reactive oxygen pair of DNA strand breaks, and DNA poly- the ubiquitin system [K. S. Makarova, L. Aravind, E. V. Koonin, Trends Biochem. Sci. 25, 50 (2000)]. species, and metabolites that can act as alky- merases that can bypass DNA damage. An additional connection between apoptosis and lating agents. Very slow turnover of DNA The BER proteins excise and replace the ubiquitin system is indicated by the demon- consequently occurs even in cells that do not damaged DNA bases, mainly those arising stration that, similar to other RING fingers, the one in TRAF6 is an E3-like ubiquitin ligase pathway [L. proliferate. Genome instability caused by the from endogenous oxidative and hydrolytic Deng et al., Cell 103, 351 (2000)]. great variety of DNA-damaging agents would decay of DNA (1). DNA glycosylases initiate 21. The AP-GTPase is a previously undetected predict- be an overwhelming problem for cells and this process by releasing the modified base. ed GTPase typified by the COOH-terminal domain organisms if it were not for DNA repair. This is followed by cleavage of the sugar- of the conserved apoptosis regulator, the DAP pro- tein kinase [B. Inbal et al., Nature 390, 180 (1997)]. On the basis of searches of the current phosphate chain, excision of the abasic resi- This predicted GTPase family appears to be the draft of the human genome sequence (2), we due, and local DNA synthesis and ligation. sister group of the RAS/ARF family GTPases, but compiled a comprehensive list of DNA repair Cell nuclei and mitochondria contain several differs from them in having a divergent P-loop motif and a THXD instead of the NKXD signature genes (Table 1). This inventory focuses on related but nonidentical DNA glycosylases motif. Additional AP-GTPases are found in plants genes whose products have been functionally obtained through alternative splicing of tran- and animals as multidomain proteins that also linked to the recognition and repair of dam- scripts. Three different nuclear DNA glyco- contain ankyrin, Lrr, and kinase domains. This do- main architecture suggests that AP-GTPases par- aged DNA as well as those showing strong sylases counteract oxidative damage, and a ticipate in GTP-dependent assembly of signaling sequence homology to repair genes in other fourth mainly excises alkylated purines. Re- complexes. organisms. Readers desiring further infor- markably, four of the eight identified DNA 22. A. G. Uren et al., Proc. Natl. Acad. Sci. U.S.A. 96, mation on specific genes should consult glycosylases can remove uracil from DNA. 10170 (1999). 23. The ZU5 domain is a previously undetected con- the primary references and links available Each of them has a specialized function, served domain that is present in receptors (such as however. UNG, which is homologous to the netrin receptors and vertebrate zona pellucida pro- Escherichia coli Ung enzyme, is associated teins) and cytoskeletal proteins (such as ankyrins) 1Imperial Cancer Research Fund, Clare Hall Laborato- with DNA replication forks and corrects and is predicted to be involved in anchoring receptors ries, Blanche Lane, South Mimms, Herts EN6 3LD, UK. to the cytoskeleton. uracil misincorporated opposite adenine. 2Imperial Cancer Research Fund, 44 Lincoln’s Inn 24. S. L. Ackerman, B. B. Knowles, Genomics 52, 205 Fields, London WC2A 3PX, UK. SMUG1, which is unique to higher eu- (1998). karyotes, probably removes the uracil that 25. H. Sakahira, M. Enari, S. Nagata, Nature 391,96 *Present address: University of Pittsburgh Cancer In- (1998). stitute, S867 Scaife Hall, 3550 Terrace Street, Pitts- arises in DNA by deamination of cytosine. 26. A. M. Aguinaldo et al., Nature 387, 489 (1997). burgh, PA 15261, USA. MBD4 excises uracil and thymine specifical-

1284 16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org