<<

Oncogene (1997) 15, 2965 ± 2974  1997 Stockton Press All rights reserved 0950 ± 9232/97 $12.00

Rel/NF-kB transcription factors and IkB inhibitors: Evolution from a unique common ancestor

Christelle Huguet1, Pascale Crepieux2 and Vincent Laudet2

1Laboratoire de reÂgulation des processus invasifs, de l'angiogeneÁse et de l'apoptose, EP560; 2Laboratoire des meÂcanismes du deÂveloppement et de la canceÂrisation, UMR319, IFR3 Institut de Biologie de Lille, 1 rue du Professeur Calmette, 59021 Lille, France

From the sequences of Rel/NF-kB and IkB , we 1995; Verma et al., 1995). Some of the Rel/NF-kB constructed an alignment of their Rel Homology Domain proteins, namely cRel, RelA, RelB/IRel, Dorsal and (RHD) and repeat domain. Using this align- Dif (the `ready-to-use' members), contain a transacti- ment, we performed tree reconstruction with both vating domain in their C-terminal half (for references distance matrix and parsimony analysis and estimated see Table 1). The others, the p50 and p52 subunits, are the branching robustness using bootstrap resampling proteolytically maturated from the precursor proteins methods. We de®ned four subfamilies of Rel/NF-kB NF-kB1 (p105) and NF-kB2 (p100) respectively and transcription factors: (i) cRel, RelA, RelB, Dorsal and are devoid of transactivating domain (Table 1). Relish, Dif; (ii) NF-kB1 and NF-kB2; (iii) Relish and (iv) NF- a gene structurally related to NF-kB genes, has been AT factors, the most divergent members. Subfamilies I recently cloned in (Dushay et al., 1996). and II are clustered together whereas Relish diverged Members of the Rel-NF-kB family bind DNA as homo earlier than other Rel/NF-kB proteins. Three subfamilies or heterodimers and recognize the consensus decameric of IkB inhibitors were also de®ned: (i) NF-kB1 and NF- sequence 5'-GGGPuNNPyPyCC-3' which is named kB kB2; (ii) close to subfamily I, the short IkB proteins site. Di€erent kB sites among gene promoters are IkBa,IkBband Bcl-3; (iii) Relish that diverged earlier bound with di€erent anities by the various dimers than other IkB inhibitors. Our de®nition of groups and which therefore bind certain target genes preferentially subfamilies ®ts to structural and functional features of (Chytil and Verdine, 1996; Miyamoto and Verma, the Rel/NF-kB and IkB proteins. We also showed that 1995; MuÈ ller et al., 1996). In addition, depending on ankyrin repeats of NF-kB1, NF-kB2 and Relish are the dimers, the transcription of the target gene is short IkB-speci®c ankyrin motifs. These proteins de®ning activated or repressed; for example RelA/p50, RelA/ a link between Rel/NF-kB and IkB families, we propose cRel, RelB/p50, RelB/p52 are potent transcriptional that all these factors evolved from a common ancestral activators, whereas p50/p50 or p52/p52 homodimers RHD-ankyrin structure within a unique superfamily, generally repress transcription (Siebenlist et al., 1994; explaining the speci®cities of interaction between the Verma et al., 1995). di€erent Rel/NF-kB dimers and the various IkB The various Rel/NF-kB dimers are submitted to inhibitors. di€erent regulations by inhibitors of the IkB family (Miyamoto and Verma, 1995). The IkB family is a Keywords: ankyrin repeat ; homology subfamily of ankyrin repeat proteins including IkBa, domain; NF-AT; molecular phylogeny IkBb,IkBg, Bcl-3, NF-kB1 and NF-kB2. The NF-kB precursor proteins present ankyrin repeats in their C- terminal half thereby exhibiting IkB inhibitory func- tions (Siebenlist et al., 1994). The IkB proteins bind the Introduction various Rel/NF-kB dimers with di€erent anities, and retain them in the cytoplasm (Miyamoto and Verma, Transcription factors represent key molecules that 1995). Activation of the Rel/NF-kB proteins requires convert signals received by the cell into activation or phosphorylation and subsequent degradation of the repression of genes that lead to cell responses such as IkB inhibitors, thus allowing the transcription complex proliferation, di€erentiation or apoptosis. The Rel/NF- to translocate into the nucleus (Finco and Baldwin, kB transcription factors are involved in the immune 1995; Thanos and Maniatis, 1995; Verma et al., 1995). response, i.e. in immunoreceptor gene expression, Once in the nucleus, the Rel/NF-kB transcription cytokine and growth factor regulation, and also in factors will regulate the transcription of various target embryonic development and programmed cell death genes including their own genes in an autoregulatory (Baeuerle and Henkel, 1994; Huguet et al., 1994; loop, thus reconstituting the stocks of Rel/NF-kB Kabrun et al., 1990; Kopp and Ghosh, 1995). These proteins in the cytoplasm. Moreover, they will activate Rel/NF-kB proteins share a 300 N-terminal the transcription of their IkB inhibitors leading to the domain, named the Rel Homology Domain (RHD), production of new IkB proteins. The newly synthesized which harbors a DNA-binding region, a dimerization IkBa will terminate the activation of Rel/NF-kB domain, a nuclear localization signal and a region of factors by migrating into the nucleus and displacing interaction with inhibitory proteins of the IkB family the Rel/NF-kB dimers from the DNA. On the (Chytil and Verdine, 1996; Miyamoto and Verma, contrary, the newly synthesized IkBb will continue the activation by protecting the Rel/NF-kB dimers Correspondence: V Laudet from the inhibition of the newly synthesized IkBa Received 24 April 1997; revised 1 August 1997; accepted 1 August (Baeuerle and Baltimore, 1996; Gilmore et al., 1996; 1997 Miyamoto and Verma, 1995). Rel/NF-kBandIkBevolution CHuguetet al 2966 Table 1 Sequences used in this study. For each sequences, the Genbank code and the accession number, when di€erent, were given as well as the reference. The sequences are given in the order of the distance trees (Figure 1 left panel) Gene species Databank accession Reference Rel/NF-kB Family Homo cRel HSRNAREL/X75042 (Brownell et al., 1989) Mus cRel MMCRELM/X15842 (Grumont and Gerondakis, 1989) Gallus cRel GSP68REL/X52193 (Capobianco et al., 1990) Meleagris cRel TKYREL09/K02455 (Wilhemsen et al., 1984) Viridae Rel ACRREL/K00555/X02759 (Wilhemsen et al., 1984) Xenopus Xrel2 XLREL2POR/Z49252 (Tannahill and Wardle, 1995) Homo RelA HUMP65NFKB/M62399 (Ruben et al., 1991) Mus RelA MUSNFKP65/M61909 (Nolan et al., 1991) Xenopus Xrel1 XELREL/M60785 (Kao and Hopwood, 1991) Gallus RelA CHKNFKB2/D13721 (Ikeda et al., 1993) Homo Irel HUMIRELA/M83221 (Ruben et al., 1992a) Mus RelB MUSRELB/M83380 (Ryseck et al., 1992) Xenopus RelB XLRELB/D63332 (Suzuki et al., 1995) Drosophila Dorsal DRODORSAL/M23702 (Steward, 1987) Drosophila Dif DRODIF/L29015 (Ip et al., 1993) Homo NF-kB1 HUMNFKB34/M55643/M37492 (Kieran et al., 1990) Mus NF-kB1 MUSTFDBS/M57999/M37732 (Ghosh et al., 1990) Gallus NF-kB1 CHKP105A/M86930 (Capobianco et al., 1992) Homo NF-kB2 S76638 (Bours et al., 1992) Gallus NF-kB2 U00111 (Sif and Gilmore, 1993) Drosophila Relish U62005 (Dushay et al., 1996) NF-AT Family Homo NF-AT3 HUMNFAT3A/L41066 (Hoey et al., 1995) Homo NF-AT4 HUMHFAT4A/L41067 (Hoey et al., 1995) Homo NF-ATx HSU14510/U14510 (Masuda et al., 1995) Homo NF-ATc HSU08015/U08015 (Northrop et al., 1994) Mus NF-ATp MMU02079/U02079 (McCa€rey et al., 1993) IkB Family Homo NF-kB1 HUMNFKB34/M55643/M37492 (Kieran et al., 1990) Mus NF-kB1 MUSTFDBS/M57999/M37732 (Ghosh et al., 1990) Gallus NF-kB1 CHKP105A/M86930 (Capobianco et al., 1992) Homo NF-kB2 S76638 (Bours et al., 1992) Gallus NF-kB2 U00111 (Sif and Gilmore, 1993) Drosophila Relish U62005 (Dushay et al., 1996) Homo IkBa HUMMAD3A/M69043 (Haskill et al., 1991) Sus IkBa SDECI6A/Z21968 (De Martin et al., 1993) Rattus IkBa RRRLIF1/X63594 (Tewari et al., 1992) Gallus IkBa CHKPP40 (Davis et al., 1991) Drosophila Cactus DROCACTUS/L04964 (Geisler et al., 1992) Mus IkBb MMU19799/U19799 (Thompson et al., 1995) Homo Bcl-3 HUMBCL3AA/M31732 (Ohno et al., 1990) Ankyrin Family Homo Ankyrin HUMANK/M28880 (Lambert et al., 1990) Mus Ankyrin 1 MMANK1AA/X69063 (Birkenmeier et al., 1993) Drosophila Ankyrin DROANKY/L35601 (Dubreuil and Yu, 1994)

A large set of experimental data recently illustrated Results how sophisticated the relationships between the Rel/ NF-kB transcription factors and their IkB inhibitors Alignment of sequence of the Rel/NF-kB and IkB are: autoregulatory loops, proteins displaying both proteins the roles of and inhibitor, various combinations of transcriptional and inhibitory part- In order to produce a complete list of all known full ners undergoing di€erent regulations. In fact, the length sequences of the members of both the Rel/NF- combination of the Rel/NF-kB and IkB families kB and IkB families (Table 1), we performed several constitutes a powerful and tightly controlled path- searches in the Genbank and EMBL databases using way to respond to various stimuli and to mediate the FASTA program. Only complete cDNA sequences speci®c cell response. Therefore, it is important to were kept. Therefore, since we only found a 3' ¯anking understand how these two families of genes evolved region for the mouse Bcl-3 gene in the data banks and diversi®ed to reach such a level of speci®city and (accession number M90397) despite its cloning cited by autoregulation. To address this question, we studied Ito et al. (1994), it has not been included in this study. in parallel the molecular evolution of the Rel/NF-kB Thirty-seven sequences were retained in this analysis: and IkB proteins. This phylogenetical analysis 25 in the Rel/NF-kB family and 14 in the IkB family; allowed to de®ne the structure of the putative the NF-kB1, NF-kB2 and Relish sequences belong to ancestor genes and to propose an evolutionary both families. Three ankyrin genes (erythroid ankyrin model that clusters both families in a unique Rel/ or ankyrin 1) were chosen as an outgroup for the IkB NF-kB/IkB superfamily. family. Rel/NF-kBandIkBevolution C Huguet et al 2967 The complete alignment of the Rel/NF-kB sequences ®nally the Dorsal/Dif group. These clusters are is 1534 amino acid long with some gaps, essentially due supported by high bootstrap values from 80 ± 100%. to the NF-AT genes and to a sequence insertion of cRel, RelA and RelB genes represent typical paralogue approximately 40 amino acids in the N-terminal part of genes descended from the duplication of a unique the RHD of NF-kB1. This sequence does not ancestral gene. Inside the groups, genes are homo- participate in DNA-binding and forms an a-helix logues of di€erent species and we con®rmed in this structure longer than that found in c-Rel or RelA study previous data showing that Xrel1 represents a (Ghosh et al., 1995; MuÈ ller et al., 1995). The IkB Xenopus homologue of RelA and that IRel is a human sequence alignment is 2620 amino acid long with homologue of mouse RelB and of the recently cloned numerous gaps, mostly located between the ankyrin Xenopus RelB. We also show that Xrel2 can be repeat motifs since there are di€erences in the number identi®ed as a Xenopus homologue of cRel. The most of repeats of each protein (eight for NF-kB1, ®ve for salient features of the Rel/NF-kB factors are their IkBa). The ankyrin motifs can also be of variable ability to dimerize and to recognize di€erent versions length (Fracchiolla et al., 1993; He ron et al., 1995; of the kB site. These features are harbored by the Ja€ray et al., 1995). Generation of the phylogenetic RHD and subtle variations in the RHD sequence trees was based on the separate alignment of the RHD dictates the speci®city of the various members of this sequences and the ankyrin repeat sequences respec- family. Therefore, a high identity between two species tively. NF-AT4 and IkBg do not appear in the trees in this domain points to them as homologous genes. since NF-AT4 and NF-ATx are a same protein and For these reasons, we consider Xrel2 as a cRel IkBg messenger RNA is transcribed from a secondary homologue, despite their divergent C-terminal se- start site of transcription of the NF-kB1 gene and thus quences (Tannahill and Wardle, 1995). IkBg is identical to the C-terminus of the NF-kB1 The branching order of Gallus and Xenopus RelA, protein. supported by 88% bootstrap in the distance analysis, does not ®t with the species evolution. Indeed, since birds are closer to mammals than to amphibians, Three subfamilies of Rel/NF-kB transcription factors Gallus RelA should be a closer neighbor of Homo RelA The topology of the distance tree is shown in Figure 1a than Xrel1. Given that both the methods of distance left panel. We can de®ne four subfamilies of Rel/NF- and parsimony clustered Gallus RelA with the other kB proteins on this tree. These subfamilies are: (i) a RelA genes with high bootstrap values, it is unlikely large one comprising the three following groups of that this gene represents a new Rel/NF-kB member, proteins: cRel/vRel and RelA, RelB/IRel, Dorsal/Dif; and we favor the hypothesis of an artefactual mis- (ii) a second subfamily containing NF-kB1 and NF- placement of Gallus RelA due to an acceleration of kB2 groups; (iii) a third subfamily de®ned by Relish. sequence divergence. The NF-AT genes belong to a fourth subfamily that From the distance tree of Figure 1a, it is clear that can be considered as an outgroup in view of its wide Dorsal and Dif may be considered as homologues of divergence compared with the three other subfamilies. the cRel, RelA and RelB genes. It appears clearly that A branching order between the subfamilies can be one speci®c duplication of the arthropod lineage assessed from the tree topology. Subfamily I (cRel, occurred to give rise to Dorsal and Dif which are RelA, . . .) and subfamily II (NF-kB1, NF-kB2) appear linked together with a 93% bootstrap value. The to be clustered together with a bootstrap value of 82% parsimony analysis gave similar group and subfamily and are clustered with subfamily III (Relish) with a topology. However, it did not resolve any branching 100% bootstrap value. order for the RelB and Dorsal/Dif groups inside Parsimony analysis was then performed with a subfamily I. reduced data set (see Materials and methods). The The second subfamily is constituted of the para- data set contains a clear phylogenetical signal since logous genes NF-kB1 and NF-kB2 that also resulted after independent choices of 10 000 random trees we from a vertebrate-speci®c gene duplication. Interest- found g1 statistics values of 70.7822+0.0058 which ingly, the Drosophila homologue of the NF-kB are clearly signi®cant according to (Hillis and subfamily members remains to be found since Relish, Huelsenbeck, 1992). The trees obtained have a which exhibits a related gene organization, is not consistency index of 0.789. The overall topology of related more to NF-kB genes than to any other this tree (Figure 1a right panel) was similar to the member of the family. Thus, one Drosophila homo- distance tree with the notable exception of the logue of NF-kB and one vertebrate homologue of placement of Relish which is linked to the Dorsal/Dif Relish may be discovered. group inside subfamily I. The monophyly of this The NF-AT subfamily IV is composed of four subfamily is supported by 61% bootstrap only. The paralogous genes. The distance tree and the parsimony low bootstrap value of 53% which clusters Relish with analysis provided a di€erent branching order for the Dorsal and Dif led us to consider this relationship as NF-AT genes, consequently it is dicult to choose unresolved. Since these bootstrap values are weak and between the two topologies. The cloning of one or since Relish is structurally distant from subfamily I more Drosophila related gene(s) would help in the members, we consider it as a unique member of a determination of the true topology. distinct subfamily as indicated in the distance tree. In conclusion the Rel/NF-kB family of transcription It is interesting to study the topologies inside the factors can be divided in four subfamilies (Figure 2a). subfamilies and groups. According to the distance tree, Subfamily I clusters the proteins which contain a RHD subfamily I is constituted of three groups among which and a transactivation domain (TA) as basic organiza- cRel and RelA are the closest neighbors and belong to tion: cRel, RelA, RelB and Dorsal/Dif. Subfamily II the same group, followed by the RelB group and groups NF-kB1 and NF-kB2 which are precursor Rel/NF-kBandIkBevolution CHuguetet al 2968

Figure 1 Rooted phylogenetic trees of Rel/NF-kB transcription factors (a) and IkB inhibitory proteins (b). (a and b left panel): Distance trees constructed by the Neighbor-Joining method (MUST), 1000 bootstrap replicates were performed on each set (expressed in % on the branch nodes). (a and b right panel): Parsimony trees constructed using the parsimony method (PAUP), 100 bootstrap replicates were performed. Subfamilies are indicated by brackets. The most internal branches in the trees allow to de®ne the subfamilies

proteins presenting the RHD in their N-terminus and remains undetermined. The parsimony tree gave an ankyrin repeats in their C-terminus. Subfamily III is identical result but the branching order between the de®ned by Relish which displays the same precursor four subfamilies remains unresolved by this analysis, structure as NF-kB1 and NF-kB2. Subfamily IV except for Relish which diverged ®rst. The data set groups NF-AT proteins that present serine-proline contains a clear phylogenetical signal since after boxes upstream of their RHD and are distantly independent choices of 10 000 random trees we found related to the Rel/NF-kB RHD. g1 statistics values of 70.8722+0.0035 which are clearly signi®cant according to (Hillis and Huelsen- beck, 1992). The trees obtained have a consistency Three subfamilies of IkB inhibitors index of 0.836. The IkB sequences were treated similarly and the trees Again, inside the subfamilies, vertebrate speci®c gene generated are shown in Figure 1b. The localization of duplications gave rise to paralogous genes such as NF- the last internal branch of the distance tree with kB1 and NF-kB2. Of note, as for the RHD domain bootstrap values above 70% (left panel) allows to tree, Relish constitutes a di€erent subfamily (III) and distinguish in a ®rst approach four subfamilies among cannot represent a direct arthropod homologue of NF- the IkB proteins in addition to the ankyrin outgroup. kB genes. The Drosophila Cactus gene represents the These subfamilies are: (i) NF-kB1 and NF-kB2 (100% homologue of the unique IkBa gene. Apparently, no bootstrap value); (ii) IkBa with Cactus (72%); (iii) vertebrate speci®c gene duplication has given rise to a IkBb and Bcl-3 (82%) and (iv) Relish alone that paralogous version of IkBa but the discovery of a new diverged before any other subfamily. In the parsimony paralogue is still possible. IkBb and Bcl-3 de®ne two analysis (right panel), the NF-kB, IkBa and Relish di€erent subfamilies or two very distant paralogue subfamilies are still well de®ned, however this method genes of a unique subfamily, whose homologue in separates IkBb and Bcl-3 as di€erent subfamilies. Thus, Drosophila remains to be found. whether IkBb and Bcl-3 are more closely related In conclusion the IkB family can be divided into together than to any other member of the family three subfamilies (Figure 2b): (i) the robust subfamily Rel/NF-kBandIkBevolution C Huguet et al 2969 I, where NF-kB1 and NF-kB2 proteins play an IkB for Relish, considering its similarities with the NF-kB inhibitory function as precursor proteins; (ii) subfamily molecules; (iii) the three following groups: IkBa,IkBb III, where an IkB-like inhibitory function is expected and Bcl-3 that could be grouped in a unique large subfamily II (the monophyly of which is uncertain) of pure IkB short ankyrin repeat proteins according to their structural features.

Evolutionary rates In order to compare the speed of evolution of the di€erent Rel/NF-kB and IkB molecules, homologous versions of genes in human, mouse, chicken and Xenopus were selected from the complete data set. Then the number of amino acid di€erences existing for a given molecule between the various homologous sequences was computed. This number was trans- formed into an evolutionary rate (number of mutations per site and per year, M/S/Y; 1079 M/S/Y=1 PAU) (Figure 3). The RHD and the ankyrin repeats were treated separately. In both Rel/NF-kB and IkB families, paralogous genes exhibited very similar evolutionary speeds: cRel, RelA and RelB evolved respectively at 0.77, 0.79 and 1.01 PAU; NF-kB1 and NF-kB2 evolved at 0.26 and 0.51 PAU in the RHD and at 0.80 and 1.02 PAU in the ankyrin repeats, respectively. Interestingly, NF-kB genes evolved much more slowly in their RHD than the other Rel/NF-kB proteins did. In fact, the NF-kB RHD evolved at an average speed of 0.38 PAU compared to 0.85 PAU for the RHD of other Rel/NF-kB. Thus, there is approximately twice as much constraint on the RHD of the precursor proteins as on the RHD of subfamily I. On the other hand the ankyrin repeats of the precursor proteins evolved at the same rate as the short IkB molecules, between 0.8 and 1 PAU, and both evolved 8 ± 10-fold faster than ankyrin repeats of the reference . This suggests that the ankyrin

Figure 2 Schematic representation of the topology and branching order of the subfamilies within the Rel/NF-kB(a) and IkB(b) families. Major structural and functional character- istic features of the di€erent subfamilies are summarized in sketchings: Black boxes for ankyrin repeats; White box for Rel Figure 3 Evolutionary rate of the Rel Homology Domain and Homology Domain (RHD); S for serine and P for proline boxes; the ankyrin repeats of Rel/NF-kB and IkB proteins. Evolutionary TA for transactivating domain; LZip for -like rates are expressed in Pauling units (1079 mutations/site/year). structure The subfamilies are indicated by brackets Rel/NF-kBandIkBevolution CHuguetet al 2970 repeats of the precursor proteins are `IkB-speci®c Discussion ankyrin repeats' and that the ankyrin repeats of the IkB proteins have evolved towards a specialized In this study, we analysed in parallel the evolutionary function that requires less contraint than classical relationships within both the Rel/NF-kB and the IkB ankyrin repeats. The evolutionary speed analysis also families. Our analysis revealed that the pattern of gene indicates that within the Rel/NF-kB proteins, the duplication events that explain the present diversity of ankyrin repeats evolved faster than the RHD, which Rel/NF-kB and IkB families is very similar to what may re¯ect more constraints on the RHD than on happened for other transcription factor families. We the ankyrin repeats. This result was expected given previously demonstrated that the diversity of ETS the strongly constrained DNA binding role of the genes and nuclear receptors, as for the Hox/HOM or RHD. families, arises very early during metazoan When compared to the DNA binding and ligand evolution and then increases speci®cally in the binding domains of other transcription factors such as vertebrate lineage (Atchley and Fitch, 1995; Holland the nuclear receptors, the RHD and ankyrin repeats and Garcia-Fernandez, 1996; Laudet et al., 1992, appears to be less constrained. Nuclear receptors 1993). Likewise, the existence of paralogous members evolved at an averaged speed of 0.303 PAU which such as c-Rel, RelA and RelB or NF-kB1 and NF-kB2, corresponds to the accumulation of 1% sequence shows that the diversity of Rel/NF-kBorIkB families divergence in 16.5 million years (VL, submitted). members was also increased during the vertebrate However, this averaged speed hides considerable evolution. variations within the nuclear superfamily (from 0.078 PAU to 1.0 PAU). In comparison, the The Rel/NF-kB family Rel/NF-kB and IkB families appears remarkably homogeneous when considering the apparent stability Interestingly, the branching order and de®nition of of their evolutionary rates. subfamilies and groups follow structural and functional

Figure 4 Models of evolution: (a) The transcription factors of the Rel/NF-kB family: (I) RHD-ankyrin ancestral structure; (II) RHD ancestral structure. (b) The inhibitory proteins of the IkB family. (c) Scenario of evolution of the Rel/NF-kB and IkB proteins from a common ancestor Rel/NF-kBandIkBevolution C Huguet et al 2971 characteristics of the various Rel/NF-kB members. While a previous study suggested that Relish Among them, the closest neighbors are cRel and RelA ankyrin repeats were equally distant from IkB and belonging to the ®rst group of subfamily I. The second other non-IkB ankyrin repeats (Dushay et al., 1996), group is composed of RelB proteins presenting, in our analysis clearly clusters Relish with the IkB addition to the basic organization RHD-transactiva- proteins and identi®es it as an early o€shoot of the tion domain (TA), a leucine zipper in their N-terminus. IkB family the structure of which should be compar- cRel, RelA and RelB paralogous genes diverged with able to the one of the ancestor. It is interesting to note similar evolutionary speed, although the RelB gene that both RHD and ankyrin repeat domains provide seems to have slightly accelerated. Despite their highly the same information, i.e. the divergence of Relish conserved features, they strikingly di€er in their before NF-kB as a result of an early duplication. dimerization speci®city since RelB proteins are unable NF-kB1 and NF-kB2 paralogous genes encode to homodimerize (Dobrzanski et al., 1993; Ruben et another type of long IkB inhibitor. They contain six al., 1992b; Ryseck et al., 1992; Suzuki et al., 1995). The intact ankyrin repeats, one short repeat and an acidic third group of subfamily I proteins contains Dorsal region (Fracchiolla et al., 1993; He ron et al., 1995; and Dif which present the RHD-TA organization and Miyamoto and Verma, 1995; Siebenlist et al., 1994). bind DNA as homodimers only. The Dorsal and Dif They have been shown to interact preferentially with genes appear to be issued from an arthropod speci®c their corresponding N-terminal part p50 and p52 but duplication. Regulation of Dorsal nuclear uptake by also to inhibit cRel and RelA, thus retaining Rel/NF- Cactus is homologous to the vertebrate regulation kB proteins in the cytoplasm (Naumann et al., 1993a,b; pathway of Rel/NF-kB factors by IkB factors Scheinman et al., 1993). (Miyamoto and Verma, 1995; Wasserman, 1993). The IkBa molecules, among which Cactus is the acquisition of the heterodimerization potential by the Drosophila homologue, belong to the ®rst group of vertebrate members extends even more the abilities of a the large subfamily II. They present six ankyrin repeats few transcription factors to regulate numerous genes in and an acidic domain and speci®cally interact with di€erent cell types. Thus, duplication events have RelA, cRel and Dorsal (Miyamoto and Verma, 1995; increased the diversity and regulatory abilities of Rel/ Hatada et al., 1992, 1993; Ja€ray et al., 1995). In NF-kB proteins during evolution. addition to the cytoplasmic retention of Rel/NF-kB Sequence identities in the RHD de®ne the NF-kB1 members, IkBa is also involved in the termination of and NF-kB2 species as the closest paralogue genes Rel/NF-kB activation through its DNA-binding constituting a distinct subfamily II. This re¯ects inhibition in the nucleus (Arenzana-Seisdedos et al., common structural and functional characteristics such 1995; Beg et al., 1995; Klement et al., 1996; Zabel and as a similar genomic organization (Fracchiolla et al., Bauerle, 1990). 1993; He ron et al., 1995), transcriptional up-regulation The second group of subfamily II is de®ned by by Rel/NF-kB proteins, and similar proteolytic IkBb. Its structure and regulation are very close to maturation. IkBa, however IkBb seems rather specialized in the Subfamily III contains Relish that displays the regulation of the Rel/NF-kB persistent response upon precursor structure of NF-kB proteins but diverged activation and could function as a chaperone for the earlier than any other Rel/NF-kB molecule. This early Rel/NF-kB factors (Thompson et al., 1995; Suyang et divergence had already been shown by Dushay et al. al., 1996). (1996), however in their study the branching order The last group is represented by Bcl-3 which between Relish and the other Drosophila proteins presents six complete ankyrin repeats and an Dorsal and Dif was not resolved. Our results clearly incomplete seventh and exclusively interacts with p50 show that Relish diverged even before Dorsal and Dif and p52 homodimers. Unlike other IkB proteins, Bcl-3 and therefore may represent the founder of a new does not inhibit Rel/NF-kB factors, is predominantly subfamily of Rel/NF-kB genes. nuclear and induces transcription through kB sites In the absence of any other reference, the NF-AT (Miyamoto and Verma, 1995; Bours et al., 1993; Fujita proteins have been chosen as outgroup, although they et al., 1993). In contrast to previous results, our study can be considered as Rel/NF-kB members according does not show any direct clustering between Bcl-3 and to their DNA-binding characteristics (Chytil and NF-kB subfamilies (Ito et al., 1995). This discrepancy Verdine, 1996; Wolfe et al., 1997). All four NF-AT could be due to the recent cloning of new genes genes are probably descended from the vertebrate increasing the sample of sequences to compare. duplication of an ancestor gene that diverged very early from the Rel/NF-kB branch. For this reason, a A Rel/NF-kB/IkB superfamily Drosophila homologue of NF-AT genes should be discovered. Using the tree as an evolutionary framework, we decided to reconstruct the evolution of the di€erent structural domains of the Rel/NF-kB family members. The IkB family Given that members of the family such as NF-kBor As the Rel/NF-kB family, the organization of the IkB Relish contain two main domains, the RHD and family that we describe here is well supported by the ankyrin repeats, two models of evolution of these known functional characteristics of these factors. The domains can be considered depending on the structure topology of both the distance and parsimony trees and of the ancestor (Figure 4a, I and II). Model I proposes the analysis of evolutionary speeds clearly indicate that an ancestor gene which displays the RHD-ankyrin all three de®ned subfamilies harbor IkB-like ankyrin repeat structure. Two independent events of loss of the repeats and thus form a cluster of subfamilies equally ankyrin domain are thus necessary to give rise to the distant to the erythroid ankyrin. NF-AT family ®rst and later to the Rel subfamily I, Rel/NF-kBandIkBevolution CHuguetet al 2972 whereas both NF-kB genes and Relish conserved the between the Rel/NF-kB transcription factors and their ancestral structure. In the model II, an ancestral RHD inhibitors and the high identities in the ankyrin domain gives rise to the NF-AT genes and acquires later on of the long and short IkBs. ankyrin repeats in the Rel/NF-kB lineage, while a later loss of this ankyrin domain leads to Rel/NF-kB subfamily I. The evolution of the ankyrin domain of Materials and methods NF-kB1 and NF-kB2 inside the IkB family helps to de®ne which one of these equally parsimonious model Construction of the database and sequence alignments is favored and the structure of the ancestor. The tree topologies led us to propose a model of Sequences were extracted from the EMBL and Genbank evolution for the IkB proteins inside the ankyrin repeat data libraries using the FASTA program in the Infobiogen protein family (Figure 4b). The ankyrin motif appears network. To perform the FASTA search we used the full size sequences of the following factors: Homo RelA, Homo more ancient than the RHD since we could not identify cRel, Homo IRel, Homo NF-kB1, Drosophila Dorsal, RHD homologous sequences in Nematodes or Homo IkBa, Homo Bcl-3, Drosophila Cactus. The last (Fasta search, not shown) where ankyrin repeat search in the database was performed on the Genbank proteins have been described (Bork, 1993). The most release of August 1996. The Relish sequence was a kind parsimonious scenario suggests that a unique acquisi- communication from Dr D Hultmark. tion event of the RHD by an ankyrin repeat ancestor Sequences of the Rel Homology Domain and the ankyrin gave rise to all IkB sunfamilies and that a later loss of repeats were aligned using the ED program of the MUST this RHD led to subfamily II of short IkB proteins. package (Philippe, 1993). This program allows a color This scenario is in favor of the monophyly of visualization of the aligned sequences and the alignment is subfamily II suggested by our results. A comprehen- done by eye. Each erythroid ankyrin sequence chosen as a reference outgroup encloses more than 20 ankyrin repeats sive study of ankyrin repeats evolution suggested that exhibiting some di€erences in their sequence. We submitted an original set of ankyrin repeats would have been the sequences to the Clustal V program which allowed to duplicated several times before divergence into the compare each of the IkB ankyrin repeat motif with the best di€erent IkB subfamilies, e.g. long and short inhibitors corresponding motif in the erythroid ankyrin (Higgins and (Bork, 1993). In contrast and most likely because of Sharp, 1988). The ®nal alignment is done by eye using the the recent cloning of Relish, our study allows to de®ne ED program of MUST. the structure of an unique ancestral set of ankyrin The regions which are ambiguously aligned, i.e. outside of repeats which has acquired a RHD before any the RHD and between the repeats of ankyrin motifs, were duplication and only later has been duplicated and excluded from the analysis. From the initial alignment of Rel/ diverged into the di€erent IkB subfamilies. NF-kB sequences containing 1534 sites, 298 sites among which 286 were variable and 269 informative were retained in Considering that NF-kB1, NF-kB2 and Relish are the ®nal alignment. From the initial alignment of IkB shared by the two families, it is tempting to unify the sequences containing 2620 sites, 194 sites among which 182 Rel/NF-kB and IkB families into a large Rel/NF-kB/ were variable and 171 informative remain in the ®nal IkB superfamily where all of these molecules have alignment. The ®nal alignments were used for both distance descended from a common ancestor (Figure 4c). In matrix and parsimony analysis. We have tested the in¯uence such a view, model I proposed above for the Rel/NF- of subtle modi®cations of our alignment in the topology of kB family evolution is favored. Thus, we de®ne a the trees and found that all the branches supported by common evolutionary pathway that begins with an bootstrap values above 70% are not modi®ed by these ancestral set of ankyrin repeats that diverged after di€erences. This argues strongly in favor of the validity of duplication to evolve in various ankyrin proteins in one our conclusions. hand and acquired an ancestral RHD in the other hand. After duplication of the RHD-ankyrin repeat Phylogenetical reconstruction procedures ancestor, the NF-AT lineage separated ®rst losing the Distance matrix was calculated using a Boolean method: ankyrin domain while Relish and NF-kB subfamilies of every change including gaps is considered as 1, identical long ankyrin repeat proteins diverged conserving the amino acids as 0. Tree reconstruction was performed using ancestral structure. Later, a single event of scission the Neighbor Joining program available on MUST taking place before the arthropod separation is together with bootstrap analysis using 1000 replicates necessary to give rise to both the `ready-to-use' Rel (Philippe, 1993). This analysis was performed on several factors (c-Rel, RelA . . .) and the short IkB inhibitors. data sets: alignments containing di€erent amounts of gaps This model of evolution is particularly parsimonious in for both the Rel/NF-kBandIkB sequences. We checked that suppression of the positions with gaps from the contrast to a co-evolution model, as described for the alignment gives trees with identical topologies. For the insulin-NGF gene family and their receptor (Fryxell, Rel/NF-kB family, only the bootstrap values of some 1996), where the ankyrin repeats of all IkBs would internal branches were lowered by the suppression of the have evolved independently and converged to acquire gaps suggesting that gaps carry phylogenetical informa- their common characteristics. Moreover our model tions. For the IkB family the presence of gaps resulted in a may explain why the long IkB protein evolved to decrease of bootstrap values of internal branches of the preferentially interact with p50 and p52 processed from tree. Since these gaps are mainly located between the their N-terminal part and the short IkB to preferen- ankyrin repeats which are in variable numbers between the tially inhibit the `ready-to-use' Rel/NF-kB proteins. di€erent members, the information leading to the de®nition Indeed in such a model, the ankyrin of both NF-kB of the subfamilies (i.e. those concerned in the internal branches of the tree) is likely contained inside the ankyrin precursors and short IkBs preserved the speci®city to motif rather than in the arrangement of these repeats. preferentially inhibit `their N-terminal part' during Parsimony analysis was performed using the 3.0 version of evolution. In conclusion, this study may help to better the PAUP software (Swo€ord, 1991). To limit calculation understand the highly sophisticated regulations existing time, the number of sequences in this analysis was decreased Rel/NF-kBandIkBevolution C Huguet et al 2973 by exclusion of the mouse sequences when human homo- were transformed into evolutionary speed and expressed as a logues are known. We checked that this choice does not alter number of mutations per site and per year (M/S/Y) using the the conclusions of the analysis. On each parsimony analysis, following divergence times: Mus-Rattus; 13 Myr; Homo-Mus 100 bootstrap replicates were performed. and Homo-Rattus: 70 Myr; Mammals-Gallus: 300 Myr; All analysis were performed on a Macintosh Power PC Mammals-Xenopus and Gallus-Xenopus 360 Myr. Data were 6100/66. converted into Pauling units (1PAU=1079 M/S/Y). Evolu- tionary speeds for various molecules were then averaged and expressed with error bars for all the sequences. Determination of evolutionary rates To avoid saturation artifacts generated by the use of widely divergent species evolving at di€erent rates, we Acknowledgements excluded from the data set the Drosophila sequences and We are grateful to Corinne Abbadie and Dominique kept homologues of each molecule in human, rat, mouse, Ste helin for critical reading of the manuscript as well as chicken or Xenopus. Only cRel, RelA, RelB, NF-kB1, NF- to Bernard Vandenbunder for constant support. This work kB2, IkBa and the reference outgroup of ankyrins was supported by grants from the Institut Pasteur de Lille, remained in the analysis. the Centre National de la Recherche Scienti®que, the The data matrices comparing the di€erent homologous GREG and the Association pour la Recherche sur le versions of each molecule were calculated with standard . CH was supported by a fellowship from the errors using MEGA (Kumar et al., 1993) and crude data Association pour la Recherche sur le Cancer.

References

Arenzana-Seisdedos F, Thompson J, Rodriguez MS, Ghosh S, Gi€ord AM, Riviere LR, Tempst P, Nolan GP and Bachelerie F, Thomas D and Hay RT. (1995). Mol. Cell. Baltimore D. (1990). Cell, 62, 1019 ± 1029. Biol., 15, 2689 ± 2696. Gilmore TD, Koedood M, Pi€at KA and White DW. (1996). Atchley WR and Fitch WM. (1995). Proc. Natl. Acad. Sci. Oncogene, 13, 1367 ± 1378. USA, 92, 10217 ± 10221. Grumont RJ and Gerondakis S. (1989). Oncogene Research, Baeuerle PA and Baltimore D. (1996). Cell, 87, 13 ± 20. 4, 1±8. Baeuerle PA and Henkel T. (1994). Annu. Rev. Immunol., 12, Haskill S, Beg AA, Tompkins SM, Morris JS, Yurochko AD, 141 ± 179. Sampson-Johannes A, Mondal K, Ralph P and Baldwin Beg AA, Sha WC, Bronson RT and Baltimore D. (1995). Jr, AS. (1991). Cell, 65, 1281 ± 1289. Genes & Dev., 9, 2736 ± 2746. Hatada EN, Naumann M and Scheidereit C. (1993). EMBO Birkenmeier CS, White RA, Peters LL, Hall EJ, Lux SE and J., 12, 2781 ± 2788. Barker JE. (1993). J. Biol. Chem., 268, 9533 ± 9540. Hatada EN, Nieters A, Wulczyn FG, Naumann M, Meyer R, Bork P. (1993). Proteins, 17, 363 ± 374. Nucifora G, McKeithan TW and Scheidereit C. (1992). Bours V, Burd PR, Brown K, Villalobos J, Park S, Ryseck Proc.Natl.Acad.Sci.USA,89, 2489 ± 2493. RP, Bravo R, Kelly K and Siebenlist U. (1992). Mol. Cell. He ron E, Deloukas P and Van Loon APGM. (1995). Biol., 12, 685 ± 695. Genomics, 30, 493 ± 505. Bours V, Franzoso G, Azarenko V, Park S, Kanno T, Brown Higgins DG and Sharp PM. (1988). Gene, 73, 237 ± 244. K and Siebenlist U. (1993). Cell, 72, 729 ± 739. Hillis DM and Huelsenbeck JP. (1992). J. Heredity, 83, 189 ± Brownell E, Mittereder N and Rice NR. (1989). Oncogene, 4, 195. 935 ± 942. Hoey T, Sun YL, Williamson K and Xu X. (1995). Immunity, Capobianco AJ, Chang D, Mosialos G and Gilmore TD. 2, 461 ± 472. (1992). J. Virol., 66, 3758 ± 3767. Holland PWH and Garcia-Fernandez J. (1996). Dev. Biol., Capobianco AJ, Simmons DL and Gilmore TD. (1990). 173, 382 ± 395. Oncogene, 5, 257 ± 265. Huguet C, Enrietto P, Vandenbunder B and Abbadie C. Chytil M and Verdine GL. (1996). Current Opinion in (1994). Cell Death and Di€erenciation, 1, 71 ± 76. Structural Biology, 6, 91 ± 100. Ikeda T, Honjo K, Hirota Y and Onodera T. (1993). Gene, DavisN,GhoshS,SimmonsDL,TempstP,LiouHC, 133, 237 ± 242. Baltimore D and Bose HJ. (1991). Science, 253, 1268 ± Ip YT, Reach M, Engstrom Y, Kadalayil L, Cai H, 1271. Gonzales-Crespo S, Tatei K and Levine M. (1993). Cell, De Martin R, Vanhove B, Cheng Q, Hofer E, Csizmadia V, 75, 753 ± 763. Winkler H and Bach FH. (1993). EMBO J., 12, 2773 ± Ito CY, Adey N, Bautch VL and Baldwin Jr AS. (1995). 2779. Genomics, 29, 490 ± 495. Dobrzanski P, Ryseck R-P and Bravo R. (1993). Mol. Cell. Ito CY, Kazantsev AG and Baldwin Jr AS. (1994). Nucleic Biol., 13, 1572 ± 1582. Acids Res., 22, 3787 ± 3792. Dubreuil RR and Yu J. (1994). Proc. Natl. Acad. Sci. USA, Ja€ray E, Wood KM and Hay RT. (1995). Mol. Cell. Biol., 91, 10285 ± 10289. 15, 2166 ± 2172. Dushay MS, Asling B and Hultmark D. (1996). Proc. Natl. Kabrun N, Bumstead N, Hayman MK and Enrietto PJ. Acad. Sci. USA, 93, 10343 ± 10347. (1990). Mol. Cell. Biol., 10, 4788 ± 4794. Finco TS and Baldwin Jr, AS. (1995). Immunity, 3, 263 ± 272. Kao KR and Hopwood ND. (1991). Proc. Natl. Acad. Sci. Fracchiolla NS, Lombardi L, Salina M, Migliazza A, Baldini USA, 88, 2697 ± 2701. L, Berti E, Cro L, Polli E, Maiolo AT and Neri A. (1993). Kieran M, Blank V, Logeat F, Vandekerckhove J, Lottspeich Oncogene, 8, 2839 ± 2845. F, Le BO, Urban MB, Kourilsky P, Baeuerle PA and Fryxell KJ. (1996). TIG, 19, 364 ± 369. Israel A. (1990). Cell, 62, 1007 ± 1018. Fujita T, Nolan PP, Liou H-C, Scott ML and Baltimore D. Klement JF, Rice NJ, Car DB, Abbondanzo SJ, Powers GD, (1993). Genes & Development, 7, 1354 ± 1363. Bhatt H, Chen C-H, Rosen CA and Stewart CL. (1996). Geisler R, Bergmann A, Hiromi Y and NuÈ sslein-Volhard C. Mol. Cell. Biol., 16, 2341 ± 2349. (1992). Cell, 71, 613 ± 621. Kopp EB and Ghosh S. (1995). Advances in Immunology, 58, Ghosh G, Van Duyne GD, Ghosh S and Sigler PB. (1995). 1 ± 27. Nature, 373, 303 ± 310. Rel/NF-kBandIkBevolution CHuguetet al 2974 Kumar S, Tamura K and Nei M. (1993). MEGA: molecular Ruben SM, Klement JF, Coleman TA, Maher M, Chen CH evolutionary genetics analysis, version 1.01. The Pennsyl- and Rosen CA. (1992a). Genes & Development, 6, 745 ± vania Stage University, University Park, PA 16802. 760. Lambert S, Yu H, Prchal JT, Lawler J, Ru€ P, Speicher D, Ruben SM, Narayanan R, Klement JF, Chen CH and Rosen Cheung MC, Kan YW and Palek J. (1990). Proc. Natl. CA. (1992b). Mol. Cell. Biol., 12, 444 ± 454. Acad. Sci. USA, 87, 1730 ± 1734. Ryseck RP, Bull P, Takamiya M, Bours V, Siebenlist U, Laudet V, Hanni C, Coll J, Catze¯is F and Stehelin D. Dobrzanski P and Bravo R. (1992). Mol. Cell. Biol., 12, (1992). EMBO J., 11, 1003 ± 1013. 674 ± 684. Laudet V, Niel C, Duterque-Coquillaud M, Leprince D and Scheinman RI, Beg AA and Baldwin Jr AS. (1993). Mol. Stehelin D. (1993). Biochem and Biophys. Res. Com., 190, Cell. Biol., 13, 6089 ± 6101. 8 ± 14. Siebenlist U, Franzoso G and Brown K. (1994). Annu. Rev. Masuda ES, Naito Y, Tokumitsu H, Campbell D, Saito F, Cell. Biol., 10, 405 ± 455. Hannum C, Arai K and Arai N. (1995). Mol. Cell. Biol., Sif S and Gilmore TD. (1993). J. Virol., 67, 7612 ± 7617. 15, 2697 ± 2706. Steward R. (1987). Science, 238, 692 ± 694. McCa€rey PG, Luo C, Kerppola TK, Jain J, Badalian TM, Suyang H, Phillips R, Douglas I and Ghosh S. (1996). Mol. Ho AM, Burgean E, Lane WS, Lambert JN, Curran T, Cell. Biol., 16, 5444 ± 5449. Verdine GL, Rao A and Hogan PG. (1993). Science, 262, Suzuki K, Yamamoto T and Inoue J. (1995). Nucleic Acids 750 ± 754. Res., 23, 4664 ± 4669. Miyamoto S and Verma IM. (1995). Adv. Cancer Res., 66, Swo€ord DL. (1991). Phylogenetical analysis using parsi- 255 ± 292. mony, Version 3.1 Computer program distributed by the MuÈ ller CW, Rey FA and Harrison SC. (1996). Nature Illinois Natural History Survey, Champaign, Illinois. Structural Biology, 3, 244 ± 227. Tannahill D and Wardle FC. (1995). Int. J. Dev. Biol., 39, Muller CW, Rey FA, Sodeoka M, Verdine GL and Harrison 549 ± 558. SC (1995). Nature, 373, 311 ± 317. Tewari M, Dobrzanski P, Mohn KL, Cressman DE, Hsu JC, Naumann M, Nieters A, Hatada EN and Scheidereit C. Bravo R and Taub R. (1992). Mol. Cell. Biol., 12, 2898 ± (1993a). Oncogene, 2275 ± 2281. 2908. Naumann M, Wulczyn FG and Scheidereit C. (1993b). Thanos D and Maniatis T. (1995). Cell, 80, 529 ± 532. EMBO J., 12, 213 ± 222. Thompson JE, Phillips RJ, Erdjument-Bromage H, Tempst Nolan GP, Ghosh S, Liou HC, Tempst P and Baltimore D. P and Ghosh S. (1995). Cell, 80, 573 ± 582. (1991). Cell, 64, 961 ± 969. Verma IM, Stevenson JK, Schwarz EM, Van Antwerp D and Northrop JP, Ho SN, Chen L, Thomas DJ, Timmerman LA, Miyamoto S. (1995). Genes & Development, 9, 2723 ± 2735. Nolan GP, Admon A and Crabtree GR. (1994). Nature, Wasserman SA. (1993). Molecular Biology of the Cell, 4, 369, 497 ± 502. 767 ± 771. Ohno H, Takimo G and McKeithan TW. (1990). Cell, 60, Wilhemsen KC, Eggleton K and Temin HM. (1984). J. 991 ± 997. Virol., 52, 172 ± 182. Philippe H. (1993). Nucleic Acids Res., 21, 5264 ± 5272. Wolfe SA, Zhou P, DoÈ tsch V, Chen L, You A, Ho SH, Ruben SM, Dillon PJ, Schreck R, Henkel T, Chen CH, Crabtree GR, Wagner G and Verdine GL. (1997). Nature, Maher M, Baeuerle PA and Rosen CA. (1991). Science, 385, 172 ± 176. 251, 1490 ± 1493. Zabel U and Bauerle PA. (1990). Cell, 61, 255 ± 265.