Copyright 0 1996 by the Genetics Society of America

Evolution of Antennapedia-Class Genes

Jianzhi Zhang and Masatoshi Nei Institute of Molecular Evolutionaly Genetics and Department of Biology, The Pennsylvania State University, University Park, Pennsylvania 16802 Manuscript received June 13, 1995 Accepted for publication September 21, 1995

ABSTRACT Antennapedia (Antp)-class homeobox genes are involved in the determination of pattern formation along the anterior-posterior axis of the embryo. A phylogenetic analysis of Antpclass homeodo- mains of the nematode, Drosophila, amphioxus, mouse, and human indicates that the 13 cognate group genes of this gene family can be divided into two major groups, i.e., groups I and 11. Group I genes can further be divided into subgroups A (cognategroups 1-2), B (cognate group 3), andC (cognategroups 4-S), and group I1 genes can be divided into subgroups D (cognate groups 9-10) and E (cognate groups 11- 13),though this classificationis somewhat ambiguous. Evolutionary distances among different amino acid sequences suggest that the divergence between group I and group I1 genes occurred - 1000 million years (My) ago, and the five different subgroups were formed by -600 MY ago, probably before the divergence of Pseudocoelomates (e.g., nematodes) and Coelomates (e.g., insects and chordates). Our results showthat the genes that are phylogenetically closeare also closely located in the chromosome, suggesting that the colinearity between the gene expression and gene arrangement was generated by successive tandem gene duplications and that the gene arrangement has been maintained by some sort of selection.

HE homeobox is a highly conserved sequence of In vertebrates, there are four clusters of Antpclass T -180 nucleotides contained in many genes con- homeobox genes, and the genes within and between trolling development. Itencodes the homeodomain the clusters are evolutionarily related. They can be clas- that is capable of binding a DNA motif and regulating sified into 13 cognate gene groups (KAPPEN and RUD- gene transcription (GEHRINGet al. 1994). Homeobox DLE 1993) (see Figure 1).However, none of the clusters genes are involved in thespecification of the body plan, has all13 cognate genes. The amphioxus (Branchiostoma determination of cellfate, and several other basic devel- jloridae), which is thought to be a sister group of verte- opmental processes (MCGINNISand KRUMLAUF 1992; brates, has one cluster of 210 cognate genes (GARCIA- LAWRENCE and MORATA1994). More than 300 homeo- FERNANDEZ and HOLLAND1994). There areeight Antp boxcontaining genes and their relativeshave been class homeobox genes in Drosophilamelanogaster, but identified and sequenced in fungi, plants, and , they are located in two separate clusters on the same and they can be classified into 230 different classes chromosome (LAWRENCE and MORATA1994). Thenem- (KAPPEN et al. 1993; BURGLIN 1994). atode (Caenorhabditisekgans) has one cluster of four The Antennapedia (Antp)-classhomeobox genes are Antpclass homeobox genes (BORGLIN and RUVKUN of special importance, because they specifythe develop- 1993; SALSERand KENYON 1994). The orthologous and mental patterningof the body segments along theante- paralogous relationships of these genes are believed to rior-posterior axis of the animal embryo. In all metazo- be asgiven in Figure 1 (KAPPEN and RUDDLE1993; ans so far examined, this class of genes exists as one or RUDDLEet al. 1994), though the relationships of cog- more clusters of genes in the genome. The arrange- nate gene groups 6, 7, and 8 are somewhat ambiguous ment of the genes in the chromosome is identical or (see KRUMLAUF 1994). highly correlated to the order of the genes that specify There are studies of Antpclass homeobox genes in the anterior-posterior body segments. In the develop many other animals such as hydras(MURTHA et al. 1991; mental process, the 3' end gene of the Antpclass ho- NMTO et al. 1993), flatworms (BARTELSet al. 1993), an- meobox gene complex is first transcribed, and the nelids (DICK and BUSS 1994; SNOWand Buss 1994), other genes are transcribed successively from the 3' to crustaceans (CARTMIRIGHT et al. 1993), acorn worms the 5' end of the complex (DUBOULEand MORATA (PENDLETONet al. 1993), lampreys (PENDLETONet al. 1994; KRUMLAUF 1994). Almost all Antpclass genes lo- 1993), and others. These animals have very different cated in a chromosome have the same transcriptional body plans. However, the DNA sequences available direction with a few exceptions (see DISCUSSION). from these studies are either truncated or their geno- mic organizations are unknown. Therefore,the or- Corresponding author: Masatoshi Nei, Institute of Molecular Evolu- tionary Genetics, The Pennsylvania State University, 328 MuellerLab thologous or paralogous relationships of the genes are oratory,University Park, PA 16802. E-mail: [email protected] unclear.

Genetics 142 295-303 (January, 1996) 296 J. Zhang and M. Nei

Homo sapiens and Mus musculus n nnnnnn b nnnnn 7 5 nn nnn 5 FIGURE1.-Genomic organizations of Anten- nn nn napedia-class homeobox genesin the human (H. sapiens), mouse (M. musculus), amphioxus (B. $oridax), Drosophila (D.melanogaster) and nema- Branchiostoma iloridae nnnnnn tode (C. ekguns). There is an inversion of genes uu ceh-13 and ceh-15 in the nematode. Thearrow in- 10967654321 dicates the order of gene expression. Moving Drosophila Melanogaster along the clusters in a 3' to 5' direction, each successive geneexpresses later in the develop- mental process and more posterior along the an- terior-posterior axis of the animal embryo. Caenorhabditis elegans n n I"

ceh-1 1 mab-5 ceh-13 ceh- 15 5' 3 4 Posterior Anterior Gene expression Late Early

A number of authors have studied the evolutionary tween two amino acid sequences was measured by the relationships of Antpclass homeobox genes with the proportion of different amino acids between the se- aim of understanding the evolution of morphogenesis. quences (pdistance). The reason for using pdistance However,many of these studies are qualitative (e.g., rather than Poisson-correction distance or Dayhoff dis- KAPPEN et al. 1989; SCHUGHARTet al. 1989; HOLLAND tance (see KUMARet al. 1993) is that we are primarily 1992). Schubert et al. (1993) conducted a phylogenetic interested in determining the topology of the phyloge- analysis ofhuman andDrosophila Antpclass homeobox netic tree, andfor this purpose pdistanceis often better genes, but the statistical accuracy of the phylogenetic than other distance measures in our experience (e.g., tree is unclear. We have therefore conducted a detailed HUGHES andNEI 1993). Similar results have been ob- analysis of the evolutionary relationships ofall Antp tained about the pdistance and the Jukes-Cantor dis- class homeobox genes of which the genomic organiza- tance for nucleotide sequences (e.g., SAITOU and NEI tion is known. We studied theevolutionary relationships 1987; NEI 1991).The phylogenetic trees were con- of cognate gene groups aswell as those of the four structed by using the neighbor-joining (NJ) method clusters of Antpclass homeobox genes. (SAITOUand NEI 1987) with pdistance. The NJ method is known to be quite efficient in obtaining reliable trees MATERIAL AND METHODS (NEI 1991; NEI et al. 1995). We did not use other com- monly used methods such as maximum parsimony and Amino acid sequence data. In the present study, we maximum likelihood, because these methods could not used amino acid sequences rather than nucleotide se- handle the large number of sequences we analyzed. To quences, because synonymous nucleotide substitutions root the phylogenetic tree for all the Antpclass homeo- are apparently saturated inmost gene comparisons and box genes, we used two non-Antpclass homeobox se- this would introduce noise in constructing phylogenetic quences from the nematode(ceh-5 and ceh-19) (see KAF- trees (Russo et al., 1996). We used altogether 98 Antp PEN et al. 1993) as outgroups.Computation of p class homeodomain sequences, four from the nema- distances and construction of phylogenetic trees were tode (C. elegans), eight from Drosophila (D. mlanogas- conducted by using the computer software MEGA (KU- tu), 10 from the amphioxus (B.Jioridae), 38 from the MAR et al. 1993). In the constructionof the phylogenetic mouse (Mus musculus) and 38 from the human (Homo tree of the four clusters of genes in vertebrates, we sapiens). All the sequences were obtained from KAPPEN computed pdistances forall pairs of gene clusters from et al. (1993) except amphioxus sequences (GARCIA-FER- the human and the mouse (only cognate gene groups NANDEZ and HOLLAND1994), mouse Hox-a-13 (I-IAACK 1-10 were considered because the amphioxus cluster and GRUSS1993), andHox-c-18 and Hox-c-13 sequences has only cognate genes 1-10). pd'lstances were com- (PETERSONet al. 1994). There were no deletions and puted by using all amino acids involved in each pairwise insertions inthe amino acid sequences (60amino comparison (painvise deletion option). TheNJ tree was acids) of these homeodomains, so alignment was then constructed by using the amphioxus cluster as the straightforward. outgroup. Phylogenetic analysis: The evolutionary distance be- The reliability of the trees obtained was tested by a Evolution of Homeobox Genes 297 bootstrap method (FELSENSTEIN1985) with 1000 repli- our phylogenetic analysis gives no clear-cut support for cations and the confidence probability of each interior these relationships. More extensive experimental study branch (RZHETSKY and NEI 1992; SITNIKOVAet al. 1995). seems to be necessary. The bootstrap test was conducted by using MEGA, Group I1 genes can be divided into two subgroups, whereas the confidence probability test of each interior ie., subgroups D and E. Subgroup D includes cognate branch was done by using a computer program devel- gene groups 9 and 10of the mouse, human, andamphi- oped by N. TAKEZAKI. oxus. The Hox-9 gene of the amphioxus clusters with cognate group 10 genes, but this is probably due to RESULTS sampling error because it is not supported by the boot- strap test. Subgroup E consists of cognate groups 11 - Evolutionary relationships of Anpclass homeobox 13 of the mouse and human, andAbdB (Abdominal-B) genes: Figure 2 shows the phylogenetic tree of 98 Antp of Drosophila. The Drosophila AbdB gene is supposed class homeodomains from the five different species to belong to cognate group 9, and if we consider the mentioned above. It is seen that the Antpclass homeo- low bootstrap values of the subgroups within group 11, box genes canbe classified into two major groups, it seems that inclusion of this gene in subgroup E is group I and group 11. Group I genes can further be due to sampling error.The nematode gene ceh-11, divided into three subgroups, i.e., subgroups A, B, and which was previously aligned with cognate group9 C. This subgrouping is not statistically well supported genes (BURGLINet al. 1991; WANCet al. 1993), appears and thus provisional. SubgroupA includes cognate to be asister group of all other Antpclass genes, though gene groups 1 and 2 of the mouse, human, and amphi- this also could be due to sampling error. oxus; pb () and lab (labial) of Drosophila; Our bootstrap tests indicate that the clusters of major and ceh-13 of the nematode. Inclusion of the gene ceh- groups and subgroups of Antpclass homeobox genes are 13 in this subgroup is reasonable, because ceh-I3 and not as solid as they look. This is apparently because the ceh-15 of the nematode are considered to be inverted number of amino acid sites usedis small and the number and ceh-13 seems to be orthologous with the cognate of sequences analyzed is large. Particularly, the unex- gene group 1 (BURGLINand RUVKUN 1993). Subgroup pected clustering of the nematode and Drosophila genes B corresponds to cognate gene group 3 of the mouse, mentioned above are not statistically supported. By con- human, and amphioxus and is more closely related to trast, many clustersof human and mouse cognate genes subgroup C. However, this relationship is not statisti- are statistically significant. We have also conducted the cally well supported. Subgroup C consists of cognate confidence probability testof each interior branch. This gene groups4-8 of the mouse, human, andamphioxus; test generally gave a higher probability of confidence Dfd (Deformed), Scr (Sex combs reduced), Antp (An- than the bootstrap value, but since this test depends on tennapedia), abdA (abdominal-A) and Ubx (Ultrabith- a number of assumptions (SITNIKOVAet al. 1995), we orax) of Drosophila; and mab-5 and ceh-15 of the nema- have decided to rely on the bootstrap test, which is tode. The nematode genes ceh-15 and mab5 are known to be quite conservative (e.g., ZHARKIKH and LI supposed to be orthologous with cognate gene groups 1992; HILLISand BULL 1993; SITNIKOVAet al. 1995). 4 and 6, respectively (KAFJPENand RUDDLE1993), but Because there are four clusters of Antpclass homeo- they form a cluster separate from that of other genes box genes in the human and mouse and their ortholo- in subgroup C. This cluster makes the current grouping gous and paralogous relationships seem to be more of cognate genes questionable, and the function of the firmly established than those of other organisms, we nematode genes might have differentiated from that of constructed a phylogenetic tree for 76 genes from these the othergenes in subgroup C. However,it is also possi- two organisms. The tree obtained is presented in Figure ble that this clustering pattern occurred by chance be- 3. The topology of this tree is identical with that of the cause the bootstrap values for the clusters within sub- tree in Figure 2 for the partof human and mouse genes group C are generally low. According to the tree in except the branching order of cognate gene groups Figure 2, cognate group 5 genes do not form a mono- belonging to subgroups C and D. However, the boot- phyletic group, butthis could be due to sampling error strap tests show that the branching pattern of this tree caused by the stochastic process of amino acid substitu- is much morereliable than theprevious one. Thesubdi- tion. The homeobox has only 60 codons, so that it is vision of the genes into the two major groups is now difficult to obtain definitive conclusions from aphyloge- statistically significant. Cognate groups 11, 12, and 13 netic analysis alone. Cognate group 6, 7, and 8 genes also form a monophyletic group with a bootstrap value also do not necessarily form a monophyletic group. This of 90%. Many clusters representing different cognate could again be due to sampling error or to functional groups are statistically significant. However, the differentiation. Although the experimentalresults Seem branching order of cognate groups within subgroups C to support the orthologousrelationships of Drosophila and D remains uncertain. Whether subgroupB is closer gene Antp, Ubx, and abdA with vertebrate cognate gene to subgroup Aor C also remains statistically unresolved. groups 6,7, and 8, respectively (BACHILLERet al. 1994), Similarly, the phylogenetic position of cognate group 298 J. Zhang and M. Nei

67 Nematode-ceh-5 Nematode-ceh-19

-ceh.13 71

2 Amphioxus-3 Nematode-mab-5

I C

DrosoDhila-abdA Drosophila-Ahp 59Human-b-7 4 Mouse-b-7

83 Human-d-883 LC Mouse-d-8 Mouse-a-9

Mouse-d-9 Mouse-b-9 Ill AmDhioxus-9

Mouse-a-10

*5" +95 Mouse-010 Human-c-10Mouse-d-10 Drosophila-AWE ~

72Human-a-1 1 59 Mouse-a-11 100 89 Hutn"d-11 Moused-1 1 981 Humane-11 Mouse-c-11 57 HU~-C-l2 1001 MoUse-el2 E Humawd-12 Mo~~d-12 991 Human-a-13 90 Mouse-a-13 100 H~111awc-13 991 Mouse-013 97(Hutnan-d-13 Mouse-d-13 Nematode-ceh-11

0- 0.1 Evolution of Homeobox Genes 299

Huma-1 991 Mouse-a-1 99 Human-b-1 Mouse-b-1 96( 96( Human-dl 54 Mouse-dl

- HUN

52

93

rl Ma

Mouse-C-6 Mouse-a-6 HuMb-6

Mouse-a-7

0 0:r

FIGURE 3.-Phylogenetic tree of 76 human and mouse Antpclass homeobox genes.

10 is unclear. In the present paper, we assume that group than that (47%) for the branching order given cognate group 10 is closer to group 9, because Figure in Figure 3. 2 shows a higher bootstrap value (51%) for the D sub- Evolutionary relationshipsof the four clusters of cog-

FIGURE2.-Phylogenetic tree of 98 Antpclass homeobox genes of the human,mouse, amphioxus, Drosophila, and nematode and two outgroup genes (nematode ceh-5 and ceh-19). The tree is constructed by the NJ method. The numbers for interior branches are bootstrap percentages. Numbers <50 are not shown. The branches are measured in terms of the proportional difference of two amino acid sequences (pdistance) with the scale given below the tree. 300 J. Zhang and M. Nei

Amphioxus 99,- Human-c Mouse-c

Mouse-d

Mouse-a Human-b Mouse-b MYA 13 1211 10 9 8 7 6 5 4 3 2 1 uu-uu E D C BA -1 I II I 0 0.01 FIGURE5.-Evolutionary scenario of 13 cognate groups of FIGURE of clusters of 4.-Evolutionary relationships four Antpclass homeobox gene family.This scenario is speculated Antpclass homeobox genes of the human and mouse with according to the tree ofFigure 3 except the phylogenetic the amphioxus cluster as an outgroup. position of cognate group 10 (see the text). Branch lengths are not proportional to evolutionary time. I and I1 are two nate genes in vertebrates: There are four clusters of major groups and A, B, C, D, and E are five subgroups (see Antpclass homeobox genes in vertebrates, but the evo- the text). lutionary relationships of these clusters are not well established. Our NJ tree of the four clusters from the genes, the first gene duplication seems to have pro- mouse and human is given in Figure 4, and the duced the ancestral genes of cognate groups 9-10 and branching pattern is of the form (b(a(c, d))).However, of the others, and the subsequent duplication events we cannot rule out the branching pattern (a(b(c, d))) generated the remaining cognate genes (Figure 5). A or ((a, b)(c, d)),because the interior branch between cluster of 210 cognate genes was formed before the cluster b and the ancestor of clusters a, c, and d is amphioxus diverged from the vertebrate. Gene clusters not statistically significant. WPENand RUDDLE(1993) a, b, c, and d in vertebrates were probably generated constructed a treeof the four clusters of the human by by two events of genome duplication in the early stage minimizing the number of gene losses or gains. Their of vertebrate evolution, and later a few genes were lost study has shown that the tree ((a, b) (c, d))is one step by deletion events in each cluster. (The zebrafish, Xeno- shorter than (b(a(c, d))).If these four clusters evolved pus, newt, chicken, mouse, and human are known to by genome duplication, the tree ((a, b) (c, d))is more have the four clusters a, b, c, and d) (RUDDLEet al. parsimonious than (a(b(c, d)))or (b(a(c, d))),because 1994.) As speculated by KAPPEN and RUDDLE (1993), the tree ((a, b) (c, d))can be explained by two events the generation of these clusters probably contributed of genome duplication whereas the other trees require to the evolution of more complex organisms. threegenome duplications and three chromosome Origin and evolution of Antpclass homeobox genes losses. At the present time, it is difficult to decide which in invertebrates: The Antpclass homeobox genes exist of the three possible trees is correct. in all metazoans so far surveyed including one of the mostprimitive metazoans, sponges (DEGNANet al. 1995), but notin flagellates, amoeboids and ciliate pro- DISCUSSION tozoans, fungi, algae, and plants (DEGNANet al. 1995). Evolutionary scenarioof Antpclass homeobox genes: It seems that Antpclass homeobox genes originated at Although thebranching order of different cognate the very early stage of metazoan evolution. group genes is not firmly established in the trees of Figure 2 suggests that the divergence of the nema- Figures 2 and 3, it is interesting to speculate the evolu- tode gene ceh-13, the Drosophila lab gene, and tionary history of Antpclass homeobox genes; it may chordates cognate group 1 genes postdated the diver- give some useful information for planning futureexper- gence of cognate gene groups 1 and 2, and that the imental studies. Theoretically it is possible to consider last common ancestor of the nematode, Drosophila, several scenarios, but the following one seems to be and chordates already had cognate gene groups 1 and one of the most plausible ones. First, this gene family 2. Similarly, we can infer from Figure 2 that the last obviously evolvedfrom a single ancestral gene through common ancestor of the nematode, Drosophila, and gene duplication, and it seems that the first duplication chordates also had thecognate group gene3, the ances- of this gene produced the ancestral genes of group I tral gene of subgroup C, and the ancestral gene of and I1 genes. Duplication of the ancestral group I gene group 11. In otherwords, there were at least five cognate subsequently generated subgroups A, B, and C (Figure genes in the last common ancestor of Pseudocoelo- 5). Subgroup A and B genes are involved in the segmen- mates (nematode in this paper) and Coelomates (in- tation of the anterior part, whereas subgroup C genes sects and chordates in this papers). In the nematode, control the intermediate part along theanterior-poste- however, cognate group genes 2 and 3 seem to have rior axis of the animal embryo. In the case of group I1 been lost later, and the ancestral gene of groups 4-8 EvolutionGenes of Homeobox 301 apparently gaverise to genes mab-5 and ceh-15. It is of gene arrangement on thechromosomes and the co- interesting that thepositions of ceh-I5 and ceh-I? in the linearity of gene expression pattern and gene arrange- cluster are inverted according to the sequencecompari- ment, it is often stated that the arrangement order of son and functional analysis (see also BURGLINand RUV- Antpclass genes in the chromosome is important for KUN 1993). This could be the result of a chromosome their functions (e.g., DUNCAN andLEWIS 1982). At least inversion in the nematode. in Drosophila, however, thegene arrangement does Schubert et al. (1993) conducted aphylogenetic anal- not seem to be essential fortheir functions. In two ysis of the human andDrosophila Antpclass genes. The experiments, splitting the bithorax complex into two topology of their tree (Figure 2 of SCHUBERT et al. 1993) pieces did not affect the development of the larva or is different from ours in Figure 3. In their tree, cognate adult (STRUHL1984; TIONGet al. 1987). Atranslocation gene group8 is a sister group with cognate gene groups that separated the genes lab and pb from the rest of the 1-7, so the genesof cognate groups4-8 are not mono- complex also had no effect on the expression of these phyletic as in our tree. To examine the reliability of two genes (HAZELRIGGand KAUFMAN 1982). Moreover, SCHUBERTet aL’s tree, we reanalyzed their data and there are some other types of homeoboxgenes or non- found that the bootstrap values of their tree are lower homeobox genes within the Antpclass homeobox gene than ours, though thedifference is not very large (data cluster of Drosophila (see Figure 8 of BURGLIN1994), not shown).The difference between the two topologies and the existence of these genes does not seem to in- apparently occurred because SCHUBERTet al. used only fluence the expression pattern of Antpclass homeobox transversional nucleotide differences whereas we used genes. Therefore, the colinearity of gene expression amino acid sequences. Our analysis has shown that the and gene arrangement looks to be unimportant. transition/transversion ratio is not high (0.5-1.5) and We note that the phylogenetic tree of 13 cognate that the nucleotide frequencies are nearly the same for gene groups in Figure 3 can be described by the mathe- all genes. Therefore, the use of transversion sites only matical symbol (((1, 2)(3((4, 5)((6, 7)8))))(9(10(11- would lose a considerable amount of phylogenetic in- (12, 13))))).The genes that are phylogenetically close formation. When we used only nonsynonymous with each other are also closely located in the chromo- changes or nucleotide substitutions at first and second some. This suggests that the colinearity of gene expres- codon positions, the tree obtained was similar to our sion and gene order in each cluster partly reflects the tree rather than to SCHUBERTet al.’s. events of tandem duplication of genes. In other words, Origin of cognate gene groups 11-13: The tree in if genes are tandemly duplicated without disturbing the Figure 2 suggests that the divergence of the subgroups transcription order, the genes that are closely located D and E occurred earlier than that of the amphioxus in the chromosome tend to have both sequencesimilar- and vertebrate cognate group 10 genes. Therefore, if ity and functional similarity. Therefore, the colinearity our tree is correct, the amphioxus must have had the of gene arrangement and gene expression can be ex- ancestor of group 11- 13 genes previously. The cognate plained by the hypothesis of successive tandem gene group 11-13 genes have important functions in the duplication. limb formation in the mouse (HAACK and GRUSS 1993; In practice, however, when the genes in a multigene MORGAN and TABIN 1993; DUBOULE1994). Does the family are duplicated, inversion of genes also often oc- amphioxus have any organ that is homologous to the curs so that the transcriptional direction varies from limbs of tetrapods? The amphioxus is thought to be a gene to gene as in the case of major histocompatibility sister group of vertebrates and has some fin-like struc- complex genes of mammals(e.g., HUGHESand NEI tures. However, the fin-like structures areneither 1990) and immunoglobin heavy chain variable region paired nor separated as fish fins are. In theamphioxus, genes of chicken (e.g., REYNAUD et al. 1989). Yet, we cartilage-like materials stiffen the dorsal fin, but nonor- do not see gene inversions very often in the Antpclass mal vertebrate skeleton is found. Therefore, it seems homeobox gene family. (Two exceptions are the Dfd that the fin of the amphioxus, which may be homolo- gene in Drosophila and the ceh-I3 and ceh-15 genes in gous to the limbs of tetrapods, is poorly developed. For the nematode). This is rather striking if we consider this reason, the cognate group 11- 13 genes might have the longhistory of thisgene family. It is therefore quite been lost during the evolution of the amphioxus, or possible thatthe current gene arrangement is im- the loss of these genes might have prevented the fin- portant for the function of the genes and the gene like structures from developing into vertebrate-like skel- arrangement is maintained by some weak selection. If etons. Of course, it is stillpossible that these genes the selection is weak, experimental separation or trans- actually exist in the amphioxus but have not been dis- location of genes would not disturb the development covered, because the 5’ upstream region of the Hox-10 of an individual, yet the gene complex with inverted gene has not been studied extensively (GARCIA-FERNAN- genes will have only a small chance of being fixed in DEZ and HOLLAND1994). the population. It is therefore likely that the current Colinearity of gene arrangement and phylogeny: Be- gene arrangementis maintained by some kind of purify- cause of the remarkable conservation of thesame order ing selection. 302 J. Zhang and M. Nei

Coevolution of Ant$xhsshomeobox genes and ani- the divergence between subgroup A and subgroups B mal body plans: Biologists are interested in the evolu- plus C seems to have occurred -550 X (0.440/0.333) tion of Antpclass homeobox genes mainly because of = 730 MY ago, whereas subgroups B and C apparently the importance of this gene family in determining the diverged -600 MY ago. Similarly, subgroups D and E animal body plan. It has been argued that the addition seem to have diverged -800 MY ago. These computa- of new members of this gene family has been a driving tions suggest that the five subgroups A, B, C, D, and E force of morphological evolution in animals (e.g., LEWIS were formed by 600 MY ago and that the evolution of 1978). Nevertheless, recent studies with insectsand crus- these subgroups predated thedivergence of Pseudocoe- taceans suggest that changes in the number, size, and lomates and Coelomates. Furthermore, following the pattern of body structures are most likely to involve tree topology of Figure 2, we earlier discussed the possi- changes in the timing and spatial regulation of Antp bility that the cognate groups 1 and 2 had already di- class homeobox genes or their target genes in the regu- verged before the divergence of Pseudocoelomates and latory pathway (WARREN et al. 1994; AVEROFand AKAM Coelomates. In other words, there were probably six 1995; CARROLL 1995; CARROLL et al. 1995). They also Antpclass homeobox genes in the common ancestor of suggest that theAntpclass genes control only a very basic Pseudocoelomates and Coelomates. So it seems that body plan on which the segmental diversityevolved. Antpclass homeobox gene family of this ancestral or- This is consistent with the fact that many organisms have ganism was similar to that of the modern segmented the same organization of Antpclass homeobox genes animals though body segmentation probably did not even though their body plans are very different. exist at that time. Since the rate of amino acid substitu- This suggests that the Antpclass genes evolved in the tion in this gene family varies from evolutionary lineage very early stage ofmetazoan evolution. To obtain some to evolutionary lineage and the paleontological esti- idea about the age of this class of genes, we tried to mate of the time of divergence between the nematode estimate the time of divergence between group I and and the other fourorganisms is not accurate, the above group I1 genes. The average pdistance between the estimates are very rough. Yet, they are not inconsistent nematode gene ceh-13 and the cognate group 1 genes with the argument that many new body plans evolved of the other four organisms (Drosophila, amphioxus, owing to the changes of the regulation of Antpclass mouse, and human) is 0.292, whereas the average p homeobox genes or their target genes in the regulatory distance between the nematode genes ceh-15 and mab- pathway rather than the acquisition of new members 5 and the genes of other four organisms in subgroup of this gene family (WARRENet al. 1994; AVEROFand C is 0.274. Therefore, the average divergence between hAM 1995; CARROLI. 1995; CARROLL et a[. 1995). the orthologous homeobox genes of the nematode and We thank NAOKOTAKEZAKI for developing a computer program those of the other four organisms becomes 0.283. By for computing the confidence probabilities of interior branches for contrast, the average pdistance of group I and I1 genes amino acid sequence data and assistance in preparing Figures 2, 3, is 0.443. Therefore, if we assume that Pseudocoelomates and 4. We are also grateful to CIAUDIAKAPPEN for her comments on (nematodes in this paper) and Coelomates (insects and an earlier version of this paper. This work was supported by research grants from the National Institutes of Health and the National Sci- chordates in this paper) diverged 550 million years ence Foundation to M.N. (MY) ago (KNOLL 1992), thedivergence between group I and group I1 genes is estimated to be 900 ago. MY LITERATURE CITED In this computation we used pdistance, which is not proportional to evolutionary time. This can be rectified AVFROF,M., and M. AKAM, 1995 Hox genes and the diversification of insect and crustacean body plans. Nature 376: 420-423. ifwe use Poisson-correction distance. Poisson-correc- BACHILLER,D., A. MACIAS,D. DUROULE andG. MORATA, 1994 Conser- tion distance (d)can be obtained by d = -ln(l - p), vation of a functional hierarchy between mammalian and insect where is the proportion of different amino acids be- Hox/Hom genes. EMBO J. 13: 1930-1941. p BARTELS,J. L., M. T. MURTHA and F. H. RUDDLE,1993 Mutiple tween two sequences. This distance becomes 0.333 be- Hox/HOM-class in Platyhelminthes. Mol.Phylo- tween the nematode and the chordates plus Drosophila genet. Evol. 2: 143-151. and 0.585 for the divergence between group I and BORGLIN,T. R., 1994 A comprehensive classification of homeobox genes, pp, 27-71 in Guidebook to the Homobox Genes, edited by group I1 genes. Therefore, we obtain 1000 My as the D. DUBOIJIX.Oxford University Press, New York. time of divergence between group I and I1 genes. BURGLIN,T. R., and G. RWKUN,1993 The Caenorhabditis ekgans These estimates are certainly very crude, but if they homeobox gene cluster. Curr. Opin. Gen. Dev. 3: 615-620. BORGLIN,T. R., G. RWKUN,A. COULSON,N. C. HAWHINS,J. D. arecorrect, the Antpclass homeoboxgene family MCGHEEet al., 1991 Nematodehomeobox cluster. Nature evolved in the very early stage of metazoan evolution 351: 703. CARROI.L, S. B., 1995 Homeotic genes and the evolution of arthro- (KNOLL 1992; CONWAYMORRIS 1993). Using similar cal- pods and chordates. Nature 376: 479-485. culations, we obtain a d distance of 0.440 for the diver- CARROI.L,S. B., S. D. WEATHERBEEandJ. A. LANGEIAND,1995 Home- gence between subgroup A and subgroups B plus C, otic genes and the regulation and evolution of insect wing num- 0.365 between subgroups B and C, and 0.485 between her. Nature 375: 58-61. CARTWRIGHT,P., M. DICKand L. M. BUSS, 1993 HOM/Hox type subgroups D and E. In the above we have seen that a homeoboxes in the chelicerate Limulus polyphemus. Mol. Phylo- d distance of 0.333 corresponds to 550 MY. Therefore, genet. Evol. 2 185-192. Evolution of Homeobox Genes 303

CONWAYMORRIS, S., 1993 The fossil record and the early evolution of homeobox genes in development and evolution. Proc. Natl. of the Metazoa. Nature 361: 219-225. Acad. USA 88: 10711-10715. DEGNAN,B. M., S. M. DEGNAN,A. GIUSTIand D. E. MORSE, 1995 A NAITO,M., H. ISHIGURO, T.FUJISAWA and Y. KUROSAWA,1993 Pres- hox/hom homeobox gene in sponges. Gene 155: 175-177. ence of eight distinct horneobox-containing genesin cnidarians. DICK,M. H., and L. M.BUSS, 1994 APCR-based survey of homeobox FEBS Lett. 333: 271-274. genes in Ctenodrilus serratus (Annelida: polychaeta). Mol. Phylo- NEI,M., 1991 Relative efficiencies of different tree-making methods genet. Evol. 3: 146-158. for molecular data, pp. 90-128 in Phylogenetic Analysis of DNA DUBOULE,D., 1994 How to make a limb? Science 268: 575-576. Sequences, edited by M. M. MIYAMOTOand J. CRA~RAFT.Oxford DUBOULE,D., and G. MORATA,1994 Colinearity and fhctional hier- University Press, New York. archy among genes of the homeotic complexes. Trends Genet. NEI, M., N. TAKEZAKIand T. SITNIKOVA,1995 Assessing molecular 10: 358-364. phylogenies. Science 267: 253-255. DUNCAN,I., and E. B. LEWIS,1982 Genetic control of body segment PF,NDI.ETON,J. W., B. K. NAGAI,M. T. MURTHAand F. H. RUDDIX, differentiation in Drosophila, pp. 533-554 in Developmental Order 1993 Expansion of the family and the evolution of Its Orig.ln and Regulation, edited by S.S. SUBTELNY.A.R. Liss, chordates. Proc. Natl. Acad. Sci. USA 90: 6300-6304. New York. PETERSON,R. L., T. PAPENBROCK,M. M.DAVDA and A. AWGULE- FELSENSTEIN, J., 1985 Confidence limits on phylogenies: an ap- WITSCH,1994 Themurine Hoxc clustercontains five neigh- proach using the bootstrap. Evolution 39: 783-791. boring AbdBrelated Hox genes that show unique spatially coordi- GARCIA-FERN~DEZ,J., and P. W. H. HOLIAND,1994 Archetypal nated expression in posterior embryonic subregions. Mech. Dev. organization of the amphioxus Hox gene cluster.Nature 37: 47: 253-260. 563-566. REYNAUD,C.-A,, A. DAHAN,V. ANQUIT and J.-C. WEII.I.,1989 Somatic GEHRING,W. J., Y. Q. Qm,M. BILLETER,K FURUKUBO-TOKUNAGA, hyperconvertion diversifies the single VH gene of the chicken A.F. SCHIERet a/., 1994 Homeodomain-DNA recognition. Cell with a high incidence in the D region. Cell 59: 171-183. 78: 211-223. RUDDLE,F. H.,J. L. BARTELS,K. L. BENTLEY,C. KAPPEN, M. T. MURTHA HAACK,H., and P. GRLISS,1993 The establishment of murine Hox- ~t a/., 1994 Evolution of HOX genes. Annu. Rev. Genet. 28: 1 expression domains during patterning of the limb. Dev. Biol. 423-442. 157: 410-422. Russo, C. A. M.,N. TAKEZAKI, and M. NEI, 1996 Efficiencies of HAZELRIGG,T., and T. C. KAUFMAN,1982 Revertants of dominant differentgenes and differenttree-building methods in recov- associated with the Antennapedia gene complex of Dre ering a known vertebrate phylogeny. Mol. Biol. Evol. (in press). sophila mehnoguster: cytology and genetics. Genetics 105: 581 -600. RZHETSKY,A,, and M. NEI,1992 A simple method for estimating and HILLIS,D. M.,and J. J. BULL,1993 An empirical test of bootstrapping testing minimum-evolution trees. Mol. Biol. Evol. 9: 945-967. as a method for assessing confidence in phylogenetic analysis. SAITOU,N., and M. NEI, 1987 The neighborjoining method: a new Syst. Biol. 42: 182-192. method for reconstructing phylogenetic trees. Mol. Biol. Evol. HOLIAND,P., 1982 Homeobox genes invertebrate evolution. BioEs- 4 406-425. says 14 267-273. SALSER,S. J., and C. KENYON,1994 Patterning C. elegans: homeotic HUGHES,A. L., and M. NEI, 1990 Evolutionary relationships of class cluster genes, cell fates and cell migrations. Trends Genet. 10: I1 major-histocompatibility-complex genes in mammals. Mol. 159-164. Biol. Evol. 7: 491-514. SCHUBERT,F. R., K. NIESELT-STRUWEand P. GRUSS,1993 The Anten- HUGHES,A. L., and M. NEI, 1993 Evolutionary relationships of the napedia-type homeobox genes have evolved from three precur- classes of major histocompatibility complex genes. Immunoge- sors separated early in metazoan evolution. Proc. Natl. Acad. Sci. netics 37: 337-346. USA 90: 143-147. KAPPEN,C., and F. H. RUUDLE, 1993 Evolution of a regulatory gene SCHUGHART,IC, C. KAPPEN and F. H. RUDDLE,1989 Duplication of family: HOM/HOX genes. Curr. Opin. Gen. Dev. 3 931-938. large genomic regions during theevolution of vertebrate homeo- KAPPEN, C., K SCHUGHARTand F. H. RUDDLE,1989 Two steps in box genes. Proc. Natl. Acad. Sci. USA 86: 7067-7071. the evolution of Antennapetia-class vertebrate homeobox genes. SITNIKOVA,T., A. RZHETSKY and M. NEI, 1995 Interior-branch and Proc. Natl. Acad. Sci. USA 86: 5459-5463. bootstrap tests of phylogenetic trees. Mol. Biol. Evol.12 319-333. KAPPEN, C., K SCHUGWRTand F. H. RUDDLE, 1993 Early evolution- SNOW,P., and L. W.BUSS, 1994 HOM/Hox-Type homeoboxes from ary origin of major homeodomain sequence classes. Genomics Stylaria lacustris (Annelida: oligochaeta). Mol. Phylogenet. Evol. 18: 54-70. 3: 360-364. KNOLL, A. H., 1992 The early evolution of eukaryotes: a geological STRUHL,G., 1984 Splitting thebithorax complex of Drosophila. perspective. Science 256: 622-627. Nature 308: 454-457. KRUMIAUF,R., 1994 Hox genes in vertebrate development. Cell 78: TIONG,S. Y., J. R. WHITTLE and M.C. GRIBBIN,1987 Chromosomal 191-201. continuity in theabdominal region of the bithoraxcomplex KUW, S., K. TAIVIURAand M. NEI, 1993 MEGA Molecular Evolu- of Drosophila is not essential for its contribution to metameric tionary Genetics Analysis, version 1.01. Pennsylvania State Uni- identity. Development 101: 135-142. versity, University Park, PA. WANG, B. B., M.M. MULLER-~MMERGI.UCK,J. AUSTIN, N. T. ROBINSON, A. CHISHOLMrt a/.,1993 A homeotic gene cluster patterns the LAWRENCE, P., and G. MORATA,1994 Homeobox genes: their func- anterioposterior body axis of C. ekgans. Cell 74: 29-42. tion in Drosophila segmentation and pattern formation.Cell 78: WARREN,R. W., L. NAGY,J. SEI.EGUE,J. GATESand S. CARROI.~.,1994 181 - 189. Evolution of homeotic gene regulation and the function in flies LEWIS,E. B., 1978 A gene complex controlling segmentation in and butterflies. Nature 372: 458-461. Drosophila. Nature 276: 565-570. Z~KH,A., and W.-H. LI, 1992 Statistical properties of bootstrap MCGINNIS,W., and R. KRUMLAUF,1992 Homeobox genes and axial estimation of phylogenetic variability from nucleotide sequences: patterning. Cell 66 283-302. I. Four taxa with a molecular . Mol. Biol.Evol. 9: 1119- MORGAN,B. A,, and C. J. TABIN,1993 The role of homeobox genes 1147. in limb development. Curr. Opin. Gen. Dev. 3: 668-674. MURT~,M. T., J. F. LECKMANand F. H. RUDDLE,1991 Detection Communicating editor: N. TAKAHATA