letters to nature

amniote divergence at 360 Myr ago also agrees with the fossil- A molecular timescale for based estimate10,14. Fewer genes are available to time the earliest divergences among vertebrates, but molecular times (of 564 and vertebrate evolution 528 Myr ago) are consistent with the Late Cambrian fossil record for the first appearance of vertebrates (at 514 Myr ago)14. Sudhir Kumar & S. Blair Hedges A striking pattern revealed by our molecular divergence times is Department of Biology and Institute of Molecular Evolutionary Genetics, the Cretaceous origin of all modern orders of examined 12 15 208 Mueller Laboratory, Pennsylvania State University, University Park, (Fig. 3). Earlier molecular and fossil studies found evidence that Pennsylvania 16802, USA at least some mammalian divergences occurred in the Cretaceous, ...... leaving open the possibility of a gradual diversification of orders into the early Cenozoic. Molecular times now indicate that at least A timescale is necessary for estimating rates of molecular and five major lineages of placental mammals (Edentata, Hystri- morphological change in organisms and for interpreting patterns cognathi, Sciurognathi, Paenungulata, Archonta þ Ferungulata) 1–9 of macroevolution and biogeography . Traditionally, these times may have arisen in the Early Cretaceous, Ͼ100 Myr ago, and that have been obtained from the fossil record, where the earliest most mammalian orders were involved in a Cretaceous radiation representatives of two lineages establish a minimum time of that predated the Cretaceous/Tertiary extinction of the dinosaurs 10 divergence of these lineages . The clock-like accumulation of (Fig. 3). The origin of most mammalian orders seems not to be tied sequence differences in some genes provides an alternative to the filling of niches left vacant by dinosaurs, but is more likely to 11 method by which the mean divergence time can be estimated. be related to events in Earth history12. Similarly, a mid-Cretaceous Estimates from single genes may have large statistical errors, but divergence was obtained for the bird orders Anseriformes and multiple genes can be studied to obtain a more reliable estimate of Galliformes12,17. 1,12–13 divergence time . However, until recently, the number of Multigene divergence times within several orders of mammals genes available for estimation of divergence time has been limited. compare closely with fossil-based estimates. For example, molecular Here we present divergence-time estimates for mammalian orders divergence times from humans to chimpanzees, gorillas, gibbons, and major lineages of vertebrates, from an analysis of 658 nuclear and Old World monkeys are close to currently accepted dates genes. The molecular times agree with most early (Palaeozoic) and from the fossil record4,10,18. The orangutan molecular divergence late (Cenozoic) fossil-based times, but indicate major gaps in the Mesozoic fossil record. At least five lineages of placental mammals a arose more than 100 million years ago, and most of the modern Myr Mammals Birds orders seem to have diversified b efore t he Cretaceous/Tertiary 0 extinction of the dinosaurs. Molecular clocks are first c alibrated w ith a k nown t ime of divergence and then used to estimate divergence times of other 100 species. The divergence of birds and mammals provides a reliable calibration point with which to anchor molecular clocks. The earliest ancestors of mammals (synapsids) and birds (diapsids) 200 are lizard-like and first a ppear i n t he C arboniferous p eriod, at Synapsids Diapsids ϳ310 million years (Myr) ago10,14 (Fig. 1a). The fact that the fossil record10,14 documents a morphological transition from lobe-finned fishes t o s tem t etrapods a t 370–360 M yr a go, a nd r ecords the 300 310 appearance of stem amphibians at 338 Myr ago, indicates that the time of the diapsid–synapsid split (within amniotes) is unlikely to be a considerable underestimate. Alternatively, multiple calibration b points based on the mammalian fossil record may be used, but this ᐉ A A might result in substantial underestimates of divergence time12,15. 2 We used 658 genes, representing 207 vertebrate species, to t A estimate divergence times by two methods (see also Fig. 1b; A Methods; Supplementary Information). Taxonomic biases in the T 1 ᐉ sequence databases resulted in a predominance of mammalian 1 sequences studied. For estimates derived from large numbers of B genes, distributions of divergence-time estimates are approximately normal (Fig. 2a–i; Methods). These distributions show considerable C dispersion around the peak, as reflected i n h igh c oefficients of variation. On average, standard errors of 10% were obtained with ϳ Figure 1 Estimation of divergence times. a, Calibration. The arrow marks the first 10 genes, 5% with 50 genes, and 3% with 100 genes. Multigene time appearance of synapsids (ancestors of mammals) and diapsids (ancestors of estimates obtained without rate-constancy tests were nearly iden- birds) in the fossil record at 310 Myr ago. Reconstructions of an early synapsid tical to those obtained with stringent rate testing (Fig. 2j–l). This (Varanosaurus) and stem diapsid (Hylonomus) are shown. The dark shading indicates that there are probably no underlying directional biases in represents the reptilian portion and the lighter shading represents the avian and the data. mammalian portion of the phylogeny. b, Two methods for dating the unknown Molecular times for the origin of the major lineages of vertebrates divergence time (t) between A1 and A2 when A and B diverged at calibration time T in the Palaeozoic and early Mesozoic eras are similar to those that (Myr ago). In the average distance method, t ¼ d12 =ð2rÞ, where r ¼ ðd1B þ d2B Þ=ð4TÞ are based on the fossil record14 (Fig. 3). The molecular time estimate is the rate of change for lineages A and B, and dij is the number of substitutions per for the marsupial–placental split, 173 Myr ago, corresponds well site between sequences i and j. In the lineage-specific m ethod, t 12 ¼ d12 =ð2rA Þ with the fossil-based estimate (178–143 Myr ago)16. The bird– (where rA ¼ ᐉA =TÞ; alternatively, t is based on the length of one of the two lineages; crocodilian divergence is slightly younger than the earliest fossils that is, t ¼ ᐉ1=r or t ¼ ᐉ1 =rA, where ᐉA and ᐉ1 are estimated by the ordinary least suggest10, at 240 Myr ago, but this difference is less than one squares method (C ¼ outgroup). standard error. The molecular estimate of the lissamphibian– Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 392 | 30 APRIL 1998 917 letters to nature time (8.2 Myr ago) corresponds with the age of the unique skull of forms are known. Recent findings of 85-Myr-old placental fossils the fossil hominoid Sivapithecus19, which is usually placed on the from Central Asia15–16 are also evidence for the presence of mam- orangutan lineage, but is about 4 Myr younger than the earliest teeth malian fossils in this Mesozoic gap. and jaw fragments assigned to Sivapithecus19. The Sivapithecus– Our molecular timescale for vertebrate evolution will be useful in orangutan association itself has been questioned20,21. Molecular time calibrating local molecular clocks and in estimating intraordinal estimates among cetartiodactyls (whales and artiodactyls) and for divergence times more reliably, especially in groups with poor fossil the catarrhine–platyrrhine and feliform–caniform divergences are records. Molecular times also provide an independent measure of close to, or only slightly older than, fossil-based estimates. the tempo and mode of morphological change. For example, the In contrast, molecular divergence times among sciurognath sudden appearance (in the Early Tertiary fossil record) of mamma- (Fig. 3) are roughly four times older than their fossil- lian and avian orders, which show large morphological differences, based estimates22, as was found previously1,7,23. Because these times has been taken to imply rapid rates of morphological change at that were estimated from many genes (343) and did not change when a time14,24. Now, the possibility of 20–70 Myr of prior evolutionary lineage-specific method (Fig. 1b) was used (e.g., a divergence time history relaxes that assumption and suggests a greater role for Earth of 41 Myr ago was obtained for the –rat divergence), the history in the evolution of terrestrial vertebrates12,25. An accurate difference cannot be attributed to stochastic error or increased rate knowledge of divergence times can help to direct the search for of substitution in rodents. Furthermore, increased stringency of the ‘missing’ fossils and test hypotheses of macroevolution. Ⅺ rate-constancy test resulted in similar time estimates (Fig. 2j–l)...... Overall, fossil-based and molecular times are in relatively close Methods agreement (Fig. 4a), except for the origin of placental orders and the Sequence retrieval and tests of molecular clocks. Amino-acid sequences of early history of rodents. The average difference between molecular nuclear genes were obtained from the HOVERGEN26 database (Genbank and fossil-based dates for Mesozoic comparisons is large (30%) Release 97) and all 5,050 gene families were manually examined. Alignments (Fig. 4b). An interpretation of this gap in the Mesozoic fossil record were retrieved whenever data were available to time at least one divergence. is that molecular times are overestimates. However, this is unlikely Genes under strong positive selection (for example, major histocompatibility as many earlier (Palaeozoic) and later (Cenozoic) fossil and mole- complex genes) and sequences with ambiguous orthologies and extensive cular dates show close agreement, especially those dates involving alignment gaps were excluded. Gene phylogenies were scrutinized further taxa with well documented fossil records and where transitional and genes (or sequences) showing extensive rate variation among lineages

a d g j 25 100 N = 56 15 N = 47 N =343 mM m M mM 100 n = 50 n = 41 n =309 20 m = 23 Myr 12 m = 65 Myr 80 m = 41 Myr M = 21 Myr M = 70 Myr M = 38 Myr 75 V = 37% V = 26% V = 37% 15 9 60

50 10 6 40

Pairs (%) Number of genes 5 3 20 25

0 0 0 0 9 18 27 36 45 30 45 60 75 90 105 5 15 25 35 45 55 65 75 85 0

b mM e mM h k N = 52 mM 12 n = 46 15 N = 56 40 N =119 100 m = 66 Myr n = 50 n =107 35 10 M = 63 Myr 12 m = 83 Myr m = 91 Myr V = 23% M = 79 Myr 30 M = 88 Myr 75 8 V = 35% V = 23% 9 25 6 20 50 6 4 15 Genes (%) 10 25 Number of genes 2 3 5 0 0 30 40 50 60 70 80 90 100 0 25 50 75 100 125 150 40 60 80 100 120 140 160 0

c f i l mM 375 100 35 m M 40 M m N =333 N =120 N =107 n =299 30 n =108 35 n = 95 80 m = 92 Myr m =112 Myr m =441 Myr 30 125 M = 92 Myr 25 M =114 Myr M =360 Myr 25 60 V = 25% 20 V = 32% V = 43% 100 20 15 75 40 15 Time (Myr) 50 10 10

Number of genes 20 5 5 25 0 0 0 0 40 60 80 100 120 140 160 45 65 85 105 125 145 165 185 205 200 300 400 500 600 700 800 900 None 3.84 2.71 1.64 0.45 Time (Myr) Time (Myr) Time (Myr) χ2 cut-off

Figure 2 Histograms (a–i) of distributions of single-gene divergence times for Paenungulata); g, mouse and rat; h, Primates and Lagomorpha; and i, Amphibia nine multigene time estimates, and graphs (j–l) of the effects of increased and Amniota. j, Percentage of pairwise comparisons not rejected. k, Percentage stringency of the rate-constancy test (corresponding to areas of 5% (recom- of genes not rejected. l, Time estimates for divergences a–i. Symbols for mended),10%, 20%, and 50%, of the ␹2 rejection curve) for the same divergences. histograms: M, mode; m, mean; N, total number of genes; n, number of genes Divergence of: a, Hominoidea and Cercopithecoidea; b, and Cricetidae; used after removal of outliers; V, coefficient of variation. The locations of m and M c, Archonta and Ferungulata; d, Ruminantia and Suidae; e, Carnivoraþ are shown. Myr, millions of years ago. Perissodactyla and Cetartiodactyla; f, Rodentia and (Archonta þ Ferungulataþ

Nature © Macmillan Publishers Ltd 1998 918 NATURE | VOL 392 | 30 APRIL 1998 letters to nature

Myr R Figure 3 A molecular timescale for vertebrate evolution. All times indicate Myr

Chimp 5.5 ± 0.2 (5) separating humans (or the largest sister group containing humans) and the group 10 Gorilla 6.7 ± 1.3 (6) shown, except when the comparative groups are separated by a slash (/). Time Orangutan 8.2 ± 0.8 (6) estimates are shown with Ϯ1 s.e.m. and the number of genes used is given in 20 Gibbon 14.6 ± 2.8 (4) parentheses. Three groups of mammalian orders are: Archonta (Primates, Bovinae/Caprinae 19.6 ± 1.8 (16) Scandentia, Dermoptera, Chiroptera, and Lagomorpha), Ferungulata (Carnivora, 30 Cervoidea/Bovoidea 22.8 ± 4.7 (3) Cetartiodactyla, and Perissodactyla), and Paenungulata (Hyracoidea, Probosci- Cercopithecidae 23.3 ± 1.2 (56) dea, Sirenia). Cam, Cambrian; Carbonif, Carboniferous; Dev, Devonian; PC, CENOZOIC 40 ± Tertiary Mouse/Rat 40.7 0.9 (343) Precambrian; Perm, Permian; Prot, Proterozoic; Sil, Silurian; Tri, Triassic. Feliformia/Caniformia 46.2 ± 5.7 (10) 50 Platyrrhini 47.6 ± 8.3 (9) were removed. The 1DN test27 was conducted for all relevant triplets of Suidae/Cetacea 58.2 ± 22.3 (3) 60 sequences, and pairs in which rate constancy was rejected in any of these Suidae/Ruminantia 64.7 ± 2.6 (47) Muridae/Cricetidae 65.8 ± 2.2 (52) comparisons were excluded (22% of pairs out of 7,943 pairs were rejected). 70 Gerbillidae/Muridae 66.2 ± 7.6 (4) Estimating evolutionary rates for protein-clock calibration. Gene-specific Carnivora/Perissodactyla 74.0 ± 5.7 (11) evolutionary rates were estimated by rB–M ¼ davg=ð2 ϫ 310Þ substitutions per 80 ± Cetartiodactyla/Carn.+Perris. 83.0 4.0 (56) site per Myr, where d is the average pairwise sequence divergence between Scandentia 85.9 ± 11.5 (3) avg 90 Lagomorpha 90.8 ± 2.0 (119) bird (B) (Gallus gallus (Gga)) and (M) (Homo sapiens (Hsa), Mus Ferungulata 92.0 ± 1.3 (333) musculus (Mmu), and/or Rattus norvegicus (Rno)) sequences. In the absence of 100 Paenungulata 105 ± 6.6 (4) bird sequences, we used a mammalian calibration, based indirectly on the bird– mammal calibration, where the divergence time of rodents and primates (or 110 109 ± 3.2 (52)

Cretaceous 12 Sciurognathi 112 ± 3.5 (120) artiodactyls) of 110 Myr ago was obtained from an analysis of 108 genes; davg 120 Galliformes/Anseriformes 112 ± 11.7 (5) was computed by comparing Mmu, Rno, and Hsa (or Bos taurus) sequences. The divergence times estimated using the poisson correction distance (pre- MESOZOIC 130 Edentata 129 ± 18.5 (3) sented here) were similar to those estimated using the gamma distance (shape parameter ¼ 2) (ref. 28). 140 Estimating average divergence time from multiple genes and species. For each gene, the divergence time between two groups was estimated by taking the 150 Scale changes average of divergence times from all constant-rate species pairs belonging to Marsupialia 173 ± 12.3 (10) those groups (Fig. 1b). This procedure was repeated for every gene, and an Jurassic 200 Neobatrachia/Archeobatrachia 197 ± 43.2 (6) average multigene time estimate was calculated. Whenever five or more genes ± Tri. Aves/Crocodylia 222 52.5 (4) were present, the upper and lower 5% of these estimates were excluded (before 250 averaging) to minimize the influence of outliers (at least the highest and lowest Lepidosauria 276 ± 54.4 (5)

Perm. time estimates were excluded). Multigene distributions for the amphibian– 300 amniote (Fig. 2i) and actinopterygian–tetrapod divergences included a higher Aves 310 (Calibration) number of early time estimates. As gene orthology was more difficult to

Carbonif. 350 determine for those basal groups, we interpreted such unusually high time Lissamphibia 360 ± 14.7 (107) estimates as representing paralogous comparisons. Therefore, we used the

Dev.Sil. 400 mode to measure the central value because the mean is sensitive to such outliers. We discarded all estimates that were based on only one or two genes. PALAEOZOIC 450 ± Actinopterygii 450 35.5 (44) Received 14 November 1997; accepted 2 February 1998.

500 1. Wilson, A. C., Carlson, S. S. & White, T. J. Biochemical evolution. Annu. Rev. Biochem. 46, 573–639 (1977). Chondricthyes 528 ± 56.4 (15)

Cam. Ordovician 2. Nei, M. Molecular Evolutionary Genetics (Columbia Univ. Press, New York, 1987). 550 3. Novacek, M. J. Mammalian phylogeny: shaking the tree. Nature 356, 121–125 (1992). Agnatha 564 ± 74.6 (13)

PC. 4. Martin, R. D. Primate origins: plugging the gaps. Nature 363, 223–234 (1993). Prot. 5. Avise, J. C. Molecular Markers, Natural History and Evolution (Chapman & Hall, New York, 1994). 6. Hallam, A. An Outline of Phanerozoic Biogeography (Oxford Univ. Press, New York, 1994). 7. Easteal, S., Collet, C. & Betty, D. The Mammalian Molecular Clock (R. G. Landes, Austin, TX, 1995). a b 8. Gerhart, J. & Kirschner, M. Cells, Embryos, and Evolution (Blackwell Scientific, Malden, 600 150 Massachusetts, 1997). Cenozoic Mesozoic 9. Gee, H. Before the Backbone (Chapman & Hall, New York, NY, 1996). 500 120 10. Benton, M. J. The Fossil Record 2 (Chapman & Hall, London, 1993). 11. Zuckerkandl, E. On the molecular evolutionary clock. J. Mol. Evol. 26, 34–46 (1987). 400 90 12. Hedges, S. B., Parker, P. H., Sibley, C. G. & Kumar, S. Continental breakup and the ordinal diversification of birds and mammals. Nature 381, 226–229 (1996). 300 13. Takezaki, N., Rzhetsky, A. & Nei, M. Phylogenetic test of the molecular clock and linearized tree. Mol. 60 200 Biol. Evol. 12, 823–833 (1995). Fossil time (Myr) 14. Benton, M. J. Vertebrate Paleontology (Chapman & Hall, New York, 1997). 30 100 15. Archibald, J. D. Fossil evidence for a late Cretaceous origin of ‘‘hoofed’’ mammals. Science 272, 1150– 1153 (1996). 0 0 16. Kielan-Jaworowska, Z. Interrelationships of Mesozoic mammals. Hist. Biol. 6, 185–202 (1992). 0 100 200 300 400 500 600 0 30 60 90 120 150 17. Cooper, A. & Penny, D. Mass survival of birds across the Cretaceous-Tertiary boundary: molecular Molecular time (Myr) Molecular time (Myr) evidence. Science 275, 1109–1113 (1997). 18. Kay, R. F., Ross, C. & Williams, B. A. Anthropoid origins. Science 275, 797–804 (1997). Figure 4 Comparison of fossil-based and molecular estimates of divergence time 19. Ward, S. in Function, Phylogeny, and Fossils (eds Begun, D. R., Ward, C. V. & Rose, M. D.) 269–290 (Plenum, New York, 1997). ¼ in vertebrates. a, Plot of all values (correlation coefficient 99%). The solid line 20. Pilbeam, D. Genetic and morphological records of the Hominoidea and hominid origins: a synthesis. indicates a 1:1 relationship; the closed circle represents the calibration point. b, Mol. Phyl. Evol. 5, 155–168 (1996). Close-up of the region from 0 to 150 Myr ago. Open diamonds show fossil dates 21. Benefit, B. R. & McCrossin, M. L. Earliest known Old World monkey skull. Nature 388, 368–371 (1997). 15,16 based on the 85-Myr placental fossils from Asia , under the assumption that 22. Jaeger, J.-J., Tong, H. & Denys, C. The age of the Mus-Rattus divergence: paleontological data some represent stem (or basal) ferungulates and archontans. Fossil-based compared with the molecular clock. C. R. Acad. Sci. Paris 302, 917–922 (1986). 4,10,14,16,18,20,22,29 30 23. Parker, P. H. An Improved Estimate of the Mouse-Rat Divergence Time and Rates of Amino Acid estimates were calculated using a method described elsewhere . Substitution in Mammals and Birds Thesis, Pennsylvania State Univ. (1996). Myr, millions of years ago. 24. Carroll, R. L. Vertebrate Paleontology and Evolution (W. H. Freeman and Co., New York, 1988).

Nature © Macmillan Publishers Ltd 1998 NATURE | VOL 392 | 30 APRIL 1998 919 letters to nature

25. Springer, M. S. et al. Endemic African mammals shake the phylogenetic tree. Nature 388, 61–63 (1997). ProtoHox gene cluster. Furthermore, we show that amphioxus 26. Duret, L., Mouchiroud, D. & Gouy, M. HOVERGEN: a database of homologous vertebrate genes. ParaHox genes have co-linear developmental expression patterns Nucleic Acids Res. 22, 2360–2365 (1994). in anterior, middle and posterior tissues. We propose that the 27. Tajima, F. Simple methods for testing the molecular evolutionary clock hypothesis. Genetics 135, 599– 607 (1993). origin of distinct Hox and ParaHox genes by gene-cluster duplication 28. Kumar, S., Tamura, K. & Nei, M. MEGA: Molecular Evolutionary Genetic Analysis (Pennsylvania State facilitated an increase in body complexity during the Cambrian Univ., 1993). 29. Gheerbrant, E., Sudre, J. & Cappetta, H. A Paleocene proboscidean from Morocco. Nature 383, 68–70 explosion. (1996). Homeodomain sequence comparisons reveal that at least five 30. Benton, M. J. Phylogeny of the major tetrapod groups: morphological data and divergence dates. J. classes of homeobox genes are as closely related to Hox genes as Mol. Evol. 30, 409–424 (1990). many of the latter are to each other6. These are the Evx, Mox, Cdx Supplementary information is available on Nature’s World-Wide Web site (http://www.nature.com) or as paper copy from Mary Sheehan at the London editorial office of Nature. (or cad), Xlox, and Gsx homeobox classes (we term a class defined by mouse Gsh-1 and Gsh-2 as Gsx). The two mammalian Evx genes Acknowledgements. We thank L. Poling, A. Beausang, and R. Padmanabhan for assistance with sequence 6 data retrieval; A. Beausang for artwork; A. G. Clark, C. A. Hass, I. Jakobsen, M. Nei, C. R. Rao, and are each linked to the 5Ј end of Hox gene clusters , and a cnidarian A. Walker for comments and discussion; and L. Duret for instructions on use of the HOVERGEN Evx-like gene is linked to a Hox-like gene7, indicating that the close database. This work was supported in part by grants to M. Nei (NIH and NSF) and S.B.H. (NSF). sequence relationship between Evx and Hox genes reflects tandem Correspondence and requests for materials should be addressed to S.B.H. (e-mail: [email protected]). duplication. Mox genes may represent a similar case because the mouse Mox-1 gene maps to chromosome 11, close to the Hoxb cluster8. The Cdx, Xlox and Gsx gene families are more problematic. To investigate the evolutionary origins of Cdx, Xlox and Gsx The ParaHox gene cluster is genes, we elected to clone representatives of each gene family from amphioxus. This is because homeobox gene families in this an evolutionary sister of the are not complicated by either excessive duplication (as in vertebrates9) or divergence and rearrangement (as in Drosophila Hox gene cluster or nematode)6,10. Using primers directed to Hox class homeoboxes and amphioxus genomic DNA as template, amplification by poly- ` † Nina M. Brooke*, Jordi Garcia-Fernandez merase chain reaction (PCR) yielded partial clones of Cdx and Xlox & Peter W. H. Holland* class homeoboxes. A fragment of amphioxus Gsx was also cloned by * School of Animal and Microbial Sciences, University of Reading, Whiteknights, PCR, using primers designed from the two mammalian gene family PO Box 228, Reading RG6 6AJ, UK members Gsh-1 and Gsh-2. To determine the complete homeobox † Departament de Gene`tica, Facultat de Biologia, Universitat de Barcelona, sequence of each gene, we isolated longer clones from amphioxus Av. Diagonal 645, 08028 Barcelona, Spain genomic libraries: only single members of each class were obtained, ...... which we named AmphiCdx, AmphiXlox and AmphiGsx. Their Genes of the Hox cluster are restricted to the animal kingdom and encoded homeodomains resemble those of the Drosophila or play a central role in axial patterning in divergent animal phyla1. vertebrate homologues (Fig. 1). Despite its evolutionary and developmental significance, the Analysis of genomic clones revealed that amphioxus Xlox and origin of the Hox gene cluster is obscure. The consensus is that Cdx class homeoboxes were unexpectedly contained within a single a primordial Hox cluster arose by tandem gene duplication close bacteriophage clone. Mapping indicated that the homeoboxes were to animal origins2–5. Several homeobox genes with high sequence separated by just 7.5 kilobases (kb). Furthermore, using genomic identity to Hox genes are found outside the Hox cluster and are walking we found that these two homeobox genes are physically known as ‘dispersed’ Hox-like genes; these genes may have been linked to the AmphiGsx gene. The Gsx and Xlox homeoboxes are transposed away from an expanding cluster6. Here we show that separated by just 25 kb (Fig. 2). We designate this tight cluster of three of these dispersed homeobox genes form a novel gene cluster three genes the ParaHox gene cluster. in the cephalochordate amphioxus. We argue that this ‘ParaHox’ The finding that amphioxus Gsx, Xlox and Cdx class genes form a gene cluster is an ancient paralogue (evolutionary sister) of the novel homeobox cluster challenges the idea that these homeobox Hox gene cluster; the two gene clusters arose by duplication of a gene classes are ‘dispersed’ Hox genes. To reconcile linkage in

25kb 7.5kb

GSX XLOX CDX 5kb N R 625 XLOX CDX

654 R

R 690 R R

H H H MPMGc117 H04 95 HHHH H H GSX

Figure 1 Homeodomains of amphioxus Cdx, Xlox and Gsx genes aligned to Figure 2 Genomic organization of amphioxus Gsx, Xlox and Cdx genes, showing mouse (m), Drosophila (D), nematode (Ce), Xenopus (XI) and leech (Htr) genomic clones used in walking. Arrows denote transcriptional orientation. R, homologues. The mouse Cdx2 gene is the probable orthologue of human EcoRI, site and N, Not I site, mapped in bacteriophage clones only; H, HindIII sites CDX3. Dashes indicate identical amino acids. mapped in cosmid only.

Nature © Macmillan Publishers Ltd 1998 920 NATURE | VOL 392 | 30 APRIL 1998