Elephant shark ( milii) provides insights into the evolution of Hox gene clusters in gnathostomes

Vydianathan Ravi, Kevin Lam, Boon-Hui Tay, Alice Tay, Sydney Brenner1, and Byrappa Venkatesh1

Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore 138673

Contributed by Sydney Brenner, July 15, 2009 (sent for review March 20, 2009) We have sequenced and analyzed Hox gene clusters from elephant organized. However, the number of Hox clusters and Hox genes shark, a holocephalian cartilaginous fish. Elephant shark possesses varies among vertebrates. Mammals and other tetrapods contain 4 Hox clusters with 45 Hox genes that include orthologs for a 4 Hox clusters (HoxA, HoxB, HoxC, and HoxD) resulting from higher number of ancient gnathostome Hox genes than the 4 2 rounds of genome duplication early during the evolution of clusters in tetrapods and the supernumerary clusters in vertebrates. Most teleost fishes contain 7 Hox clusters owing to fishes. Phylogenetic analysis of elephant shark Hox genes from 7 a fish-specific genome duplication event in the ray-finned fish paralogous groups that contain all of the 4 members indicated an lineage followed by the loss of 1 duplicate cluster. Whereas a ((AB)(CD)) topology for the order of Hox cluster duplication, duplicate HoxD cluster is lost in zebrafish, acanthopterygians providing support for the 2R hypothesis (i.e., 2 rounds of whole- such as fugu, medaka, and cichlids have lost a duplicate HoxC genome duplication during the early evolution of vertebrates). cluster. In addition, several Hox genes have experienced lineage- Comparisons of noncoding sequences of the elephant shark and specific secondary losses resulting in each of these human Hox clusters have identified a large number of conserved possessing a unique set of Hox genes (5, 15). In contrast to these noncoding elements (CNEs), which represent putative cis-regula- teleosts, Atlantic salmon contains as many as 13 Hox clusters tory elements that may be involved in the regulation of Hox genes. owing to an additional genome duplication in a salmonid an- EVOLUTION Interestingly, in fugu more than 50% of these ancient CNEs have cestor (16). Although lampreys and hagfishes are known to diverged beyond recognition in the duplicated (HoxA, HoxB, and contain multiple Hox clusters, the exact number of Hox clusters HoxD) as well as the singleton (HoxC) Hox clusters. Furthermore, in these primitive jawless vertebrates is not known. PCR-based the b-paralogs of the duplicated fugu Hox clusters are virtually surveys have indicated that lampreys may contain 3 or 4 Hox devoid of unique ancient CNEs. In contrast to fugu Hox clusters, clusters (17, 18), and hagfishes may contain as many as 7 Hox elephant shark and human Hox clusters have lost fewer ancient clusters (19). However, phylogenetic analyses suggest that some CNEs. If these ancient CNEs are indeed enhancers directing tissue- of the clusters are the result of lineage-specific cluster duplica- specific expression of Hox genes, divergence of their sequences in tions (19, 20), and thus the number of Hox clusters in the stem vertebrate lineages might have led to altered expression patterns jawless-vertebrate lineage is unclear. and presumably the functions of their associated Hox genes. Cartilaginous fishes are the most basal group of living jawed vertebrates (gnathostomes). They comprise 2 groups, the elas- conserved noncoding elements ͉ gene loss ͉ genome duplication ͉ teleost mobranchs (sharks, rays, and skates) and holocephalians (chi- fish ͉ fugu maera) that diverged Ϸ374 million years ago (21). By virtue of their phylogenetic position, cartilaginous fishes are a critical ox genes encode homeodomain-containing transcription outgroup for studying the evolution of Hox gene clusters in Hfactors that specify the identities of body segments along the gnathostomes. To date, a complete HoxA cluster and a partial anterior–posterior axis of metazoans. Hox genes occur in clus- HoxD cluster (HoxD5 to HoxD14) have been characterized in an ters that were generated by a series of tandem duplications of elasmobranch cartilaginous fish, the horn shark (Heterodontus ancestral gene(s) before the divergence of cnidarians and bila- francisci) (22, 23). Comparisons of the single HoxA cluster in terians (1). The Hox clusters exhibit a striking phenomenon of Ј horn shark and human with duplicated HoxA clusters in ze- spatial collinearity whereby genes in the 3 -end of the cluster are brafish have identified many noncoding elements conserved in expressed in the anterior part of the embryo, whereas those in horn shark and human over Ϸ450 million years of evolution (24). the 5Ј-end are expressed in the posterior part. In vertebrates, These elements are likely to be cis-regulatory elements that are Hox cluster genes also exhibit temporal collinearity; that is, under evolutionary constraint. genes located in the 3Ј-end of the cluster are expressed early We recently identified elephant shark (Callorhinchus milii), a during development, whereas genes in the 5Ј-end are expressed holocephalian cartilaginous fish, as a useful model cartilaginous later (2). Recent comparative studies of Hox clusters in model fish genome because of its relatively small genome (910 Mb) and genomes have shown that Hox clusters have experienced re- generated 1.4ϫ coverage sequence of its genome (25). Analysis peated molecular changes, including fragmentation, cluster du- Ϸ plication, gene loss, coding-sequence divergence, and cis- of these sequences, which represent 75% of the genome, regulatory element evolution (3–7). Because of their critical role identified 37 Hox genes belonging to putative 4 Hox clusters (25). in defining the identities of body segments, Hox genes are believed to have played a key role in driving the morphologic Author contributions: S.B. and B.V. designed research; V.R., K.L., B.-H.T., and A.T. per- diversification of (8–10) and thus are of particular formed research; V.R. and B.V. analyzed data; and V.R. and B.V. wrote the paper. interest in understanding the genetic basis of morphologic The authors declare no conflict of interest. diversity of metazoans. Data deposition: The sequences reported in this paper have been deposited in the GenBank Among , the cephalochordate amphioxus possesses database (accession nos. FJ824598 to FJ824601). a single Hox cluster (11, 12). In urochordates such as Ciona and 1To whom correspondence may be addressed. E-mail: [email protected] Oikopleura, the single cluster is highly fragmented and dispersed or [email protected]. in the genome (13, 14). In contrast to these invertebrate chor- This article contains supporting information online at www.pnas.org/cgi/content/full/ dates, vertebrates contain multiple Hox clusters that are tightly 0907914106/DCSupplemental.

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907914106 PNAS Early Edition ͉ 1of6 Downloaded by guest on October 3, 2021 Fig. 1. Hox cluster loci in elephant shark. Genes are represented as colored boxes, and arrows denote the direction of transcription. Pseudogenes are denoted by the symbol ⌿. The bars below represent repetitive sequences predicted using CENSOR (see Methods).

However, because of the low coverage of the genome, most Hox vertebrates (26). Because transposable elements are a major genes were in fragments, and the number and organization of source of genetic diversity, it has been suggested that the Hox clusters could not be confirmed. In the present study we insertion of retrotransposons into the Hox clusters may have have sequenced BAC clones and determined complete se- been associated with the evolution of morphologic diversity of quences of the Hox gene clusters in elephant shark. We have also reptiles (26). It would be interesting to investigate whether the analyzed the pattern of evolution of conserved noncoding ele- insertion of retrotransposons into the HoxC cluster of elephant ments (CNEs) in the Hox clusters of elephant shark, human, and shark has modified the regulation and function of its HoxC a teleost fish (fugu). Our study confirms that elephant shark has genes. 4 Hox clusters like tetrapods, but the 4 clusters contain more Hox The 4 Hox clusters in elephant shark contain 45 Hox genes, genes than tetrapods. In contrast to elephant shark and human which is more than the genes present in the 4 Hox clusters in Hox clusters that have lost only a small number of ancient CNEs, tetrapods. Furthermore, some of the genes present in the more than 50% of ancient CNEs have diverged beyond recog- elephant shark Hox clusters have been lost in the supernumerary nition in the Hox clusters of fugu. Hox clusters in teleost fishes. Thus, elephant shark contains more ancient gnathostome Hox genes than any known bony Results and Discussion vertebrate. In addition to 45 intact Hox genes, the elephant shark Hox Clusters in Elephant Shark. Using probes for selected Hox Hox clusters contain remnants of 2 Hox genes, HoxA14 and genes identified from the 1.4ϫ coverage genome sequence of HoxB14. An inactive HoxA14 gene has been previously identified elephant shark, we isolated BACs and determined sequences for in the horn shark (23), but this is a unique instance of a HoxB14 4 Hox cluster loci in elephant shark (Fig. 1). All of the 37 Hox pseudogene in vertebrates. The presence of remnants of a gene fragments identified in the 1.4ϫ coverage sequence map to HoxA14 gene in elephant shark and horn shark, which separated these loci, strongly suggesting that elephant shark contains only Ϸ374 million years ago (21), indicates that this gene was 4 Hox clusters (HoxA–D). The elephant shark Hox cluster loci functional for a long time in the 2 lineages and became inactive display a distinct pattern of repetitive sequences within and relatively recently. In addition to the Hox pseudogenes, we also outside the Hox gene clusters (Fig. 1); whereas the Hox clusters identified an Evx pseudogene closely linked to the HoxB cluster. contain only 3.7% repetitive sequences, their flanking regions The HoxA and HoxD clusters of elephant shark and other contain 26% repetitive sequences. This pattern suggests that gnathostomes each contain an intact Evx gene (called Evx1 and repetitive sequences have been selectively excluded from the Evx2, respectively). However, no Evx gene has been identified in elephant shark Hox clusters. Interestingly, the density of repet- the HoxB clusters of tetrapods, although HoxB clusters of some itive sequences in the Hox clusters of elephant shark is similar to teleost fishes contain an intact Evx homolog, called Eve (15, 27). that in the single Hox cluster in amphioxus (3.9%), yet the The identification of remnants of an Evx in the HoxB cluster of elephant shark Hox clusters are 3 to 4 times smaller than the elephant shark suggests that this gene was independently lost in amphioxus Hox cluster (Ϸ448 kb) (11). The exclusion of repet- elephant shark and tetrapods. The elephant shark Hox clusters itive sequences, therefore, does not seem to be responsible for also contain 6 microRNA genes (Fig. 1). One of the microRNAs, the compact sizes of the elephant shark Hox clusters. mir-10c located between HoxC5 and HoxC4, is lost in mammals The sizes of the elephant shark Hox clusters are comparable (28), whereas mir-196a-1 (between HoxB10 and HoxB9) is lost in to their orthologs in humans except for the HoxC cluster, which medaka (5). is considerably larger (Ϸ172 kb) than its ortholog in humans On the basis of the intact and inactivated Hox genes identified in (Ϸ117 kb) (supporting information Fig. S1). The large size of the the Hox clusters of elephant shark, we can infer that the last elephant shark HoxC cluster is mainly due to the presence of common ancestor of cartilaginous fishes and bony vertebrates HoxC3 and HoxC1 genes that are absent in mammals and a contained 4 Hox clusters with at least 47 Hox genes (Fig. 2). relatively higher content of repetitive sequences (9.5%). A Although all except 2 of these genes have been retained in elephant majority of these repetitive sequences (6.8%) are retrotrans- shark, several of these ancient gnathostome Hox genes have expe- posons and are found in the 3Ј end of the cluster (Fig. 1). A rienced differential losses in various bony vertebrate lineages at similar high concentration of retrotransposons (6.7%) has been various stages of evolution. For example, 3 of the ancient gnatho- recently reported in the 4 Hox clusters of the anole lizard, which stome genes, HoxD2, HoxD5, and HoxD14, were lost in a common are unusually larger (Ϸ173–324 kb) than their orthologs in other ancestor of bony vertebrates (Fig. 2 and Fig. S2), whereas HoxB7

2of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907914106 Ravi et al. Downloaded by guest on October 3, 2021 Fig. 2. Hox gene clusters in elephant shark, human, and fugu. Loss of ancestral gnathostome Hox genes is indicated. Star denotes whole-genome duplication. HoxA14 is present in coelacanth and is therefore marked as independently lost in human and fugu. HoxB10, HoxC1, and HoxC3 are indicated as lost independently in human and fugu because these genes are present in some teleost fishes (5). HoxA14 and HoxB14 are pseudogenes in elephant shark and are therefore marked as independently lost in the elephant shark lineage. Also see Fig. S2.

was lost independently in the pufferfish, medaka, and tilapia (BA), and neighbor-joining (NJ). The first 2 are model-based lineages relatively recently (Fig. S2). Between the tetrapod and methods, whereas the latter is a distance-based method. Each of teleost lineages, more ancient Hox genes have experienced losses in these methods has its own strengths and limitations. Interest- the teleost lineage (10 genes) than in the tetrapod lineage (6 genes) ingly, all 3 methods gave an ((AB)(CD)) topology with moderate EVOLUTION (Fig. S2). It is not clear whether these ancient genes were redundant to high support values (Fig. 3). This congruence in the inferred genes that were retained for varying periods and then lost, or topology between the 3 methods provides strong support for the whether they performed some unique functions that have been ((AB)(CD)) topology. We then tested the 15 possible cluster perturbed in the lineages that have lost them. There is evidence to duplication topologies using CONSEL (36) to determine suggest that 1 of the ancient Hox genes, HoxD14, which has been whether the topology inferred by the phylogenetic analysis was lost in bony vertebrates, had a unique expression pattern in carti- significantly better than the other topologies. All of the tests laginous fishes; it is expressed in a restricted cell population in the implemented in CONSEL indicated that the ((AB)(CD)) topol- hindgut, but not in the central nervous system, somites, or fin ogy was the most likely topology (Table S1). This well-supported buds/folds that are known to express Hox genes (29). Consequent symmetric topology provides strong support for the 2R hypoth- to its loss in bony vertebrates, the function performed by this gene esis. in cartilaginous fishes must have been perturbed in bony verte- brates. Examination of the expression patterns and functions of CNEs in Elephant Shark and Human Hox Loci. CNEs are useful tools other elephant shark Hox genes that have been lost in various bony for discovering functional elements in the noncoding regions of vertebrate lineages should help in understanding the consequences human and other vertebrate genomes. Indeed, functional assay of their losses. of CNEs identified in distantly related vertebrates such as human and teleost fishes has shown that many of them function as Order of Hox Cluster Duplications. Although it is widely accepted transcriptional enhancers directing tissue-specific expression that there were 2 rounds of genome duplication during the early (37, 38). Comparisons of Hox clusters in human and teleost fishes evolution of vertebrates (the 2R hypothesis), there has been have identified many CNEs within the Hox clusters (24, 27, 39, conflicting evidence that supports scenarios such as one round 40) that were shared by the common ancestors of these bony of duplication followed by large-scale segmental duplications vertebrates. Because cartilaginous fishes reside at the base of the (30, 31) and only segmental duplications without genome du- gnathostome phylogenetic tree, comparisons of cartilaginous plication (32). Phylogenetic analyses of Hox genes have been fishes and bony vertebrates should help in defining more ancient used in the past to resolve this debate. If the 4 Hox clusters are CNEs that existed in the common ancestor of gnathostomes. the result of 2R, the phylogeny of the genes from the 4 clusters Previously many CNEs have been identified in the HoxA cluster should show a symmetric topology. However, none of the studies has yielded a symmetric topology with strong statistical support (33, 34). We hypothesized that the Hox genes in elephant shark are good candidates to resolve this debate for 2 reasons: (i) the coding sequences in elephant shark have been found to evolve at a slower rate than in other vertebrates (35), and hence the degree of variation and divergence between Hox paralogs will be lower compared with other vertebrates; and (ii) in elephant shark, 7 of the 14 paralogous Hox groups (Hox1, -3, -4, -5, -9, -10, and -13) are represented by all of the 4 members, the highest in any vertebrate characterized to date. The phylogenetic analysis of Fig. 3. Phylogenetic model for the order of Hox cluster duplication. The model was inferred by phylogenetic analysis of elephant shark (Cm) Hox genes these 7 paralogous group Hox genes should generate a robust from paralogous groups 1, 3, 4, 5, 9, 10, and 13 that contain all of the 4 phylogeny. We concatenated the coding sequence alignments of members (A, B, C, and D) using amphioxus (Amphi) Hox genes as outgroup. the 7 paralogous groups and carried out phylogenetic analysis Circles at the nodes denote Hox cluster duplication. Branch-support values are using amphioxus as the outgroup. We used 3 different methods shown on top of the branch as ML bootstrap percentage/BA posterior prob- of phylogenetic inference: maximum likelihood (ML), Bayesian ability/NJ bootstrap percentage.

Ravi et al. PNAS Early Edition ͉ 3of6 Downloaded by guest on October 3, 2021 Table 1. CNEs in the elephant shark and human Hox clusters, and in the elephant shark and fugu Hox clusters Elephant shark and human Elephant shark and fugu

Total length, kb Total length, kb Hox cluster No. of CNEs (average length, bp) Fugu Hox cluster No. of CNEs (average length, bp)

HoxA 70 8.37 (120) HoxAa 35 3.71 (106) HoxAb 14 1.22 (87) HoxB 40 4.36 (109) HoxBa 27 2.32 (86) HoxBb 7 0.67 (95) HoxC 23 2.53 (110) HoxCa 24 2.56 (107) HoxD 46 4.99 (108) HoxDa 25 2.37 (95) HoxDb 6 0.34 (56)

of horn shark and human (22). Interestingly, several of these (Fig. 4). This pattern suggests that CNEs associated with 3Ј Hox ancient CNEs were found to be absent in the duplicated HoxA genes, which express in the anterior regions of the developing clusters of zebrafish (24). To identify ancient CNEs conserved in embryo, are highly conserved during evolution, and presumably the Hox cluster loci of elephant shark, human, and teleost fishes, their expression is tightly regulated. Conversely, the expression we aligned each elephant shark Hox cluster locus (note that it patterns of 5Ј Hox genes, which express in the posterior seg- includes some genes linked to the Hox cluster; Fig. 1) with its ments, might be divergent in elephant shark and human. A orthologous sequence in human and fugu using SLAGAN (41) similar pattern of CNEs across the Hox clusters was previously and predicted CNEs using VISTA (42). The VISTA plots of the observed in the HoxA clusters of horn shark and human (40) and 4 Hox clusters are given in Fig. S3, and details of the CNEs are among the Hox clusters of teleost fishes (5). In contrast to elephant shark and human HoxA, HoxB, and HoxD clusters, given in Table S2. their HoxC clusters contain a low density of CNEs, and the The HoxD and HoxA loci of elephant shark and human density of CNEs in the HoxC cluster does not show an increase contain 105 CNEs (total length 18.3 kb) and 96 CNEs (11.4 kb), toward the 3Ј end (Fig. 4). As noted above, the intergenic regions respectively. HoxB and HoxC loci, on the other hand, contain 48 in the elephant shark HoxC cluster, particularly in the 3Ј end of CNEs (5.0 kb) and 25 CNEs (2.6 kb), respectively (Table S2). the cluster, are unusually large owing to the accumulation of The higher number of CNEs in the HoxD locus is largely due to retrotransposons. The divergence of CNEs in the elephant shark a set of highly conserved CNEs (51 CNEs, 11.6 kb) located 5Ј in and human HoxC clusters could be a consequence of the the locus, upstream of the Hox cluster (Fig. S3). The corre- insertion of retrotransposons into the elephant shark HoxC sponding regions in HoxA, -B, and -C loci are virtually devoid of cluster. CNEs (Fig. S3). A considerable number of cis-regulatory elements that play a Pattern of CNE Evolution in Hox Clusters of Fugu, Human, and Elephant role in the tissue-specific expression of Hox genes have been Shark. Following the duplication of Hox clusters in the teleost previously identified in vertebrate Hox clusters and their func- lineage, fugu has lost a complete duplicate HoxC cluster and tions verified in transgenic assays (Table S3). In a number of studies, putative transcription factor binding sites that are in- volved in the tissue-specific expression have also been identified. To determine whether these functionally verified enhancers overlap elephant shark–human CNEs, we compared the human genomic coordinates of the enhancers with those of the elephant shark–human CNEs. Of the 35 functionally verified enhancers that we extracted from literature, 26 enhancers (74%) show overlap with elephant shark–human CNEs. Furthermore, most of the putative transcription factor binding sites identified in these enhancers are conserved in the elephant shark CNEs (Table S3), indicating that CNEs represent the core regions of enhancers. The overlap of a majority of the functionally verified Hox enhancers with elephant shark–human CNEs suggests that a large number of CNEs identified in the elephant shark and human Hox clusters represent cis-regulatory elements that play a role in the expression of Hox genes. To assess the pattern of evolution of CNEs in the intergenic regions of Hox genes, we compared the distribution of CNEs within only the Hox clusters (i.e., 5 kb upstream of the 5Ј-most Hox gene to 5 kb downstream of the 3Ј-most Hox gene). The elephant shark and human HoxA clusters contain the highest number of CNEs (70 CNEs, total length 8.4 kb), followed by HoxD clusters (46 CNEs, 5.0 kb). HoxB clusters contain 40 CNEs (4.4 kb), whereas HoxC clusters contain only 23 CNEs (2.5 kb) (Table 1). This variation in the CNE content indicates that CNEs in paralogous Hox clusters have been subject to different Fig. 4. Total length and density of elephant shark–human CNEs in the intergenic regions of elephant shark Hox genes. Names of intergenic regions patterns of evolutionary constraints. The density of CNEs also of Hox genes are given along the x axis. Arrows represent 5-kb flanking shows variation across each Hox cluster. The density of CNEs in sequences included for this analysis. Note that Hox genes B10, C1, C3, D2, D5, the 5Ј end of HoxA, HoxB, and HoxD clusters is generally low and D14 are absent in human. #The high values are due to very short intergenic and shows an increasing trend toward the 3Ј end of the clusters regions (Ͻ2.5 kb).

4of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907914106 Ravi et al. Downloaded by guest on October 3, 2021 retained 45 Hox genes in the remaining 7 Hox clusters (Fig. 2). divergent enhancers contained redundant transcription factor Interestingly, elephant shark and fugu Hox clusters contain binding sites, the expression pattern of their associated genes fewer CNEs than elephant shark and human Hox clusters, and could still be maintained by other binding sites. Another possi- the elephant shark–fugu CNEs are shorter (average length 90 bility is that the loss of transcription factor binding sites in one bp) than elephant shark–human CNEs (average length 113 bp) location might be compensated by de novo evolution of binding (Table 1). In fact, more than 50% of CNEs conserved in elephant sites in another location, as has been demonstrated for the stripe shark and human Hox clusters are not identifiable in the 2 enhancer of even-skipped gene in Drosophila (44). Notwith- duplicated Hox clusters (85 of 156 CNEs from HoxA, -B, and -D standing these possibilities, several studies have shown that clusters) as well as in the single HoxC cluster (12 of 23 CNEs) changes in cis-regulatory sequences, ranging from subtle differ- of fugu (Table S2). This indicates that a majority of ancient ences to extensive restructuring, can account for the altered CNEs that are conserved in elephant shark and human have expression pattern of Hox genes (6, 7, 45–47). For example, the diverged beyond recognition or have been lost in fugu Hox duplicate fugu HoxA2a and HoxA2b genes show distinct expres- clusters, irrespective of whether the cluster is retained in dupli- sion patterns in the hindbrain. Detailed functional assay of the cate or as a singleton. Interestingly, among the 71 human CNEs regulatory elements of these genes has shown that a subtle drift that are conserved in the duplicated Hox clusters of fugu, a in the subset of cis-regulatory elements is responsible for their majority is found in the a-paralogous clusters. Whereas 12 of differential expression patterns (7). Another interesting example these CNEs are conserved in both fugu a- and b-paralogous is the HoxC8 early enhancer, a 200-bp region that plays a role in clusters, 57 are conserved in a-paralogous clusters, and only 2 the early phase of mouse HoxC8 expression in neural tube and CNEs are conserved in a fugu b-paralogous cluster (HoxBb mesoderm. This enhancer is found in different vertebrate lin- cluster) (Table S2). This skewed pattern of retention of unique eages, including chicken and fishes. Interestingly, whereas the 5Ј CNEs between fugu a- and b-paralogous clusters indicates that region of the enhancer is highly conserved in mouse, fugu, and fugu b-paralogous clusters have not only lost a large number of zebrafish, the 3Ј region is divergent in the 2 teleosts, and trans- Hox genes (altogether only 12 Hox genes are retained; see Fig. genic mouse assays have shown that the mouse, fugu, and 2) but have also lost a substantial number of ancient CNEs zebrafish enhancers confer distinct patterns of reporter gene compared with a-paralogous clusters. expression (45). These data indicate that the divergence of short In contrast to 97 ancient CNEs (total length 8.7 kb) lost in the stretches of sequences within a highly conserved enhancer also fugu Hox clusters, human Hox clusters have lost only 43 ancient has the potential to modify the expression pattern of enhancers. EVOLUTION CNEs (total length 2.9 kb) that are conserved in elephant shark Therefore, it is possible that the divergence of ancient Hox CNEs and fugu Hox clusters (Table S2). To assess the extent of ancient that represent putative cis-regulatory elements might have led to CNEs that have been lost in elephant shark Hox clusters, we altered expression patterns of their associated Hox genes. Inci- generated SLAGAN alignments of orthologous human, fugu, dentally, the 3Ј region of the mouse HoxC8 early enhancer and elephant shark Hox clusters using human as the reference overlaps an elephant shark–human CNE (HE-HoxC࿝CNE10) sequence and predicted CNEs conserved in human–fugu and identified in our study, and in contrast to the divergent 3Ј regions human–elephant shark Hox clusters (Fig. S4). A total of 39 of fugu and zebrafish, the elephant shark sequence is highly CNEs (total length 2.8 kb) conserved in human and fugu are not similar to the mouse and human sequences (Fig. S5). Thus, this identifiable in elephant shark Hox clusters (Table S4). Although enhancer seems to be an ancient gnathostome enhancer that has this number is comparable to the number of ancient CNEs lost diverged considerably in the teleost lineage. Functional assay of in human Hox clusters, it should be noted that some of these this and other conserved elephant shark CNEs in comparison CNEs could have evolved de novo in the stem bony vertebrate with their divergent counterparts in bony vertebrates should lineage. Overall, these comparisons show that fugu Hox clusters indicate whether the divergent CNEs have indeed resulted in have lost far more ancient CNEs than the human and elephant altered expression patterns of their associated Hox genes. shark Hox clusters. Conservation of noncoding sequences is a powerful strategy Methods for identifying cis-regulatory elements directing tissue-specific Identification of BACs. The DNA prepared from 92,160 clones in an elephant expression of developmental genes. Functional assay of CNEs shark BAC library (IMCB࿝Eshark BAC library) were pooled in 3 dimensions and associated with developmental genes has shown that many of used for identifying BAC clones by 3-step PCR screening. The 1.4ϫ coverage them function as tissue-specific enhancers (37, 38). In a recent sequence of the elephant shark genome contained fragments for 37 Hox study in which enhancers were predicted in mouse embryonic genes (25). PCR primers were initially designed for 4 of the fragments (HoxA1, forebrain, midbrain, and limb tissues on the basis of ChIP assay HoxB5, HoxC8, and HoxD5) representing each of the 4 Hox clusters and used for the enhancer-associated p300 protein, approximately 90% of to identify representative BAC clones (158C9, 48D4, 161E22, and 46N13, the p300-binding regions were found to overlap CNEs (43). respectively). These BACs were sequenced completely. BACs that overlapped the 5Ј end sequence of these BACs were identified by PCR screening and Thus, CNEs are a useful signature for discovering enhancers sequenced completely. Altogether, 2 BACs each were sequenced to obtain the associated with developmental genes. A considerable amount of sequences of HoxA (55E21 and 158C9) and HoxC (205I1 and 161E22) loci, work has been done on the identification and characterization of whereas 3 BACs each were sequenced to determine the HoxB (235L21, 91I24, regulatory regions involved in tissue-specific expression of ver- and 48D4) and HoxD (28O18, 41O12, and 46N13) loci sequences. tebrate Hox genes, and functions of about 35 enhancers have been verified in transgenic assays (Table S3). We found that a Sequencing and Assembly of BACs. BAC clones were sequenced using the majority of these functionally verified enhancers (74%) overlaps standard shotgun sequencing method, and gaps were filled by PCR amplifi- CNEs in elephant shark and human Hox clusters. This provides cation and primer walking. Sequencing was done using the BigDye Termina- support to the hypothesis that a large number of CNEs identified tor Cycle Sequencing Kit (Applied Biosystems). Sequences were processed in the elephant shark and human Hox clusters represent en- and assembled using Phred-Phrap and Consed (http://www.phrap.org/ hancers. If these CNEs indeed represent enhancers, our obser- phredphrapconsed.html). vation that many ancient CNEs have diverged beyond recogni- Sequence Annotation. Repetitive sequences were identified and masked using tion in fugu and human raises the question whether the divergent CENSOR at default settings (http://www.girinst.org/censor/index.php). Pro- CNEs have resulted in altered expression patterns and functions tein-coding and microRNA genes were predicted using a combination of ab of their associated Hox genes. However, it should be noted that initio (e.g., FGENESH) and homology-based methods. Sequences for human divergent enhancers need not necessarily lead to altered expres- and fugu genes were extracted from the University of California Santa Cruz sion pattern of genes associated with them. For example, if the Genome Browser (http://genome.ucsc.edu/). Our homology-based searches

Ravi et al. PNAS Early Edition ͉ 5of6 Downloaded by guest on October 3, 2021 identified remnants of HoxA14 gene in the HoxA cluster and remnants of an used MrBayes 3.1.2 (http://mrbayes.csit.fsu.edu/). Two independent runs start- Evx gene and HoxB14 gene in HoxB cluster. We also found fragments of an Lnp ing from different random trees were run for 100,000 generations with gene in HoxC cluster with high similarity to 5 of the 11 exons (exons 3, 6, 8, 9, sampling every 100 generations. A consensus tree was built from all sampled and 10) of the Lnp gene in HoxD cluster. Because we could not identify any trees excluding the first 250 (burn-in). In addition, a NJ tree was also generated other exons, we annotated the Lnp gene in the HoxC cluster as a pseudogene. using MEGA 4.0 (http://www.megasoftware.net/mega.html) with 1,000 boot- strap replicates. CONSEL (36) was used to evaluate the 15 possible tree topologies for the Hox cluster duplications using the same alignment. Order of Hox Cluster Duplications. Predicted amino acid sequences of elephant shark Hox genes from paralogous groups Hox1, -3, -4, -5, -9, -10, and -13 were aligned with their orthologous sequences from amphioxus using CLUSTALW Identification and Analysis of CNEs. Multiple alignments of elephant shark, human, and fugu Hox cluster loci sequences were generated using the ‘‘glo- as implemented in BioEdit sequence alignment editor (http://www.mbio. cal’’ alignment program SLAGAN (41), with either elephant shark or human as ncsu.edu/BioEdit/BioEdit.html). A codon-based alignment of nucleotide se- the reference sequence. CNEs were predicted using a cut-off of Ն65% identity quences was generated on the basis of the amino acid alignment using across Ͼ50 bp windows and visualized using VISTA (42). CNEs that were PAL2NAL (http://coot.embl.de/pal2nal/). Individual alignments for the paralo- shorter than 50 bp were excluded from the final set of CNEs. Human or gous groups were concatenated, and third codon positions were excluded. elephant shark CNEs that showed a minimum overlap of 20 bp with fugu CNEs The best-fit substitution model for this alignment was deduced by ModelGen- are considered as conserved in fugu. erator (http://bioinf.may.ie/software/modelgenerator/). ML and BA methods were used for phylogenetic analyses using the best-fit substitution model. ACKNOWLEDGMENTS. We thank Masaki Miya for help in phylogenetic anal- PHYML (http://www.atgc-montpellier.fr/phyml/) was used for ML analyses ysis. This work is supported by the Biomedical Research Council of the A*STAR, and 100 nonparametric bootstrap replicates were used. For BA analyses, we Singapore.

1. Ryan JF, et al. (2007) Pre-bilaterian origins of the Hox cluster and the Hox code: 26. Di-Poi N, Montoya-Burgos JI, Duboule D (2009) Atypical relaxation of structural con- Evidence from the sea anemone, Nematostella vectensis. PLoS ONE 2:e153. straints in Hox gene clusters of the green anole lizard. Genome Res 19:602–610. 2. Kmita M, Duboule D (2003) Organizing axes in time and space; 25 years of colinear 27. Lee AP, Koh EG, Tay A, Brenner S, Venkatesh B (2006) Highly conserved syntenic blocks tinkering. Science 301:331–333. at the vertebrate Hox loci and conserved regulatory elements within and outside Hox 3. Duboule D (2007) The rise and fall of Hox gene clusters. Development 134:2549–2560. gene clusters. Proc Natl Acad Sci USA 103:6994–6999. 4. Garcia-Fernandez J (2005) The genesis and evolution of homeobox gene clusters. Nat 28. Tanzer A, Amemiya CT, Kim CB, Stadler PF (2005) Evolution of microRNAs located Rev Genet 6:881–892. within Hox gene clusters. J Exp Zoolog B Mol Dev Evol 304:75–85. 5. Hoegg S, Boore JL, Kuehl JV, Meyer (2007) Comparative phylogenomic analyses of 29. Kuraku S, et al. (2008) Noncanonical role of Hox14 revealed by its expression patterns teleost fish Hox gene clusters: Lessons from the cichlid fish Astatotilapia burtoni. BMC in lamprey and shark. Proc Natl Acad Sci USA 105:6679–6683. Genomics 8:317. 30. Gu X, Wang Y, Gu J (2002) Age distribution of human gene families shows significant 6. Ray R, Capecchi M (2008) An examination of the Chiropteran HoxD locus from an roles of both large- and small-scale duplications in vertebrate evolution. Nat Genet evolutionary perspective. Evol Dev 10:657–670. 7. Tumpel S, Cambronero F, Wiedemann LM, Krumlauf R (2006) Evolution of cis elements 31:205–209. in the differential expression of two Hoxa2 coparalogous genes in pufferfish (Takifugu 31. McLysaght A, Hokamp K, Wolfe KH (2002) Extensive genomic duplication during early rubripes). Proc Natl Acad Sci USA 103:5419–5424. evolution. Nat Genet 31:200–204. 8. Carroll SB, Weatherbee SD, Langeland JA (1995) Homeotic genes and the regulation 32. Friedman R, Hughes AL (2003) The temporal distribution of gene duplication events in and evolution of insect wing number. Nature 375:58–61. a set of highly conserved human gene families. Mol Biol Evol 20:154–161. 9. Holland PW, Garcia-Fernandez J (1996) Hox genes and chordate evolution. Dev Biol 33. Bailey WJ, Kim J, Wagner GP, Ruddle FH (1997) Phylogenetic reconstruction of verte- 173:382–395. brate Hox cluster duplications. Mol Biol Evol 14:843–853. 10. Wagner GP, Amemiya C, Ruddle F (2003) Hox cluster duplications and the opportunity 34. Lynch VJ, Wagner GP (2009) Multiple chromosomal rearrangements structured the for evolutionary novelties. Proc Natl Acad Sci USA 100:14603–14606. ancestral vertebrate Hox-bearing protochromosomes. PLoS Genet 5:e1000349. 11. Amemiya CT, et al. (2008) The amphioxus Hox cluster: Characterization, comparative 35. Wang J, Lee AP, Kodzius R, Brenner S, Venkatesh B (2009) Large number of ultracon- genomics, and evolution. J Exp Zoolog B Mol Dev Evol 310:465–477. served elements were already present in the jawed vertebrate ancestor. Mol Biol Evol 12. Garcia-Fernandez J, Holland PW (1994) Archetypal organization of the amphioxus Hox 26:487–490. gene cluster. Nature 370:563–566. 36. Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phyloge- 13. Ikuta T, Yoshida N, Satoh N, Saiga H (2004) Ciona intestinalis Hox gene cluster: Its netic tree selection. Bioinformatics 17:1246–1247. dispersed structure and residual colinear expression in development. Proc Natl Acad Sci 37. Pennacchio LA, et al. (2006) In vivo enhancer analysis of human conserved non-coding USA 101:15118–15123. sequences. Nature 444:499–502. 14. Seo HC, et al. (2004) Hox cluster disintegration with persistent anteroposterior order of expression in Oikopleura dioica. Nature 431:67–71. 38. Woolfe A, et al. (2005) Highly conserved non-coding sequences are associated with 15. Amores A, et al. (1998) Zebrafish hox clusters and vertebrate genome evolution. vertebrate development. PLoS Biol 3:e7. Science 282:1711–1714. 39. Aparicio S, et al. (1995) Detecting conserved regulatory elements with the model 16. Mungpakdee S, et al. (2008) Differential evolution of the 13 Atlantic salmon Hox genome of the Japanese puffer fish, Fugu rubripes. Proc Natl Acad Sci USA 92:1684– clusters. Mol Biol Evol 25:1333–1343. 1688. 17. Force A, Amores A, Postlethwait JH (2002) Hox cluster organization in the jawless 40. Santini S, Boore JL, Meyer A (2003) Evolutionary conservation of regulatory elements vertebrate Petromyzon marinus. J Exp Zool 294:30–46. in vertebrate Hox gene clusters. Genome Res 13:1111–1122. 18. Irvine SQ, et al. (2002) Genomic analysis of Hox clusters in the sea lamprey Petromyzon 41. Brudno M, et al. (2003) Glocal alignment: Finding rearrangements during alignment. marinus. J Exp Zool 294:47–62. Bioinformatics 19(Suppl 1):i54–i62. 19. Stadler PF, et al. (2004) Evidence for independent Hox gene duplications in the hagfish 42. Frazer KA, Pachter L, Poliakov A, Rubin EM, Dubchak I (2004) VISTA: Computational lineage: A PCR-based gene inventory of Eptatretus stoutii. Mol Phylogenet Evol tools for comparative genomics. Nucleic Acids Res 32:W273–W279. 32:686–694. 43. Visel A, et al. (2009) ChIP-seq accurately predicts tissue-specific activity of enhancers. 20. Fried C, Prohaska SJ, Stadler PF (2003) Independent Hox-cluster duplications in lam- Nature 457:854–858. preys. J Exp Zoolog B Mol Dev Evol 299:18–25. 44. Ludwig MZ, Bergman C, Patel NH, Kreitman M (2000) Evidence for stabilizing selection 21. Cappetta H, Duffin C, Zidek J (1993) in The Record 2, ed Benton MJ (Chapman and in a eukaryotic enhancer element. Nature 403:564–567. Hall, London), pp 593–609. 45. Anand S, et al. (2003) Divergence of Hoxc8 early enhancer parallels diverged axial 22. Kim CB, et al. (2000) Hox cluster genomics in the horn shark, Heterodontus francisci. morphologies between mammals and fishes. Proc Natl Acad Sci USA 100:15666–15669. Proc Natl Acad Sci USA 97:1655–1660. 23. Powers TP, Amemiya CT (2004) Evidence for a Hox14 paralog group in vertebrates. Curr 46. Belting HG, Shashikant CS, Ruddle FH (1998) Modification of expression and cis- Biol 14:R183–R184. regulation of Hoxc8 in the evolution of diverged axial morphology. Proc Natl Acad Sci 24. Chiu CH, et al. (2002) Molecular evolution of the HoxA cluster in the three major USA 95:2355–2360. gnathostome lineages. Proc Natl Acad Sci USA 99:5492–5497. 47. Shashikant CS, Kim CB, Borbely MA, Wang WC, Ruddle FH (1998) Comparative studies 25. Venkatesh B, et al. (2007) Survey sequencing and comparative analysis of the elephant on mammalian Hoxc8 early enhancer sequence reveal a baleen whale-specific deletion shark (Callorhinchus milii) genome. PLoS Biol 5:e101. of a cis-acting element. Proc Natl Acad Sci USA 95:15446–15451.

6of6 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0907914106 Ravi et al. Downloaded by guest on October 3, 2021