<<

EVOLUTION & DEVELOPMENT 8:5, 394–406 (2006)

Is retinoic acid genetic machinery a innovation?

Cristian Can˜estro,a John H. Postlethwait,a Roser Gonza`lez-Duarte,b and Ricard Albalatb,Ã aInstitute of Neuroscience, University of Oregon, Eugene, OR 97403, USA bDepartament de Gene`tica, Universitat de Barcelona, Av. Diagonal 645, 08028 Barcelona, Spain ÃAuthor for correspondence (email: [email protected])

SUMMARY Development of many chordate features and showed for the first time that RA genetic machineryF depends on retinoic acid (RA). Because the action of RA that is Aldh1a, Cyp26, and Rar orthologsFis present in during development seems to be restricted to , it had nonchordate deuterostomes. This finding implies that RA been previously proposed that the ‘‘invention’’ of RA genetic genetic machinery was already present during early machinery, including RA-binding nuclear hormone receptors deuterostome evolution, and therefore, is not a chordate (Rars), and the RA-synthesizing and RA-degrading enzymes innovation. This new evolutionary viewpoint argues against Aldh1a (Raldh) and Cyp26, respectively, was an important the hypothesis that the acquisition of gene families under- step for the origin of developmental mechanisms leading lying RA metabolism and signaling was a key event for to the chordate body plan. We tested this hypothesis the origin of chordates. We propose a new hypothesis in by conducting an exhaustive survey of the RA machinery which lineage-specific duplication and loss of RA machinery in genomic databases for twelve deuterostomes. We genes could be related to the morphological radiation of reconstructed the evolution of these genes in deuterostomes deuterostomes.

INTRODUCTION which appears to be the sister group of chordates (Cameron et al. 2000, but see Delsuc et al. 2006). To differentiate orthologs The origin of chordates and their innovative body plan re- from paralogs, we investigated the phylogenetic relationships mains controversial (Holland 2005b; Delsuc et al. 2006). Be- of deuterostome genes implicated in the evolution of the RA cause the action of retinoic acid (RA) in patterning embryonic machinery and closely related families. Understanding the axes seems to be restricted to chordates, it had been proposed evolution of Aldh1a had been obscure, due to poorly sup- that the morphogenetic role of RA was a chordate novelty ported gene phylogenies, confusion with the closely related linked to the origin of chordate-specific features (Shimeld Aldh2 (Fujiwara and Kawamura 2003), and the small number 1996; Manzanares et al. 2000; Schilling and Knight 2001; of taxa in which Aldh1a hadbeenreported(onlyafew Wada 2001; Holland 2005a). regulate RA action vertebrates and ascidians). To illuminate Aldh1a evolution, at two levels: metabolism and signaling (Fig. 1). Machinery we identified 73 Aldh1a-related genes in publicly available governing RA metabolism includes the RA-synthesizing en- /EST databases of seven vertebrates, one cephalo- zymes (retinaldehyde dehydrogenases Aldh1a, formerly chordate, three urochordates, and two nonchordate deuteros- Raldh) and the RA-degrading enzymes (Cyp26), which to- tomes. In this work we reveal the presence of the three main gether regulate the spatio-temporal distribution of RA during components of RA machinery, Aldh1a, Cyp26,andRar,in embryogenesis (Niederreither et al. 2002; Reijntjes et al. 2005). nonchordate , revealing for the first time that these Machinery for RA-signaling includes the RA-binding nuclear genes were not a chordate innovation. hormone receptors (Rars), which mediate RA action on target genes (e.g., Hox genes) (Marshall et al. 1994). Because Aldh1a, Cyp26 and Rar had been described only in chordates, it had been proposed that the acquisition of these gene families was MATERIALS AND METHODS a key step for the innovation of the chordate body plan (re- Sequence analysis and identification of new genes viewed in Fujiwara and Kawamura 2003). This hypothesis Sequences used in this work were assembled from data obtained by predicts that Aldh1a, Cyp26,andRar genes should not be in silico screening of public databases (accession numbers and da- found outside the Chordata. tabase URLs used in this work are provided in Table A1 in the To test this hypothesis, we searched for RA genetic ma- appendix). Human reference proteins from each analyzed gene chinery in Ambulacraria (echinoderms plus hemichordates), family were used as starting queries for BLAST searches (Altschul

394 & 2006 The Author(s) Journal compilation & 2006 Blackwell Publishing Ltd. Canestro et al. Retinoic acid and chordate origins 395

A B

CD

Fig. 1. (A) Retinoic acid (RA) genetic machinery regulates RA action at two levels: metabolism and signaling. Aldh1a (red) and Cyp26 (blue) regulate the spatio-temporal distribution of RA. Heterodimers of Rar and Rxr (green and gray, respectively) mediate RA signaling to RA-target genes (e.g., Hox). In contrast to Rar, Rxr can heterodimerize with other nuclear receptors, and its presence in protostomes suggests a more ancient origin (reviewed in Escriva et al. 2000). (B) Gene phylogenies corroborate the orthology of nonchordate Aldh1a, Cyp26, and Rar proteins (colored). Consistent with the presence of Rar in sea-urchin, we also found a sea-urchin Rxr (in bold). Tree branch lengths correspond to neighbor-joining distances, and numbers are the bootstrap values supporting each node (n 5 1000; poorly supported nodes o50% were collapsed). The same tree topologies were supported by maximum-likelihood and maximum-parsimony methods. (C) The finding of Aldh1a, Cyp26,andRar orthologs in Ambulacraria suggests a new evolutionary scenario, in which the RA genetic machinery was already present before the divergence of extant deuterostomes, and consequently, it is not a chordate innovation. (D) The Aldh1a phylogeny illustrates taxon-specific variation of RA-related gene families caused by independent gene duplication and loss during de- uterostome evolution. To understand the evolution of the Aldh1a subfamily, it was necessary to include in the analysis the next two most related Aldh families (Aldh2 and Aldh1l), and to consider exon–intron organization. The putative Aldh1a2 described in ascidians (Fujiwara and Kawamura 2003) has been renamed here as Aldh1a1/2/3a to reflect its phylogenetic affinities. The position of the urochordate Aldh1a1/ 2/3 cluster close to the Aldh1l family is probably distorted by an artifact due to long branch attraction. Vertebrates: Hs, Homo sapiens;Mm, Mus musculus;Rn,Rattus novergicus;Gg,Gallus gallus;Xt,Xenopus tropicalis;Dr,Danio rerio; and Tr, rubripes. Cephalo- chordates: Bf, Branchiostoma floridae. Urochordates: (larvaceans) Od, Oikopleura dioica; (ascidians) Ci, Ciona intestinalis and Cs, Ciona savignyi; Hemichordates: (acorn-worm) Sk, Saccoglossus kowalevskii; Echinoderms: (sea-urchin) Sp, Strongylocentrotus purpuratus. et al. 1997) against EST and genomic databases. The orthology of From nonassembled , Aldh genes were deduced by assem- proteins was deduced initially by reciprocal best BLAST searches bling cDNA and genomic contigs from 1009 trace sequences and against GenBank database (Wall et al. 2003). 108 ESTs from NCBI. Gene structures and protein sequences were 396 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006 deduced after merging the genomic sequences with ESTs when plete genome sequence for this species, we cannot discard the available, or by comparison with well characterized Aldh, Cyp26, presence of Rar in this hemichordate. Orthologies of the and Rar genes described in other species. Predicted genes from newly identified genes were strongly supported by sequence automatically annotated genomes were verified by eye, and errors similarity (Tables A2 and A3) and by reciprocal BLAST in automatic annotations were corrected to maximize the similarity against the human databases. In the case of the sea-urchin with ESTs when available, and other known enzymes. The partially Rar, the most significant BLAST hits were against the RARs predicted Strongylocentrotus purpuratus Rar protein sequence was (E-value of 9e-117) and the second next hit was against the completed by in silico genomic walking over 70 sequence traces. Zebrafish Aldh1a3 protein was initially predicted from putative THRB (E-value substantially lower, at 4e-69). In the case of exons inappropriately assembled by the Ensembl Zv5 database into the acorn-worm and sea-urchin Cyp26 proteins, the most three nonoverlapping genomic contigs (Table A1). To verify our significant BLAST hits in the human genome were the human zebrafish Aldh1A3 prediction, we amplified embryonic cDNA CYP26s (E-values were 2e-98 and 9e-72, respectively) and the by PCR and cloned a cDNA containing the complete coding next hits were against the human CYP51 and CYP3A, sequence. respectively (E-values were much lower, at 2e-26 and 9e-21, respectively). Gene phylogenies inferred by maximum Phylogenetic analysis likelihood and maximum parsimony (data not shown) Protein sequence alignments were generated with clustalX (Thomp- showed the same tree topologies as the neighbor-joining tree son et al. 1997) and corrected by eye. Only conserved parts of the (Fig. 1B) and thus corroborated the Rar and Cyp26 proteins, whose alignments were unambiguous among paralogs, orthologies inferred by the reciprocal best hit method were considered for the phylogenetic analysis: from codon I40 to (Wall et al. 2003). The next closest related families accord- I513 of human ALDH1A2 for the Aldh alignment; from codon ing to the BLAST searches (i.e., CYP51 for CYP26, and S74 to E156 and from I236 to S417 of human RARA for nuclear thyroid hormone receptor THR for Rar) and other closely receptor alignment; and from codon P45 to F490 of human CYP26A for cytochrome P450 alignment. The MEGA package related members (CYP4 and retinoid X receptor (RXR)) were (Kumar et al. 2001) was used to construct maximum parsimony, used as outgroups in the phylogenetic analyses. In the larva- and neighbor-joining phylogenetic trees corrected by a Poisson cean urochordate Oikopleura dioica, despite the deep coverage distribution of amino acid substitutions. A thousand repetitions of the genome database (9-fold coverage, Table A1), we were run for bootstrap support. TREE-PUZZLE 5.2 (Schmidt et did not find any clear Cyp26 or Rar orthologous genes. The al. 2002) was used to construct maximum-likelihood phylogenetic fact that BLAST searches using Cyp26 and Rar proteins trees, based on the quartet puzzling procedure and following the from multiple organisms as starting queries allowed us to JTT model for amino acid substitutions. identify phylogenetically more distant genes from the Cyp and nuclear hormone receptor families (e.g., Cyp4, Cyp5, Cyp2, Subcellular localization prediction Cyp3, and Thr, Rxr, Err, Ror, data not shown), suggested The PSORT-II program was used to predict from the amino acid that Cyp26 and Rar genes have either been lost or their sequence the subcellular localization of the deduced enzymes sequences have diverged so much that it is impossible to (Nakai and Horton 1999). PSORT employs the discriminant anal- recognize them by BLAST searches in the larvacean genome ysis (called ‘‘MITDISC’’), whose variables are the amino acid composition of the N-terminal 20 residues (Nakai and Kanehisa database. 1992), to recognize mitochondrial targeting signals. The prediction was performed using the k-nearest data points and a probability for the different subcellular localizations was assigned. Identification of Aldh1a and Aldh2 orthologous genes in deuterostomes The high sequence similarity between the Aldh1a and Aldh2 RESULTS families (Table A4) did not allow us to use the reciprocal BLAST approach to unambiguously ascribe the newly iden- Identification of Cyp26 and Rar sequences in tified genes to one or the other of the two families. For this nonchordate deuterostomes reason, we decided to identify all putative Aldh1a and Aldh2 Rar and Cyp26 sequences had previously been isolated only genes that might be orthologous to the Aldh1a or Aldh2 from several vertebrates and a few nonvertebrate species. To families in a large catalogue of deuterostomes. From our learn whether nonchordate deuterostomes have Cyp26 and Aldh1a–Aldh2 survey, we identified new Aldh1a and Aldh2 Rar orthologs, we searched EST and genomic databases of genes in vertebrates (Danio rerio (zebrafish), the hemichordate Saccoglossus kowalevskii (acorn-worm) and (pufferfish), Xenopus tropicalis (frog), Gallus gallus (chicken)), the echinoderm S. purpuratus (sea-urchin). From this search, cephalochordates (Branchiostoma floridae (amphioxus)), we identified Cyp26 sequences in both species and Rar in the urochordates (the ascidians Ciona intestinalis and Ciona S. purpuratus genome. Although no Rar ortholog could be savignyi, and the larvacean Oikopleura dioca), hemichordates identified in the S. kowalevskii EST database, without a com- (S. kowalevskii (acorn worm)) and echinoderms (S. purpuratus Canestro et al. Retinoic acid and chordate origins 397

(purple sea urchin)) (Table A1). During our in silico screening, differentAldhfamilies.ThefactthatdeuterostomeAldh1l identification of orthologs of the next most closely related genes share an overall exon-intron organization that differs family (i.e., Aldh1l; Fig. 1D and Table A1) was considered as from those of Aldh1a and Aldh2 genes (data not shown) evidence that all Aldh1a and Aldh2 genes present in the dat- suggests that the origin of Aldh1l and the Aldh2/1a gene pre- abases had been retrieved. Except for the hemichordate S. ceded the duplication and divergence of Aldh2 and Aldh1a. kowalevskii, the deep coverage of the genomic databases We describe now this complicated gene family in a taxonomic screened (coverage is provided in Table A1) and the large context. number of EST sequences from the organisms analyzed sug- gested that all Aldh1a orthologs for a given species had likely Vertebrates been identified. Aldh2 To classify the new Aldh sequences, we took advantage Aldh2 enzymes are a group of nuclearly encoded proteins that of the presence in their gene structures of family-specific play a major role in acetaldehyde detoxification in mito- ‘‘signatures’’ in their exon-intron organizations: Aldh1a genes chondria. Mitochondrial aldehyde dehydrogenases related to lack intron 4 but include an extra intron 12b, whereas Aldh2 have been found in all eukaryote species so far inves- the oppositeFpresence of intron 4 and absence of 12bFis tigated, suggesting an ancient origin probably preceding the characteristic of Aldh2 genes (Fig. 2). The recognition of evolution of eukaryotes (Rzhetsky et al. 1997; Yoshida et al. these family-specific signatures combined with information 1998; Perozich et al. 1999; Vasiliou et al. 1999; Sophos and from phylogenetic analysis, sequence identity, genomic Vasiliou 2003). During our survey, we identified in all verte- location and prediction of subcellular localization, allowed brates at least one clear Aldh2 gene. All Aldh2 us to classify confidently the new Aldh proteins into the genes shared the same exon–intron structure, made of 13

Fig. 2. Schematic comparison of intron distribution in deuterostome Aldh1a and Aldh2 families. Arrowheads indicate intron positions. The 13intronsoftheStrongylocentrotus purpuratus Aldh2 genes are shared by most of Aldh2 and Aldh1a genes, and for this reason, it has been used as reference (numbered 1–13, top; additional lineage-specific introns 7b, 9b, 10b, and 12b, bottom). White arrowheads indicate overall conserved intron positions; black arrowheads on a gray background denote introns 4 and 12b, which define Aldh family signatures: Aldh1a genes lacked intron 4 but included an extra intron 12b, whereas the oppositeFpresence of intron 4 and absence of 12bFis characteristic of Aldh2 genes; gray arrowheads indicate lineage-specific introns. Notice the intron-less structure of the Aldh1b1 gene, and that O. dioca Aldh2 has completely reorganized its gene structure (not numbered). Homo sapiens genes represent the conserved vertebrate gene structure. Abbreviations are as in Fig. 1. 398 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006 exons and 12 introns (Fig. 2) and coded for enzymes predicted Aldh2a2, and consequently the rest of Aldh2 genes in verte- to be located within mitochondria (Table 1). In Zv5 zebrafish brates should be renamed as Aldh2a1. The intron-less struc- genome database we identified two aldh2 genes located ture of the coding region of Aldh1b1 genessuggestsa contiguously in LG5 (Table A1), transcribed in the same retrotranscriptional origin of the new copy during vertebrate direction, resulting probably from a zebrafish-specific tandem evolution. duplication. The two predicted zebrafish Aldh2 proteins (Aldh2a and Aldh2b) are 95.2% identical. Aldh1a In addition to the typical Aldh2 genes, an extra human All known vertebrate Aldh1a genes share the same exon-in- protein named ALDH1B1 (also known as ALDHx or tron organization, having 12 introns. Although Aldh1a and ALDH5) (Hsu and Chang 1991), showed the highest simi- Aldh2 genes both have 12 introns, the organization of these larity with ALDH2 in the human genome (Table A4). We introns differed significantly (Fig. 2). Three main cytosolic identified Aldh1b1 genes in all mammals and amphibians retinaldehyde dehydrogenase enzymes, named Aldh1a1, examined, but not in birds and fishes (Table A1). Sequence Aldh1a2 and Aldh1a3, are typically found in tetrapods similarity and phylogenetic analysis indicate that Aldh1b1 (Sophos and Vasiliou 2003). Rodents are unique in possessing genes derived from a vertebrate ancestral duplication within a fourth enzyme, named Aldh1a4 (Dunn et al. 1989) or the Aldh2 family (Fig. 1). We propose, therefore, that Al- Aldh1a7 (Hsu et al. 1999). Our phylogenetic analysis sug- dh1b1 should be included in the Aldh2 family and renamed as gested that the extra murine genes arose from a duplication of the Aldh1a1 gene in the rodent clade before the divergence of mouse and rat lineages (Fig. 1D). This idea is consistent Table 1. Prediction of subcellular localization of Aldh2, with the genomic position of the Aldh1a1 and the extra Aldh1b1, and Aldh1a enzymes murine genes, which map as neighbors and are divergently transcribed in regions of the mouse and rat genomes that Mito- Cyto- share conserved syntenies (Table A1). Enzyme chondrial plasmic Nuclear From the three Aldh1a members, Aldh1a2 was the only ALDH2 SpAldh2a 87.0 8.7 4.3 one that had previously been described outside tetrapods SpAldh2b 65.2 26.1 4.3 (Begemann et al. 2001; Grandel et al. 2002). Our searches in SkAldh2 43.5 30.4 13.0 nonmammalian vertebrate databases showed that amphibians OdAldh2 95.7 4.3 0.0 and birds also have Aldh1a1 and Aldh1a3 enzymes (Fig. 1 CiAldh2 4.3 73.9 8.7 and Table A1) (Godbout 1992; Sockanathan and Jessell 1998; CsAldh2 8.7 65.2 13.0 Tsukui et al. 1999; Grun et al. 2000; Suzuki et al. 2000). BfAldh2 91.3 4.3 4.3 Zebrafish Aldh1a2 was the only Aldh1a member described so HsALDH2 87.0 4.3 8.7 HsALDH1B1 95.7 0.0 4.3 far in fishes (Begemann et al. 2001; Grandel et al. 2002). ALDH1a SkAldh1a1/2/3 34.8 30.4 4.3 Screening of and zebrafish databases allowed us to CiAldh1a1/2/3a 30.4 21.7 34.8 identify, not only aldh1a2 genes, but also aldh1a3 orthologs CiAldh1a1/2/3b 13.0 52.2 30.4 (Fig. 1D and Table A1). However, aldh1a1 genes were not CiAldh1a1/2/3c 8.7 34.8 43.5 found in either of the two fish genomes analyzed. Thus, while CiAldh1a1/2/3d 21.7 21.7 34.8 tetrapods have three Aldh1a members, teleosts seem to have CsAldh1a1/2/3a 21.7 21.7 39.1 only two, Aldh1a2 and Aldh1a3. CsAldh1a1/2/3b/c1 4.3 60.9 30.4 CsAldh1a1/2/3b/c2 4.3 60.9 30.4 BfAldh1a1/2/3a 4.3 73.9 8.7 Cephalochordates BfAldh1a1/2/3b 13.0 56.5 17.4 Aldh2 BfAldh1a1/2/3c 30.4 43.5 17.4 No Aldh enzymes had been previously identified in cephalo- BfAldh1a1/2/3d 0.0 60.9 17.4 chordates. From our survey of the B. floriade genome project BfAldh1a1/2/3e 4.3 73.9 4.3 and the EST database, we have assembled six genomic contigs BfAldh1a1/2/3f 8.7 73.9 8.7 containing seven genes belonging to the Aldh2 or Aldh1a HsALDH1A1 4.3 73.9 8.7 HsALDH1A2 17.4 60.9 13.0 families (Table A1). Only one of these amphioxus genes HsALDH1A3 4.3 65.2 17.4 showed an intron–exon organization with the Aldh2 signature (presence of intron 4 and absence of intron 12b; Fig. 2). The PSORT-II program were used to predict the subcellular localization protein predicted from this gene was most similar to verte- of the deduced Aldh enzymes. The prediction is performed using the k- nearest data points and the probabilities (%) for the different subcellular brate Aldh2 enzymes (72% against the human ALDH2 vs. localizations were calculated. Only mitochondrial, cytoplasmic, and nu- 64 Æ 1% against ALDH1A proteins) (Table A4). Moreover, clear are shown. The highest value is highlighted in bold. Human enzymes this enzyme was the only amphioxus Aldh that grouped stand for the vertebrate enzymes. Nomenclature is as in Fig. 1. within the Aldh2 cluster in the phylogenetic tree (Fig. 1D), Canestro et al. Retinoic acid and chordate origins 399 and rendered a robust prediction for mitochondrial localiza- that these genes represent the ascidian orthologs of Aldh2, tion (Table 1). Therefore, we could confidently assign this although their subcellular localization and biochemical activ- amphioxus enzyme to the Aldh2 family. ity will need further investigation. Aldh1a Aldh1a The other six amphioxus genes coded for Aldh proteins that In addition to Aldh2, C. intestinalis and C. savignyi genomes grouped into a single cluster in the phylogenetic tree (Fig. 1, B contained four and three putative Aldh1a genes, respectively. and D), suggesting that they might have originated by mul- The Aldh1a nature of these genes was corroborated by the tiple gene duplications within the cephalochordate lineage. All presence of the Aldh1a signature in their exon–intron organ- six amphioxus Aldh proteins were more similar to each other ization: absence of intron 4 and presence of intron 12b (de- than to any human ALDH2 or ALDH1A enzymes (Table spite the fact that intron 12b appeared to have been A4). The Aldh1a nature of these proteins was supported secondarily lost once within the urochordate clade). The fact by the presence of the Aldh1a signature in their gene struc- that many ascidian Aldh1a genes appeared to have lost intron tureFabsence of intron 4 and presence of intron 12b 7 and that the predicted proteins clustered together in the (Fig. 2)Fand the predicted cytosolic localization of the de- evolutionary trees (Figs. 1, B and D, 3), pointed to a lineage- duced proteins (Table 1). Thus, the six amphioxus Aldh1a specific origin by gene duplications in the ascidian lineage. genes probably originated from an ‘‘explosion’’ of independ- Following the same rationale as with amphioxus, we con- ent cephalochordate-specific gene duplications from an cluded that these genes would have evolved from an ancestral ancestral Aldh1a1/2/3 gene (i.e., the pro-ortholog of the Aldh1a1/2/3 form and, hence, we have named them as Al- current vertebrate Aldh1a1, Aldh1a2,andAldh1a3). We dh1a1/2/3Fa, b, c and d for C. intestinalis and a, bc1 and bc2 have named the amphioxus genes Aldh1a1/2/3(a to f)tore- for C. savignyiF(Table A1). The long branches of the as- flect this evolutionary origin. The fact the amphioxus cidian Aldh1a1/2/3 cluster probably cause a long-branch-at- Aldh1a1/2/3c and Aldh1a1/2/3d genes were in the same gen- traction (LBA) artifact and force its phylogenetic position omic contig, and the presence of the amphioxus-specific in- close to Aldh1l in the neighbor-joining tree (Fig. 1D). The use tron 10b in the Aldh1a1/2/3b-c-d clade (Fig. 2) further of less sensitive phylogenetic methods (i.e., maximum likeli- supported the origin of these genes from gene duplications hood and maximum parsimony) to LBA and the elimination within the cephalochordate lineage. of the Aldh1l sequences overcame LBA distortion of the urochordate Aldh1a clade. The internal topology of the as- Urochordates cidian cluster in the evolutionary tree (Figs. 1D and 3) and AsingleC. intestinalis enzyme, called Raldh2 (Nagatomo and exon–intron comparisons (Fig. 2) indicated that C. savignyi Fujiwara 2003), was the only retinaldehyde dehydrogenase and C. intestinalis Aldh1a1/2/3a (which corresponds to the that had been previously described in urochordates. reported Raldh2 gene; Nagatomo and Fujiwara 2003) were As in cephalochordates, our in silico screening of EST and orthologs. The other Aldh1a1/2/3 genes (b,candd in C. genomic databases of ascidians and larvaceans revealed a intestinalis and bc1 and bc2 in C. savignyi) lied together in remarkable complexity of Aldh genes among urochordates each species, suggesting again that they originated by inde- (Table A1). pendent tandem duplications during the evolution of each Ciona species. Ascidian Class Aldh2 Larvacean Class Only one Aldh gene in each ascidian species, C. intestinalis and Aldh2 C. savignyi, showed the Aldh2-exon/intron signature: presence OuranalysisoftheO. dioica genome revealed a single of intron 4 and absence of 12b (Fig. 2). The proteins predicted larvacean gene related to the vertebrate Aldh2 or Aldh1a to be encoded by these ascidian genes showed the highest families (Table A1). The complete reorganization of the exon– similarity against vertebrate Aldh2 enzymes (71.7 Æ 0.3% intron structure of this Oikopleura Aldh gene (Fig. 2)Fa against human ALDH2 vs. 65.7 Æ 2.1% against human feature of many Oikopleura genes (Edvardsen et al. 2004; ALDH1A proteins; Table A4), and grouped at the base of the Can˜ estro et al. 2005)Fdid not permit use of exon organiza- Aldh2 cluster in the evolutionary tree (Fig. 1D). Subcellular tion as a character for ascribing the gene to an Aldh family. localization programs did not predict mitochondrial localiza- The Aldh2 nature of larvacean protein, however, was tion for these ascidian enzymes (Table 1), an unexpected re- deduced from the sequence identity (66.7% against human sult considering the high similarity with other Aldh2 proteins. ALDH2 vs. 60.0 Æ 1.1% against human ALDH1As) Close inspection of the ascidian Aldh2 sequences revealed (Table A4), its strong prediction score of mitochondrial shortened and divergent N-terminal regions, which contains localization (Table 1), and its position in the evolutionary the mitochondrial localization signal of these enzymes in oth- tree within the cluster of all other nonvertebrate Aldh2 er species (Nakai and Kanehisa 1992). We therefore assumed enzymes (Fig 1D). 400 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006

Fig. 3. Phylogenetic analysis of Aldh1a family by maximum-likelihood (A) and maximum-parsimony (B) methods. Numbers are the bootstrap values (%) supporting each node (n 5 1000). Poorly supported nodes (o50%) were collapsed.

Aldh1a ALDH2 vs. 67 Æ 1.1% against human ALDH1As; Table After an exhaustive survey of the larvacean genome database A4). Consistent with the Aldh2 nature of this protein, it was (9-fold coverage, Table A1), no Aldh1a1/2/3 orthologue was the only S. kowalevskii Aldh that grouped within the clade of recognized in Oikopleura, despite the fact that other genes other nonvertebrate Aldh2 proteins (Fig. 1D) and its score for from more distantly related Aldh families (e.g., Aldh5, Aldh6, mitochondrial localization was higher than for cytoplasmatic Aldh8, Aldh9, and Aldh16, data not shown) were clearly localization (Table 1). Therefore, we concluded this S. ko- identified. The presence of Aldh1a1/2/3 genes in the other walevskii protein is an Aldh2 ortholog. urochordate species suggested a specific gene loss in the lar- vacean lineage or the absence of the gene in the Oikopleura Aldh1a sequencing project. We deduced five other Aldh sequences from the S. kowalevskii EST database (Table A1). The five S. kowalevskii Aldh pro- Hemichordates teins grouped as a cluster in the phylogenetic tree (Figs. 1, B No Aldh has been previously described in hemichordates. and D, 3) close to the amphioxus Aldh1a1/2/3 clade, sug- BLAST searches against 101,376 EST trace files sequenced by gesting that they might have originated by multiple gene WIBR/MIT available at NCBI allowed us to assemble six duplications of an ancestral Aldh1a1/2/3 gene within the different cDNA contigs that coded for Aldh1a–Aldh2-related hemichordate lineage. Consistent with this lineage-specific or- proteins (Table A1). From the six cDNAs assembled, only igin, all five S. kowalevskii Aldh proteins were more similar to two contained the entire coding sequence, and the other four each other than to human ALDH2 or ALDH1A enzymes covered only about 50–70% of the coding sequence. There is (Table A4). currently no genomic sequence database available for S. ko- Echinoderms walevskii, and therefore the presence of additional Aldh1a or Aldh2 proteins cannot be discarded. Aldh2 OurscreeningoftheseaurchinS. purpuratus genome revealed Aldh2 two Aldh genes closely related with the Aldh2 family (Table One of the six predicted proteins showed the highest similarity A1). Proteins predicted from the two sea urchin sequences to human ALDH2 proteins (e.g., 76% against human showed highest similarity to other Aldh2 enzymes (70.5% and Canestro et al. Retinoic acid and chordate origins 401

67.5% against human ALDH2 vs. 62.7 Æ 2.1% and of the phylogenetic analysis and to circumvent the inherent 63.5 Æ 2.6% against human ALDH1As, respectively; Table limitation of the phylogenetic signal of the Aldh1a–Aldh2 se- A4). The two genes shared the same 14-exon organization, quences, we took advantage of other types of information, and the presence of intron 4 and absence of intron 12b was whose homoplasy is considered to be low (reviewed in Rokas indicative of the Aldh2 nature (Fig. 2). Accordingly, sea ur- and Holland 2000), such as gene structures (i.e., intron indels) chin proteins group with other Aldh2 enzymes in evolutionary and a specific sequence motif that, in our case, determines the trees and were predicted to localize to mitochondria (Table 1). subcellular localization of the proteins. Thus, the recognition We have named the S. purpuratus enzymes Aldh2a and of family-specific ‘‘signatures’’ in the exon–intron organiza- Aldh2b because phylogenetic analysis suggested that both se- tions of Aldh1a and Aldh2 families (Fig. 2) was especially quences derived from an independent duplication early in useful to support orthologies inferred from the topologies echinoderm evolution. Consistently, the recently released sea of the evolutionary trees (Fig. 1), and provided a useful tool urchin genome assembly has revealed that both genes are for future discrimination between deuterostome Aldh1a and tandemly located in the same genomic region and directly Aldh2 genes. transcribed. Our screening of echinoderm and hemichordate databases revealed that orthologs of Aldh1a, Cyp26,andRar genes exist Aldh1a outside the phylum Chordata (Fig. 1B). This discovery means All attempts to find Aldh1a1/2/3 orthologous in the sea urchin that the gene families for RA metabolism and RA signaling genome were unsuccessful, even though other genes belonging were present before the divergence of extant deuterostomes, to more distantly related families, e.g., Aldh1l1, Aldh5 and and therefore, they are not a chordate innovation (Fig. 1C). Aldh16, were clearly identified (data not shown). Although we This conclusion argues against the hypothesis that the acqui- cannot rule out that an Aldh1a1/2/3 gene could be located in sition of gene families underlying RA metabolism and signa- a genomic region still not covered by the current genome ling was a key event for the evolution of developmental project, we favor the hypothesis that the S. purpuratus gen- mechanisms that produce the chordate body plan. ome may lack an Aldh1a1/2/3 ortholog. Since Aldh1a1/2/3 In our search for the RA genetic machinery, we found enzymes appear to be present in hemichordates, the sister remarkable variability in gene family size in different de- group of echinoderms (Cameron et al. 2000), additional gen- uterostome lineages, especially in Aldh1a genes (Fig. 1D). omes need to be explored to distinguish whether the loss of There are at least five Aldh1a genes in hemichordates, four in Aldh1a1/2/3 is specific to sea-urchins or is common to the urochordates, and six in cephalochordates (Fig. 1D). Among entire phylum Echinodermata. vertebrates, tetrapods have three main Aldh1a genes, but fish have only two. Independent gene duplication in most de- uterostome lineages from a single Aldh1a1/2/3 pro-ortholog DISCUSSION present in the stem deuterostome is the most parsimonious hypothesis to explain the origin of the present wide catalog of To test the hypothesis that the acquisition of RA genetic ma- deuterostome Aldh1a proteins. This hypothesis is supported chinery was a key event for the innovation of developmental by our findings: (i) Aldh1a paralogs in the same taxon branch mechanisms that produce the chordate body plan, we ex- together in the phylogenetic tree; (ii) Aldh1a paralogs in the plored genome databases of chordate and nonchordate de- same taxon share lineage-specific introns; and (iii) many uterostomes. In contrast to the unambiguous identification of Aldh1a genes paralogs in the same taxon lie in tandem in Cyp26 and Rar genes, whose orthology could be clearly as- the same genomic region. It cannot be discarded, however, the signed by reciprocal BLAST searches and phylogenetic anal- possibility that the present catalog of deuterostome Aldh1a ysis (Fig. 1B), the phylogenetic signal of the putative genes results from a complex pattern of gene conversion and nonvertebrate Aldh1a sequences was not sufficient to conclu- lineage-specific gene duplications (and loses) from an original sively classify them into the Aldh1a or into the closely related extensive set of Aldh1a1/2/3 genes that was already present Aldh2 families by BLAST analysis only. Different phyloge- in the stem deuterostome. For vertebrates, this possibility is netic methods, that is maximum likelihood, maximum par- unlikely because ALDH1A1 in human chromosome (Hsa) simony and neighbor-joining analysis, split the new identified Hsa9q21.13 and ALDH1A2 and ALDH1A3 in Hsa15q22.1 deuterostome Aldh sequences in two distinct groupsF and Hsa15q26.3 (Table A1), respectively, occupy two par- Aldh1a and Aldh2 (Figs. 1D and 3). Although some nodes alogous chromosome regions that arose in the two rounds of the tree did not have high statistical support, the assigned of whole genome duplication that occurred at about the time orthology of these two groups to the Aldh1a and Aldh2 of vertebrate origins (Dehal and Boore 2005). The future families was consistent with the fact that most deuterostome assembly of nonvertebrate genomes will help to illuminate the lineages possessed a representative in each Aldh group (see evolutionary origin of the complex catalog of deuterostome code of colors in Fig. 1D). To overcome the lack of resolution Aldh1 genes. Under either hypothesis, however, the main 402 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006 conclusion holds that one or more Aldh1a1/2/3 pro-ortholog the expression of Hox genes? If the ancestral deuterostome was already present in the stem deuterostome. did synthesize, bind, and breakdown RA with its Aldh1a, Our genomic screening reveals that the Aldh1a family not Rar, and Cyp26 genes, then we must view the deuterostome only experienced extensive amplification, but also independ- ancestor as more ‘‘chordate like’’ than is generally assumed ent losses in sea-urchin and larvacean urochordates (Fig. 1C). (Gerhart et al. 2005). On the other hand, if the Aldh1a, Rar, In these two taxa, exogenous RA treatments do not and Cyp26 proteins were present in stem deuterostomes, but cause homeotic changes of anterior to posterior structures were not acting in RA metabolism, then the discovery of (Sciarrino and Matranga 1995; and C. Can˜ estro, unpublished which of today’s lineages actually interact with RA will help data). In sea-urchin, the absence of Aldh1a could be func- us to understand when various innovations were acquired in tionally compensated by enzymes from other Aldh families, the evolution of chordate-specific features. How do our con- for example Aldh8a1 is capable, at least in vitro, of oxidizing clusions change if the classical model of deuterostome evo- retinal, but the contribution of Aldh8a1 to RA signaling is not lution we assume in Fig. 1C is replaced by a new phylogenetic yet fully understood (Lin and Napoli 2000). Larvaceans, hypothesis suggesting that urochordates are the closest living however, also appear to lack Cyp26 and Rar genes, suggesting relative to vertebrates and chordates are no longer mon- that RA machinery might have been lost or modified beyond ophyletic because cephalochordates are more closely related recognition during larvacean evolution (Fig. 1C); this loss to Ambulacraria than to vertebrates (Delsuc et al. 2006)? The questions the contribution of RA signaling in Oikopleura de- newly proposed evolutionary scenario does not alter our main velopmental patterning. Overall, our data reveal that the RA conclusion that the RA genetic machinery was already present genetic machinery has diversified substantially in different in the stem deuterostome. If, however, cephalochordates were deuterostome lineages, raising the possibility that these dif- the sister group of Ambulacraria, it can be inferred that ferences could be related to the morphological diversity of the deuterostome ancestor already had a surprising number extant deuterostomes by, for example, modifying the spatio- of chordate features and RA genetic machinery was temporal distribution of RA during embryogenesis. already involved in axial patterning during development. Our work definitively answered the question of whether Under either the classical view or new phylogenetic model, it the genetic machinery for RA signaling was ‘‘invented’’ by is possible that the RA genetic machinery was fully functional chordates. The answer pushes back in evolutionary time our in the stem deuterostome, and during the deuterostome understanding of the origins of this machinery, but raises an- radiation, the components of the RA machinery evolved other question, for which there is currently insufficient data to heterogeneously in different lineages by being preserved, draw a firm conclusion: What is the earliest diverging taxon recruited, duplicated, or lost from the molecular genetic that possesses components of the RA signaling network? In network controlling the development of lineage-specific mor- an attempt to narrow down the origin of RA machinery phological features, such as axial patterning, the development during evolution, we explored available genomic and of body symmetries, the central nervous system, sensory cells, EST databases of nondeuterostomes (e.g., the ecdysozoan and endodermal derivatives (Shimeld, 1996; Hinman and protostomes Drosophila, Anopheles, Aedes,andC. elegans, Degnan 1998; Hinman and Degnan 2000; Manzanares et al. and the radiata Hydra and Nematostella). Although this 2000; Ross et al. 2000; Schilling and Knight 2001; Wada 2001; analysis did not reveal any convincing Rar, Cyp26,orAldh1a Escriva et al. 2002; Holland, 2005a; Vermot and Pourquie orthologs, the taxonomic diversity is too narrow and some 2005; C. Can˜ estro, unpublished data). The possibility that this current genomic databases of nondeuterostomes are too shal- heterogenous evolution of the RA components may have low to conclude that the RA genetic machinery was a de- favored morphological radiation among deuterostomes is uterostome innovation. Analysis of this question must await consistent with the multiple independent gene duplications deeper databases and broader phylogenetic sampling. and losses we found among different deuterostome taxa. In conclusion, our analysis shows for the first time the Finally, the new Aldh1a, Cyp26, and Rar genes identified here presence of the three main components of RA metabolism are significant because they provide genomic information and signaling in nonchordates. We conclude that the gene necessary to design functional experiments to investigate the families for RA metabolism and RA signaling were already developmental roles of RA across a variety of taxa, and in present in early deuterostome evolution, thereby calling into doing so, improve our understanding of deuterostome radi- question the ‘‘invention’’ of the RA genetic machinery as a ation and the evolutionary origin of chordate developmental basis for the innovation of developmental mechanisms leading mechanisms. to chordate-specific features. Our new evolutionary scenario raises a number of new questions. Did the ancestral deute- Acknowledgments rostome Aldh1a and Cyp26 proteins actually metabolize For generously making genome sequences publicly available, we retinoids? Was RA the ligand for the ancestral deuterostome thank D. Chourrout and Genoscope for O. dioica;L.Holland,J. Rar and, if so, did RA already act as a morphogen regulating Gibson-Brown, and JGI for B. floridae; and J. Aronowicz, C. J. Canestro et al. Retinoic acid and chordate origins 403

Lowe, and WIBR/MIT for S. kowalevskii.Thismaterialisbasedon Hsu, L. C., and Chang, W. C. 1991. Cloning and characterization of a new work supported by NSF Grant IBN-0345203 to J. H. P. and C. C., functional human aldehyde dehydrogenase gene. J. Biol. Chem. 266: HD22486 to J. H. P., by Ministerio de Ciencia y Tecnologı´a (Spain), 12257–12265. grant BMC2003-05211 to R. G. D. and R. A., and EX2002-0059 to Hsu, L. C., Chang, W. C., Hoffmann, I., and Duester, G. 1999. Molecular analysis of two closely related mouse aldehyde dehydrogenase genes: C. C., and by DURSI (Generalitat de Catalunya), grant identification of a role for Aldh1, but not Aldh-pb, in the biosynthesis of 2005BE00080 to R. A. retinoic acid. Biochem. J. 339 (Part 2): 387–395. Kumar, S., Tamura, K., Jakobsen, I. B., and Nei, M. 2001. MEGA2: molecular evolutionary geneticsanalysissoftware.Bioinformatics 17: 1244–1245. Lin, M., and Napoli, J. L. 2000. cDNA cloning and expression of a human REFERENCES aldehyde dehydrogenase (ALDH) active with 9-cis-retinal and identifi- cation of a rat ortholog, ALDH12. J. Biol. Chem. 275: 40106–40112. Altschul, S. F., et al. 1997. Gapped BLAST and PSI-BLAST: a new gen- Manzanares, M., et al. 2000. Conservation and elaboration of Hox eration of protein database search programs. Nucleic Acids Res. 25: gene regulation during evolution of the vertebrate head. Nature 408: 854– 3389–3402. 857. Begemann,G.,Schilling,T.F.,Rauch,G.J.,Geisler,R.,andIngham,P. Marshall, H., et al. 1994. A conserved retinoic acid response element re- W. 2001. The zebrafish neckless mutation reveals a requirement for quired for early expression of the homeobox gene Hoxb-1. Nature 370: raldh2 in mesodermal signals that pattern the hindbrain. Development 567–571. 128: 3081–3094. Nagatomo, K., and Fujiwara, S. 2003. Expression of Raldh2, Cyp26 and Cameron, C. B., Garey, J. R., and Swalla, B. J. 2000. Evolution of the Hox-1 in normal and retinoic acid-treated Ciona intestinalis embryos. chordate body plan: new insights from phylogenetic analyses of de- Gene. Exp. Patterns 3: 273–277. uterostome phyla. Proc. Natl. Acad. Sci. USA 97: 4469–4474. Nakai, K., and Horton, P. 1999. PSORT: a program for detecting sorting Can˜ estro, C., Bassham, S., and Postlethwait, J. 2005. Development of the signals in proteins and predicting their subcellular localization. Trends central nervous system in the larvacean Oikopleura dioica and the ev- Biochem. Sci. 24: 34–36. olution of the chordate brain. Dev. Biol. 285: 298–315. Nakai, K., and Kanehisa, M. 1992. A knowledge base for predicting protein Dehal, P., and Boore, J. L. 2005. Two rounds of whole genome duplication localization sites in eukaryotic cells. 14: 897–911. in the ancestral vertebrate. PLoS Biol. 3: e314. Niederreither, K., et al. 2002. Genetic evidence that oxidative derivatives of Delsuc, F., Brinkmann, H., Chourrout, D., and Philippe, H. 2006. Tunic- retinoic acid are not involved in retinoid signaling during mouse devel- ates and not cephalochordates are the closest living relatives of verte- opment. Nat. Genet. 31: 84–88. brates. Nature 439: 965–968. Perozich, J., Nicholas, H., Wang, B. C., Lindahl, R., and Hempel, J. 1999. Dunn,T.J.,Koleske,A.J.,Lindahl,R.,andPitot,H.C.1989.Pheno- Relationships within the aldehyde dehydrogenase extended family. Pro- barbital-inducible aldehyde dehydrogenase in the rat. cDNA sequence tein Sci. 8: 137–146. and regulation of the mRNA by phenobarbital in responsive rats. J. Biol. Reijntjes, S., Blentic, A., Gale, E., and Maden, M. 2005. The control of Chem. 264: 13057–13065. morphogen signalling: regulation of the synthesis and catabolism of re- Edvardsen, R. B., et al. 2004. Hypervariable and highly divergent intron- tinoic acid in the developing embryo. Dev. Biol. 285: 224–237. exon organizations in the chordate Oikopleura dioica. J. Mol. Evol. 59: Rokas, A., and Holland, P. W. 2000. Rare genomic changes as a tool for 448–457. phylogenetics. Trends Ecol. Evol. 15: 454–459. Escriva, H., Delaunay, F., and Laudet, V. 2000. Ligand binding and nuclear Ross, S. A., McCaffery, P. J., Drager, U. C., and De Luca, L. M. 2000. receptor evolution. Bioessays 22: 717–727. Retinoids in embryonal development. Physiol. Rev. 80: 1021–1054. Escriva, H., Holland, N. D., Gronemeyer, H., Laudet, V., and Holland, L. Rzhetsky,A.,Ayala,F.J.,Hsu,L.C.,Chang,C.,andYoshida,A.1997. Z. 2002. The retinoic acid signaling pathway regulates anterior/posterior Exon/intron structure of aldehyde dehydrogenase genes supports the patterning in the nerve cord and pharynx of amphioxus, a chordate ‘‘introns-late’’ theory. Proc.Natl.Acad.Sci.USA94: 6820–6825. lacking neural crest. Development 129: 2905–2916. Schilling, T. F., and Knight, R. D. 2001. Origins of anteroposterior pat- Fujiwara, S., and Kawamura, K. 2003. Acquisition of retinoic acid signa- terning and Hox gene regulation during chordate evolution. Philos. ling pathway and innovation of the chordate body plan. Zool. Sci. 20: Trans. R. Soc. Lond. B. Biol. Sci. 356: 1599–1613. 809–818. Schmidt, H. A., Strimmer, K., Vingron, M., and von Haeseler, A. 2002. Gerhart, J., Lowe, C., and Kirschner, M. 2005. Hemichordates and the TREE-PUZZLE: maximum likelihood phylogenetic analysis using quar- origin of chordates. Curr. Opin. Genet. Dev. 15: 461–467. tets and parallel computing. Bioinformatics 18: 502–504. Godbout, R. 1992. High levels of aldehyde dehydrogenase transcripts in the Sciarrino, S., and Matranga, V. 1995. Effects of retinoic acid and dime- undifferentiated chick retina. Exp. Eye. Res. 54: 297–305. thylsulfoxide on the morphogenesis of the sea urchin embryo. Cell Biol. Grandel, H., et al. 2002. Retinoic acid signalling in the zebrafish embryo is Int. Rep. 19: 675–680. necessary during pre-segmentation stages to pattern the anterior–poste- Shimeld, S. M. 1996. Retinoic acid, hox genes and the anterior-posterior rior axis of the CNS and to induce a pectoral fin bud. Development 129: axis in chordates. BioEssays 18: 613–616. 2851–2865. Sockanathan, S., and Jessell, T. M. 1998. Motor neuron-derived retinoid Grun, F., Hirose, Y., Kawauchi, S., Ogura, T., and Umesono, K. 2000. signaling specifies the subtype identity of spinal motor neurons. Cell 94: Aldehyde dehydrogenase 6, a cytosolic retinaldehyde dehydrogenase 503–514. prominently expressed in sensory neuroepithelia during development. J. Sophos, N. A., and Vasiliou, V. 2003. Aldehyde dehydrogenase gene su- Biol. Chem. 275: 41210–41218. perfamily: the 2002 update. Chem. Biol. Interact. 143–144: 5–22. Hinman, V. F., and Degnan, B. M. 1998. Retinoic acid disrupts anterior Suzuki, R., et al. 2000. Identification of RALDH-3, a novel retinaldehyde ectodermal and endodermal development in ascidian larvae and post- dehydrogenase, expressed in the ventral region of the retina. Mech. Dev. larvae. Dev. Genes Evol. 208: 336–345. 98: 37–50. Hinman, V. F., and Degnan, B. M. 2000. Retinoic acid perturbs Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., and Higgins, Otx gene expression in the ascidian pharynx. Dev. Genes Evol. 210: D. G. 1997. The CLUSTAL_X windows interface: flexible strategies for 129–139. multiple sequence alignment aided by quality analysis tools. Nucleic Ac- Holland, L. Z. 2005a. Non-neural ectoderm is really neural: evolution of ids Res. 25: 4876–4882. developmental patterning mechanisms in the non-neural ectoderm of Tsukui, T., et al. 1999. Multiple left-right asymmetry defects in Shh(-/-) chordates and the problem of sensory cell homologies. J. Exp. Zool. B mutant mice unveil a convergence of the shh and retinoic acid Mol. Dev. Evol. 304: 304–323. pathways in the control of Lefty-1. Proc.Natl.Acad.Sci.USA96: Holland, N. D. 2005b. Chordates. Curr. Biol. 15: R911–R914. 11376–11381. 404 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006

Vasiliou, V., Bairoch, A., Tipton, K. F., and Nebert, D. W. 1999. Eukaryo- Wada, H. 2001. Origin and evolution of the neural crest: a hypothetical tic aldehyde dehydrogenase (ALDH) genes: human polymorphisms, and reconstruction of its evolutionary history. Dev Growth Differ 43: 509– recommended nomenclature based on divergent evolution and chromo- 520. somal mapping. Pharmacogenetics 9: 421–434. Wall,D.P.,Fraser,H.B.,andHirsh,A.E.2003.Detectingputativeor- Vermot, J., and Pourquie, O. 2005. Retinoic acid coordinates somit- thologs. Bioinformatics 19: 1710–1711. ogenesis and left–right patterning in vertebrate embryos. Nature 435: Yoshida, A., Rzhetsky, A., Hsu, L. C., and Chang, C. 1998. Human al- 215–220. dehyde dehydrogenase gene family. Eur. J. Biochem. 251: 549–557.

APPENDIX Table A1. Accession numbers and information related to the Aldh sequences used in this study

Species genome, size (coverage) Name Accession number1 Genomic information Saccoglossus kowalevskii,2 1100 Mb ( À ) Aldh2 This work F Aldh1a1/2/3a This work F Aldh1a1/2/3b This work F Aldh1a1/2/3c This work F Aldh1a1/2/3d This work F Aldh1a1/2/3e This work F Strongylocentrotus purpuratus,2 800 Mb (6x) Aldh2a This work and XP_786833 Trace file database and NW_719387 Aldh2b This work and XP_786787 Trace file database and NW_719387 Aldh1l This work and XP_784777 Trace file database Oikopleura dioca,2 72 Mb (9x) Aldh2 This work Trace file database Ciona intestinalis,3,4 160 Mb (8.2x) Aldh2 ci0100140576 sc 184 Aldh1a1/2/3a ci0100144540 sc18 Aldh1a1/2/3b ci0100149806 sc 112 Aldh1a1/2/3c ci0100132432 sc 112 Aldh1a1/2/3d ci0100136702 sc 112 Aldh1l ci0100147819 sc 1 Ciona savignyi,5 180 Mb (13x) Aldh2 SINCSIG00000043357 sc 360 Aldh1a1/2/3a SINCSIG00000051674 sc 329 Aldh1a1/2/3bc1 SINCSIG00000070704 sc 76 Aldh1a1/2/3bc2 SINCSIG00000067716 sc 76 Branchiostoma floridae,2 600 Mb (13x) Aldh2 This work Trace file database Aldh1a1/2/3a This work Trace file database Aldh1a1/2/3b This work Trace file database Aldh1a1/2/3c This work Trace file database Aldh1a1/2/3d This work Trace file database Aldh1a1/2/3e This work Trace file database Aldh1a1/2/3f This work Trace file database Aldh1l This work Trace file database Danio rerio,2 1700 Mb (6.5–7x) Aldh2a NP_956784 LG5, Zv5_scaffold1383 Aldh2b NP_998466 LG5, Zv5_scaffold1383 Aldh1a2 NP_571925 LG7 Aldh1a3 This work (DQ300198) Zv5_sc1492, 3955 and NA2068 Aldh1l1 XP_687196 NW_644648 Aldh1l2 XP_700926 NW_652559 Takifugu rubripes,5 365 Mb (5.7x) Aldh2 SINFRUP00000070759 sc 3571 Aldh1a2 BAE20172 sc 233 Aldh1a3 This work sc 1420 and sc 4033 Aldh1l SINFRUP00000060897 sc 1384 Aldh1l2 SINFRUP00000057229 sc 1786 Xenopus tropicalis,3 1500 Mb (7.65x) Aldh2 fgenesh1_pg.C_scaffold_501000010 sc 501 Aldh1b1 fgenesh1_pm.C_scaffold_153000014 sc 153 Aldh1a1 fgenesh1_pg.C_scaffold_982000009 sc 982 Aldh1a2 fgenesh1_pg.C_scaffold_297000032 sc 297 Aldh1a3 fgenesh1_pg.C_scaffold_208000041 sc 208 Aldh1l1 fgenesh1_kg.C_scaffold_368000004 sc 368 Aldh1l2 fgenesh1_pg.C_scaffold_434000022 sc 434 Canestro et al. Retinoic acid and chordate origins 405

Table A1. Continued Species genome, size (coverage) Name Accession number1 Genomic information Gallus gallus,2 1200 Mb (6.6x) Aldh2 XM_415171 Chr 15 Aldh1a1 X58869 Chr Z Aldh1a2 NM_204995 Chr 10 Aldh1a3 NM_204669 Chr 10 Aldh1l XP_416314 Chr 1 Ratus norvegicus,2 2800 Mb Aldh2 NM_032416 12q16 Aldh1b1 NM_001011975 5q22 Aldh1a1 NM_022407 1q51 Aldh1a2 NM_053896 8q24 Aldh1a3 NM_153300 1q22 Aldh1a4 NM_017272 1q51 Aldh1l NP_071992 F Mus musculus,2 2600 Mb Aldh2 NM_009656 5F-G1 Aldh1b1 NP_082546 4B2 Aldh1a1 NM_013467 19B (12.0 cM) Aldh1a2 NM_009022 9D (42.0 cM) Aldh1a3 NM_053080 7C Aldh1a7 NP_036051 19B (20.0 cM) Aldh1l1 NP_081682 6 D1 Aldh1l2 NP_705771 10 C1 Homo sapiens,2 2900 Mb Aldh2 NM_000690 12q24.2 Aldh1B1 NM_000692 9p11.1 Aldh1A1 AF003341 9q21.13 Aldh1A2 NM_003888 15q21.3 Aldh1A3 NM_000693 15q26.3 Aldh1L1 NP_036322 3q21.2 Aldh1L2 XP_090294 12q23.3

1The NCBI accession numbers for the genomic trace sequences of nonassembled genomes and ESTs used in this work, and the protein alignments can be found at http://www.neuro.uoregon.edu/postle/OikOR/publications.htm or http://www.ub.edu/genetica/evolucioen/albalatpub.htm 2NCBI: www.ncbi.nlm.nih.gov 3JGI: www.jgi.doe.gov\par 4Ghost: ghost.zool.kyoto-u.ac.jp/indexr1.html 5ENSEMBL: www.ensembl.org. S. kowalevskii sequences have been deduced from EST data since genomic information is not available (F). The genome size and the depth coverage of the genomes represented in the genomic databases are indicated. In the case of B. floridae and O. dioica, the depth coverage has been estimated from the total number of trace sequences publicly available and considering an average size of 0.9 kb per trace sequence. Genomic and EST sequences have been obtained from publicly available databases.

Table A2. Percentage of sequence identity among members of the CYP26, CYP51, and CYP4 families

CYP26 1 HsCYP26A 2 HsCYP26B 46 3 HsCYP26C 48 56 4 CiCyp26a 39 40 41 5 CiCyp26b 33 32 30 30 6 SkCyp26 41 40 39 37 31 7 SpCyp26 38 38 35 32 30 41 CYP518HsCYP5121202020172121 9 SpCyp51 20 20 20 20 18 20 21 73 CYP4 10 HsCYP4V2 21 18 20 21 17 20 18 19 18 11 SpCyp4 20 20 20 23 20 21 19 19 16 51 12345678 9 10 11

Nomenclature is as in Fig. 1. Values are based on the protein alignment used for the evolutionary tree. Nonchordate sequences are in bold. 406 EVOLUTION&DEVELOPMENT Vol. 8, No. 5, September^October 2006

Table A3. Percentage of sequence identity among members of the RAR, THR, and RXR families

RAR 1 HsRARA 2 HsRARB 93 3 HsRARG 89 90 4 BfRar 73 73 72 5 CiRar 70 71 70 63 6 SpRar 65 65 65 62 59 THR 7 HsTHRA 46 47 45 44 42 47 8 HsTHRB44454343414888 9SpThr 55 55 56 53 52 53 70 70 RXR 10 HsRXRA 37 38 38 35 38 34 33 31 50 11 HsRXRB 37 37 38 34 36 34 32 31 49 92 12 HsRXRC 39 40 39 36 38 34 32 30 51 90 88 13 SpRxr 36 37 37 34 35 33 31 28 45 82 80 77 123456 789 10 11 12 13

Nomenclature is as in Fig. 1. Values are based on the protein alignment used for the evolutionary tree. Nonchordate sequences are in bold.

Table A4. Percentage of sequence identity among members of the Aldh1A subfamily and Aldh2 family in deuterostomes

ALDH1A 1HsALDH1A1 2HsALDH1A274 3HsALDH1A37274 4 BfAldh1a1/2/3a 68 68 70 5 BfAldh1a1/2/3b 67 71 68 72 6 BfAldh1a1/2/3c 66 66 65 71 78 7 BfAldh1a1/2/3d 67 66 65 70 78 90 8 BfAldh1a1/2/3e 70676571757372 9 BfAldh1a1/2/3f 6866656973696983 10CiAldh1a1/2/3a 596361585958566159 11CiAldh1a1/2/3b 62595860616160605956 12CiAldh1a1/2/3c 6360586061626061595786 13CiAldh1a1/2/3d 565856585554545957545859 14CsAldh1a1/2/3a 59636158595757615989555654 15CsAldh1a1/2/3bc16159586160605960595683805853 16CsAldh1a1/2/3bc26159586160605960595583805853100 17 SkALDH1a1/2/3a 70 66 68 68 74 70 70 74 73 60 61 61 57 61 62 62 18 SkALDH1a1/2/3b 64 62 63 67 64 63 62 64 62 57 53 56 55 58 59 58 76 19 SkALDH1a1/2/3c 73 70 70 71 75 71 71 74 74 62 63 65 58 64 63 63 77 76 20 SkALDH1a1/2/3d 68 64 63 65 69 64 66 67 65 57 57 58 53 58 59 59 73 72 80 21 SkALDH1a1/2/3e 68 64 64 67 67 66 67 67 68 55 56 58 53 57 59 59 77 73 82 80 ALDH2 22HsALDH2 696967697067677169616161586261616868716666 23HsALDH1B1 65676567706466686856585957566060676467626474 24BfAldh2 6667656868646469665757575559575769656761617267 25 CiAldh2 68 68 65 69 71 66 66 71 68 58 61 61 58 57 62 62 70 64 71 64 62 74 70 71 26CsAldh2 68676468706566716858606057576160686470636173697095 27OdAldh2 6565646765616064645756565657585864626659567364696968 28 SkAldh2 68 69 65 70 70 68 69 72 71 58 60 61 57 59 60 60 73 66 72 66 65 76 71 73 76 74 72 29 SpAldh2a 66 67 64 67 68 67 66 69 67 57 57 58 55 57 59 58 69 68 67 62 63 73 70 73 70 69 67 73 30 SpAldh2b 67 68 64 67 73 67 66 68 70 59 58 58 57 57 58 58 68 65 69 64 65 72 67 68 69 67 65 68 71 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Human Cephalochordate Urochordate Hemichordate Deuterostome ALDH1A Aldh1/2/3s Aldh1a1/2/3s Aldh1a1/2/3s ALDH2s

Nomenclature is as in Fig. 1. Values are based on the protein alignment used for the evolutionary tree. Nonchordate sequences are in bold.