<<

Insights into Cyclostome Phylogenomics: Pre-2R or Post-2R?

Shigehiro Kuraku*

Lehrstuhl für Zoologie und Evolutionsbiologie, Department of Biology, University of Konstanz, Universitätsstrasse 10, 78457 Konstanz, Germany

Interest in understanding the transition from prevertebrates to at the molecular level has resulted in accumulating genomic and transcriptomic sequence data for the earliest groups of extant vertebrates, namely, hagfishes (Myxiniformes) and lampreys (Petromyzontiformes). Mol- ecular phylogenetic studies on species phylogeny have revealed the monophyly of cyclostomes and the deep divergence between hagfishes and lampreys (more than 400 million years). In parallel, recent molecular phylogenetic studies have shed light on the complex of the cyclostome . This consists of whole genome duplications, shared at least partly with gnathostomes (jawed vertebrates), and cyclostome lineage-specific secondary modifications of the genome, such as gains and losses. Therefore, the analysis of cyclostome requires caution in dis- tinguishing between orthology and paralogy in gene molecular phylogeny at the scale, as well as between apomorphic and plesiomorphic genomic traits in larger-scale analyses. In this review, we propose possible ways of improving the resolvability of these evolutionary events, and discuss probable scenarios for cyclostome genome evolution, with special emphasis on the hypothesis that two-round (2R) genome duplication events occurred before the divergence between cyclostomes and gnathostomes, and therefore that a post-2R state is a genomic synapomorphy for all extant vertebrates.

Key words: hagfish, , orthology, hidden paralogy, long branch attraction, whole genome duplica- tion

lution, abundant sequence resources have enabled various INTRODUCTION kinds of evolutionary information to be extracted, such as Hagfishes (Myxiniformes) and lampreys (Petromyzo- phylogenetic relationships, evolutionary time scales, and ntiformes) hold basal phylogenetic positions as the earliest gains/losses of . In particular, the evolution of gene groups of extant vertebrates, and they have been analyzed repertoires has had an impact on comparative analyses of from various viewpoints to understand the transition from gene function, including the regulatory gene network that prevertebrates to vertebrates at the molecular level (e.g., governs development, physiology, and other biological pro- Kuratani et al., 2002; Lamb et al., 2007; Osorio and Retaux, cesses. Decades of molecular studies have shown that many 2008). Currently, the National Center for Biotechnology well-studied genes have similar copies between species as Information (NCBI) sequence database has 124,029 and well as within species. Indispensable terms to characterize 24,521 entries of nucleotide sequences, including expre- these gene copies evolutionarily, namely, ‘orthology’ and ssed sequence tags (ESTs), for species in the orders ‘paralogy,’ were originally introduced in the early 1970s Petromyzontiformes and Myxiniformes, respectively (and (Fitch, 1970) and later described as follows (Fitch, 2000): 3006 and 652 entries for annotated protein sequences, “Orthology is that relationship where sequence diver- respectively, as of March 26, 2008). Although many of these gence follows speciation, that is, where the common database entries represent limited types of gene families ancestor of the two genes lies in the cenancestor of the (e.g., genes encoding homeodomain-containing transcrip- taxa from which the two sequences were obtained. This tion factors, antigen-recognition proteins involved in the gives rise to a set of sequences whose true phylogeny is adaptive immune system, and so on), the current collection exactly the same as the true phylogeny of the organisms of annotated cyclostome genes is providing a rough but from which the sequences were obtained. Only ortho- insightful overview into understanding the evolutionary prop- logous sequences have this property. Paralogy is erties of cyclostome genomes. defined as that condition where sequence divergence fol- In light of the theories and knowledge of molecular evo- lows . Such genes might descend and diverge while existing side by side in the same lineage.” * Corresponding author. Phone: +49-7531-88-2763; Fax : +49-7531-88-3018; In general, recognition of orthology and paralogy is not E-mail: [email protected] straightforward when the genomic evolution of a species in question has experienced a series of complicated events 961

(Fitch, 2000). In discussing early evolution, the cyclostomes and gnathostomes. However, it has consis- closest attention should be paid to this, because two rounds tently been reported that the ancestor of the Myxiniformes of genome duplications occurred, which resulted, for and Petromyzontiformes diverged shortly (up to 100 million example, in four clusters observed in non-teleost years) after the cyclostome and gnathostome lineages split gnathostomes, such as mammals, chicken, Xenopus, and (summarized in Kuraku et al., 2008a). In terms of the evolu- chondrichthyans (reviewed in Kuraku and Meyer, 2008). It tionary time that has elapsed, it would not be surprising has also been proposed that a large-scale duplication event even if we were to identify differences between the genomes occurred in the cyclostome lineage (see below). of hagfishes and lampreys that were as large as those Importantly, the above definitions of orthology and observed in a comparison between the genomes of paralogy do not include any properties of gene function. For Mammalia and Chondrichthyes. some cyclostome genes, changes in expression patterns BASIC GENOMIC PROPERTIES are described as possible factors explaining the morpho- logical differences between lamprey and gnathostomes Karyotypes (e.g., Shigetani et al., 2002; Uchida et al., 2003; Hammond Many cytogenetic observations of cyclostome genomes and Whitfield, 2006). Apart from cyclostomes, many more were described in the 1970s (Potter and Rothwell, 1970; studies highlight dynamic changes in gene expression pat- Potter and Robinson, 1971; Robinson et al., 1975). In con- terns among orthologs during vertebrate evolution (e.g., trast to the relatively small number of of Locascio et al., 2002; Kuraku et al., 2005). To conduct hagfishes (2n=14–48), most lamprey species have more reasonable evolutionary studies, any comparative analysis than 150 chromosomes (Fig. 1B) (original data were regarding gene expression patterns and functions should retrieved from the Genome Size Database, http:// follow the solid characterization of the phylogenetic nature of www.genomesize.com). C-values (genome sizes) of hag- genes: orthology/paralogy should be clarified independently fishes range from 2.29 to 4.59 pg, whereas those of lam- of any functional property of genes. preys range from 1.29 to 2.44 pg (Fig. 1B). Judging from the In this review, from the viewpoint of molecular evolution/ very small size of chromosomes and normal C-values in phylogeny and genome informatics, current knowledge and lampreys, the uniqueness of lamprey karyotypes is thought perspectives are summarized to provide a better link to have been caused mainly by successive Robertsonian between genomic properties and phenotypic evolution. fissions in the lamprey lineage (Robertson, 1916; Sumner, 2003). PHYLOGENY Monophyly of cyclostomes Noncoding and repetitive landscape Although the taxonomic term Cyclostomata was first Although the paucity of genomic sequences, especially introduced in the early 19th century (Duméril, 1806), many in hagfishes, prevents the analysis of general genomic prop- subsequent studies on morphology regarded only hagfishes erties, there are some implications based on transcriptomic as the earliest branching group, taking lampreys as the true data. By analyzing the GC-content of four-fold degenerate sister taxon of the gnathostomes (Janvier, 1996; see also sites (GC4) in protein-coding regions, it has been shown that Ota et al., 2007). However, the monophyly of cyclostomes lampreys (both northern- and southern-hemisphere species) was first supported by in the early have high levels of GC4 (70–90%), whereas hagfishes have 1990’s (Stock and Whitt, 1992). Many molecular phyloge- moderate levels of GC4 (40–60%) (Kuraku and Kuratani, netic studies have since supported the monophyly of cyclos- 2006; Kuraku et al., unpublished observations). Currently tomes using ribosomal DNA (Mallatt and Sullivan, 1998; available genomic sequences of lampreys have revealed Mallatt and Winchell, 2007), mitochondrial DNA (Delarbre et that the extraordinarily high GC-content in protein-coding al., 2002), and protein-coding genes in the nuclear genome regions, which is represented by GC4, is not a reflection of (Kuraku et al., 1999; Takezaki et al., 2003; Blair and global genomic base composition (Table 1); rather, it is Hedges, 2005; Delsuc et al., 2006) (Fig. 1A). The mono- probably because of highly biased codon usage in lampreys. phyly of cyclostomes can be regarded as one of the most Other evidence obtained in transcriptome analysis clear-cut examples in which molecular phylogenetics has shows the existence of a short genomic element that might succeeded in updating phylogenetic relationships based on have spread throughout the lamprey genome (designated nonmolecular traits (Meyer and Zardoya, 2003). ‘lamprino’; GenBank accession number AB425244). This element was found in the 3’ untranslated region of the Time scale -containing transcription factor gene LjHox13α In light of the robust support for cyclostome monophyly, (Kuraku et al., 2008b). Similar sequences have been identi- one can estimate the divergence time between Myxini- fied in many untranslated regions of lamprey ESTs (how- formes and Petromyzontiformes (these two taxa form mono- ever, not in those of hagfish ESTs). The lamprey genome phyletic groups). By incorporating some fossil records for also contains a transcribed and translated sequence (found extinct cyclostomes, the relaxed molecular clock analysis, in AF464190) that has high similarity to Tc1-like transposase which is currently frequently used to estimate divergence identified in salmonids (de Boer et al., 2007). It is highly times, provides an estimate that this divergence occurred likely that this gene was horizontally transferred from the 520–430 million years ago (Hedges, 2001; Blair and host by parasitism. Moreover, it has been shown that the Hedges, 2005; Kuraku and Kuratani, 2006; summarized in lamprey genome contains at least dozens of the microRNAs Kuraku et al., 2008a) (Fig. 1A). Estimates vary largely (miRNAs) already reported for other model vertebrates depending on the assumed date of the divergence between (Heimberg et al., 2008). 962

Fig. 1. Molecular-based phylogeny and genomic properties of cyclostomes. (A) Phy- logenetic relationships of extant cyclos- tomes and estimated divergence times along the geological time scale. See Kuraku and Kuratani (2006) for details of divergence times within Myxiniformes and Petromyzon- tiformes. The divergence time between Myx- iniformes and Petromyzontiformes (482 million years ago) is the average of values obtained from available reports (see Kuraku et al., 2008a for details). The phylogenetic relationships of southern hemisphere lam- preys ( and ) and their divergence times are based on unpublished observations by the author and collabora- tors. No molecular sequence data are avail- able for reliable phylogenetic analyses of the hagfish genera Notomyxine, Nemamyxine, Neomyxine, and Quadratus, and the lam- prey genera , Caspiomyzon, , and . (B) Karyotypes and C-values of cyclostomes. Data were retrieved from the Animal Genome Size Database (www.genomesize.com) and are shown only for species for which infor- mation was available. Tree topology and branch lengths are based on the phyloge- netic relationships and time scale shown in A.

assume that a gene duplication has occurred in the cyclos- PITFALLS IN CYCLOSTOME GENE PHYLOGENY tome lineage, because of a potential ‘hidden paralogy’ (see Orthology/paralogy Gribaldo and Philippe, 2002 for this term) (Fig. 2C). In con- Because of insufficient gene identification in cyclos- trast, even when a hagfish gene seems to have diverged tomes and possible secondary gene gains/losses in their first because of a gene duplication, it is still probable that lineage (discussed below), orthology/paralogy between this topology is caused by the rapid evolutionary rate of the multiple cyclostome genes should be treated with caution. hagfish sequence and that this rapidly evolving hagfish gene When more than two genes of the same cyclostome species is orthologous to the lamprey gene (Fig. 2D). Otherwise, this cluster together, it is highly likely that they are paralogous is considered a result of secondary independent gene (Fig. 2A). When a pair of hagfish and lamprey genes forms losses in the lineages leading to lampreys and gnathos- a cluster, this is regarded as a reflection of species phy- tomes (Fig. 2E). As exemplified here, the molecular phylog- logeny representing cyclostome monophyly (Fig. 2B). enies of cyclostome genes involve various difficult issues. However, it is also possible that they are paralogous if we Even when we extend our scope to the genome-wide level, 963

Table 1. GC-content of reported cyclostome genomic sequences ommended that cyclostome genes be named with care so over 10 kb in length. that there is no confusion in understanding the phylogeny of

length these genes (e.g., the bone morphogenetic protein [BMP] Species Annotation Acc. No. GC (%) (kb) genes BMP2/4-A, BMP2/4-B, and BMP2/4-C of lampreys in Hagfishes the BMP2/4 subfamily of the TGFβ gene family [McCauley Mg ParaHox region, BAC90C8 EU122193 145.0 46 and Bronner-Fraser, 2004]). Eb ParaHox region, BAC7-H10 EU122194 103.4 44 Eb VLR-A gene AY965678 43.4 42 PHYLOME Eb VLR-B gene AY965679 92.0 42 Cyclostome lineage-specific gen(om)e duplication(s) Es VLR-A gene AY965680 81.7 42 In the process of tree building using molecular Es VLR-B gene AY965681 76.8 45 sequences, cyclostome genes, which often have elevated Lampreys evolutionary rates, potentially tend to form clusters because Pm VLR gene AY577941 57.3 46 Pm VLR gene AY577942 58.2 47 of a technical artifact called long branch attraction (LBA) Pm ABCB9-like gene AH012171 14.5 46 (Felsenstein, 1978; Philippe et al., 2005). The idea of Pm HoxW10a gene AF464190 29.6 51 cyclostome lineage-specific genome duplication(s) has been Pm CD45 gene DQ008073 37.6 43 mainly based on analyses of Hox genes, which suggests Pm UNK clone CH303-4_3B5 AC182744 55.2 55 that at least one of the multiple Hox gene clusters in cyclos- Pm UNK clone CH303-4_3A4 AC182746 36.7 47 tomes is derived from a cyclostome lineage-specific cluster Pm UNK clone CH303-4_3B4 AC182747 60.2 46 duplication event that the gnathostome lineage did not expe- Pm UNK clone CH303-4_3B11 AC182729 77.9 42 rience (Force et al., 2002; Irvine et al., 2002; Fried et al., Pm UNK clone CH303-4_3A12 AC182728 113.9 45 2003; Stadler et al., 2004). This is because we often Pm UNK clone CH303-4_3C11 AC182745 137.7 41 observe a flock of cyclostome genes in molecular phyloge- Pm UNK clone CH303-4_3D5 AC182743 142.0 44 Pm UNK clone CH303-4_3C12 AC182742 148.5 38 netic trees (designated here as ‘cyclogs’), in addition to a Pm UNK clone CH303-4_3D9 AC182725 151.7 45 flock of multiple gnathostome paralogs. Possible cyclostome Pm UNK clone CH303-4_3D4 AC182727 157.5 46 lineage-specific gene duplications have also been observed Pm UNK clone CH303-4_3D2 AC182726 179.5 46 in other homeobox-containing genes, such as Dlx and Lj VLR gene 5’ LRRCT segment AB275449 10.6 50 Lj VLR-B gene AB272084 12.9 44 Lj VLR-A gene AB272083 40.2 45 Lj MASP gene AB078894 11.1 52 Abbreviations for species name: Mg, Myxine glutinosa; Eb, Eptatretus burgeri; Es, Eptatretus stoutii; Pm, Petromyzon marinus; Lj, japonicum.

gene orthology/paralogy should be recognized as an indis- pensable element with which to elucidate the history of cyclostome genome evolution (see Sicheritz-Ponten and Andersson, 2001; Huerta-Cepas et al., 2007 for attempts to practice genome-wide phylogenetic analysis, designated ‘phylome’).

Gene naming Confusion in gene naming is a notorious artificial factor that can potentially prevent the reasonable integration of the phylogenetic nature of genes and their functional properties. It is highly recommended that newly identified genes be des- ignated based on careful phylogenetic diagnosis. This pro- Fig. 2. Possible examples of orthology/paralogy between cyclos- cedure is achieved by a preliminary step involving homology tome genes. (A) Probable paralogy between two lamprey genes search tools, such as BLAST (Altschul et al., 1997) (note (‘cyclog’; see text). It is also possible that this topology is an artifact that BLAST detects only ‘similarity,’ not ‘homology’) and produced by long-branch attraction if the two genes that duplicated subsequent elaborate phylogenetic analysis. For cyclostome before the cyclostome-gnathostome split evolved rapidly. (B) Pro- genes, which often have elevated evolutionary rates, it is bable orthology between hagfish and lamprey genes. (C) Hidden strongly recommended to employ molecular phylogenetic paralogy between hagfish and lamprey genes. If we assume that a methods that are less susceptible to biases in among- gene duplication occurred in the cyclostome lineage in B, the hag- fish-lamprey cluster can be regarded as a pair of paralogous genes. lineage and also among-site rate variation (Felsenstein, (D) Artificial paralogy between hagfish and lamprey genes. This can 1978; Felsenstein, 1996; Philippe et al., 2005; summarized be caused by an elevated evolutionary rate due to long-branch in Whelan et al., 2001; Holder and Lewis, 2003; Felsenstein, attraction (LBA) of the hagfish sequence, even if these hagfish and 2004). By taking into account the ambiguous signals of lamprey genes are orthologous to each other. (E) Another explana- orthology between cyclostomes and gnathostomes that are tion for early branching of cyclostome genes. The topology is similar frequently observed and the possibility of cyclostome lineage- to that of D; however, the topology in E is due to the loss of lamprey specific gene duplications (discussed below), it is also rec- and gnathostome genes. 964

Engrailed (Neidert et al., 2001; Matsuura et al., 2008). Hughes, 1999; Friedman and Hughes, 2001; Hughes and Homeobox-containing genes, however, do not provide a Friedman, 2003 for a negative view about the WGDs per se; robust conclusion in molecular phylogeny because of the see also Gregory, 2005 for alternative modes of large-scale short length of their alignable regions. In this sense, other duplication). However, under the assumption that there were gene families that can potentially provide more robust phy- 2R WGDs, as has been more recently reported (Dehal and logeny will be better references. For example, gene duplica- Boore, 2005), there are three possible scenarios for tions specific to the cyclostome lineages are also observed explaining the timing of WGDs relative to the cyclostome- in the BMP2/4 gene subfamily (McCauley and Bronner- gnathostome split (Fig. 3). Hypothesis A has not been Fraser, 2004). The ongoing genome sequ- explicitly proposed to date, whereas there are some reports encing project will give rise to more conclusive data on supporting hypotheses B (Pendleton et al., 1993; Sharman whether these possible cyclostome lineage-specific gene and Holland, 1998; Escriva et al., 2002; Force et al., 2002; duplications are technical artifacts, and if not, whether they Stadler et al., 2004) and C (Fried et al., 2003; Furlong et al., were caused by a genome-wide duplication event. The sea 2007). Because of the high level of inconsistency in gene lamprey genome sequencing project should also confirm phylogeny between gene families, there is no consensus on whether the transposable elements mentioned above largely which of these three scenarios is correct (reviewed in contributed to the expansion of ‘cyclogs’. Kuraku and Meyer, 2008). This difficulty is thought to have Importantly, even if there was at least one whole been caused by a combination of possible complicating fac- genome duplication (WGD) in the cyclostome lineage, it is tors, as listed below. most likely that the evolutionary path leading to gnathos- tomes independently experienced two-round (2R) genome • The WGDs, plus the cyclostome-gnathostome diver- duplications. This has been repeatedly confirmed by the gence, occurred within a short period of time existence of multiple (usually four, as in Hox gene clusters) (Larhammar et al., 2002; Horton et al., 2003). similar arrays of genes along chromosomes in humans and • A certain amount of absolute time (more than 500 mil- mice (reviewed in Kasahara, 2007). It should be empha- lion years) has elapsed since the WGDs. This is rela- sized that multiple gene copies observed in gnathostomes tively ancient compared with other well-studied WGD cannot be imputed to cyclostome lineage-specific events, events (e.g., those in the lineages of fungi, land plants, and thus cyclostome lineage-specific gen(om)e duplications teleosts, and Xenopus; reviewed in Wolfe, 2001). should be treated separately from the WGDs that resulted in • Many cyclostome genes studied to date are not ideal redundant gene repertoires in gnathostomes. genes for molecular phylogenetic analysis (short [e.g., homeobox genes], rapidly evolving, or highly suscep- Timing of WGDs: pre-2R or post-2R? tible to selection or small-scale gene duplications). It remains controversial whether the mode of genome • Cyclostome genes (especially hagfish genes) are duplication in the cyclostome-gnathostome split was a often rapidly evolving and thus can potentially prevent simple ‘2R’ pattern (Furlong and Holland, 2002; see also a robust molecular phylogenetic analysis.

Fig. 3. Possible scenarios for two-round whole genome duplications (WGDs). Under the assumption that there were two rounds of WGDs, three possible scenarios with different timings of WGDs (arrows) are shown (hypotheses A–C). Expected tree topologies for gene phylogeny are illustrated for an imaginary gene family comprising one invertebrate outgroup (Inv.), four gna- thostome (Gna.) genes (paralogs a–d), and an intact set of cyclostome (Cyc.) genes. Gene duplications are represented by white diamonds. The cyclostome- gnathostome divergence is represented by white cir- cles. At the bottom, possible artificial biases are shown with the direction of their misleading conclusion. Cyclostome lineage-specific gene duplications are not shown in this figure (see text). 965

Fig. 4. Previously reported cyclostome gene phylogenies and deduced evolutionary scenarios. (A) HMG-box containing SoxE genes (Zhang and Cohn, 2006; Zhang et al., 2006). (B) Clade A fibril genes (Iwabe and Miyata, 2002). (C) Kinesin KIF1 A/B/C genes. Gene duplica- tions that gave rise to gnathostome paralogs are represented by white diamonds. The cyclostome-gnathostome divergence is represented by a white circle. A gene duplication thought to have occurred in the cyclostome lineage is represented by a black diamond. A gene duplication whose timing cannot be determined by the tree topology is represented by a gray diamond. The timing of gene duplications observed in the upper trees are indicated by arrows in the species phylogeny at the bottom. Gnathostome genes are represented by the names of paralogs, and cyclostome genes are shown as combinations of abbreviations of species names and gene names. Abbreviations of species names: Lj, Lethenteron japonicum; Pm, Petromyzon marinus; Mg, Myxine glutinosa; Eb, Eptatretus burgeri; Lr, Lethenteron reissneri. In C, chromosomal localizations of human genes are also shown. For B, the phylogeny of cyclostome genes is based on McCauley and Bronner-Fraser (2006) (see also Zhang et al., 2006).

• Gene sampling in gnathostomes usually concentrates on bony vertebrates, and thus the earliest branching group in extant gnathostomes, namely, Chondrich- thyes, is not included. Therefore, genes in basal jawed vertebrates are usually represented by Xenopus laevis and teleost fishes, which underwent an additional genome duplication in each lineage (Wolfe, 2001). • Many cyclostome genes remain to be identified. • Species sampling for cyclostomes is not thorough (usually only one species is included in an analysis). • Intensive phylogenetic analysis (model selection [e.g. consideration of among-site rate heterogeneity], probabilistic tree search methods [e.g., maximum- likelihood, Bayesian methods], and statistical evalua- tion of alternative tree topologies obtained; see Holder and Lewis, 2003) has not yet been thoroughly applied to phylogenetic studies of cyclostome genes. • The results of different molecular phylogenetic analyses are evaluated by different criteria. In many cases, only the best tree is analyzed, without any sta- Fig. 5. Schematic illustration of the pan-vertebrate tetraploidization tistical examination. (PV4) hypothesis. A generalized gene tree of an imaginary gene family (lower) is shown with a species tree on which hypothesized To optimize molecular phylogenetic analysis for add- events in the evolution of gene repertoires are mapped (upper). Gene duplications that gave rise to gnathostome paralogs are repre- ressing this issue, one is expected to select appropriate sented by white diamonds. The cyclostome-gnathostome split gene families with unbiased taxon sampling. Apart from (white circles) is thought to have occurred before the duplicates gen- large-scale duplication events, such as WGDs, rampant erated in WGDs became fully neo- or subfunctionalized. Subse- small-scale gene duplications, such as tandem gene dupli- quently, possibly because of less functional constraint, cyclostome cations and retropositions, would also have occurred. These paralogs were secondarily duplicated to produce ‘cyclogs’ (repre- events occur in a gene family-specific manner. For example, sented by a black diamond), lost (X), or neofunctionalized (an elon- gated branch). These kinds of secondary modifications also the globin gene family has undergone a series of tandem occurred in the gnathostome lineage, but the secondary modifica- gene duplications during vertebrate evolution (Goodman et tions in the cyclostome lineage are often neglected because of al., 1975), and glycoprotein hormones (e.g., gonadotropin) excessive expectation that cyclostomes should have retained a have also undergone gene family-specific, small-scale gene genomic architecture ancestral for vertebrates. 966

Table 2. Molecular phylogenetic properties of the gene families shown in Fig. 4.

# of cyclostome Cyclostome # of gnathostome Supported Gene families Property paralogs identified lineage-specific gene paralogs hypothesis (A–C) so far* duplications observed ? SoxE (8/9/10) HMG-box containing TF 3 3 A No Clade A fibril collagen collagen 5 2 A or B Yes KIF 1 A/B/C kinesin 3 2 A No * This count is based on a parsimonious estimate, with an assumption of cyclostome monophyly. TF, transcription factor. duplications (Sower et al., 2006). Gene families that have esis (Fig. 5). For all three gene families shown in Fig. 4, undergone small-scale gene duplications should be treated gene duplications were observed more frequently before the with caution when assessing the timing of genome-wide cyclostome-gnathostome split than after it, suggesting duplication events. hypothesis A as a scenario for explaining the timing of For estimating the timing of WGDs, gene numbers in WGDs relative to the cyclostome-gnathostome split (see cyclostomes have been repeatedly used as markers. For Table 2). These gene families have longer alignment lengths example, a lower number of Dlx genes in lampreys than in and exhibit a greater level of sequence divergence among gnathostomes was regarded as evidence that lampreys orthologs and paralogs than most gene families on which diverged before at least one round of WGDs (Neidert et al., hypotheses B and C are based (see Fig. 3). It is likely that 2001; Donoghue and Purnell, 2005). However, lampreys the hypotheses suggested to date (hypotheses B and C) have at least six Dlx genes, as do non-teleost gnathostomes might have been produced by the technically difficult factors (Kuraku et al., 2008c). The many cases in which a lower mentioned above. This could be confirmed with elaborate number of genes has been identified in cyclostomes com- molecular phylogenetic analyses using larger data sets. pared with gnathostomes are expected to be updated by the SUMMARY identification of many more genes in whole genome sequ- encing. We should also take into account the possibility of When we take into account the difficult mentioned fac- secondary gene losses in the cyclostome lineage (see tors above regarding (1) the absolute and relative timing of Furlong et al., 2007 for an example of the hagfish ParaHox evolutionary events, (2) technical limitations, and (3) insuffi- gene cluster). Even in gnathostomes, after the quadruplica- cient data sets, cyclostome genomes can be recognized as tion of a single ancestral gene, many gene families have lost very challenging targets for evolutionary genomics. Whether at least one paralog produced in 2R WGDs. The most fre- cyclostomes diverged from the future gnathostome lineage quently observed ratio of gene numbers per gene family, before or after WGDs, cyclostome genomes seem to show which is derived from a single ancestral gene in the chor- relatively degenerate gene repertoires with unique features date ancestor, between invertebrates and non-teleost gna- in some respects (e.g., high GC-content in protein-coding thostomes is 1:2 (Furlong and Holland, 2002). For example, regions and high chromosome number in lampreys). It most subfamilies in the nuclear receptor gene families have would not be surprising, however, even if cyclostomes have a ratio of 1:3 (e.g., retinoic acid receptor [RAR]α, RARβ, and experienced a large amount of secondary modification to RARγ), whereas most subfamilies in the Wnt gene family their genomes, to discover large differences in genomic have a ratio of 1:2 (e.g., Wnt7a and Wnt7b; interestingly, content between cyclostomes and gnathostomes, which there are no subfamilies with the ratios of 1:3 or 1:4 in this diverged more than 500 million years ago, and also between gene family). Nevertheless, genome-wide identification of hagfishes and lampreys, diverged more than 400 million similar arrays of genes along multiple (usually up to four) years ago. In contrast, stable genomes of chondrichthyans chromosomal regions (Dehal and Boore, 2005) and detailed have been estimated by analyses of partial genomic sequ- analysis of conserved landmark gene clusters have led to ences of the ghost shark, Callorhinchus milii (Venkatesh et the idea of WGDs and the ratio of 1:4 to parsimoniously al., 2007; Yu et al., 2008; see also Kuraku and Meyer [2008] explain the difference in gene numbers between inverte- for the possible retention of the ancestral structure of the brates and non-teleost gnathostomes (reviewed in Kasahara, Hox gene clusters of gnathostomes by the horn shark, 2007). To obtain a clear image of genomic status in cyclos- Heterodontus francisci). Under the assumption that the tomes, this line of evidence on conserved syntenies to ana- post-WGD state is a genomic synapomorphy of all extant lyze ‘how’ intra-genome redundancy was introduced should vertebrates (hypothesis A in Fig. 3; Fig. 5), this contrast be accompanied by large-scale, elaborate molecular phylo- might reflect differences between the taxon that diverged genetic analyses to clarify ‘when’ such intra-genome redun- before genes produced in WGDs became fully sub- or neo- dancy was created. A lower number of gene repertoires in functionalized (namely, Cyclostomata) and the taxon that cyclostomes does not necessarily mean that they diverged diverged after many of them acquired the functions that are before the genome expansion: a tree-based approach is shared by diverse living gnathostomes (namely, Chondrich- indispensable. thyes). The differences between these two taxa in levels of Molecular phylogenetic studies have also been per- constraints acting on genes might have led to the difference formed for a number of non-homeobox gene families. Based in the frequency of gene retentions and duplicability. Thus, on observations in recent studies (Fig. 4), hypothesis A (Fig. determining orthology/paralogy based on solid molecular 3) is proposed here as an alternative scenario and is desig- phylogenetic evidence is a prerequisite for linking the evolu- nated as the ‘pan-vertebrate tetraploidization (PV4)’ hypoth- tion of gene repertoires with the evolution of gene functions 967 and phenotypes. Currently, a genome sequencing project Goodman M, Moore GW, Matsuda G (1975) Darwinian evolution in for the sea lamprey is underway. In terms of deep diver- the genealogy of hemoglobin. Nature 253: 603–608 gence time within cyclostomes, it will be interesting to com- Gregory TR (2005) The Evolution of the Genome. Elsevier, San pare this genome with the hagfish genome. Without their Diego genomes, we cannot discuss what defines vertebrate Gribaldo S, Philippe H (2002) Ancient phylogenetic relationships. Theor Popul Biol 61: 391–408 genomes. Hammond KL, Whitfield TT (2006) The developing lamprey ear ACKNOWLEDGMENTS closely resembles the zebrafish otic vesicle: otx1 expression can account for all major patterning differences. Development I thank Shigeru Kuratani, Kinya G. Ota, Yoko Takio, Rie 133: 1347–1357 Kusakabe, and Axel Meyer for valuable discussions. My gratitude Hedges SB (2001) Molecular evidence for the early history of living extends to Falk Hildebrand, who provided a lot of insight through his vertebrates. In “Major Events in Early Vertebrate Evolution” Ed computational assistance in the P. marinus genome sequence by PE Ahlberg, Taylor & Frances, London, pp 119–134 analysis. Heimberg AM, Sempere LF, Moy VN, Donoghue PC, Peterson KJ (2008) MicroRNAs and the advent of vertebrate morphological REFERENCES complexity. Proc Natl Acad Sci USA 105: 2946–2950 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new gen- Bayesian approaches. Nat Rev Genet 4: 275–284 eration of protein database search programs. Nucleic Acids Horton AC, Mahadevan NR, Ruvinsky I, Gibson-Brown JJ (2003) Res 25: 3389–3402 Phylogenetic analyses alone are insufficient to determine Blair JE, Hedges SB (2005) Molecular phylogeny and divergence whether genome duplication(s) occurred during early vertebrate times of deuterostome . Mol Biol Evol 22: 2275–2284 evolution. J Exp Zoolog B Mol Dev Evol 299: 41–53 de Boer JG, Yazawa R, Davidson WS, Koop BF (2007) Bursts and Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldon T (2007) The horizontal evolution of DNA transposons in the speciation of human phylome. Genome Biol 8: R109 pseudotetraploid salmonids. BMC Genomics 8: 422 Hughes AL (1999) Phylogenies of developmentally important pro- Dehal P, Boore JL (2005) Two rounds of whole genome duplication teins do not support the hypothesis of two rounds of genome in the ancestral vertebrate. PLoS Biol 3: e314 duplication early in vertebrate history. J Mol Evol 48: 565–576 Delarbre C, Gallut C, Barriel V, Janvier P, Gachelin G (2002) Com- Hughes AL, Friedman R (2003) 2R or not 2R: testing hypotheses of plete mitochondrial DNA of the hagfish, Eptatretus burgeri: the genome duplication in early vertebrates. J Struct Funct Genom- comparative analysis of mitochondrial DNA sequences strongly ics 3: 85–93 supports the cyclostome monophyly. Mol Phylogenet Evol 22: Irvine SQ, Carr JL, Bailey WJ, Kawasaki K, Shimizu N, Amemiya 184–192 CT, Ruddle FH (2002) Genomic analysis of Hox clusters in the Delsuc F, Brinkmann H, Chourrout D, Philippe H (2006) sea lamprey Petromyzon marinus. J Exp Zool 294: 47–62 and not cephalochordates are the closest living relatives of ver- Iwabe N, Miyata T (2002) Kinesin-related genes from diplomonad, tebrates. Nature 439: 965–968 sponge, amphioxus, and cyclostomes: divergence pattern of Donoghue PC, Purnell MA (2005) Genome duplication, extinction kinesin family and evolution of giardial membrane-bounded and vertebrate evolution. Trends Ecol Evol 20: 312–319 organella. Mol Biol Evol 19: 1524–1533 Duméril AMC (1806) Zoologie Analytique, ou Méthode Naturelle de Janvier P (1996) Early Vertebrates. Clarendon Press, Oxford Classification des Animaux, Rendue plus Facile a l’Aide de Kasahara M (2007) The 2R hypothesis: an update. Curr Opin Immu- Tableaux Synoptiques par. Allais, Paris nol 19: 547–552 Escriva H, Manzon L, Youson J, Laudet V (2002) Analysis of lam- Kuraku S, Kuratani S (2006) Time scale for cyclostome evolution prey and hagfish genes reveals a complex history of gene dupli- inferred with a phylogenetic diagnosis of hagfish and lamprey cations during early vertebrate evolution. Mol Biol Evol 19: cDNA sequences. Zool Sci 23: 1053–1064 1440–1450 Kuraku S, Meyer A (2008) The evolution and maintenance of Hox Felsenstein J (1978) Cases in which parsimony or compatibility gene clusters in vertebrates and the teleost-specific genome methods will be positively misleading. Syst Zool 27: 401–410 duplication. Int J Dev Biol: in press Felsenstein J (1996) Inferring phylogenies from protein sequences Kuraku S, Hoshiyama D, Katoh K, Suga H, Miyata T (1999) Mono- by parsimony, distance, and likelihood methods. Methods phyly of lampreys and hagfishes supported by nuclear DNA- Enzymol 266, 418–427 coded genes. J Mol Evol 49: 729–735 Felsenstein J (2004) Inferring Phylogenies. Sinauer Associates, Kuratani S, Kuraku S, Murakami Y (2002) Lamprey as an evo-devo Sunderland, MA model: lessons from comparative embryology and molecular Fitch WM (1970) Distinguishing homologous from analogous pro- phylogenetics. Genesis 34: 175–183 teins. Syst Zool 19: 99–113 Kuraku S, Usuda R, Kuratani S (2005) Comprehensive survey of Fitch WM (2000) Homology: a personal view on some of the prob- carapacial ridge-specific genes in turtle implies co-option of lems. Trends Genet 16: 227–231 some regulatory genes in carapace evolution. Evol Dev 7: 3–17 Force A, Amores A, Postlethwait JH (2002) Hox cluster organization Kuraku S, Ota KG, Kuratani S (2008a) Cyclostomata. In “Timetree in the jawless vertebrate Petromyzon marinus. J Exp Zool 294: of Life” Ed by SB Hedges, S Kumar, Oxford University Press, 30–46 New York, in press Fried C, Prohaska SJ, Stadler PF (2003) Independent Hox-cluster Kuraku S, Takio Y, Tamura K, Aono H, Meyer A, Kuratani S (2008b) duplications in lampreys. J Exp Zool B 299: 18–25 Noncanonical roles of Hox14 revealed by its expression pat- Friedman R, Hughes AL (2001) Pattern and timing of gene duplica- terns in lamprey and shark. Proc Natl Acad Sci USA 105: 6679– tion in animal genomes. Genome Res 11: 1842–1847 6683 Furlong RF, Holland PW (2002) Were vertebrates octoploid? Philos Kuraku S, Meyer A, Kuratani S (2008c) Timing of genome duplica- Trans R Soc Lond B 357: 531–544 tions relative to the origin of the vertebrates: did cyclostomes Furlong RF, Younger R, Kasahara M, Reinhardt R, Thorndyke M, diverge before, or after? Mol Biol Evol (doi:10.1093/molbev/ Holland PW (2007) A degenerate ParaHox gene cluster in a msn222) degenerate vertebrate. Mol Biol Evol 24: 2681–2686 Lamb TD, Collin SP, Pugh EN Jr (2007) Evolution of the vertebrate 968

eye: opsins, photoreceptors, retina and eye cup. Nat Rev Neu- Sharman AC, Holland PW (1998) Estimation of Hox gene cluster rosci 8: 960–976 number in lampreys. Int J Dev Biol 42: 617–620 Larhammar D, Lundin LG, Hallbook F (2002) The human Hox-bear- Shigetani Y, Sugahara F, Kawakami Y, Murakami Y, Hirano S, ing chromosome regions did arise by block or chromosome (or Kuratani S (2002) Heterotopic shift of epithelial–mesenchymal even genome) duplications. Genome Res 12: 1910–1920 interactions in vertebrate jaw evolution. Science 296: 1316– Locascio A, Manzanares M, Blanco MJ, Nieto MA (2002) Modularity 1319 and reshuffling of Snail and Slug expression during vertebrate Sicheritz-Ponten T, Andersson SG (2001) A phylogenomic evolution. Proc Natl Acad Sci USA 99: 16841–16846 approach to microbial evolution. Nucleic Acids Res 29: 545– Mallatt J, Sullivan J (1998) 28S and 18S rDNA sequences support 552 the monophyly of lampreys and hagfishes. Mol Biol Evol 15: Sower SA, Moriyama S, Kasahara M, Takahashi A, Nozaki M, 1706–1718 Uchida K, Dahlstrom JM, Kawauchi H (2006) Identification of Mallatt J, Winchell CJ (2007) Ribosomal RNA genes and deuteros- sea lamprey GTHbeta-like cDNA and its evolutionary implica- tome phylogeny revisited: more cyclostomes, elasmobranchs, tions. Gen Comp Endocrinol 148: 22–32 reptiles, and a brittle star. Mol Phylogenet Evol 43: 1005–1022 Stadler PF, Fried C, Prohaska SJ, Bailey WJ, Misof BY, Ruddle FH, Matsuura M, Nishihara H, Onimaru K, Kokubo N, Kuraku S, Kusakabe Wagner GP (2004) Evidence for independent Hox gene dupli- R, Okada N, Kuratani S, Tanaka M (2008) Identification of four cations in the hagfish lineage: a PCR-based gene inventory of Engrailed genes in the Japanese lamprey, Lethenteron Eptatretus stoutii. Mol Phylogenet Evol 32: 686–694 japonicum. Dev Dyn 273: 1581–1589 Stock DW, Whitt GS (1992) Evidence from 18S ribosomal RNA McCauley DW, Bronner-Fraser M (2004) Conservation and diver- sequences that lampreys and hagfishes form a natural group. gence of BMP2/4 genes in the lamprey: expression and phylo- Science 257: 787–789 genetic analysis suggest a single ancestral vertebrate gene. Sumner AT (2003) Chromosomes: organization and function. Black- Evol Dev 6: 411–422 well, Malden, MA Meyer A, Zardoya R (2003) Recent advances in the (molecular) Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Klein J (2003) phylogeny of vertebrates. Annu Rev Ecol Evol Syst 34: 311– Molecular phylogeny of early vertebrates: monophyly of the 338 agnathans as revealed by sequences of 35 genes. Mol Biol Neidert AH, Virupannavar V, Hooker GW, Langeland JA (2001) Evol 20: 287–292 Lamprey Dlx genes and early vertebrate evolution. Proc Natl Uchida K, Murakami Y, Kuraku S, Hirano S, Kuratani S (2003) Acad Sci USA 98: 1665–1670 Development of the adenohypophysis in the lamprey: evolution Osorio J, Retaux S (2008) The lamprey in evolutionary studies. Dev of epigenetic patterning programs in organogenesis. J Exp Genes Evol. 218: 221–235 Zoolog B 300: 32–47 Ota KG, Kuraku S, Kuratani S (2007) Hagfish embryology with ref- Venkatesh B, Kirkness EF, Loh YH, Halpern AL, Lee AP, et al. erence to the evolution of the neural crest. Nature 446: 672– (2007) Survey sequencing and comparative analysis of the ele- 675 phant shark (Callorhinchus milii) genome. PLoS Biol 5: e101 Pendleton JW, Nagai BK, Murtha MT, Ruddle FH (1993) Expansion Whelan S, Lio P, Goldman N (2001) Molecular phylogenetics: state- of the Hox gene family and the evolution of . Proc Natl of-the-art methods for looking into the past. Trends Genet 17: Acad Sci USA 90: 6300–6304 262–272 Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005) Het- Wolfe KH (2001) Yesterday’s polyploids and the mystery of dip- erotachy and long branch attraction in phylogenetics. BMC Evol loidization. Nat Rev Genet 2: 333–341 Biol 5: 50 Yu W, Rajasegaran V, Yew K, Loh W, Tay B, Amemiya C, Brenner Potter IC, Robinson ES (1971) The chromosomes. In “The Biology S, Venkatesh B (2008) Elephant shark sequence reveals of Lampreys” Ed by MW Hardisty, IC Potter, Academic Press, unique insights into the evolutionary history of vertebrate London, pp 279–293 genomes: a comparative analysis of the protocadherin cluster. Potter IC, Rothwell B (1970) The mitotic chromosomes of the lam- Proc Natl Acad Sci USA 105: 3819–3824 prey, Petromyzon marinus L. Experientia 26: 429–430 Zhang G, Cohn MJ (2006) Hagfish and fibrillar Robertson WMRB (1916) Chromosome studies. I. Taxonomic rela- reveal that type II collagen-based cartilage evolved in stem ver- tionships shown in the chromosomes of Tettegidae and Acr- tebrates. Proc Natl Acad Sci U S A 103: 16829–16833 ididiae: V-shaped chromosomes and their significance in Zhang G, Miyamoto MM, Cohn MJ (2006) Lamprey type II collagen Acrididiae, Locustidae and Grillidae: chromosomes and varia- and Sox9 reveal an ancient origin of the vertebrate collagenous tion. J Morphol 27: 179–331 skeleton. Proc Natl Acad Sci USA 103: 3180–3185 Robinson ES, Potter IC, Atkin NB (1975) The nuclear DNA content of lampreys. Experientia 31: 912–913