Quick viewing(Text Mode)

The Mitochondrial Genome of Spotted Green Pufferfish Tetraodon Nigroviridis

The Mitochondrial Genome of Spotted Green Pufferfish Tetraodon Nigroviridis

Genes Genet. Syst. (2006) 81, p. 29–39 The mitochondrial genome of spotted nigroviridis (Teleostei: ) and divergence time estimation among model organisms in fishes

Yusuke Yamanoue1*, Masaki Miya2, Jun G. Inoue1†, Keiichi Matsuura3, and Mutsumi Nishida1 1Ocean Research Institute, University of Tokyo, 1-15-1 Minamidai, Nakano-ku, Tokyo 164-8639, Japan 2Department of Zoology, Natural History Museum & Institute, Chiba, 955-2 Aoba-cho, Chuo-ku, Chiba 260-8682, Japan 3Department of Zoology, National Science Museum, 3-23-1 Hyakunin-cho, Shinjuku-ku, Tokyo 169-0073, Japan

(Received 12 November 2005, accepted 19 December 2005)

We determined the whole mitochondrial genome sequence for spotted green pufferfish, Tetraodon nigroviridis (Teleostei: Tetraodontiformes). The genome (16,488 bp) contained 37 genes (two ribosomal RNA genes, 22 transfer RNA genes, and 13 protein-coding genes) plus control region as found in other , with the gene identical to that of typical vertebrates. The sequence was used to estimate phylogenetic relationships and divergence times among major lin- eages of fishes, including representative model organisms in fishes. We employed partitioned Bayesian approaches for these two analyses using two datasets that comprised concatenated amino acid sequences from 12 protein-coding genes (excluding the ND6 gene) and concatenated nucleotide sequences from the 12 pro- tein-coding genes (without 3rd codon positions), 22 transfer RNA genes, and two ribosomal RNA genes. The resultant trees from the two datasets were well resolved and largely congruent with those from previous studies, with spotted green pufferfish being placed in a reasonable phylogenetic position. The approx- imate divergence times between spotted green pufferfish and model organisms in fishes were 85 million years ago (MYA) vs. torafugu, 183 MYA vs. three-spined stickleback, 191 MYA vs. medaka, and 324 MYA vs. zebrafish, all of which were about twice as old as the divergence times estimated by their earliest occurrences in fossil records.

Key words: mitochondrial genome, rubripes, fugu, partitioned Baye- sian analysis

fishes have been sequenced and compared with the INTRODUCTION human and other vertebrates’ genomes. Torafugu (Tak- The human genome sequence was generated by the ifugu rubripes) became the second organism to Human Genome Project (Venter et al., 2001), and using be sequenced to draft quality prior to tetrapods (Aparicio it we can identify and characterize human genes. Most et al., 2002). Other genome projects for several genetic information that governs how humans develop of ray-finned fishes, such as spotted green pufferfish (Tet- and function is encoded in their genome sequence. Com- raodon nigroviridis), zebrafish (Danio rerio), medaka paring the genomes of different organisms will guide (Oryzias latipes), and three-spined stickleback (Gasteros- future approaches to understanding gene function, regu- teus aculeatus), are also ongoing. lation, and evolution. Recently, the genomes of several Although ray-finned fishes and humans diverged from their common ancestor as long as 450 MYA (Kumazawa Edited by Norihiro Okada * Corresponding author. E-mail: [email protected] et al., 1999; Hedges and Kumar, 2003), their genomes † Present address: School of Computational Science, Florida have essentially the same genes and regulatory State University, Tallahassee, FL32306 4120, USA sequences. Furthermore, some ray-finned fishes have

30 Y. YAMANOUE et al. great advantages over mammals for research purposes: many are small and easy to maintain and have short MATERIALS AND METHODS reproductive cycles. Among ray-finned fishes, puffer- fishes have a genome of approximately 400 Mb, and the Sample and DNA extraction. A portion of the epax- effort needed for sequencing to obtain a comparable ial musculature (about 0.25g) was excised from a fresh amount of information is modest by comparison with the specimen and immediately preserved in 99.5% ethanol. effort needed for mammalian genomes (2.0–5.0 Gb). The Total genomic DNA was extracted using a Qiagen spotted green pufferfish, Tetraodon nigroviridis, was pro- DNeasy tissue kit (Qiagen) following the manufacturer’s posed as a model organism for genomic studies (Crnogo- protocol. The voucher specimen was deposited at the rac-Jurcevic et al., 1997), and recently its draft sequence Department of Zoology, Natural History Museum & Insti- was published (Jaillon et al., 2004). tute, Chiba (CBM-ZF 10554). The correct interpretation of any kind of comparative biological data requires an evolutionary framework, PCR and sequencing. The mitochondrial genome of namely, well-supported phylogeny with precise diver- spotted green pufferfish was amplified in its entirety gence times. Compared with the relationships among using a long polymerase chain reaction (PCR) technique other vertebrate groups, relationships among major lin- (Cheng et al., 1994). Long PCR primers (Table 1) were eages of ray-finned fishes, comprising over half of all ver- used so as to amplify the entire mitochondrial genome in tebrate species, remain ill defined, and thus the proposed two reactions. The long-PCR products were diluted with relationships among model organisms in fishes have tra- TE buffer (1:19) for subsequent use as PCR templates. ditionally relied on morphology-based hypotheses. Only A total of 67 fish-versatile PCR primers and a species- recently, several efforts are underway that rely on exten- specific primer (Table 1) were used in various combina- sive taxonomic coverage of the diversity of ray-finned tions to amplify contiguous, overlapping segments of the fishes to estimate higher-level relationships (e.g., Miya et entire mitochondrial genome for spotted green pufferfish. al., 2003). There are a few studies on divergence time Long PCR and subsequent short PCR were carried out as estimation for ray-finned fishes, and the estimates based previously described (e.g., Miya and Nishida, 1999; Inoue on the earliest fossil records and molecular clock were et al., 2003). very different. Molecular clock estimates were much Double-stranded PCR products, purified using ExoSAP- older than fossil estimates; for example, fossil records IT (USB), were subsequently used for direct cycle have shown that the radiation of dates back sequencing with dye-labeled terminators (Applied to the early Cenozoic (Benton, 1993, 2005), but a molecu- Biosystems). The primers used were the same as those lar clock estimate has placed it from the to the for PCR. All sequencing reactions were performed early (Kumazawa et al., 1999). In studies on according to the manufacturer’s instructions. Labeled model organisms in fishes, the proposed divergence times fragments were analyzed using a Model 377 DNA have exclusively referred to the estimates based on the sequencer (Applied Biosystems). earliest fossil records (e.g., Volff et al., 2003; Chen et al., 2004). Alignments. We chose five model species for genome In this study, we determined the whole mitochondrial science (spotted green pufferfish, torafugu, three-spined genome sequence for the spotted green pufferfish, Tetra- stickleback, medaka, and zebrafish) and representatives odon nigroviridis, and analyzed its genome contents. from major lineages of fishes (total 21 species, Table Recent studies have used many genes to estimate diver- 2). Final rooting was done using the small-spotted cat- gence times in the hope of reducing the effect of rate vari- shark. ation (Nei and Glazko, 2002). We used the whole The DNA sequences were edited and analyzed with mitochondrial genome sequence to estimate the diver- EditView ver. 1.0.1, AutoAssembler ver. 2.1 (Applied Bio- gence times between the model organisms and major fish systems), and DNASIS ver. 3.2 (Hitachi Software Engi- lineages because they have been demonstrated in recent neering Co. Ltd.). The individual gene sequences for the studies to be useful for estimating the divergence times 21 species were aligned manually using DNASIS based on among basal lineages within tetrapods (Kumazawa et al., the previously aligned sequences of 48 (Miya et 2004), primates (Schrago and Russo, 2003), and actinop- al., 2001). Amino acids were used for alignments of the terygians (Inoue et al., 2005). The partitioned Bayesian protein-coding genes and secondary structure models approach, which does not assume constancy of evolution- (Kumazawa and Nishida, 1993) were used for the align- ary rates (Thorne and Kishino, 2002), was employed for ment of tRNA genes. Since strictly secondary-structure- estimation of divergence times because evolutionary rate based alignment for the two rRNA genes was impractical heterogeneity was observed in the mitochondrial genes for the large dataset, we instead employed machine align- among the species used in this study. ment, which would minimize erroneous assessment of the positional homology of the rRNA molecules. The two

Mitochondrial genome of green pufferfish 31

Table 1 PCR and sequencing primers used in the analysis of spotted green pufferfish mitochondrial genome

L primers Sequence (5´–3´) H primers Sequence (5´–3´) Long PCR primers L2508-16S CTC GGC AAA CAT AAG CCT CGC CTG TTT ACC S-LA-16S-H TGC ACC ATT RGG ATG TCC TGA TCC AAC AAA AAC ATC L12321-Leu GGT CTT AGG AAC CAA AAA CTC TTG GTG CAA H15149-CYB GGT GGC KCC TCA GAA GGA CAT TTG KCC TCA PCR and sequencing primers L708-12S TTA YAC ATG CAA GTN TCC GC H690-12S GCG GAG GCT TGC ATG TGT A L1083-12S ACA AAC TGG GAT TAG ATA C H884-12S AAC CGC GGT GGC TGG CAC GAG L1803-16S AGT ACC GCA AGG GAA AGC TGA AA H1903-16S GTA GCT CGT YTA GTT TCG GG L2510-16S CGC CTG TTT ACC AAA AAC AT H2590-16S ACA AGT GAT TGC GCT ACC TT L2949-16S AGT TAC CCT GGG GAT AAC AGC GCA ATC H3084-16S AGA TAG AAA CTG ACC TGG AT L3074-16S CGA TTA AAG TCC TAC GTG ATC TGA GTT CAG H3466-ND1 ATK GGT TCT TTG ATG AAK AGT TT L3686-ND1 TGA GCM TCW AAT TCM AAA TA H3976-ND1 ATG TTG GCG TAT TCK GCK AGG AA L4166-ND1 CGA TAT GAT CAA CTM ATK CA H4432-Met TTT AAC CGW CAT GTT CGG GGT ATG L4438-Met AAG CTT TTG GGC CCA TRC CC H4866-ND2 AAK GGK GCK AGT TTT TGT CA L4633-ND2 CAC CAC CCW CGA GCA GTT GA H5669-Asn AAC TGA GAG TTT GWA GGA TCG AGG CC L5261-ND2 CWG GTT TCR TRC CWA AAT GA H5937-CO1 TGG GTG CCA ATG TCT TTG TG L5698-Asn AGG CCT CGA TCC TAC AAA GKT TTA GTT AAC H6855-CO1 AGT CAG CTG AAK ACT TTT AC L6199-CO1 GCC TTC CCW CGA ATA AAT AA H7447-Ser AWG GGG GTT CRA TTC CTY CCT TTC TC L7255-CO1 GAT GCC TAC ACM CTG TGA AA H8312-Lys CAC CWG TTT TTG GCT TAA AAG GCT AAY GCT Teni-CO1-L CAC CAT ATG TTT ACG GTA GG H8589-ATP AAG CTT AKT GTC ATG GTC AGT L7467-Ser GAG AAA GGR AGG AAT TGA ACC H9076-ATP GGG CGG ATA AAK AGG CTA AT L7905-CO2 GGC CAY CAR TGG TAY TGA AG H9375-CO3 CGG ATR ATG TCT CGT CAT CA L8202-CO2 TGY GGA GCW AAT CAY AGC TT H9639-CO3 CTG TGG TGA GCY CAK GT L8343-Lys AGC GTT GGC CTT TTA AGC TAA WGA TWG GTG H10019-Gly AGG AGS GCG ATT TCW AGR TC L8984-ATP ATT GGK KTA CGA AAT CAA CC H10244-ND3 AGG AGS GCG ATT TCW AGR TC L9514-CO3 TTC TGA GCC TTC TAY CA H10433-Arg AAC CAT GGW TTT TTG AGC CGA AAT L10201-ND3 TTT GAC CCT CTR GGS TCT GCC CG H10970-ND4 GAT TAT WAG KGG GAG WAG TCA L10440-Arg AAG ATT WTT GAT TTC GGC T H11085-ND4 ATT TCW GTG GCS CCG AAK GC L11424-ND4 TGA CTT CCW AAA GCC CAT GTA GA H11534-ND4 GCK AGG AYA ATA AAK GGG TA L12176-His AGA CRT TAG ATT GTG ATT CTA H12293-Leu TTG CAC CAA GAG TTT TTG GTT CCT AAG ACC L12329-Leu CTC TTG GTG CAA MTC CAA GT H13069-ND5 GTG CTG GAG TGK AGT AGG GC L13553-ND5 AAC ACM TCT TAY CTW AAC GC H13396-ND5 CCT ATT TTK CGG ATG TCY TG L14504-ND6 GCC AAW GCT GCW GAA TAM GCA AA H13727-ND5 GCG ATK ATG CTT CCT CAG GC L14735-Glu AAC CAC CGT TGT TAT TCA ACT A H14473-ND6 GCG GCW TTG GCK GCK GAG CC L15369-CYB ACA GGM TCA AAY AAC CC H14714-Glu TAG TTG AAT AAC AAC GGT GGT T L15765-CYB ATT CTW ACM TGA ATT GGM GG H14768-CYB TTK GCG ATT TTW AGK AGG GGG TG L15998-Pro AAC TCT TAC CMT TGG CTC CCA ARG C H15149-CYB GGT GGC KCC TCA GAA GGA CAT TTG KCC TCA MT-CR-L TGA WYT ATT CCT GGC ATT TGG YTC H15560-CYB TAG GCR AAT AGG AAR TAT CA H15915-Thr ACC TCC GAT CTY CGG ATT ACA AGA C MT-CR-H GAG CCA AAT GCM AGG AAT ARW TCA Primers are designated by their 3´ ends, which correspond to the positions of the human mitochondrial genome by convention. L and H denote light and heavy strands, respectively. rRNA gene (12S and 16S rRNA) sequences were aligned ses because of its heterogeneous base composition and using Clustal X (Thompson et al., 1997) with default gap consistently poor phylogenetic performance (Miya and penalties. Nishida, 2000). Ambiguously aligned regions, such as The ND6 gene was not used in the phylogenetic analy- the 5´ and 3´ ends of several protein-coding genes and

32 Y. YAMANOUE et al.

Table 2 List of species used in this study, with DDBJ/EMBL/GenBank accession numbers and references Accession Common name Order Family Species References Nos. Small-spotted catshark Carcharhiniformes Scyliorhinidae Scyliorhinus canicula X16067 Delarbre et al. (1998) Coelacanth Coelacanthiformes Coelacanthidae Latimeria chalumnae U82228 Zardoya and Meyer (1997) Amiidae Amia calva AB042952 Inoue et al. (2003) Japanese sardine Clupeidae Sardinops melanostictus AB032554 Inoue et al. (2001) Carp Cyprinidae Cyprinus carpio X61010 Chang et al. (1994) Zebrafish Cypriniformes Cyprinidae Danio rerio AC024175 Broughton et al. (2000) Northern pike Esocidae Esox lucius AP004103 Ishiguro et al. (2003) Rainbow trout Salmoniformes Oncorhynchus mykiss L29771 Zardoya et al. (1995) Atlantic cod Gadidae Gadus morhua X99772 Johansen and Bakke (1996) Silver eye Polymixiiformes Polymixiidae japonica AB034826 Miya and Nishida (2000) Medaka Adrianichthyidae Oryzias latipes AP004421 Miya et al. (2003) Deepbody boarfish Antigonia capros AP002943 Miya et al. (2001) Splendid alfonsino Berycidae Beryx splendens AP002939 Miya et al. (2001) Red coat Beryciformes Sargocentron rubrum AP004432 Miya et al. (2001) Three-spined stickleback Gasterosteidae Gasterosteus aculeatus AP002944 Miya et al. (2001) Hilgendorfís saucord Scorpaenidae Helicolenus hilgendorfii AP002948 Miya et al. (2003) Bastard halibut Pleuronectiformes Paralichthyidae Paralichthys olivaceus AB028664 Saitoh et al. (2000) Masked Tetraodontiformes Balistidae fraenatum AP004456 Miya et al. (2001) Thread-sail Tetraodontiformes Monacanthidae cirrhifer AB002952 Miya et al. (2003) Torafugu Tetraodontiformes AP006045 This study Spotted green pufferfish Tetraodontiformes Tetraodontidae Tetraodon nigroviridis AP006046 This study loop regions of several tRNA genes, were excluded from distribution (HKY85 + Γ) for dataset #2. We assumed the analyses. The “saturation” at 3rd codon positions in that all of the model parameters were unlinked and the the protein-coding genes was eliminated from the analy- rate multipliers were variable across partitions for ses by translating nucleotide sequences into amino acid datasets #1 and #2. sequences or simply excluding 3rd codon positions in the The Markov chain Monte Carlo (MCMC) process was protein-coding genes, leaving a total of 14,384 available set so that four chains (three heated and one cold) ran nucleotide positions (10,809, 1480, and 2095 positions for simultaneously. On the basis of preliminary runs with protein-coding, tRNA, and rRNA genes, respectively) for varying cycles (100,000–500,000), we estimated average the analyses. Two datasets were used in our analyses: log likelihood scores at stationarity (dataset #1 ≈ –45,590; dataset #1: concatenated amino acid sequences from 12 dataset #2 ≈ –82,890), and subsequently conducted two protein-coding genes (total 3603 positions); dataset #2: independent runs for each dataset. After reaching sta- concatenated nucleotide sequences from 12 protein-coding tionarity in the two runs at 20,000 cycles, we continued genes (without 3rd codon positions), 22 tRNA genes, and the runs for 980,000 cycles to confirm lack of improve- two rRNA genes (total 10,781 positions). ment in the likelihood scores, with one in every 100 trees being sampled. Posterior probabilities for internal Phylogenetic analysis. Partitioned Bayesian phyloge- branches and parameters of the model of sequence evolu- netic analyses were conducted with MrBayes ver. 3.1 tion for the two sets of 980,000 cycles (9800 trees) after (Ronquist and Huelsenbeck, 2003). We set 12 (dataset reaching stationarity were in excellent agreement for the #1: 12 protein-coding genes) and 4 (dataset #2: 1st, 2nd two datasets. Thus, we determined posterior probabili- codon positions, tRNA genes, and rRNA genes) partitions ties of the phylogeny and its branches based on the depending on the datasets. We used the mtmam (Yang 19,600-pooled trees from the two runs for the two et al., 1998) model with some sites assumed to follow a datasets. discrete gamma distribution (mtmam + Γ; Yang, 1994) for dataset #1, and the two-parameter model variant for Divergence time estimation. The analysis of diver- unequal base frequencies (HKY85; Hasegawa et al., 1985) gence time was conducted with the partitioned Bayesian with variable sites assumed to follow a discrete gamma approach (Thorne and Kishino, 2002). Molecular clock

Mitochondrial genome of green pufferfish 33 approaches were not used because a high rate of hetero- obtained by Bayesian analysis of dataset #2 (Fig. 1), geneity among lineages of fishes was observed by the two- which was identical with those from the previous studies cluster test (LINTREE; Takezaki et al., 1995). As a ref- (e.g., Miya et al., 2003). We used the mtmam + Γ model erence point for dating, the divergence time between sar- for dataset #1 and the HKY85 + Γ model for dataset copterygians and actinopterygians (450 MYA) was used #2. For dataset #2, we used the baseml program in the for the age of root node following previous analyses based PAML ver. 3.14 package (Yang, 1997) to obtain estimates on both fossils and molecules (see Kumazawa et al., 1999; of the transition/transversion rate ratio and the rates for Hedges and Kumar, 2003). site classes under the discrete-gamma model of rates The program Thornian Time Traveler ver. 1.0 (T3; see among sites under the HKY85 + Γ model. http://abacus.gene.ucl.ac.uk/) was used to estimate the Divergence time was estimated using the program mul- divergence time following the partitioned Bayesian tidivtime of T3. In each case, Markov chain Monte Carlo method of Thorne and Kishino (2002). We set 12 (MCMC) approximations were obtained with a burnin (dataset #1) and four (dataset #2: 1st, 2nd, codon posi- period of 100,000 proposal cycles. Thereafter, samples of tions, tRNA, and rRNA genes) partitions depending on the Markov chain were taken every 100 cycles until a the datasets. For dataset #2, we assumed that func- total of 10,000 samples were obtained. To diagnose pos- tional constraints on sequence evolution are more similar sible failure of the Markov chains to converge on their within codon positions (or types of molecules) across stationary distribution, we performed at least two repli- genes than across codon positions (or types of molecules) cate MCMC runs with different initial starting points for within genes. Branch lengths were estimated with the each analysis. Application of the multidivtime program estbNewAA and estbNew programs of T3 for datasets #1 requires a value for the mean of the prior distribution for and #2, respectively, in conjunction with the tree topology the time separating the ingroup root from the present

Fig. 1. Phylogenetic tree of 20 bony fishes plus an outgroup and prior distribution of divergence times. We assumed that the tree topology was fixed in the topology, which is the 50% majority rule consensus tree of the 19,600-pooled trees from the two independent Bayesian analyses of dataset #2 [concatenated nucleotide sequences from 12 protein-coding genes (without 3rd codon positions), 22 tRNA genes, and two rRNA genes (total 10,781 positions)] using MrBayes 3.1 (Ronquist and Huelsenbeck, 2003) with HKY85 + Γ model of sequence evolution. The asterisks indicate branches supported by posterior probabilities less than 100% (Node 13: 79%; Node 14: 91%). Topological incongruities with dataset #1 are denoted by arrowheads. Ages for earliest fossil records are shown to right of the tree. The divergence time between sarcopterygians and actinopterygians (Node 1), i.e., 450 MYA, was used for the age of root node following previous analyses based on both fossils and molecules (see Kumazawa et al., 1999; Hedges and Kumar, 2003). In the partitioned Bayesian approach, we used the two maximum (U) and nine minimum (L) constraints according to the fossil and molec- ular information (Table 3).

34 Y. YAMANOUE et al.

Table 3 Maximum (U) and minimum (L) constrains (MYA) and calibration information for constraints in Fig. 1 Node Constraints Calibration information 1 U528 The estimated divergence time between chondrichthyans and osteichthyans (528 MYA) from independent molecule studies (Kumazawa et al., 1999; Kumazawa and Nishida, 2000; Hedges and Kumar, 2003) 1 L411 The Andreolepis fossil () from Ludfordian ()* 2 U450 The estimated divergence time between sarcopterygians and actinopterygians (450 MYA) based on both fossils and molecules (Kumazawa et al., 1999; Hedges and Kumar, 2003) 2 L240 The amiiform fossil from Ansinian ()* 4 L74 The ostariophysian fossil from Campanian (Cretaceous)* 7 L74 The esociform fossil from Campanian (Cretaceous)* 9 L161 The gadiform fossil from Bathonian (Jurassic)* 10 L90.4 The beryciform fossil from Cenomanian (Cretaceous)* 13 L50 The pleuronectiform fossil from Ypresian (Eocene)* 16 L95 The tetraodontiform fossil from 95 MYA (Cretaceous) (Tyler and Sorbini, 1996) 17 L56.5 The balistoid fossil from Thanetian (Paleocene) (Tyler and Bannikov, 1992) * Minima (L) are based on earliest occurrences in the fossil records (Benton, 1993).

(rttm). We used a conservative estimate of 450 MYA for 13 protein-coding, 22 tRNA, and two rRNA genes, and a rttm (see above). Other settings for running multidiv- control region, as found in other vertebrates. Also, as in time were as follows: standard deviation of rttm (rttmsd) other vertebrates, most genes were encoded on the H = 0.5 × rttm; rtrate = X/rttm, where rtrate is the mean of strand, except for the ND6 gene and eight tRNA genes prior distribution for the rate at the root node and X is (tRNA-Gln, tRNA-Ala, tRNA-Asn, tRNA-Cys, tRNA-Tyr, the median amount of evolution from the ingroup root to tRNA-Ser (UCN), tRNA-Glu, and tRNA-Pro genes), and the ingroup tips; rtratesd = 0.5 × rtrate, where rtratesd all genes were similar in length to those in other is the standard deviation of rtrate; rttm × brownmean = vertebrates. The gene arrangement was identical to 1, where brownmean is the mean of the prior distribution those in typical vertebrates. for the autocorrelation parameter (í); and brownsd = brownmean, where brownsd is the standard deviation of Protein-coding genes. There were 13 protein-coding the prior distribution for ν. genes, of which two reading-frame overlaps on the same The multidivtime program allows for both minimum strand (ATP8 and ATP6 shared 10 nucleotides; ND4L and and maximum fossil constraints. Whereas minima are ND4 shared seven nucleotides). All protein-coding genes often based on earliest occurrences in the fossil record, began with an ATG start codon, except for the COI gene, maxima are intrinsically more difficult to estimate. which started with GTG. Stop codons of protein-coding However, we constrained two upper limits to obtain nar- genes were TAA in the ATP8, Cytb, ND4L, and ND5 rower credibility intervals on posterior divergence times genes, TAG in the ND1 gene, TA in the ATP6, ND2, and according to Kishino and Thorne (2002). With the excep- COIII genes, T in the COII, ND3, and ND4 genes, and tion of Nodes 1 and 2 (Table 3, Fig. 1), we could not use AGG in the COI and ND6 genes. For those genes that the maximum because fossils of teleosts are not well pre- have an incomplete stop codon, the transcripts would be served in general (Benton, 1993; Kumazawa et al., modified to form the complete termination signal UAA by 1999). Accordingly, we used the two maximum and nine polyadenylation after cleavage of the polycistronic RNA, minimum constraints for the divergence time estimation as demonstrated for other metazoan mtDNAs (Ojala et (Table 3, Fig. 1). al., 1981).

Transfer RNA genes. The mitochondrial genome of RESULTS AND DISCUSSION spotted green pufferfish contained 22 tRNA genes, which were in clusters or individually scattered in the genome. Genome organization. The complete L-strand nucle- The tRNA genes ranged in size from 64 to 74 nucleotides, otide sequence of the spotted green pufferfish mitochon- large enough that the encoded tRNAs can fold into the drial genome has been registered in DDBJ/EMBL/ cloverleaf secondary structure characteristic of tRNAs. GenBank under the accession number AP006046. The This structure was possible provided that formation of the total length of the mitochondrial genome of spotted green G-U wobble and other atypical pairings were allowed in pufferfish is 16,488 bp. The genome contents included the stem regions. All postulated cloverleaf structures

Mitochondrial genome of green pufferfish 35 contained 7 bp in the amino acid stem, 5 bp in the TYC other model organisms as follows (datasets #1/#2): 96.2/ stem, 5 bp in the anticodon stem, and 4 bp in the DHU 73.3 MYA vs. torafugu (Node 18 in Fig. 2), 191.5/175.1 stem, except that there were 3 bp in DHU stem of the MYA vs. three-spined stickleback (Node 14), 198.6/183.7 tRNA-Ser (AGY) gene. MYA vs. medaka (Node 12), and 333.0/314.7 MYA vs. zebrafish (Node 3). The divergence time estimates Ribosomal RNA genes. The 12S rRNA and 16S rRNA between spotted green pufferfish and major lineages of genes of spotted green pufferfish were 949 bp and 1676 fishes are also shown in Fig. 2 and Table 4. On the bp, respectively. They were located, as in other verte- whole, the estimates made using dataset #1 (amino acids) brates, between the tRNA-Phe and tRNA-Leu (UUR) were slightly older than those made using dataset #2 genes, being separated by the tRNA-Val gene. Prelimi- (nucleotides), but the 95% credibility intervals from the nary assessment of their secondary structure indicated two estimates largely overlapped with each other. In the that the present sequences could be reasonably superim- following discussion, we use the averages of the estimates posed on the proposed secondary structure of goby 12S from the two datasets as the estimated divergence times rRNA (Wang and Lee, 2002) and loach 16S rRNA (Gutell for convenience. et al., 1993). Benton (2005) analyzed the radiation of ray-finned fishes based on earliest fossil records, and reported diver- Non-coding sequences. As in most vertebrates, the gence times much more recent than those in this study, origin of light strand replication (OL) in spotted green especially for those of higher fishes (Table 4): bow- pufferfish was 33 bp, located between the tRNA-Asn and fin/other actinopterygians (Node 2 in Fig. 2) at ca. 250 tRNA-Cys genes. This region has the potential to fold MYA [403.6 MYA in the present estimate]; northern pike into a stable stem-loop secondary structure with 20 bp in + rainbow trout/other euteleosts (Node 6) at ca. 130 MYA the stem and 13 bp in the loop. The conserved motif 5´- [288.0 MYA]; medaka + bastard halibut/other percomor- GCCGG-3´ was found at the base of the stem within the phs (Node 12) at ca. 90 MYA [191.2 MYA]. Kumazawa tRNA-Cys gene (Hixson et al., 1986). et al. (1999) estimated divergence times of major lineages The major non-coding region found in spotted green of ray-finned fishes based on mitochondrial ND2 and Cytb pufferfish is located between the tRNA-Pro and tRNA- genes using a molecular-clock approach, and obtained Phe genes and its length is 810 bp. This non-coding estimates largely congruent with those obtained in this sequence appears to correspond to the control region, study. The rate calibration of Kumazawa et al. (1999) because it has several conserved sequence blocks (CSB) was based on the assumption that African and neotropi- and a terminal-associated sequence (TAS), which are cal cichlids diverged 100 MYA and that actinopterygians characteristic of the region (Doda et al., 1981; Walberg and sarcopterygians split 450 MYA. Inoue et al. (2005) and Clayton, 1981). estimated divergence times of basal ray-finned fishes using the same methods and sequences as employed in Phylogenetic analysis. Fig. 1 shows a 50% majority this study, and obtained estimated times slightly more rule consensus tree of the 19,600 combined samples from recent than those in this study. Crnogorac-Jurcevic et al. the two Bayesian analyses of the 21 whole mitogenomic (1997) calculated the divergence time between torafugu sequences from the concatenated 12 protein-coding, 22 and spotted green pufferfish to be 18–30 MYA [84.8 MYA] tRNA genes, plus two rRNA genes (dataset #2) using the by using a molecular clock of 376 bp of Cytb gene HKY85 + Γ model of sequence evolution (Hasegawa et al., sequences based on Cantatore et al. (1994). Cantatore 1985). Most of the internal branches were supported by et al. (1994) estimated the rates of evolution at 0.5–0.8% high Bayesian posterior probabilities (= 100%) except for per million years by using five shark species, and the two nodes. The resultant topology was identical with divergence time between and Perci- those from previous studies (e.g., Miya et al., 2001, 2003; formes was calibrated to be about 200 MYA, correspond- Inoue et al., 2003). Bayesian analysis using amino acid ing to Node 2 in Fig. 2 [403.6 MYA]. sequences of 12 protein-coding genes (dataset #1) pro- There appears to be a considerable time gap between duced a similar tree, with minor differences in the place- the molecular and fossil evidence. The fossil estimates ments of Japanese sardine and bastard halibut (Fig. are usually based on the earliest occurrences. The ear- 1). The position of spotted green pufferfish was consis- liest fossil records, however, do not necessarily indicate tently supported as a sister species of torafugu, with that a group diverged from its sister group at that these two species forming a clade with the other tetra- time. In addition, fossil records tend to be incomplete in odontiforms, and thread-sail filefish. various respects and the long-term lack of teleostean fos- sils was probably influenced by the population sizes and Divergence time estimation among model organ- taphonomic conditions for fossilization (Kumazawa et al., isms. The partitioned Bayesian approach estimated the 1999). Also it has been reported that molecular diver- divergence times between green spotted pufferfish and gence time estimates often exceed fossil estimates in ver-

36 Y. YAMANOUE et al.

Fig. 2. Posterior distributions of the divergence times among model organisms of fishes and major lineages of fishes on the partitioned Bayesian approach (Thorne and Kishino, 2002) for dataset #1 [concatenated amino acid sequences from 12 protein-coding genes (total 3603 positions)]. A shark, Scyliorhinus canicula, was used as an outgroup. The program Thornian Time Traveler ver. 1.0 (T3; see http://abacus.gene.ucl.ac.uk/) was used to estimate divergence times in conjunction with the tree topology shown in Fig. 1 using the mtmam + Γ model of sequence evolution. The horizontal rectangles represent the estimated 95% credibility intervals of divergence times. Arrowheads indicate the nodes under maximum or minimum constraints (see Table 3, Fig. 1). tebrates (e.g., Springer et al., 2003; Inoue et al., 2005). earliest occurrences of fossil records and were in accord We interpret this apparent discrepancy to be indicative of with those obtained by Kumazawa et al. (1999), implying the paucity of teleostean fossil records rather than the that the radiation of major lineages of ray-finned fishes inaccuracy of our molecular time estimates. is much older than previously thought. Even in recent A timescale is necessary not only for estimating rates genomic studies of fishes, however, calculation of evolu- of molecular and morphological changes in organisms but tionary rates or understanding of patterns of evolution also for interpreting patterns of macroevolution and was obtained exclusively from divergence times estimated biogeography. Moreover, the availability of genomic from the earliest fossil records. This accumulated data from model organisms has created a demand for reli- knowledge associated with timescale should be reconsid- able estimation of divergence time to help understand the ered based on molecular estimation. More detailed anal- temporal component of evolution. The results obtained yses with further taxonomic sampling may allow reliable in this study were much older than those based on the estimation of the timescale for fish evolution.

Mitochondrial genome of green pufferfish 37

Table 4 Divergence time estimates (MYA) of each node of Fig. 2 based on the partitioned Bayesian approach for datasets #1 and #2. Numbers in the parentheses indicate 95% credibility intervals of divergence times Node Dataset #1 Dataset #2 Average Molecular clock Fossil 1 499.7 (447.0–526.7) 470.1 (414.5–524.2) 484.9 438.0d, 450.5e 450c 2 417.4 (365.5–448.3) 389.7 (339.7–441.7) 403.6 404b, 342.5d, 376.4e 250c 3 333.0 (282.9–380.1) 314.7 (270.4–363.2) 323.9 296b, 230.4d, 277.7e 130–140c* 4 323.7 (274.5–370.8) 284.4 (241.9–331.5) 304.1 265b*, 200.8d, 239.0e – 5 213.3 (167.2–262.0) 166.8 (130.7–207.9) 190.1 265b* – 6 296.1 (249.4–342.4) 279.9 (240.2–326.3) 288.0 190.7d, 231.6e 130c* 7 292.4 (246.2–338.3) 230.2 (193.9–272.8) 261.3 – – 8 234.8 (194.4–280.2) 223.0 (190.6–264.3) 228.9 131.2d, 159.3e 100c 9 206.3 (168.1–251.3) 191.2 (163.2–230.1) 198.8 – – 10 213.3 (174.1–257.1) 205.6 (173.8–244.9) 209.5 – – 11 176.6 (141.3–217.1) 183.5 (153.6–220.7) 180.1 – – 12 198.6 (159.4–243.0) 183.7 (153.5–221.0) 191.2 – – 13 189.7 (151.6–234.1) 169.6 (140.2–206.6) 179.7 – – 14 191.5 (153.4–234.5) 175.1 (146.1–210.7) 183.3 – – 15 162.1 (126.3–202.6) 148.8 (122.1–182.0) 155.5 – – 16 176.0 (139.2–217.7) 160.2 (132.8–193.9) 168.1 – – 17 167.0 (131.0–208.2) 150.9 (124.2–183.9) 159.0 – – 18 96.2 ( 69.5–128.7) 73.3 ( 56.7– 93.6) 84.8 18–30a – 19 141.3 (106.5–182.1) 117.7 ( 94.5–146.0) 129.5 – – a Crnogorac-Jurcevic et al. (1997); b Kumazawa et al. (1999); c Benton (2005); Inoue et al. (2005) (d amino acids, e nucleotides); * phylogenetic topologies were different from that in this study

We thank N. B. Ishiguro, M. M. Yamauchi, and other Gadaleta, M. N., and Saccone, C. (1994) Evolutionary anal- graduate students at the Molecular Marine Biology Lab- ysis of cytochrome b sequences in some Perciformes: evi- oratory, Ocean Research Institute, University of Tokyo, dence for a slower rate of evolution than in mammals. J. Mol. Evol. 39, 589–597. for their helpful suggestions during this study. We also Chang, Y. S., Huang, F. L., and Lo, T. B. (1994) The complete thank Jacob Egge for correcting the English in the nucleotide sequence and gene organization of carp (Cypri- manuscript. Critical comments from two anonymous nus carpio) mitochondrial genome. J. Mol. Evol. 38, 138– reviewers were helpful to improve the manuscript. A 155. portion of this study was supported by Grants-in-Aid from Chen, W. -J., Orti, G., and Meyer, A. (2004) Novel evolutionary relationship among four fish model systems. Trends the Ministry of Education, Culture, Sports, Science, and Genet. 20, 424–431. Technology, Japan (13556028, 13640711, 15570090, Cheng, S., Higuchi, R., and Stoneking, M. (1994) Complete mito- 15380131, and 12NP0201) and Research Fellowship of chondrial genome amplification. Nat. Genet. 7, 350–351. the Japan Society for the Promotion of Science for Young Crnogorac-Jurcevic, T., Brown, J. R., Lehrach, H., and Schalk- Scientists (07304). wyk, L. C. (1997) Tetraodon fluviatilis, a new puffer model for genome studies. Genomics 41, 177–184. Delarbre, C., Spruyt, N., Delmarre, C., Gallut, C., Barriel, V., REFERENCES Janvier, P., Laudet, V., and Gachelin, G. (1998) The com- plete nucleotide sequence of the mitochondrial DNA of the Aparicio, S., Chapman, J., Stupka, E., Putnam, N., Chia, J. M., dogfish, Scyliorhinus canicula. Genetics 150, 331–344. et al. (2002) Whole-genome shotgun assembly and analysis Doda, J. N., Wright, C. T., and Clayton, D. A. (1981) Elongation of the genome of Fugu rubripes. Science 297, 1301–1310. of displacement-loop strands in human and mouse mito- Benton, M. J. (1993) The fossil record, vol. 2, pp 845, Chapman chondrial DNA is arrested near specific template sequences. & Hall, London. Proc. Natl. Acad. Sci. USA 78, 6116–6120. Benton, M. J. (2005) Vertebrate palaeontology, 3rd ed., pp. 455, Gutell, R. R., Gray, M. W., and Schnare, M. N. (1993) A compi- Blackwell, Malden. lation of large subunit (23S and 23S-like) ribosome RNA Broughton, R. E., Milam, J. E., and Roe, B. A. (2001) The com- structures. Nucleic Acids Res. 21, 3055–3074. plete sequence of the zebrafish (Danio rerio) mitochondrial Hasegawa, M., Kishino, H., and Yano, T. A. (1985) Dating of the genome and evolutionary patterns in vertebrate mitochon- human ape splitting by a molecular clock of mitochondrial drial DNA. Genome Res. 11, 1958–1967. DNA. J. Mol. Evol. 22, 160–174. Cantatore, P., Roberti, M., Pesole, G., Ludovico, A., Milella, F., Hedges, S. B., and Kumar, S. (2003) Genomic clocks and evolu- 38 Y. YAMANOUE et al.

tionary timescales. Trends Genet. 19, 200–206. based on 100 complete mitochondrial DNA sequences. Mol. Hixson, J. E., Wong, T. W., and Clayton, D. A. (1986) Both the Phylogenet. Evol. 26, 121–138. conserved stem loop and divergent 5' -flanking sequences Nei, M., and Glazko, G. V. (2002) Estimation of divergence times are required for initiation at the human mitochondrial ori- for a few mammalian and several primate species. J. gin of light-strand DNA replication. J. Biol. Chem. 261, Heredity 93, 157–164. 2384–2390. Ojala, D., Montoya, J., and Attardi, G. (1981) tRNA punctuation Inoue, J. G., Miya, M., Tsukamoto, K., and Nishida, M. (2000) model of RNA processing in human mitochondria. Nature Complete mitochondrial DNA sequence of the Japanese sar- 290, 470–474. dine Sardinops melanostictus. Fish. Sci. 66, 924–932. Ronquist, F., and Huelsenbeck, J. P. (2003) MrBayes 3: Baye- Inoue, J. G., Miya, M., Tsukamoto, K., and Nishida, M. (2003) sian phylogenetic inference under mixed models. Bioinfor- Basal actinopterygian relationships: a mitogenomic perspec- matics 19, 1572–1574. tive on the phylogeny of the “ancient fish.” Mol. Phylo- Saitoh, K., Hayashizaki, K., Yokoyama, Y., Asahida, T., Toyo- genet. Evol. 26,110–120. hara, H., and Yamashita, Y. (2000) Complete nucleotide Inoue, J. G., Miya, M., Venkatesh, B., and Nishida, M. (2005) sequence of Japanese flounder (Paralichthys olivaceus) The mitochondrial genome of Indonesian coelacanth Latim- mitochondrial genome structural properties and cue for eria menadoensis (Sarcopterygii: Coelacanthiformes) and resolving teleostean relationships. J. Heredity 91, 271– divergence time estimation between the two coelacanths. 278. Gene 349, 227–235. Schrago, C. G., and Russo, C. A. M. (2003) Timing the origin of Ishiguro, N. B., Miya, M., and Nishida, M. (2003) Basal euteleo- new world monkeys. Mol. Biol. Evol. 20, 1620–1625. stean relationships: a mitogenomic perspective on the phy- Springer, M. S., Murphy, W. J., Eizirik, E., and OíBrien, S. J. logenetic reality of the “.” Mol. (2003) Placental mammal diversification and the Creta- Phylogenet. Evol. 27, 476–488. ceous-Tertiary boundary. Proc. Natl. Acad. Sci. U.S.A. 100, Jaillon, O., Aury, J. -M., Brunet, F., Petit, J. -L., Stange- 1056–1061. Thomann, N., et al. (2004) Genome duplication in the Takezaki, N., Rzhetsky, A., and Nei, M. (1995) Phylogenetic test teleost fish Tetraodon nigroviridis reveals the early verte- of the molecular clock and linearized trees. Mol. Biol. Evol. brate proto-karyotype. Nature 431, 946–957. 12, 823–833. Johansen, S., and Bakke, I. (1996) The complete mitochondrial Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F., DNA sequence of Atlantic cod (Gadus morhua): relevance to and Higgins, D. G. (1997) The Clustal X windows interface: taxonomic studies among codfishes. Mol. Mar. Biol. Bio- flexible strategies for multiple sequence alignment aided by technol. 5, 203–214. quality analysis tools. Nucleic Acids Res. 25, 4876–4882. Kumazawa, Y., Azuma, Y., and Nishida, M. (2004) Tempo of Thorne, J. L., and Kishino, H. (2002) Divergence time and evo- mitochondrial gene evolution: can mitochondrial DNA be lutionary rate estimation with multilocus data. Syst. Biol. used to data old divergences? Endocytobiosis Cell Res. 15, 51, 689–702. 136–142. Tyler, J. C., and Bannikov, A. F. (1992) A remarkable new genus Kumazawa, Y., and Nishida, M. (1993) Sequence evolution of of tetraodontiform fish with features of both balistids and mitochondrial tRNA genes and deep-branch ostraciids from the Eocene of Turkmenistan. Smiths. Con- phylogenetics. J. Mol. Evol. 37, 380–398. trib. Paleobiol. 72, 1–14. Kumazawa, Y., and Nishida, M. (2000) Molecular phylogeny of Tyler, J. C., and Sorbini, L. (1996) New superfamily and three osteoglossoids: a new model for Gondwanian origin and new families of tetraodontiform fishes from the Upper Cre- plate tectonic transportation of the Asian arowana. Mol. taceous: the earliest and most morphologically primitive Biol. Evol. 17, 1869–1878. plectognaths. Smiths. Contrib. Paleobiol. 82, 1–59. Kumazawa, Y., Yamaguchi, M., and Nishida, M. (1999) Mito- Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. chondrial molecular clocks and the origin of euteleostean J., et al. (2001) The sequence of the human genome. Sci- biodiversity: familial radiation of perciforms may have pre- ence 291, 1304–1351. dated the Cretaceous/Tertiary boundary. In: The biology of Volff, J. N., Bouneau, L., Ozouf-Costaz, C., and Fischer, C. biodiversity (ed. M. Kato), pp. 35–52. Springer-Verlag, (2003) Diversity of retrotransposable elements in compact Tokyo. puffer genomes. Trends Genet. 19, 674–678. Miya, M., Kawaguchi, A., and Nishida, M. (2001) Mitogenomic Walberg, M. W., and Clayton, D. A. (1981) Sequence and prop- exploration of higher teleostean phylogenies: a case study erties of the human KB cell and mouse L cell D-loop regions for moderate-scale evolutionary genomics with 38 newly of mitochondrial DNA. Nucleic Acids Res. 9, 5411–5421. determined complete mitochondrial DNA sequences. Mol. Wang, H. Y., and Lee, S. C. (2002) Secondary structure of mito- Biol. Evol. 18, 1993–2009. chondrial 12S rRNA among fish and its phylogenetic Miya, M., and Nishida, M. (1999) Organization of the mitochon- applications. Mol. Biol. Evol. 19, 138–148. drial genome of a deep-sea fish Gonostoma gracile (Teleo- Yang, Z. (1994) Maximum likelihood phylogenetic estimation stei: ): first example of transfer RNA gene from DNA sequences with variable rates over sites: approx- rearrangements in bony fishes. Mar. Biotechnol. 1, 416– imate methods. J. Mol. Evol. 39, 306–314. 426. Yang, Z. (1997) PAML: a program package for phylogenetic Miya, M., and Nishida, M. (2000) Use of mitogenomic informa- analysis by maximum likelihood. Comp. App. Biosci. 13, tion in teleostean molecular phylogenetics: a tree-based 555–556. exploration under the maximum-parsimony optimality Yang, Z., Nielsen, R., and Hasegawa, M. (1998) Models of amino criterion. Mol. Phylogenet. Evol. 17, 437–455. acid substitution and applications to mitochondrial protein Miya, M., Takeshima, H., Endo, H., Ishiguro, N. B., Inoue, J. G., evolution. Mol. Biol. Evol. 15, 1600–1611. Mukai, T., Satoh, T. P., Yamaguchi, M., Kawaguchi, A., Zardoya, R., Garrido-Pertierra, A., and Bautista, J. M. (1995) Mabuchi, K., Shirai, S. M., and Nishida, M. (2003) Major The complete nucleotide sequence of the mitochondrial DNA patterns of higher teleostean phylogenies: a new perspective genome of the rainbow trout, Oncorhynchus mykiss.J. Mitochondrial genome of green pufferfish 39

Mol. Evol. 41, 942–951. of the mitochondrial genome of a “living fossil,” the coela- Zardoya, R., and Meyer, A. (1997) The complete DNA sequence canth (Latimeria chalumnae). Genetics 146, 995–1010.