Copyright 0 1994 by the Genetics Society of America

Complete DNA Sequence of the Mitochondrial Genome of the Black , tunicata

Jeffrey L. Boore' and Wesley M. Brown Department of Biology and Museum of Zoology, University of Michigan, Ann Arbor, Michigan 48109-1048 Manuscript received March 10, 1994 Accepted for publication July 8, 1994

ABSTRACT The DNA sequence of the 15,532-basepair (bp) mitochondrial DNA (mtDNA) of the chiton has been determined. The 37 genes typical of metazoan mtDNA are present: 13 for protein subunits involved in oxidative phosphorylation, 2 for rRNAs and 22 for tRNAs. The gene arrangement resembles thoseof much more than that of another mollusc, the bivalve Mytilus edulis.Most genes abut directly or overlap, and abbreviated stop codons are inferred for four genes. Four junctions between adjacent pairs of protein genes lack intervening tRNA genes; however,at each of these junctions there is a sequence immediately adjacent to the start codon of the downstream gene that is capable of forming a stem-and-loop structure. Analysis of the tRNA gene sequences suggests that the D arm is un- paired in tRNASer(AGN), whichis typical of metazoan mtDNAs, and also in tRNAxr(UCN),a condition found previously only innematode mtDNAs. There aretwo additional sequences in Katharina mtDNA that can be folded into structures resembling tRNAs; whether these are functional genes is unknown. All possible codons except the stop codons TAA and TAG are used in the protein-encoding genes, and Katharina mtDNA appears to use the same variation of the mitochondrial genetic code that is used in Drosophila and Mytilus. Translation initiates at the codons ATG, ATA and GTG. A + T richness appears to have affected codon usage patterns and, perhaps, the amino acid composition of the encoded proteins. A 142-bp non-coding region between tRNA@'"and C03 contains a 72-bp tract of alternating A and T.

ETAZOAN mitochondrial DNA (mtDNA) is a shown to include elements for initiationthe and control M closed-circular molecule, except in some hydro- of replication and transcription. zoan cnidarians where it is one ortwo linear molecules Mitochondrial gene arrangements are usually quite (WARRIORand GALL 1985;BRIDGE et al. 1992). Metazoan similar among within the same phylum. All 37 mtDNAsvary insize from ca. 14-42 kilobases (kb) genes are arranged in the same relative order among (MORITZet al. 1987; WOLSTENHOLME 1992; SNYDERet al. most vertebrates (ANDERSON et al. 1981, 1982; ARNASON 1987). Usually this variation is due to differences in non- et al. 1991; &ASON and J~HNSS~N1992; BIBBet al. 1981; coding regions (BROWN1985; HARRISON1989), but oc- CHANGet al. 1994; GADALETAet al. 1989; ROEet al. 1985; casionally it is due to duplications or multiplications of JOHANSEN et al. 1990; TZENGet al. 1992), although minor coding regions (AZEVEDO and HW 1993; FULLERand rearrangementsare found in marsupials (Pmo ZOUROS1993; MORITZand BROWN 1986, 1987;ZEVERING et al. 1991) and birds (DESJARDINSand MORAIS 1990; et al. 1991). DESJARDINSet al. 1990). Likewise, gene arrangements are The gene contentof metazoan mtDNA is highlycon- very similar among echinoderms, with only a single in- served, and typically consists of genes for 2 ribosomal version differentiating sea urchins from sea stars UACOBS subunit RNAs [small- and large-rRNA (+rRNA and et al. 1988; CANTATOREet al. 1987b, 1989; DE GIORGIet al. 1-rRNA) 1, for 22 tRNAs, and for 13 protein subunits [cy- 1991; HIMENOet al. 1987; SMITHet al. 1989,1990,1993). tochrome c oxidase subunits 1-111 (CO1-3), cytochrome Within arthropods,the mitochondrial genearrange- b apoenzyme (Cytb), ATP synthase subunits 6 and 8 ments of Drosophila (CLARYand WOLSTENHOLME1985a; (ATPase6 and ATPase8), and NADH dehydrogenase DE BRUIJN 1983; GARESSE1988) and Apis (CROZIERand subunits 1-6 and 4L (NDl-6, 4L)I. In addition, there CROZIER1993), and partial gene arrangementsof others is at least one sequence ofvariable lengthwhich does not (BATUCASet al. 1988; DUBINet al. 1986; HSUCHENet al. encode any structural genes and which, in vertebrates 1984; MCCRACKENet al. 1987; PASHLEYand KE 1992; (BOGENHAGENet al. 1985; CLAWON1991, 1992;FORAN UHLENBUSCHet al. 1987; L. DAEHLER,D. STANTONand W. BROWN,unpublished data) differ only by one to a few et al. 1988; KING and LOW1987; MONTOYAet al. 1982) and (CLARYand WOLSTENHOLME1985a), has been tRNA transpositions. The mtDNAs of two nematodes, Caenorhabditis elegans and Ascaris mum, have nearly Present address: Department of Cell Biology and Neuroanatomy, Univer- identical genearrangements (WOLSTENHOLMEet al. sity of Minnesota, 4135 Church Street SE, Minneapolis, Minnesota 55455. 1987; OKIMOTOet al. 1992), although that of a third,

Genetics 138: 423-443 (October, 1994) 424 J. L. Boore and W. M. Brown

Meloidogyne javanica, is radically different (OKIMOTO heim) according to the protocol in SAMBROOKet al. (1989); et al. 1991). Nematodes are also unusual in lacking a after labeling, digestswere split and subjected toelectrophore- mitochondrially encoded ATPuse8. In contrast to the sis in 1%agarose and 3.5% polyacrylamide gels, usingcondi- tions (BROWN1980) that allowed detection of fragments as general similarity of gene arrangements within a phy- small as 40 base pairs (bp) in the polyacrylamide gels. lum, there are substantial differences among phyla. Katharina mtDNA was cloned into the A bacteriophage vec- Comparison of the relative arrangements of mito- tor EMBL3 (Stratagene) as a single insert, using the unique chondrial genes may be useful for evaluating phyloge- BamHI siteat position 669-674 (seeAPPENDIX). Approximately netic relationships among major metazoan groups (see 50 ngof digested mtDNA and 1 pg of EMBLSarms were mixed BOOREand BROWN1994). The gene content metazoan of with DNA ligase and incubated overnight at 14" (SAMBROOK et al. 1989). The ligation mix was added to a packaging system mtDNAs isnearly unvarying, so a comparable data setis (Stratagene) and used to transfecta culture of Escherichia coli available for all taxa. Mitochondrial genomes appear to strain P2-392. After initially screening for insert size, plaque- undergo rearrangementon a time scale appropriate for purified recombinant phage were tested for the presence of resolving ancient phylogenetic relationships. The great the entire mtDNA insert by restriction enzyme fragment com- number of theoretically possible gene arrangements parisons with Katharina mtDNA. Five insertcontaining DNA fragments, of approximately makes it unlikely that or more taxa will indepen- two 7.0,3.9,3.1,1.1 and 0.7 kilobases(kb), were produced by com- dently converge on an identical order, thus, identical bined digestion of the recombinant DNA with BamHI and gene arrangementswill generally be sharedonly by com- EcoRI. The fragments were separated by electrophoresis in mon ancestry. Comparisons of mitochondrial gene ar- 0.8% agarose, excisedfrom the gel, and extracted and purified rangements have been useful so far in addressing the using Geneclean (Bio 101, Inc.) according to supplier's in- evolutionary relationships among echinoderm classes structions.A 50- to 500-ng sampleof each fragment was ligated to an equimolar amount of the plasmid vector pBluescript I1 (SMITHet al. 1993). As this data set expands, we antici- KS- (Stratagene) which had been digested with BamHl and pate that mitochondrial gene arrangements, as well as EcoR1, or with EcoRI alone. The recombinant plasmids, des- many aspects of the molecular biology of mtDNA ( i. e., ignated K7.0, K3.9, K3.1, K1.l and K0.7, respectively, were alternative start codons, unusualtRNA and rRNA struc- transformed into the XL1 Blue strain of E. coli, which had tures, genetic code variations, and features of mtDNA been made transformationcompetent by treatment with poly- replication and transcription) will reveal patterns that ethylene glycol (CHUNGand MILLER1988). The cleavage sites of 20 restriction enzymes weremapped in the inserts, and fur- signal the evolutionary relationships ofmany major ther subcloning yielded a total of 41 additional clones (see metazoan groups. Figure 1). White, ampicillin-resistant colonies were selected Mytilus edulis is the first mollusc whose mitochon- and the identities of the inserts in the recombinant plasmids drialgene arrangement was determined (HOFFMANN were verified by comparing restriction enzyme maps of puri- et al. 1992). That arrangementis radically different from fied plasmid DNA with those of native mtDNA. any previously reported. Mytilus mtDNA, like that of DNA sequences were determined according to procedures adapted from SANCERet al. (1977), using modified T7 DNA nematodes, lacks A TPase8, and ithas several other atypi- polymerase (Sequenase 2.0, U. S. Biochemical Corp.) and cal features, including a second tRNAmetgene, several [35S]dATP (Amersham). Double-stranded sequencing tem- lengthy unassigned intergenic sequences, and signifi- plateswere prepared according to DEL SAL et al. (1988), cantdeparture in the sizeof several of its protein- LISZEWSKIet al. (1989), or the CsCl banding technique of encoding genes from those of other metazoans. It was SAMBROOKet al. (1989). Some clones were sequenced from unclear whetherthese features of Mytilus mtDNA, which single-stranded templates generated according to SAMBROOK et al. (1989). Products of sequencing reactions were separated belongs to the class Bivalvia, are typical of molluscs, or by electrophoresis in 4% or 6% polyacrylamide gels; dITPwas of a more phylogenetically restricted group. Toevaluate used, when necessary, to resolve compressions.The sequence this, to add to the rapidly growing database of mitochon- of ca. 400 nucleotides (nt) from each end of an insert could drial gene arrangements, and to detail aspects of the usually be determined using the T7 or T3 priming sites in molecular biology of mtDNA, manyof which promise to pBluescript 11; additional sequence was obtained using oligo- nucleotide primers of 17 nt, designed on the basis of the se- be useful for evolutionary comparisons, we determined quence obtained (see Figure 1). the complete mtDNA sequence for the chiton Katha- Open reading frames (ORFs) were identified with the pro- rina tunicata, a representative of the molluscan class gram IBI MacVector (version 3.5), using the genetic code in- Polyplacophora. ferred for Drosophila mtDNA (CLARYand WOLSTENHOLME 1985a). The identity of the ORFs was determined by the simi- larity of their inferred amino acid sequences to those of the MATERIALSAND METHODS corresponding mitochondrial genes of Drosophila yakuba Mitochondrial DNA from the chiton K. tunicata was iso- (CLARYand WOLSTENHOLME1985a). In the case of ND4L, ND6 lated and purified as described and referenced in WRIGHTet al. andATPase8,which show only weakamino acid sequence simi- (1983), but with a twofold increased concentration of larity between Katharinaand Drosophila, identities were con- ethidium bromide substituted for propidium iodide. A de- firmed by comparisons of hydrophilicity profiles (KYTE and tailed map of the cleavage sites for 12 restriction endonucle- DOOLITTLE1982; see Figure 3). Sequences were identified ge- ases wasconstructed, using a combination of single and double nerically as tRNA genes by their potential to be folded into digests (see Figure 1). DNA fragments resulting from the di- secondary structures characteristic of mt-tRNAs, and specifi- gests were labeled with [32P]dNTP~(Amersham Corp.) using cally by the triplet sequence in their putative anticodon loop. the Klenow fragment of DNA polymerase (Boehringer Mann- The rRNA genes were identified by their similarity in sequence Chiton Mitochondrial Genome 425 Katharinatunicata

d -/ FW)? S 7 E H T P u y ME€ KRI S(AGN)

D m LfQM !aGAN

(K7.0) 1 K0.7 I K3.1 I K1.l I K7.0 K3.9 < > <- - ~"" - > <- ___ - - - > <- - - ___ __" - <- - > <- " -- ___-" " - "

"7 7 ""2 -7 7 ""r7 7- 77 7 - 3 "" 7 ""7 - "" 7 "2- L " L 2 -7" " c" "- L 2 7"

"-7

I I IB TCKE X T I W WREI R WH PRlA KSEX HHHiH S ISSRRH tWX I CIS X J Ill Ill I I I I II 1111 Ill II I 1111 IIII IIII I I I1 II u 1111 I II I I

5000 10000 15000 FIGURE1.-Mitochondrial DNA of the chiton K. tunicata, showing subcloning and sequencing strategies. The circular mtDNA map has been linearized between CO1 and ND2. Genes are designated by their products: CO1-3, subunits 1-111 of cytochrome c oxidase; ND1-6 and ND4L, subunits 1-6 and 4L of NADH dehydrogenase; CYTB, cytochrome b apoenzyme;A6 and A8, subunits 6 and 8 of the mitochondrial ATP synthase; s-rRNA and 1-rRNA, small and large ribosomal RNAs, respectively; tRNA genes are identified by the single letter amino acid code; F(UUU)? and S(UCU)?,tRNA genelike sequences (see text and Figure 5); UNK (unknown), thelongest region of unassigned DNA (424 bp); A/T, a 72-nt tract of alternating As and Ts. All genes are transcribed from left to right, except those shown under the arrows (they are underlined for emphasis). K0.7, K3.1, K1.l, K3.9 and K7.0 are the five DNA fragments subcloned from the original recombinant EMBLS bacteriophage; vertical bars showtheir boundaries. The horizontal bars immediatelybelow the K clones depict further subclones. The symbols < and > at the extreme right and left ends of the lines of subclones indicate that the adjacent subclone spans the arbitrary break made to linearizethe circular map. Sequences obtained using the pBluescript T3 orT7 primers are depicted by arrows with downward facing barbs; arrows with upward facing barbs depict those obtained using primers designed from the Katharina sequence. Sites of cleavage byrestriction enzymes employed in cloning are indicated by short vertical bars above the scale bar at the bottom. Restriction enzyme abbreviations:A, SacII; B, BamHI;C, CZaI; E, EcoRI; H, HindIII; I, HindII; K, KpnI; N, NheI; P, PstI; R, EcoRV; S, SpeI; T, Sad; X, XbaI. The BamHI site at position 669-674 (see APPENDIX) was used to insert the entire genome into the bacteriophage vector EMBL3. The scale, shown at the bottom, is in nucleotides. and in potential secondary structure to their D. yakuba coun- 18.6% for the strand shown in the APPENDIX) and assum- terparts (CLARYand WOLSTENHOLME1985b). ing random distribution. The ratio of 0.64 for the ob served to expected frequency of CG is the lowest such RESULTSAND DISCUSSION ratio among all dinucleotide pairs in this mtDNA, as is Genome composition: The size of the cloned Katha- typical among metazoan mtDNAs [e.g., D. yakuba = rina mtDNAis 15,532 bp. As istypical of metazoan 0.69, human = 0.64, (Strongylocentrotuspur- mtDNAs, most intergenic regions aresmall (

Mytilus

Katharina

D

L(UUR) K ANE G H - S(W) UCLLCU Y I M Wy I- Drosophila 4 ND2 d!i1111 I 1 4 13 ICYTB 1 Ulj-rRNd- h QG S(AGN) FIGURE2.-Comparison of mitochondrial gene arrangements among K. tunicata, M. edulis (HOFFMANNet al. 1992) and D. yakuba (CLARYand WoLsTENHoLME 1985a).Each mtDNAis shownwith the 5"endof COl to the left.All Mytilus genes are transcribed from left to right, as are all Katharina and Drosophila genes except those underlined. Lines connecting gene pairs(or blocks of contiguous genes, when markedby a bar) show rearrangements neededto interconvert the(protein + rRNA) genearrangements; tRNA gene rearrangements,which are numerous, have been ignored. Lines with encircling arrows are inversions; lines without them are transpositions. Gene designations asFIGURE in 1. except thatM(AUA) in Myths denotes a supernumerary tRNAmet-like gene with the anticodon TAT. of NADH dehydrogenase (ND1-6, 4L), 1-111 of cyto- to interconvert the Katharina and gene or- chrome c oxidase (CO1-3), 6 and 8 of ATP synthase ders: a transposition of the block C03-ND3, and an in- (ATPase6and 8),and cytochrome b apoenzyme (Cytb)], version of the block ND6-Cytb (see Figure 2). As noted 2 rRNAs and 22 tRNh that are foundin most metazoan previously (CLARYand WOLSTENHOLME1985a; MORITZ mtDNAs. It is notable that inthis it differs from another et al. 1987), tRNA genes appear to rearrange at a higher mollusc, the bivalve M. edulis, whose mtDNA does not frequency: 10 differ in their relative locations between contain ATPase8 (HOFFMANNet al. 1992). Unlike other Katharina and Drosophila mtDNAs [those for leu (UUR) , metazoan mtDNAs examined, Katharina mtDNA con- lys, asp, gly, ser(AGN), glu, ile, gln, met and trp] . tains two additional sequenceswhich can be folded into The similarity of the Katharina gene arrangement tRNA-like structures, although it is questionable with those of the arthropods Apis and Drosophila is in whether either is a functional tRNA gene. Mytilus also marked contrast toits dissimilaritywiththe arrangement has a sequence thatmay be a supernumerary tRNA gene of the bivalve mollusc Mytilus.The only Katharina gene (tRNAmet(*'*); HOFFMANNet al. 1992), but it is not ho- boundaries that are sharedby Mytilusare those between mologous to either of those found in Katharina tRNAth' and ND4L and among tRNA"uiCuN),tRNA"(uUR' (tRNAphe(uuu) and mAser(ucu)). and NDI. The results of these gene arrangementcom- The Katharina gene arrangementis notably similar to parisons, which run counter toall expectations based those of thearthropods Drosophila (CLARYand on presently accepted phylogenetic affinities among WOLSTENHOLME1985a) and Apis (CROZIERand CROZIER these taxa, are discussed further, in conjunctionwith 1993). If one considers only the 15 genes encoding rR- other comparisons, in the section on phylogenetic NAs and proteins, only two rearrangements are required implications. Chiton Mitochondrial Genome 427

ND4L ND6 ATPase8

Katharina

10 20 30 40 50 60 70 80 93 100 20 40 60 80 100 120 140 160 10 20 30 40 50

Drosophila

10 20 30 40 50 60 70 80 90 M 40 60 80 100 120 140 160 10 20 30 40 50 FIGURE3.-Hydrophilicity profiles of the mitochondrially encodedND4L, ND6 and ATPase8 proteins of K. tunicata (above) compared with those of D. yakuba (below). Hydrophilicitywas computed by the method of KWE and DOOLITTLE(1982), with a search windowsetting of seven. They-axis represents the hydrophilicity scale ranging from-5.0 to +5.0; x-axis numbers designate amino acid positions in each protein.

Thirty-one Katharina mtDNA genes either abut di- C. elegans and Ascaris suum (OKIMOTOet al. 1992). In rectly or have some noncoding sequence separating Katharina mtDNA thereare four pairs of protein- them. In seven cases, however, there are overlaps be- encoding genes that abutwithout an intervening tRNA: tween adjacent genes. In two, the genes are encodedon NDZ-COl,ATPase8-ATPase6, ND4L-ND4 and ND6- opposite strands ( tRNASe'(UCN)overlaps tRNA"' by 1 nt, Cytb. At each of these four gene junctions there is a and tRNAP'" overlaps NDI by 8 nt) , which presents no sequence with the potential for forming a stem-loop problems for the resolution of their respective tran- structure, and in each case the start codonof the down- scripts. In five, however, the overlapping genes are en- stream gene is positioned identically relative to that coded by the same strand. The gene pairs tRNA1e"(CuN)- structure (see Figure 4). Although sequences with the and tRNAtYr-tRNAcYseach overlap by 1 nt. potential to form other stem-loop structures also occur Their transcripts could be resolved if the processing near the gene boundaries, there is no commonality to point alternated by 1 nt, if the transcripts were length- their positions relative to the gene termini, and noneof ened by enzymatic editing, or if the unpaired nucleotide them is predicted to be morestable then the structures at the3'end of the upstream tRNA were unnecessary to in Figure 4. Although additionalstructuring (not the downstream tRNA. However, it is more difficult to shown) is possible within each of the loops, there is no explain how the transcripts for the two gene pairs with apparent similarity in this respect among the several 2-nt overlaps ( tRNAgin-tRNAi* and tRNAaSn-tRNA'le) loops. The variability in loop size and, in the cases ofthe might be resolved. An alternating point of transcript ND4-ND4Land ND6-Cytbjunctions, the shortstems ar- processing, transcript editing, or relaxation of tRNA gue against the hypothesis that these structures are sig- structure couldagain be invoked. An argument against nificant in vivo. If one assumes a random sequence the latter is that all Katharina tRNA genes, including model, it is likely that a reverse complement to the 5 nt these four, are invariant in having the potential for preceding ND4 (TTATT) would occur by chance alone standard, seven-member amino-acylstems. Finally, within the 50 nt upstream of ND4 (P= 0.78), and some- tRNAIYY'and tRNA"'" overlap on the same DNA strand what likelyfor the5 nt upstream of Cytb (CAAAT) (P= by 4 nt; for these, alternation of transcript processing 0.1). However, the likelihood of chance matches as ex- or transcript editing are the only explanations we can planations for the potential secondary structures be- propose. tween NDZ-COland ATPase8-ATPase6 is much lower In mammalian mitochondria, the tRNA portions of (P= 0.006). Thus, the occurrence of these structures is the polycistronic transcript have been suggested to serve suggestive of function. However, whether the structures as processing signals for its cleavage into gene-specific actually form in vivo and fimction in transcript processing RNAs (BATTEYand CLAWON1980; OJALA et al. 1980, is purely speculative, since no experimental data exist. 1981). In cases where proteinencoding genes abut di- Translationinitiation and termination: An ATG rectly, sequences adjacentto the genejunctionshave the codon is at the inferred initiation site in 11 of the 13 potential to formstem-loop structures which, when tran- protein-encoding genes (see Table 2). Translation of scribed, may serve as alternative signals for RNA pro- the othertwo appears to initiate at ATA (ND4)and GTG cessing enzymes. Either tRNA genes or sequences with (ND2),both of which havebeen invoked as alternative the potential to form stem-loop structures have been start codons in other mitochondrial systems found at nearly all mitochondrial gene boundaries in (MONTOYAet al. 1981; ANDERSON et al. 1981, 1982; BIBB mammals (Mus domesticus, Homo sapiens and Bos tau- et al. 1981; CLARYand WOLSTENHOLME 1985a; GADALETA rus; BIBBet al. 1981), in the insects D.yakuba (CLARY et al. 1988,1989; JACOBS et al. 1988;GAREY and and WOISTENHOLME1985a) and D. melanogaster WOLSTENHOLME1989; DESJARDINS and MORAIS 1990; (DE BRUIJN1983), in the sea urchin Paracentrotus lividus JOHANSEN et al. 1990; HOFFMANNet al. 1992) and in some (CANTATORE et al. 198713, 1989), and in the nematodes prokaryotes (e.g., STORMOet al. 1982). 428 J. L. Boore and W. M. Brown

G-C G-T 6-T 1-6 G-T T-A A-T A -7 G-T T-A FIGURE4.-Potential RNA second- T-A ary structuresat the four iunctionsbe- T- G 1-5l -A - T-A T-A directly tween protein-abutting GGAGCTATAAA-T~ATGCGATGAATT TAAGTNCCCCT-AtATGATAATAGAT encodinggenes.Thesequenceofthe ND2 A TPase8 A TPase6 noncoding DNA strand is shown, and CO7 GT pairing is allowed. A loop is de- picted above each stem, with num-the ber of nt in each loop encircled. In each structure, the start codon of the (boxed) downstream gene is at an identical relative location.

A-T G-C 0G-T T-A T-A T-A A-T T-A TGCACACGGAA-T~ATATCAGCATTTTTTTTGTATTAA-T~ATGCTTAAACCA N D4L ND4ND4L ND6 Cytb

Translation of ND4 appears to initiate with the ATA The ORFs inferredfor C02, ND2, ND4Land at positions 6928-30 (see Appendix). There is an in- A TPase8 terminate with the stop codonTAG, and those frame TAA (stop) codon21 nt upstream of this ATA that for C03,ND1, ND3,Cytb and ATPaseGwith TAA. The is immediately followed by GTG, the only other possible fourremaining protein-encoding genes (ND4,NDS, ( i.e., previously described) start codon before theone ND6 and COl) are inferred to terminate with an in- inferred. The six additional amino acids that would complete stop codon thatcontains only the Tof the first result if initiation were at this GTG are notsimilar to codon position. The transcripts of these four genes are those at the amino terminus of ND4 in Drosophila. presumably modified to form a completeTAA (= UAA) There are no ATG codons in the first 200 nt down- stop codon by polyadenylation of the transcript after stream of the inferred start site; however, there are 9 cleavage of the polycistronic RNA, a common feature additional ATA codons in this region. Although we among metazoan mitochondrial systems (OJALAet al. cannot determine theactual start site from this DNA 1980, 1981; ANDERSON et al. 1981). sequence data with certainty, we note that initiation Alternatively, in Katharina each of these four genes at the ATA we have designated would result in a pro- could terminate with a complete stop codon thatover- tein of length and aminoacid sequence similar to that laps the adjacent downstream gene by a few nt. The T inferred for ND4 in Drosophila (CLARYand WOLSTEN- inferred as the first nucleotide of the incomplete stop HOLME 1985a). codon of ND4 directly abuts the 5' end of tRNAhiS,but The other exception to translation initiation at an an in-frame stop codon (TAG) is present 5 nt further ATG codon is ND2,which we infer to initiate at GTG downstream. The terminal T inferred for NDS is fol- (positions 14514-16; see Appendix). Initiation at this lowed by AA, which would complete the stop codon; GTG would result in a protein of similar length to that however, these two As are also inferred to be thefirst 2 inferred forDrosophila, and would leave only1 nt sepa- nt of tRNAPhe. Likewise, the T of the incomplete stop rating ND2 and tRNAserlAGN).There is a stop codon in codon that we infer for ND6 directly abuts Cytb; if this frame with ND2 that is 13 codons upstream from this inference is incorrect andND6 terminates instead at the GTG, and there areno ATN codons in frame with ND2 next completein-frame stop codon,it must overlap Cytb in the region between this stop codonand the proposed by 8 nt. GTG start codon. The first ATNis 10 codons down- The most ambiguous terminus is that for CO1. This stream from the predictedND2 start site, and there are gene is inferred to have an incomplete stop codonwhich no ATG, ATA or additional GTG codons in the first 100 results in an inferred protein that endstwo amino acids nt of ND2.It is unclear whetheran initiating GTG would beyond its alignment with the CO1 protein of Drosoph- be translated as valine, as GTG apparently is at internal ila. Although there is a legitimate stop codon after only positions, or as methionine. In this regard, we note that two more amino acids, the abbreviated stop has been the CAU anticodon of tRNA"" is an acceptable reverse chosen because it accommodates a normal length gene complement to the GTG codon. for tRNAphe(UUU).However, uncertainty about whether C hiton Mitochondrial Genome Mitochondrial Chiton 429

TABLE 1

Codon usage in the 13 protein coding genes of the K. tunicata mitochondrial genome

Amino Amino Codon N % acid Codon N 5% Codon N % acid Codon N % - ~ TIT 300 8.1 Ser TCT 114 3.1 TAT 81 2.2 TGT 26 0.7 TTC 53 1.4 (UGA) TCC 24 0.6 TAC 42 1.1 TGC 15 0.4 TTA 322 8.7 TCA 52 1.4 TAA 5 0.1 TGA 73 2.0 TTG 68 1.8 TCG 10 0.3 TAG 4 0.1 TGG 37 1.0 CTT 79 2.1 Pro CCT 78' 2.1 CAT 60 1.6 CGT 11 0.3 CTC 32 0.9 (UGG) CCC 15 0.4 CAC 19 0.5 CGC 7 0.2 CTA 90 2.4 CCA 34 0.9 CAA 51 1.4 CGA 26 0.7 CTG 6 0.2 CCG 3 0.1 GAG 14 0.4 CGG 13 0.3 ATT 236 6.4 Thr ACT 80 2.2 AAT 110 3.0 AGT 39 1.o ATC 58 1.6 (UGU) ACC0.5 19 AAC 51 1.4 AGC 24 0.6 ATA 149 4.0 ACA 72 1.9 AAA 78 2.1 AGA 95 2.6 ATG 40 1.1 ACG 8 0.2 AAG 18 0.5 AGG 26 0.7 Gl" 107 2.9 Ala GCT 91 2.4 GAT 51 1.4 GGT 48 1.3 GTC 16 0.4 (UGC) GCC 31 0.8 GAC 18 0.5 GGC 32 0.9 GTA 93 2.5 GCA 70 1.9 GAA 60 1.6 GGA 73 2.0 GTG 34 0.9 GCG 19 0.5 GAG 22 0.6 GGG 86 2.3 N = the number of occurrences of each codon; % = the percentage of the total codon usage this comprises. The total number of codons used is 3718. Abbreviated stop codonswere excluded from the analysis. The anticodon of the corresponding tRNA is shown in parentheses below each amino acid designation. The Katharina mt-tRNA anticodons are identical to those of M. edulis (HOFFMANNet a1 1992). The putative tRNAme'(AUA) gene in MytilusmtDNA has been excluded from consideration. this supernumerary tRNA gene is actually transcribed order to verify their identities, we compared the hydro- into a functionaltRNA (see below) makes this criterion philicity profiles of each of their inferred proteins with less compelling. those of D.yakuba (CLARYand WOLSTENHOLME1985a). In each of these four cases, the presence of an in- The results of this analysis appear in Figure 3. Although frame stop codonjust inside the downstream gene may amino acid sequence identities with ND4L, ND6 and be a coincidenceand without functional implications, or ATPase8 of Drosophila were low (36.7, 33.5 and 30.2%, possibly a mechanism to prevent extensive translational respectively), each of the proteins has a very similar hy- readthrough of unprocessed messages.Alternatively, drophilicity profile to its Drosophila homolog, suggest- the upstream reading frames might extend to the com- ing that the amino acid replacements are patterned to plete stop codons,with transcript processing alternating conserve this feature of the proteins. between the overlapping genes, possibly in a regulated A base-compositional bias for A + T appearsto affect manner. (This, barring transcript editing, is what is pre- both codon usage pattern and amino acid composition dicted for the overlapping tRNA genes, as discussed of proteins (Table 1). The 13 proteinencoding genes above.) In the case of ND6-Cytb, there is the additional include 1657 codons from fourfold degenerate codon possibility that the commontranscript might remain un- families [codons that specify the same amino acid with processed, and that Cytb translation might initiate at an any nucleotide in thethird ("wobble") position]. Of internal ATG codon in the bicistronic transcript. Over- these, 1252 (75.6%) end in A or T. Five of the six most lappinggenes have beennoted inseveral mtDNAs commonly used codons arecomposed of onlyA and T: (ANDERSON et al. 1981,1982; BIBBet al. 1981; GADALETA TTT (8.1%);TTA (8.7%); ATT (6.4%); ATA (4.0%); et al. 1989;JACOBS et al. 1988; CLARYand WOLSTENHOLMEand AAT (3.0%). Thesixth, TCT (3.1%),codes for ser, 1985a; DESJARDINSand MORAIS 1990; JOHANSEN et al. 1990; which cannot be specified by any codon composed of ROE et al. 1985),and in human cells, ATPase6 and only A and T. Two other translatable codons are com- ATPase8 are known to be translated from a single posed of only A and T, TAT (2.2%) which codes for tyr, mRNA (OJALAet al. 1981). and AAA (2.1%) which codes for lys. Among the pro- Protein-encoding genes: Katharina mtDNA contains teins, thethree most frequentamino acids are leu genes for the same 13 proteinsubunits that are typically (16.1%),ser (10.3%) and phe (9.5%);two ofthese (leu encoded in metazoan mtDNAs. These genes comprise and phe)can be specified by codons composed of only 11,148 nt (71.8%) of the Katharina mitochondrial ge- A and T (TTA and TTT, respectively).While alternative nome. Most gene identities were easily established by explanations are possible, it appears likely that the A+ comparison of their (translated) amino acid sequences T richness of the Katharina mtDNA is the determining with those published for the mitochondrially encoded factor in its codon usage pattern (see D'ONOFRIOet al. proteins of other metazoans. The most difficult genes to 1991; COLLINSand JUKES 1993). identify, because of their small size and relative lack of The inferred sizesof the Katharina proteinsare sequence similarity, were ND4L, ND6 and ATPase8. In similar to their homologs in Drosophila (CLARYand 430 J. L. Boore and W. M.Brown

TABLE 2 Comparisons among the mitochondrial protein coding genes of K. tunicata, D. yakuba and M. edulis

Number of amino acids Percent amino acid identity Predicted initiation and termination codons

Katharina/ Katharina/ Protein Katharina Drosophila Mytilus Drosophila Mytilus Katharina Drosophila Mytilus

230 224 238 224ATPase6 230 32.7 37.8 ATG TAA ATG TAa ATG TAG ATPase8 53 53 Not present 30.2 Not present ATG TAG A'M TAA Not present 513 512co1 513 Partial (129) 76.7 39.5 ATG Ta ATAA TAA - TAA c02 229 228 Partial37.3 (193) 62.6 ATG TAG ATG Ta ATG TAG C03 264 259 262 46.6 64.1 ATG TAA ATG TAA ATG TAA 379 378Cytb 379 Partial47.4 (272) 68.7 ATG TAA ATG TAA TAAATG ND1 316 324 Partial46.2 (221) 54.4 ATGTAA ATA TAA - TA' 338 341ND2 338 Partial37.3 (217) 38.9 GTG TAG AlT Ta ATG TAG ND3 120 117 44.8116 48.1 ATG TAA A'M TAA ATG TAA 442 446ND4 442 Partial35.5 (287) 47.3 ATA Ta ATG T" ATG TAA ND4L 100 96 39.893 36.7 ATG TAG ATG TA" ATG TAA 571 573ND5 571 Partial41.2 (522) 44.1 ATG Ta A'M T" ATA TA" ND6 166 174 158 33.5 22.8 ATG Ta ATT TAA ATG TAA Inferred sizes, degree of amino acid identity and predicted initiation and termination codons were compared for the mitochondrially encoded protein genes of K. tunicata, D. yakuba (CLARYand WOLSTENHOLME1985a) and M. edulis (HOFFMANNet al. 1992). Amino acid sequences were aligned pairwise using the PAM250 matrix (IBI MacVector; version 3.5). Percentage identity was calculated by dividing the number of identical amino acid positions by the average length of the sequences compared. Numbers in parentheses are the number of inferred amino acids compared from partially sequenced genes of Mytilus. No ATPase8 is present in Mytilus mtDNA. a Incomplete termination codons presumably completed by polyadenylation (see text). *The CO1 and ND1 initiation codons of Mytilus could not be identified with confidence.

WOLSTENHOLME1985a) and to the five proteins whose suggest that the mitochondrial genetic code of Katha- genes have been completely sequenced in Mytilus rina is identical to those of Drosophila and Mytilus. All (HOFFMANNet al. 1992; see Table 2). ND1 differs sig- codons occur within the protein genes of Katharina nificantly in size between Drosophila and Mytilus, and mtDNA except the stop codons TAA and TAG. In par- this may also be true of CO1 and C03, as inferred from ticular, three codons(TGA, AGG and AGA) that appear differences in their initiation and termination points to functionas stop codons in the mitochondrial systems (HOFFMANNet al. 1992);in these respects,the three Katha- of some metazoans occur frequently in Katharina pro- rina genes are more similar to Drosophila than to Mytilus. tein genes. The percent amino acid sequence identity of Katha- AGA and AGG,like AGY codons, specifrserine: AGA rina, Drosophila and Mytilus proteins are compared in and AGG code for arg in the "universal" genetic code Table 2. Complete sequences are available for five genes and in the mitochondrial codes of protozoa (PRITCHARD in all three species; for four of them, Katharina is more et al. 1990; ZIAIE and SUYAMA1987), fungi (BONITZet al. similar to Drosophila than to Mytilus. A similar result is 1980) and plants (see JUKES and OSAWA 1990;OHYM obtained in comparisons of all of the remaining genes, et al. 1991), for gly in ascidian mtDNA (YOKOBORIet al. which are only partiallysequenced for Mytilus. It may be 1993), and aredesignated as stop codons in vertebrate argued that these comparisons are biased because the mtDNA (ANDERSON et al. 1981, 1982; BIBBet al. 1981; Mytilus sequences are from gene termini, which are of- GADALETAet al. 1989; ROE et al. 1985; JOHANSEN et al. ten especially variable (e.g.,see the protein alignments 1990). In mitochondria,tRNAser(AGYJ appears also to rec- in JACOBS et al. 1988). In order totest this, Mytilus and ognize AGA and AGG in flatworms (Fasciola; GAREYand Katharina C03 amino acid sequences were aligned, WOLSTENHOLME1989), echinoderms UACOBSet al. 1988; using the PAM 250 matrix of the IBI MacVector pro- CANTATOREet al. 1989; DE GIORCIet al. 1991; HIMENOet al. gram, and analyzed for patterns of amino acid identity. 1987; SMITHet al. 1989, 1990), Mytilus (HOFFMANNet al. [C03 was chosen because it has the least ambiguous 1992) and nematodes (OKIMOTOet al. 1992). AGA also alignment of the five genes completely sequenced in codes for serin Drosophila mtDNA, but AGG codons are Mytilus.] Fifty-seven (43%) of the 132 aligned positions not used (CLARYand WOLSTENHOLME1985a). Of the 95 in the central region are identical, whereas 65 (49%) of in-frame AGA codons in Katharina mtDNA, 34 corre- the 132 aligned positions nearest the ends (Le., the 66 spond in position to ser in Drosophila proteins (11 to aligned positions nearest each end) are identical, sug- AGA, 8 to AGT, 15 to TCN) ,but nonecorrespond to arg. gesting that comparisons of the terminal amino acid se- AGA codons in Katharina mtDNA correspond to asp quences of Mytilusproteins may be representative of the positions in Drosophila proteins 11 times and 6 times overall degree of variation. each to gly, met and thr.Of the 26 in-frame AGG codons Genetic code: Similarities in codon usage patterns, in Katharina mtDNA, 6 correspond in position to ser tRNA anticodons and amino acid sequence alignments in Drosophila proteins (2 to AGA, 3 to AGT, 1 to TW), C hiton Mitochondrial Genome Mitochondrial Chiton 431 but none correspond to arg. Katharina AGG codons dant amino acid in Katharina mitochondrial proteins correspond in position to thr in Drosophila proteins five (16.1% of allamino acids). Only 1 nt change is required times, and to no other amino acid more than three to convert the leu codons CTA and TTA to ATA, and times. both CTA and TTA are presentin abundance (2.4%and Although AGG codons in Katharina correspond al- 8.7%of allcodons, respectively).Further, no tRNA with most equally to thr andser positions in Drosophila pro- an anticodon thatcould discriminate an ATA from ATG, teins, we do not regard this as significant. Katharina ATC and ATT has been identified. The anticodon of mtDNA encodes a tRNA with the anticodon UGU, typi- tRNA1le is GAU, so the G in the wobble position is ex- cal for thetRNAth‘ whichdecodes theACN codon family. pected to discriminate the ATT and ATC codons. The This tRNA would not be expected to recognize the anticodon of tRNAm“is CAU,so C in the wobble position codon AGG, nor does it seem likely that any other Ka- presumably pairs with either A or G of ATA and ATG tharina tRNA would be able to discriminate AGG codons codons. While it is not obvious how the GA pairing is from others in the AGN family. It seems most likely, achieved, this appears to be the generalcase for tRNAmet therefore, that AGG specifies ser in Katharina mtDNA, in mitochondrial systems. and that the nearly identical number of matches to ser Code evidence from tRNA anticodons:Identity of the and thr positions in Drosophila proteins is due to (1) the mitochondrial genetic code in Drosophila and Katha- similar chemical nature of ser and thr, (2) the lack of rina is also indicated by comparisons of their cognate AGG codons in Drosophila, and (3) the potential for tRNA anticodons, which are identical in allbut onecase. one-step interconversion of thr and ser codons (thr = The exceptional tRNA in Katharina has the anticodon ACN; ser = AGN). UUU, and presumably recognizes the lys codons AAA The anticodon of tRNAse‘(AGN) is SinceGCU. this is the and AAG; the U in the wobble position could do this only tRNAwith an NCU anticodon and since AGR efficiently. The cognate tRNA anticodon in Drosophila codons occur frequently in Katharina mitochondrial is CUU (CLARYand WOLSTENHOLME1985a), but in Myti- protein genes, this tRNA presumably recognizes all AGN lus it is also UUU (HOFFMANNet al. 1992). codons as ser. Itis not obvious howthe G in thisanticodon Other than in (see above), G only occurs would pair with A or G to recognize both AGA and AGG in the wobble position of tRNAs that specifically recog- codons, although this also appears to be the case in several nize NNYcodons(those forasp, asn, cys, his, ile, phe and other metazoan mitochondrial systems. This may be at- tyr) . C in the wobble position of tRNAmetmust pair with tributable to posttranscriptional modification of nucleoti- both the A in ATA and theG in ATG. U is in the wobble des, whichis a common feature of nuclearencoded tRNAs position of allother WAS;these either do notdiscrimi- and has been demonstrated in a mitochondrial tRNA from nate among nucleotides in the third codon position a marsupial (JANKE and PAABo 1993). (fourfold degenerate codonfamilies) or specifically rec- TGA, like TGG codons, specify tryptophan: TGA is a ognize NNR codons (glu, gln, leu(UUR), lys and trp). stop codon in the “universal” genetic code and in the Novel tRNAgenes do not suggest acode change: Based mitochondrial code of plants, but specifies trp in the on their folding potential, two sequences in Katharina mitochondria of fungi (CUMMINGSand DOMENICO1988), mtDNA may encode two additional tRNAs, with AAA protozoa (PRITCHARDet al. 1990; ZIAIEand SUYAMA1987) and AGA anticodons (see below and Figure 5).It is not and all metazoans so far reported (seeJUKES and OSAWA known whether eitherof these sequences is transcribed 1990). TGA codons occur73 times in the mitochondrial or produces a functional tRNA, but sequence consider- genes of Katharina; 51 of these correspond in position ations (discussed below) argue against the latter.If they to trpin Drosophila proteins. Two tRNAs are identified encoded functional WAS,their productswould be ex- with NCA anticodons, suggesting that the TGN codon pected to recognize the codons TTT and TCT and to family is twofold degenerate: tRNAV, with anticodon specify phe and ser, respectively. However, it is theo- GCA, would be expected to recognize TGT and TGC; retically possible that these codons would be uniquely tRNAT, withanticodon UCA, would be expected to rec- recognized by the two tRNAs and might specify different ognize TGA and TGG. amino acids. To test this unlikely hypothesis, we com- ATA codons specify methionine rather than isoleucine: pared the correspondence of TIT and TCT codons in ATA codes for ile in the “universa1”geneticcode, but for Katharina mtDNA to Drosophila amino acid positions. met in the mitochondria of many organisms. Of 149 Out of 300 Katharina TTT codons, 153 corresponded to ATA codon occurrences in Katharina mtDNA, 53 cor- phe, 38 to leu, 25 to ile, 21 to tyr, and no more than 12 respond in position to met in Drosophila proteins and to any other amino acid. Out of 114 Katharina TCT only 14 correspond toile, suggesting that ATA codes for codons, 60 corresponded to ser (30 to TCT codons and met in Katharina mtDNA. ATA codons in Katharina 30 to one of the other 7 ser-specifying codons), and no mtDNA also correspond to leu positions in Drosophila more than 7 to any other amino acid. Therefore, it proteins in 36 of the 149 occurrences. It seems unlikely, appears likely that TTT and TCT specify phe and ser, however, that this is significant. Leu is the most abun- respectively. 432 J. L. Boore and W. M. Brown

Alanine c-2 c-E Arginine Asparagine T-: Aspartate C-: Cysteine A-: c -c T-A 0-C A-T A-T C-C A -1 A-T C -C C -c A-T T-A A-T A C -C T-A -T T -A T-A A -T c -0 TT A-T A -T A -T A -T A A- T TT T -A T -A T-A A A-T T TCCCC A TTCTC I A TAT T TCAA AATTTCA II I I I A T TCTTAC ACCCC 0 TCTCGA I I I AA A IIII A AA T ACTCA bb!!hI I A ATA A ATTC ACTT TTTC G Ill1 TC IIII c IIII A A IIII T AAAACC A A TCAT 'AA A T A T TCACC~ AA C A-~AGT T - T~ T'C' T-AcAcG T-A TT TT A-T T-A T- A T-A T-A A-T C-C A-T C-C G-C C-T A-T A-T C-C G-C T-A TT TC TA TA TA-TA TA TO TA TA TCC TCG TA CTT CTC CCA

A A T Glutamine :I; Histidine ;I; Isoleucine ;I; A-T T -A T -A A-T A -T A-T c -c C -C T -A A-T A -T C -C T T -A A T TTTTTC T TTCA C A a. IIIIII TA A IIII C TCTC AAAAAC A TTTA AACT C ((11 T A A CCAC A A A AC' AA aT-AAAC T-AAA A- T T-A C-C i-A A-T A-T C-C A-T c-c A- T A-T AA TG-cT TA TT TC TC TTC CTG CAT

Methionine T-i C-C T -A A-T A-T A-T A-T TA T CTCT AA A IIII T C TCCA GAGA TT T !).hi T TA A A T CT T- A GA C-C G- C C-C C- C A- T TC TC TA TC CAT 6 AA

Proline c-2 serine A-: Serine(~~~) Threonine c-; A-T A-TA- T C-C C-C (AGN) :I: Folding T -A A- T G-C Alternative A-T A-T T-A TT A- T A-T A-1 C-T CCwG-T TTCCCTTA - A T- 6 CCTTA' IIIII G C G I llllc ACCCG OTTT CCCGT I TTc IIII 'T TT- AA~~ T- A T- A T- A T- A T- A C-C C- C G- T C- T C- c C-C C- C C- C G- C T- A T-A AA CA CA CA CA TA TA TA TA TA TCC CCT CCT TCA TGT

A A Tyrosine A-T Valine T-A A-T A-T T-A A-T A-T A-T C-C A-T A-T T-A C-C T C-C T TTCC IA A CACA'A TACCCGC I I I I AA A IIII CAGG IT A Ill T A&AGCT CG- TT TA T- A T- A A- T T- A A- T T- A A- T CA T A- TC TA TA CTA TAC T-G TT T-A A- T TT AA AC CC AAA ACA FIGURE5.-Twenty-two Katharina mtDNA sequence elements folded into secondary structures resembling mitochondrial WAS. DNA sequences of the non-coding strand are shown. Bars connecting nucleotide pairs indicate base pairing, with GT pairs allowed. Dashed bars indicate potential base pairing within loops. Heavy lines indicate short complementary sequences within the large loops of serine tRNA. WAxr(AGN)is shown in both the form that has been hypothesized in other animal mtDNA studies and in an alternative formwith a folded D arm. The folded version is shifted 2-3 nucleotides relative to the version lackinga D arm. Two additional tRNA-like structures are shown in the box at the lower right (see text). C hiton Mitochondrial Genome Mitochondrial Chiton 433

Transfer RNAs Twenty-four sequences that could be The TWC arm derives its name from thenearly universal folded into tRNA-like structures with acceptable antic- presence, in nuclearencoded tRNAgenes, of TTCat the odons were identified. Of these, 22 corresponded to the first three positions in its loop (SPRINZLet al. 1991); in standard set of tRNA genes found in other metazoan the transcript, the first U is methylated to ribothymidine mDNAs; these are discussed in the first part of this sec- (T) andthe second is rearranged to pseudouridine (Y) . tion. The two supernumerary sequences, which haveno A TTC motifoccurs at this position in only two Katharina counterparts in other reported metazoan mtDNAs, are mt-tRNA, tRNAEr(AGN)and tRNAser(UCN),wherein it is fol- discussed in the latter part of this section. lowed by the dinucleotide RA (R = A or G),as is usual The standard tRNA gene set: The corresponding mi- for nuclear tRNA. The dinucleotide separating the tochondrial tRNA genes of Katharina and Mytilus have amino-acyl stem from the D armis most commonly TA identical anticodons, and in Drosophila only the tRNA'Ys (conventionally referred to as T, and b),but is TG in anticodon differs from these. The Katharina tRNA'F an- three of the WAS.In nuclearencoded WAS,T, usu- ticodon is UUU, as in vertebrate (see references above), ally pairs with A,, to generate partof an Lshapedtertiary Apis (CROZIERand CROZIER1993) and nematode structure. There is potential base pairing between po- mtDNAs (OKIMOTOet al. 1992), whereas it is CUU in sitions 8 and 14 in 20 of the Katharina mt-tRNA: T8-Al4 Drosophila, mosquito (HSUCHENet al. 1983) and Fas- in 17, &-T,, in two, and C8-G,, in one. T,-Al4 base pairing ciola mtDNAs (GAREYand WOLSTENHOLME1989). The is also possiblein both of the supernumerary tRNA-like codons MAand AAG specify lys, so a U, as occurs in structures (Figure 5). Katharina, would be expectedin the wobble position of Except for Mytilus, all metazoan mtDNAs that have the anticodon. The C found at this position in some been sequenced encode only one tRNA"'", which pre- organisms is unexpected and must, under suitable cir- sumably functions in both initiation and elongation. cumstances, be able to pair with both Aand G (thisis also DUBINand HSUCHEN(1984) analyzed the mt-tRNAmet of true for the CAU anticodon of tRNAme', which recog- Aedes (mosquito) in detail, and found that it has both nizes both ATA and ATG) and to discriminate against the U preceding the anticodon, as is typical of conven- the codonsAAT and AAC, which specify asp and whose tional elongator tRNAs, and the three GC pairs in the cognate tRNA has the anticodon GUU. anticodon stem, as is typical of initiator tRNA, consis- As shown in Figure 5, all Katharina mt-tRNAs have a tent with a dual role. Except for a GA mismatch in the seven-member amino-acyl stem, a five-member antic- anticodon stem, mt-tRNAmet ofKatharina shares these odon stem, and a seven-member anticodon loop. The features, along with the two T (= U) nucleotides in the variable arm sizes range from 3 to 6 nt,with 4 nt being anticodon stem that have been shown to be postran- most common. The D arm (shown to the left in each scriptionally modified to pseudouridine in the Aedes tRNA in Figure 5) has a paired stem that varies from 3 mt-tRNAmetand a nearly perfect match to all nt in the D to 5 nt, without mismatch, and an unpaired loop offrom arms of Aedes and Drosophila mt-tFtNAmet(DUBIN and 3 to 8 nt. The opposite(TWC) arm has a paired stem that HSUCHEN1984; CLARYand WOLSTENHOLME1985a). varies, except in tRNAglu, from 3 to 6 nt, without mis- In all metazoan mtDNAs examined, the D armof the match, and an unpaired loopof from 3 to 8 nt. TheTWC mAser(AGN) is unpaired (see GAREYand WOLSTENHOLME stem of tRNAg'" is only 2 nt; in all other respects the 1989). In Katharina mtDNA, the can be structure is conventional, but this short stem, coupled folded into a structureresembling a conventional tRNA, with the failure of this gene to directly abut a neigh- complete with a paired D arm.Alternatively, by slightly boring gene at either end, ledus to search for an alter- altering the inferredsize and position of this gene (add- native sequence. All 'TTC trinucleotides in intergenic re- ing 3 nt at the3' end and omitting 2 nt at the 5' end) gions ofeither strand were investigatedas possible tRNAgl' a tRNA-like structure without a paired D arm,similar to anticodons, but none with flanking sequences that could the tRNAser(AGN)of Drosophila, can be formed. Both be folded into a suitable tRNA structure were found. structures are presented in Figure 5. Favoring the struc- FivetRNA genes have one mismatch each in the ture lacking a paired Darm is that it mostclosely amino-acyl stem; in four, the mismatch is between the matches those proposed formt-tRNAser(AGN) in other in- bases of the 6th nt pair. Six genes have mismatches in vertebrates; this similarity extends to the numberof un- the anticodon stem: tRNAwg has two mismatches, be- paired nucleotides. In this, but notin the structurewith tween the first 2 nt pairs at the topof the stem; tRNAmet the paired D arm, thereis the potential for interaction also has two, but between nt pairs 1 and 3; of the re- of T, and A,, and, perhaps more significantly, the TWC maining four, threehave a mismatch between the 1st nt loop exactly matches the sequence of this loop in the pair, and two between the 2nd nt pair. mAser(AGN) ofanother mollusc, the snail Plicopurpura All 22 tRNA genes have a Timmediately preceding the (T. COLLINSand W. BROWN,unpublished data), in which anticodon; immediately following it, 17 have an A and an alternative pairing of the D arm is not possible. As an 5 have a G. A single purine nucleotide separates the additional ambiguity, there is potential for up to 4 ad- anticodon stem from the Darm in 19 of the tRNAgenes. ditional base pairs in the anticodonstem oftRNAser(AGN). 434 J. L. Boore and W. M. Brown

A variant secondary structure with a six-member antic- rRNA genes extend to the exact boundaries of their odon stem has been suggested for bothinsect and mam- adjacent genes, then genes of similar length and good malian tRNAser(AGN)(DUBIN et al. 1984 and references match in primary sequence to those of Drosophila are therein)and demonstrated for bovine tRNAser(UCN) obtained. Also, correspondingportions of the se- (YOKOGAWAet al. 1991). quences can be folded intosecondary structures that are Innematodes, butnot in other metazoans, mt- similar to those proposed for other metazoan s- and tRNAsr(UCN) also lacksa paired D arm (WOLSTENHOLME1-rRNAs (Drosophila, e.g.; CLARYand WOLSTENHOLME et al. 1987). Surprisingly, the D arm of the mt- 1985b). Interpreted in this way, the mt-+rRNA gene tRNAser(UCN)of Katharina is also unpaired. Finally, the comprises 826 nt and the mt-1-rRNA gene comprises seven-member TlIrC loop of Katharina mt-tRNASer(UCN) 1275 nt in Katharina [compared, respectively, with 789 matches the TWloop of Katharina tRNAser(AGN)at six and 1326 nt in Drosophila (CLARYand WOLSTENHOLME of seven positions. 1985b), ca. 819 and 1640 nt in Xenopus (ROE et al. Supernumerary tRNA genelike sequences: Two addi- 1985), ca. 880 and 1525 nt in echinoderms (JACOBS et al. tional sequences can be folded into secondary structures 1988; CANTATORE et al. 1989; DE GIORGIet al. 1991), ca. resembling tRNAs (Figure 5). If these function as tRNA 955 and 1570 nt in mammals (BIBBet al. 1981), and ca. genes, they presumably recognize the codons TTT and 945 and 1245 nt in Mytilus (although thereis significant TCT, which correspond to phe and ser, respectively. ambiguity in determining the3’ end of the 1-rRNA gene; They are,therefore, provisionally identified as HOFFMANNet al. 1992)]. &NAphe(UUU)and tRNASer(UCU). H owever, the following Functionally unassigned sequences: The largest un- suggest that they are non-functional, at least as tRNA assigned sequence, 424 nt, is between tRNA* and C02 (AP genes: Each has an unprecedented anticodon sequence, PENDIx, positions 1683-2106). Thissequence is 75.3%A -t AAA and AGA, with A in the wobble position; no tRNA T, with most ofthe C and G present in homopolymer tracts. encoded in Katharina, Mytilus,Apis or Drosophila No open reading frames of more than 100 nt (33 amino mtDNAs has an A in this position, although there is an acids) are present. Four sequencesthat can be folded into A in the wobble position of mt-tRNA”‘g in Caenorhab- hairpin structures with stems210 nt andloops 550 ntare ditis and Ascaris. Each has several mismatches within present; their stems are formed by base-pairing between stems, and tRNAPhe(UUU)has only 2 nt in the variable arm. positions1740-1756 and 1758-1774 (15/16 match), Finally, neither has a T preceding the anticodon, as is 1769-1780 and 1803-1814,1833-1847 and 1849-1863, found in all other Katharina WAS. and 2034-2044 and 2073-2083 (see APPENDIX). If these two sequences donot encode functional The second largest unassigned sequence, of 141 nt, is tRNA, how might one explain this similarity? The positioned between tRNA@uand C03. This location co- tRNAscr(UCUisequence occupies the region between C02 incides with one of the two boundaries between oppo- and ATPase8, except for 3 nt flanking each end; the sitely transcribed portions of the genome(Figure 1).128 tRNAf’he((UUU)sequence is positioned between CO1 and of the 141 nt are As or Ts, 72 of which occur in a tract tRNAQSf’,ending 3 nt from CO1 and 10 nt from tRNA””. of 36 consecutiveAT pairs. Althoughalternating AT tracts One hypothesis is that these sequences originated from are found in the mtDNAs of C. elegans, Ascaris suum (Om- tRNA gene duplications, that either they and/or their MOTO et al. 1992), SpodOptera fmgiperda (PASHEYand KE functional counterparts were transposed from the du- 1992), and D.yduba (CLARYand WOLSTESHOLME1985a), plication sites, and that their tRNA-like structures have that in Katharinais the longest reported. A tract of 17 As been conserved because they mediate other functions in this region is immediately upstreamof CO3; part of this (e.g.,replication, termination of D-loop DNA synthesis, tract and its flanking sequence (positions 12922-12944; transcript processing). Sequence comparisons with the see APPENDIX) are an 18/23 match to the reverse comple- other Katharina mt-tRNA genes provide no support for ment of positions 1940-1918 in the largest unassignedse- a duplication hypothesis; however, if the duplication quence. At the other end of this 141-nt region, positions events were ancient, the initial sequence resemblances 12810-12815 are an exact match to positions 1744-1749 would be erodedby substitutions. Experimental data on in the largest unassigned region. Whether either match replication and transcription initiation andon tran- has significance is unknown. script processing in Katharina, as well asadditional com- Two other lengthy regions (those in which the two parisons of mtDNAsequences from otherspecies of chi- tRNA-like structures designated tRNAs’(Ucu) and tons and representatives of other molluscan classes are tRNAPhe(U”U)occur) may also belong in the unassigned needed to clanfy this. category. The remaining 120 unassigned nt occur in- Ribosomal RNAs Katharina mtDNA contains genes tergenically, in short segments of 1-28 nt. No commonly for thes-rRNA and 1-rRNA of mitochondrialribosomes. held sequence elements or other features were found The precise boundaries of these genes cannot be de- among them. termined with certainty on the basis of sequence com- Phylogeneticimplications: The arrangement of parisons among species. If one assumes that the mt- genes in Katharina mtDNA is quite similar to those of Chiton Mitochondrial Genome 435 several other metazoan phyla. This was unexpected, WN UUR given the marked dissimilarity in arrangementbetween I-rRNA L L N)1 another mollusc, the bivalve M. edulis, and these same phyla, and given the much closer phylogenetic affinity that is thought to exist between Mytilus and Katharina. VertebratesArthropods Molluscs Ignoring tRNA genes, two rearrangement events are UUR WN CUN UUR I-rRNA L W1 I-rRNA L M1 I-rRNA L L W1 necessary to interconvert the gene arrangementsof Ka- AGY CUN tharina and Drosophila (Figure 2), whereas many (210, N)4 HS L NDS N)4 H NDS N)4 H N)S including loss ofA TPase8) are requiredto interconvert UCN UUR uuu those of Katharina and Mytilus. Moreover, neither Ka- CO1 S D CO2 col L m2 CO1 F? D UNK CO2 tharina nor Mytilus share any unique feature (apomor- FIGURE6.-Evolutionary hypothesis for the positioning of phy) that is in either of their respective mtDNAs (e.g., the leucine tRNA genes inmollusc, vertebrateand mtDNAs. The positionsof tRNAku(C"N)and tRNA'cUfUuR)relative proteingene sizes;possible supernumerary tRNA to flankinggenes are depicted for eachof these mtDNAs. The genes). Finally, asdiscussed above, the proteinencoding gene arrangement in the hypothetical ancestorof vertebrates, gene sequences of Katharina and Drosophila are more arthropods and molluscs is depicted at the top. The molluscan similar than those of Katharina and Mytilus or of Dro- arrangements, shown on the right, are unchanged(at least in sophila and Mytilus (Table 2). this respect) from the ancestral state. In the lineage leading to vertebrates, tRNA'cu(CUN)was translocated toa new position be- One explanation for these observations is that the as- tweenand ND5, as shown on theleft. Inthe lineage sumed relationship ofbivalves and astaxa leadingto arthropods, however, the other tRNA"" gene (classes) within the phylum is incorrect. This [ tRNA'eu(""R)]was translocated toa new position, between GO1 appears to be unlikely, however; the placement of bi- and CO2, resulting in the arrangements shown in the middle. valves within the Mollusca is strongly supported by com- Genes are designated as in Figure 1. parative morphology (BRUSCAand BRUSCA1990; see p. 762). Another explanation is that rates of both substi- vokes a frequently observedphenomenon (tRNA transpe tution and rearrangement in mtDNA are or have been sitions; seeMORITZ et al 1987),rather than one that (while grossly different in the lineages leading to Katharina and theoretically possible) has not been observed elsewhere Mytilus. If true, then bothrates must have accelerated in among metazoan mtDNAs. the lineage leading to Mytilus, since its sequence and We anticipate that as more metazoan mtDNA gene gene arrangement are more derived than Katharina's. arrangements become known, it will be possible to dis- We have analyzed all published mtDNA gene arrange- cern other patterns of evolutionary rearrangement. At ments using cladistic methods (HENNIG1966); none of least two factors suggest that gene arrangement com- the gene arrangements shared between Katharina and parisons will be useful for inferring the phylogeny of Drosophila mtDNAs supports their having a common major metazan groups. (1) The very large number of ancestor to theexclusion of Mytilus (J.L. BOOREand W. theoretically possible arrangements makes it unlikely M. BROWN,manuscript in preparation; BOOREand that various organisms would convergently adopt iden- BROWN1994). Expansion of the mtDNA data set and tical gene orders; thus, shared derived gene orders are comparisons of conserved nuclear genesfrom these and very likely to indicate a common evolutionary history. additional taxa may clarify this. (2) In general, rearrangements appear to occur infre- In all vertebrate mtDNAs examined tRNA"u(UURIis quently, on a time scale that is appropriate for address- between I-rRNA and NDI, whereasin insect mtDNAs ing these ancient divergences. (CLARYand WOLSTENHOLME1985a; CROZIERand CROZIER 1993;PAsHLmand 1992; HSUCHENall984) tRNA'"'cw This work was supported by grants from the National Institutesof KE et Health(GM30144; GM07544), the National Science Foundation is at this location (see Figure 6).Although codon switch- (BSR-9107306;DEB-9220640) and the H. H. RackhamGraduate ing was invoked to explain this (CANTATORE et al. 1987a), School of the University of Michigan. We thank R. Cox for collecting the Katharina and Mytilus gene arrangements suggest an- and purifymg mtDNA fromK. tunicata, L. LYNNEDAEHLER for expert other, more plausible explanation. In Katharina the ar- technical assistance,and T. COLLINS,S. FUERSTENBERG, R. HOFFMANN, E. rangement is l-rRNA-tRNA'"~Cw-tRNA'"(~)-~l;in Myti- F. KRAus and G. P. NAYLORfor many helpful comments. The Katharina mtDNA sequence has been placed in GenBank under accession num- lus it is tRNA'"(Cw-tRNA'"(m-hDl (I-rRNA is located ber U09810. elsewhere).The arrangement in whichboth leucine tRNA genes are between ML" and hDI may represent the LITERATURE CITED primitive coelomate condition. Iftrue, then in the lineage ANDERSON, S., A. T. BANNER,B. G. BARRELL,M. H. L. DE BRUIJN,A. R. leading to vertebratestRNA'"(cciN) must have translocated to COUL~ONet al., 1981 Sequence and organization of the human a position betweentRNAMAGy) and hD5 and in the lineage mitochondrial genome. Nature 490: 457-465. leading to insects tRNA"(m' must have translocated to a ANDERSON, S., M. H. L. DE BRUIJN,A. R. COULSON,I. C. EPERON,F. SANCER et al., 1982 Complete sequence of bovine mitochondrial DNA position between COl and C02. We believe this hypothesis conserved features of the mammalian mitochondrial genome. to be more plausible than codon switching, because it in- J. Mol. Biol. 156 683-717. 436 J. L. Boore and W. M. Brown

ARNASON, U., and E. JOHNSON, 1992 The complete mitochondrial CUMMINGSD. H., and J. M. DOMENICO,1988 Sequence analysis of mito- DNAsequence of the harborseal, Phoca vitulina.J. Mol. Evol.34: chondrial DNA from Pcdospora ansoina. J. Mol. Biol. 204: 815-839. 493-505. DE BRUIJN, M.H. L., 1983 Drosophila melampfmmitochondrid DNA, a ARNASON,U., A. GULLBERCand B. WIDEGREN,1991 Thecomplete novel organization and genetic code. Nature 304: 234-241. nucleotide sequence of the mitochondrial DNA of the fin whale, DE GIORGI,c., C.LANAVE, M. D. MUXI and C. SACCONE,1991 Mito- Balaenoptera physalus. J. Mol. Evol. 33: 556-568. chondrial DNA in the sea urchin Arbacia lixula: evolutionary AZEVEDO,J. L., and B. C. Hm,1993 Molecular characterization of inferences from nucleotide sequence analysis. Mol. Biol. Evol. 8 lengthy mitochondrial DNA duplications fromthe parasitic 515-529. nematode Romanomermis culicivorax. Genetics 133: 933-942. DEL SAL,G., G. MANFIOLETTI and C. SCHNEIDER,1988 A one-tube plas- BATTEY,J., and D. A. CLAYTON,1980 The transcripion map of human mid DNA mini-preparation suitable for sequencing.Nucleic Acids mitochondrial DNA implicates transfer RNA excision as a major Res. 16 9878. processing event. J. Biol. Chem. 255: 11599-11606. DESJARDINS, P., and R. MOMS, 1990 Sequence and geneorganization BATUCAS,B., R. GARESSE,M. CALLEJA, J. R.VALVERDE and R. MARCO, of the chicken mitochondrial genome: a novel gene order in 1988 Genome organization of Artemia mitochondrial DNA. higher vertebrates. J. Mol. Biol. 212: 599-634. Nucleic Acids Res. 16: 6515-6529. DESJARDINS,P., V. RAMIWZ and R. MouIs, 1990 Gene organization of BIBB,M. J., R. A. VAN ETTEN,C. T. WRIGHT,M. W. WALBERGand D. A. the Peking duckmitochondrial genome. Curr. Genet. 17: CLAYTON,1981 Sequence and gene organization of mouse mi- 515-518. tochondrial DNA. Cell 26: 167-180. D'ONOFRIO,G., D. MOUCHIROUD,B. AISSANI, C. GAUTIERand G. BERNARDI, BIRD,A. P., 1986 CpGrich islands and the function of DNA methy- 1991 Correlations between the compositional properties of hu- lation. Nature 321: 209-213. man genes, codon usage, and amino acid composition of pro- BOGENHAGEN,D. F., S. S. CAIRNS and B. K. You, 1985 Nucleotide se- teins. J. Mol. Evol. 32: 504-510. quences involved in the controlof transcription and displacement DROUIN,G., 1991 Nonrandom CpG mutations affect the synonymous loop DNA synthesis for Xenopus leavis mtDNA, pp. 175-182 in codon usage of moderately GGrich single copy actingenes. Achievements and Perspectivesof Mitochondrial Research, Vol. II: J. Mol. EvoI. 33: 237-240. Biogenesis, edited by E. QUAGLIARIELLOet al. Elsevier Science Pub- DUBIN,D. T., and C.-C. HSUCHEN,1984 Sequence and structureof a lishers, New York. methionine transfer RNA from mosquito mitochondria. Nucleic BONITZ,S. G., R. BERLANI,G. CORUZZI,M. LI, G. MACINO et al., Acids Res. 12: 4185-4189. 1980 Codon recognition rules in yeast mitochondria. Proc. Natl. DUBIN,D. T., C.4.HSUCHEN, G. R. CLEAVESand K. D. TIMKO, 1984 Se- Acad. Sci. USA 77: 3167-3170. quence and structure of a serine transfer RNA with GCU antic- BOORE,J. L., and W. M. BROWN,1994 Mitochondrial genomes and the odon from mosquito mitochondria. J. Mol. BIOI. 176: 251-260. phylogeny of mollusks. Nautilus (in press). DUBIN,D. T., C.-C. HSUCHENand L.E. TILLOTSON,1986 Mosquito BRIDGE,D., c. W.CUNNINGHAM, B. SCHIERWATER, R. DESALLEand L. W. mitochondrial tRNks for valine, glycine and glutamate: RNA and Buss, 1992 Class-level relationships in the phylum Cnidaria: evi- gene sequences and vicinal genome organization. Curr. Genet. dence from mitochondrial genome structure. Proc. Natl. Acad. 10: 701-707. Sci. USA 89 8750-8753. FORAN,D. R., J. E. HIXSONand W.M. BROWN,1988 Comparisons BROWN,W. M., 1980 Polymorphism in mitochondrial DNA of hu- of ape and human sequences that regulate mitochondrial DNA mans as revealed by restriction endonuclease analysis. Proc. Natl. transcription and D-Loop DNA synthesis. Nucleic Acids Res. 16 Acad. Sci. USA 77: 3605-3609. 5841-5861. BROWN,W. M., 1985 The mitochondrial genomeof animals, pp. 95- FULLER,K. M., and E. ZOUROS,1993 Dispersed length polymorphism 130 in MolecularEvolutionary Genetics, edited by R.J. MACINTYRE. of mitochondrial DNA in the scallop Placopecten magellanicus Plenum Press, New York. (Gmelin). Curr. Genet. 23 365-369. BRUSCA,R. C., and G. J. BRUSCA,1990 Invertebrates. Sinauer, Sun- GADALETA,G., G. PEPE,G. DE CANDIA, C. QUAGLIARIELLO, E. SBISAet al., derland MA. 1988 Nucleotide sequence of rat mitochondrial NADH dehy- CANTATORE, P., M. N. GADALETA,M. ROBERTI,C. SACCONEand A. C. WIL- drogenase subunit 1. GTG, a new initiator codon in vertebrate SON,1987a Duplication and remoulding of tRNA genes during mitochondrial genome. Nucleic Acids Res. 16: 6233. the evolutionary rearrangement of mitochondrial genomes. Na- GADALETA,G., G. PEPE,G. DE CANDIA,C. QUAGLIARIELLO, E. SBISA et al., ture 329: 853-855. 1989 The complete nucleotide sequence of the Rattus norue- CANTATORE, P., M. ROBERTI,P. MORISCO,G. RAINALDI,M. N. GA~ALETA gicus mitochondrial genome: cryptic signals revealed by compara- et al., 19871, A novel gene order in the Paracentrotus lividus tive analysis between vertebrates. J. Mol. Evol. 28: 497-516. mitochondrial genome. Gene 53 41-54. GARESSE,R., 1988 Drosophilamelanogaster mitochondrial DNA CANTATORE, P.,M. ROBERTI,G. RAINALDI, M. N. GADALETAand C. SAC- gene organization and evolutionary considerations. Genetics 118 CONE,1989 The completenucleotide sequence, gene order and 649-663. genetic codeof the mitochondrial genomeof Paracentrotus livi- GAREY,J. R., and D. R. WOLSTENHOLME,1989 Platyhelminth mitochon- dus. J. Biol. Chem. 264: 10965-10975. drial DNA evidencefor early evolutionary origin of a tRNA"'AGN CHANG,Y.-S, F.-L. HumcandT.-B. Lo,1994 The complete nucleotide that contains a dihydrouridine arm replacement loop and of sequence and gene organization of carp (Cyprinus carpio) mi- serine-specifymg AGAand AGG codons. J. Mol. Evol.28: 374-387. tochondrial genome. J. Mol. Evol. 38: 138-155. HARRISON,R. G., 1989 Animal mitochondrial DNA as agenetic CHUNG,C. T., andR. H. MILLER,1988 Arapid and convenient method marker in population and evolutionary biology. Trends Ecol. for the preparation and storage of competent bacterial cells. Evol. 4 6-11. Nucleic Acids Res. 16: 3580. HENNIG,W., 1966 PhylogeneticSystematics. University of Illinois CLARY,D. O., and D. R. WOLSTENHOLME,1985a The mitochondrial Press, Urbana, Ill. DNA molecule of Drosophila yakuba: nucleotide sequence, gene HIMENO,H., H. MASIKI,T. UWAI,T. OHTA,I. KUMAGAIet al., 1987 Un- organization, and genetic code. J. Mol. Evol. 22 252-271. usual genetic codes and a novel gene structure fortRNASer(*"' in CLARY, D.O., and D. R. WOLSTENHOLME,1985b The ribosomal RNA starfish mitochondrial DNA. Gene 56: 219-230. genes of Drosophila mitochondrial DNA. Nucleic Acids Res. 13: HOFFMANN,R. J., J. L. BOOREand W. M. BROWN,1992 A novel mito- 4029-4044. chondrial genomeorganization for the bluemussel, Mytilus edu- CLAYTON,D. A,, 1991 Replication and transcription of vertebrate mi- lis. Genetics 131: 397-412. tochondrial DNA. Annu. Rev. Cell Biol. 7: 453-478. HSUCHEN,C.-C., G. R. CLEAVESand D. T. DUBIN, 1983 A major lysine CLAKON,D. A,, 1992 Transcription and replication of animal mito- tRNA with a CUU anticodon in insect mitochondria. Nucleic AC- chondrial DNA. Int. Rev. Cytol. 141: 217-232. ids Res. 11: 8659-8662. COLLINS,D. W., and T. H. JUKES, 1993 Relationship between G + C HSUCHEN,C.C., R. M. KOTIN and D. T. DUBIN,1984 Sequencesofthe cod- in silent sites of codons and amino acid composition of human ing and flanking regions of the large ribosomal subunit RNA gene proteins. J. Mol. Evol. 36: 201-213. of mosquito mitochondria. Nucleic Acids Res. 12 7771-7785. CROZIER,R. H, and Y. C. CROZIER, 1993The mitochondrial genome JACOBS, H. T., D. J. ELLIOTT,V.B. MATH and A. FARQUARSON, of the honeybee Apis mellifera: complete sequence and genome 1988 Nucleotide sequence and geneorganization of sea urchin organization. Genetics 133: 97-117. mitochondrial DNA. J. Mol. Biol. 202: 185-217. Ch iton Mitochondrial Genome Mitochondrial Chiton 437

JANKE,A,, and S. P&o, 1993 Editing of a tRNA anticodon in mar- RUSSELL,G. J., P. M. B. WALKER,R. A. ELTONand J. H. SUBAK-SHARPE, supial mitochondria changes its codon recognition. Nucleic Acids 1976 Doublet frequency analysis of fractionated vertebrate Res. 21: 1523-1525. nuclear DNA. J. Mol. Biol. 108 1-23. JOHANSEN,H., P. H. GunDAL and T. JOHANSEN,1990 Organization of SAMBROOK,J., E. F. FRITSCHand T. MANIATIS, 1989 Molecular Cloning: the mitochondrial genome of Atlantic cod, Gadusmorhua. A Laboratory Manual, Ed. 2. Cold Spring Harbor Laboratory, Nucleic Acids Res. 18: 411-419. Cold Spring Harbor, N.Y. JUKES,T. H., and S. OSAWA,1990 The genetic code in mitochondria SANGER,F., S. NICKLENand A. R. COULSON,1977 DNA sequencingwith and chloroplasts. Experientia 46: 1117-1 126. chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74: 5463- KING,T. C., and R. L. Low, 1987 Mapping of control elements in the 5467. displacement loop region of bovine mitochondrial DNA. J. Biol. SMITH,M. J., D. K. BANFIELD, K. DOTEVAL,S. FORSKI and D. J. KOWBEL, Chem. 13: 6204-6213. 1989 Gene arrangement in sea star mitochondrial DNA dem- KYTE,J., and R. F. DooLITTLE, 1982 A simple method for displaying onstrates a major inversion event during echinoderm evolution. the hydropathic character ofa protein. J. Mol. Biol.157: 105-132. Gene 76: 181-185. LISZEWSKI,M. K., V. KUMARand J. P. ATKINSON,1989 “Midiprep”iso- SMITH,M. J., D. K. BANFIELD, K. DOTEVAL,S. FOR~KJ andD. J. KOWBEL, lation of plasmid DNA in less than two hours for sequencing, 1990 Nucleotide sequence of nine proteincoding genes and 22 subcloning and hybridizations. Biotechniques 7: 1079-1081. tRNAs in the mitochondrial DNA of the sea star Pisaster ochra- MCCRACKEN, A,,I. UHLENBUSCH and G. GELLISSEN,1987 Structure of ceus. J. Mol. Evol. 31: 195-204. the cloned Locusta migratoriamitochondrial genome: restriction SMITH,M. J.. A. ARNDT, S. GORSKIand E. FAJBER,1993 The phylogeny mapping and sequence ofitsND-1 (URF-1) gene. Curr. Genet. 11: of echinoderm classes based on mitochondrial gene arrange- 625-630. ments. J. Mol. Evol. 36 545-554. MONTOYA,J., D. OJU and G.ATTARDI, 1981 Distinctive features of the SNYDER,M., A. R. FRASER, J. LAROCHE,K. E. GARTNER-KEPKAYand E. 5”terminal sequences of the human mitochondrial mRNAs. Na- ZOUROS,1987 Atypical mitochondrial DNA from the deepsea ture 465-470. 290: scallop Placopecten magellanicus. Proc. Natl. Acad. Sci. USA 84: MONTOYA,J., T. CHRISTIANSON,D. LEVENS, M. RAFXNOWITZand G.AnmI, 7595-7599. 1982 Identification of initiation sites for heavy-strand and light- SPRINZL,M., N. DANK,S. NOCKand A. SCHON,1991 Compilation of strand transcription in human mitochondrial DNA. Proc. Natl. tRNA sequences and sequences of tRNAgenes. NucleicAcids Res. Acad. Sci. USA 79: 7195-7199. 17: rl-rl71. MOIUTZ, C., and W. M. BROW,1986 Tandem duplications of D-loop and ribosomal RNA sequences in lizard mitochondrial DNA. STORMO,G. D., T. D. SCHNEIDER and L. M. GOLD,1982 Characteriza- Science 233 1425-1427. tion of translation initiation sites inE. coli. Nucleic Acids Res. 10: MORITZ, C., and W. M. BROW,1987 Tandem duplications in animal 2971-2996. mitochondrial DNAs: variation in incidence and gene content TZENG,C.S., C.-F. HUI,S.-C. SHENand P. C. HUANG,1992 The com- among lizards. Proc. Natl. Acad. Sci. USA 84: 7183-7187. plete nucleotide sequence of the Crossostome lacustre mitochon- MORITZ,C., T. E. DOWLINGand W. M. BROWN,1987 Evolution of ani- drial genome: conservation and variations among vertebrates. mal mitochondrial DNA relevance for population biology and Nucleic Acids Res. 20: 4853-4858. systematics. Annu. Rev. Ecol. Syst. 18: 269-292. UHLENBUSCH,I., A. MCCRACKENand G. GELLISSEN,1987 The gene for OHYAMA,K., Y. OGURA,K ODA,K. YAMATO,E. OHTA et al., 1991 Evo- the large (16s) ribosomal RNA from the Locusta migratoria mi- lution of organellar genomes, pp. 187-198 in Evolution ofLife: tochondrial genome. Curr. Genet. ll: 621-638. Fossils, Molecules, and Culture,edited by S. OSAWAand T. HONJO. WARRIOR, R.,and J. GALL,1985 The mitochondrial DNAof Hydra Springer-Verlag, Tokyo. attenuata and Hydra littoralis consists of two linear molecules. OJAIA,D., C. MERKEL,R. GELFAND and G. AnmI, 1980 The tRNA Arch. Sci. 38: 439-445. genes punctuate the reading of genetic information in human WOUTENHOLME,D.R., 1992 Animal mitochondrial DNA structure mitochondrial DNA. Cell 22 393-403. and evolution. Int. Rev. Cytol. 141: 173-216. OJAIA,D., J. MONTOVAand G. AT~ARDI,1981 tRNA punctuation model WOLSTENHOLME,D. R., J. L. MACFARLANE,R. OKIMOTO, D. 0. CLARYand of RNA processing in human mitochondria. Nature290: 470-474. J. A. WAHLEITHNER,1987 Bizarre tRNAs inferred from DNAse- OKIMOTO,R., H. M. CHAMBERLIN,J. L. MACFARLANE and D. R. WOLSTEN- quences of mitochondrial genomes of nematode worms. Proc. HOLME,1991 Repeated sequence sets in mitochondrial DNA Natl. Acad. Sci. USA 84: 1324-1328. molecules of root knot nematodes (Meloidogyne):nucleotide se- WRIGHT,J. W., C. SPOLSKVand W. M. BROWN,1983 The origin of the quences, genome location and potential for host race identifica- parthenogenetic lizard Cnemidophorus laredoensis inferred from tion. Nucleic Acids Res. 19: 1619-1626. mitochondrial DNA analysis. Herpetologica 39: 410-416. OKIMOTO,R., J. L. MACFARLANE, D. 0. Cumand D. R. WOLSTENHOLME, YOKOBORI,S., T. UEDAand K. WATANABE,1993 Codon AGA and 1992 The mitochondrial genomes of two nematodes, Caenorh- AGG are read as glycine in ascidian mitochondria. J. Mol. Evol. abditis elegans and Ascaris mum. Genetics 130: 471-498. 36: 1-8. Pfi~o,S., W. K. THOMAS,K. M. WHITFIELD,Y. KUMAZAWA and A. C.WIL- YOKOGAWA,T., Y.4. WATANABE,Y. KUMAWWA,T. UEDA,I. Hlwo et al., SON, 1991 Rearrangements of mitochondrial transfer RNA 1991 A novel cloverleaf structure found in mammalian mito- genes in marsupials. J. Mol. Evol. 33: 426-430. chondrial tRNRer(UCN).Nucleic Acids Res. 19: 6101-6105. PASHLEY,D. P., and L. D. KE, 1992 Sequence evolution in mitochon- ZEMRING,C. E., C. MORITZ,A. HEIDEMAN R.and A. STURM,1991 Par- drial ribosomal and ND-1 genes in : implications for allel origins of duplications and the formation of pseudogenes in phylogenetic analyses. Mol. Biol. Evol. 9: 1061-1075 mitochondrial DNA from parthenogenetic lizards (Heteronotia PRITCHARD,A. E., J. J. SEILHAMER,R. MAHALINGAM, C. L. SABLE, S. E. binoei: Gekkonidae). J. Mol. Evol. 33: 431-441. VENUTIet al., 1990 Nucleotide sequence of the mitochondrial ZIAIE, Z., and Y. SUYAMA,1987 The cytochrome oxidase subunit I gene genome of Paramecium. Nucleic Acids Res. 18(1): 173-180. of Tetrahymena: a 57 amino acid NHZ-terminalextension and a ROE,B. A,, D.-P. MA, R. K. WIWNand J. F.-H. WONG,1985 The com- 108 amino acid insert. Curr. Genet. 12 357-368. plete nucleotide sequence of the Xenopus laeuis mitochondrial genome. J. Biol. Chem. 260: 9759-9774. Communicating editor: A. G. CLARK

APPENDIX Sequence of K. tunicata mtDNA is given in Figure 7 (pp. 438-443). Orientation is as in Figure 1. Gene products are shown above the sequence; arrows indicate directionof transcription. Genesare abbreviated as in the text, except that three letter amino acid designations are used to denotethe corresponding tRNA genes. tRNA gene positions are denoted by overlining; anticodons are underlined. Precise terminiof rRNA genes are unknown; dashes indicate approximate termini.Asterisks denote stop codons and abbreviated stop codons. Protein genesare translated, using single letter amino acid designations; theseare centered below the 2nd nt of the corresponding codon. N denotes an unidentified nt; X denotes an unidentified amino acid. 438 J. L. Boore and W. M. Brown

COl-") ATGCGATGAATTTTTTCIA~TCATAAAGATATTGG100 MRWIFSTNHKDIGTLYILFGIWAGLVGTALSLL

TTCGTGCAGAGCTAGGTCAACCA~TTTATTGGGGGT200 IRAELGQPGALLGDDQLYNVIVTAHAFVMIFFLV

TAn;CCTATAATAATTGGGG~C~TAGTGCC~TAAT~T~Gn;CCG~An;GCTTTCC~~TTAAATAATATRAGTTT~GA300 MPMMIGGFGNWLVPLMLGVPDMAFPRLNNMSFW

CT~TGCCTCCGGCATTATGTCTTTTGTTAGCTTCAGGG400 LLPPALCLLLASGAVESGAGTGWTVYPPLAGNV

GGCATGCTGG~;GATCTGTTGA~T~TATTTTTT~TTACATTTAG~GGA~ATCGTCTAT~AGG~T~TAATTTTATTACTACAAT~;TAAA500 GHAGGSVDLAIFSLHLAGVSSILGAVNFITTIVN

TATACGAAGAGAGAGCGATTACCTPTCT 600 MRSEGMQLERLPLFVWSVKITAILLLLSLPVLA

GGAGGAATTACAATATTGTTAACTGATCGTAATTTTAATAGT 700 GGITMLLTDRNFNTSFFDPAGGGDPILYQHLFW

TTTTTGGPCATCCTGRAGT~ATATCTTAATTTTACCTC800 FFGHPEVYILILPGFGMISHIVMHYSSKKETFGT

TTTAGGAATAATTTATGCGATATTAGCTATTGGTCTTTT~GATTTATn;TTTGAGCACATCATAn;T~GTAGTAGGGATGGAT~TGAT~TCG~T900 LGMIYAMLAIGLLGFIVWAHHMFVVGMDVDTRA

TATTTTACTGCGGCTACTATGATTATTGCTGTTCCTACTGGAATTAAGATT~TA~G~T~TAC~TTTA~G~CGGATTGGPAGGGAGACTC1000 YFTAATMIIAVPTGIKIFSWLATIYGARIGSET

CTATAATGn;GGCTITAGGATTTATTTT~TATTTA~G~G~~~A~GGTATT~G~ATC~TTCTTCTTTAGATATTAT~TACAn;ATAG1100 PMMWALGFIFLFTVGGLTGIVLSNSSLDIMLHDS

GTATTACGTA~TGCTCATTTTCATTAT~T~ATCGATA1200 YYVVAHFHYVLSMGAVFALFGAFNYWYPLMTGL

AGTCTTCATGAACGATGATT~~A~GATA~T~A~~TAATTTAAC~TTTT~CTC~CACT~TTGGGGTTAAGAGGGATGC1300 SLHERWTKSHFVVMFLGVNLTFFPQHFLGLSGM

CTCGACGTTATPCTGA~ATCTGA~G~ATATT~T~TG~G~T~TCTATTGGTTCTTTAATTTCT~~TGGC~~T~TTTTTATATT1400 PRRYSDYPDCYIKWNVVSSIGSLISFVAVLFFMF

TATCGTTTGGGAG~TTA~TTCTCAGCGGGGGGTGATTCG1500 IVWESLVSQRGVIWSSHLSSALEWDNSIPLGFT tRNA phe (VnO)? +- GATCTGATGAAGGGGAAGTTATn;TTTGTTAAT~~TT~A~GTAGTTTAT~C~An;TCCA~T~TAGCGGGGTT~~GTATTG~GG1600 DLMKGKLLFVNLI' tRNA asp -> GATTATAAATTAA~TAGGAGAAATTAG~~TAATGAAG~Tn;~~T~TAGTTAC~~GTAT~CTTTGTATT~G~AATAGAT1700

TTATTTIATTTATTATTGTAGTWULTTAn;TGTTATTTGATAG~AT~~T~TTGG~GTTRAGTT~TTAT~TTTT~TATC1800

TT~T~AGCTI~~GTATTTTGTTATAAGAARATTT1900

AAACTAAATTTTGTTGGTT~T~A~~TAGCAC~;TAGC~~TTTGTT~;TGAGTGTATA~T~CTCC~GATGG~T~CTCC~~TAG2000

GGGGT'PTAATTAGAAATTA~A~~TTTA~TGTAT~T~GATTTAT~TTTTGGTGTGTATTT~TAT~ATGTATCACC~A~G~~AA2100 c02 . AAGAAAATGGCIT~TGAGTGAG~T~CAGGA2200 MAFWSQWGFQDGASPIMEQLIFFHDHAMLIL

TTATAATTATTAGTPTATTGCTTAT~~GGTTT~TT~TAAATAATT~TTT~TTCG~TTCTACATTA~~~~T~A~T~TATG2300 IMIISL1,SYGAVSLMNNSFLSRSTLESQEIEIVW

GAffATCTn;CCTGCGGTA~~~AATTTTTTTGGCATTTCC~CTTTACAATTAT~TATTTGTTAGATG~Tn;~~~T~T~AACTATTAAG2400 TILPAVVLIFLAFPSLQLLYLLDELEEPALTIK

GTGGTGGGGCATCAATGATACTGGAGGTATGAGTATTCTTTTTTACATATTCGAAGACTTAG~GAG~AG2500 VVGHQWYWSYEYSDFLNLEFDSYMISLEDLEEG

ATTACCGGCTA~AGAAGTGATCACCG~GGGTGG~~ATG~A~AAAG~C~~T~A~TACTGCT~~T~ACTTCA~C~G~CTGT2600 DYRLLEVDHRSVVPMKTKVRVLVTAADVLHSWTV

FIGURE7.-Sequence of K. tunicata mtDNA. C h iton Mitochondrial Chiton Genome 439

TC~TCIT~~~~T~~~~A~AAGAT~AATT~CCI~GTI~AT~~~~TG~2700 PSLGVKADAVPGRLNQLSFFANYPGVFYGQCSE

A~GIY;GGGCAAACCATTCT~ATACCAATAC~TIGTGTTAGA~A~GAT~~~ATIAAA~TTATATTIAA~~~-2800 ICGANHSFMPIVLEVVDSSSFIKWIMFNGEA*** tR?JA 8ar (UCU)?e ATP8808--") . TAAATTrATAAGAAATTAGPTAGATRTAT~~TITATIT~TCTT~~TIA~~A~CTrTAT~A~TIA~~TAT~TIGAAT2 900 MPQLAPMNWI

~A~TT~T~~~~G~~~~CIA~~~TCI~~~~TCT~~T3000 FLLLFFWSNVFCLGVCVWWVKSSSYSFLSKSVS A!fP88e6 d NCCCCmTrATITI~ATGAGTITGATffi~~TAATAGATATTTITTCT~TITGA~TAAT~TITTA~~mATACCAC~m~ATT3100 XPFYFKKWVW***MMMDIFSSFDDNNFTLFSFSY

TGATT~TTI~~ATGT~TAATTAGTC~~~AATA~TTIT~TIA~TCICGTTITTAT~~~AATGAT~ATIT~T~TIT3200 LIWVLGLAPVLISQNMFWFSKSRFYSMILFPKNF

TA~GGGCAGGTTATGT~A~~TATTAG~TTA~~TATU;TGATTA~TIT~ACTrTTAAmTAGTIAATTrATCA3300 MGGQVMRATGKNISGFINMVISIFLLLILVNLS

GGGCTTCTTCCTIATGTCTTA~TTAAGAAGRCATTTT~~~~~TT~T~TI~T~~~CICTIT~TCITTIC~~TTTAT3400 GLLPYVFSLSSHLSFAFCFGLPFWLSLIFSGDL

AT~TTCTGAGGI~TGTATI~CTAGGGGGGC3500 YVSEVVSHLLPSGAPGILNPFLVLVETVSISVRP

TAITACITTATCI~CGACTTGCTGCTAATATAAGGC3600 ITLSVRLAANMSAGHIILGLIGSYLSVGLFCYP

ATTCTTGmT~TTGTTA~ATTr~CAAG~TTrTATTrT~A~~TTGGT~T~TAA~~CATATATTrT~C~ATTAGTrA3700 ILVLGLLLFVQVFYFLFEIGVSLIQAYIFSLLV f" tRNA phe GGTIATACTCAGATGAT~ATIAATTAAAmATACCATAATTATTmATTCIATAGTAACIACG~T~~G~TCCATT~A~~~T~A~G3800 SLYSDDHPY***

CTCIGATATAAGCTACTTAGTTAAGTAAATCATATAGTAT 3900 *TFWMTVSIFLMSLLTLSIFLNFLNSQ

GATITTTTTGATrAATTGTTGTAAGTGTTAT~TTITATTrAGIA~~TCCC~CCGATAGTITCCATT~CC~TCT~C~T~~TGIA4000 NKQNITTLTMIKNLLGEGGITEMWGLDLSKFTY

GAGPCCIGTATAAAGAGATCI~CIG~TATATTATTTT4100 LGTYLSSKSTMNNSLNIMFWMNFFFNNIISHSI

TTAAGTTTGTAT~~TIGCI~~AAC~~~T~~TATT~TITA~TATTAT~ATT~GI~GTIG4200 KLELKFLKYTIAAGTLTAFTAMLKNMMSILPLTS

AGGGGGTTAAAATAAGTCATG~CTCT~AATI~CC~GCI~A~ATGGT~T~~AATAAT~~ATCT~~CAG~T4300 PTLILWSFVAGSTIAGAALMTLPAIISNDEDST

TCTGACTACPCPTGATTGTGATTTITGAT~G~T~T~ATA~G~ffiA~ATC~~G~~A~GTI~~TATATI~4400 SVVSSQTMKSWFIFLSLRASYSATLITALFMNL

GAAATTAGAAAATTATTITGGTTAAATATIATIA~~T~TT~A~T~AGAGT~T~TGCTAT~~~~C~~G~~TITG4500 SILFNNQNFMMMEIIVDKSYFGAMFPFGCLALNA

CAATGTIGAAGCATGTTCTTATTIGAAT~AT~~~C~A~C~ATAT~TrGTrATIA~~~~TAATATT~C~C4 600 INFCTSSFPMQIWLHSMKRIDQNNNHSHIINGA

GCAAAU;AATAGGAGGGCTTITAAATATT~A~AGIGAAT~TG~TA~TA~~TA~C~~TCTIACICTIAATATIAT~TCCT~T4700 CLFLLAKFMAHTFLHFLALSFLGLSVSLMMVGL

TGGCTTAAAGTGTT~~ATCI~TTC~TIT~T~TG~~TATTACTAT~~~~ATTCT~~T~~ATI~TC4800 QSLTSLAIIKKLDTEFNASIGAMVMTMSSIIMLS

TTGI~AAATAAATGTTCT~G~T~TI~ATr~T~CIC~G~~T~T~GIA~GAG~~T~~~C4900 TSFLHSNSLFPYFQILLFVGATVLTSSHVLASV

AGGGGTAGGU~CGCATC~~TGGT~T~A~TCITT~~~~~AT~C~CT~TA~TTAT~~A~AG~AATGTI5000 PTPAAMAAPLWSCFPLQASKTMGALLIMSITLT

ATTGITGGGGTTCTTCATATAAAAAGM;CATTGAAGTGCC5100 MTPTSWMFLANFHGQSFTVSIALLIAVDGIRNMF

FIGURE7.-Continued. 440 J. L. Boore and W. M. Brown

AT~GTTATTAGCCCGTAAAGA~ATAG~~GATAATAGATTACTAAGC~ATACTAGCC~AGCC~T~C~~~TAA~5200 ATMLGAGLSKYNQYYIVLCFSVLGLGDWGLLLT

TGCTAA~TTA~AGA~A~GAGPL;TACAAAGPL;TA~~GATTCAffiTAAATCG~GATAAAGATAT~~CTTA~AA~5300 ALSPIFILLNMSLVFLMVIWTFRKIFIDQSMYN

TTAGTAAAAATTATTACTCTTGCGGAGATATAACATACAA 5400 KTFIMVSASIYCVIASFLATTWNILIPLTITVSN

TAAAGTTGTAGAAACAAAATTCTAGTAAAATAATCTTGTT 5500 FNYFCFELLIIKNTTLTYMATSFSLLSHSLLIV . f"-ND5 -tRNA hie TGTTGCTIGTGAAGTACTIAAAAATATTATIT~ATT~TAGATAA~TTGTACTTACTTTI~~~AA~TAATATT~TTAT~TTAATCTAA5600 TAQSTSLFMMKM - CAAATTGTT~TCTTAATAAAAGTTAAAAG~GTGCTG~~CAAT~~TIA~TTGTAATATAAATGGGTTIGAAATIA~~~AACAA5700 *ITTSLLILLQAPIWHLFIVNYYLPKFNNFPNVF

ATGAAGAAATTT~CCATGTTGT~A~~~T~~TAG~TATAAAC~CCGAAA~AAT~ATAA~~A~~TIATCAGAAATTTT~5800 SSIKGHQTNTFLFLTYVASVFAMLALLIMLFKS

GATAGARACTGTACTAATIA~~CTGCTT~CTI~~TTGATTGA~AGC~TAT~~T~~ATATTAA~GATA~CTA5900 ISVTSIMLGAESLLNISPPAAMNISCMLFFSMS

AAAGTTGGAACTAATGAGATAAACCCTTTTGTAATAAAAA 6000 FTPVLSIFGKTIFLSRSGSKEYMMNSMSFLGSSS

AAAGACCGTGA~TTATTATTAATAAAGCTATTTTT~CCAAATTCAAATCTGCTGAGGA~CCT~A~ATGAGTCCTATATGACC~TGAG~6100 LGHAIMMLLAMKLGFEFSSLLGALMLGMHGVSS

ATAAGCGATTAATGCITATGTCTCTIT~C~AGACAAATTAGACTTCTTAATACTCCTCCTACT~GCCAAGTGTGATAAT~TPL;TTI~TA~6200 YAILAKMDSQRLCILSSLVGGVLGLTIIFYNLV

AAATIGTAT~GAAAATATTATTATTAATTAAT~TATTAATAC~T~CAC~A~TTTAffiAG~~T~TAGAATTATAGAGCC~TACAGATGGGC~6300 FNYKFINNNITNIGYGGLKLLLAALIMSGAVPAE

CTACATGAGCTTTAGGGAGTCATAGATGAAAAGAGTAAAT64 00 VHAKPLWLHFSYIPLKVLFVLIIFLWWFSIMAQ

ATAAGGTATAATTGGTAGGCTTATTATAA~A~~TT~TGA~A~ATGT~T~T~GAT~TAATTAA~~~GCTC~~6500 YPMIPLSMMILLNLHANHSYIFFINILFPLAGL

ATA~GTAGATTATTATATATATCCC~TIGAAGGC~T~~T~TACC~CATATTAAAATT~GA~AATGTA~GATTAAGGA~TIC~6600 ITYIMMYMGAQLREPQYGWMLILILTPILSAEFF

AAATATATAATGAAAATPL;TCTTTIT~A~~TACTATAATAATAAT~TTT~~ATIA~TTIA~~TCTT~TPL;ATTATIATT6700 IYLSFLSKQTFVMIIIFNLIMVNLLFFSVLNNN

TITTAGAATTITATATCTGCTTAT~TIATI~CCA~~TT~CATCTT~TATTATT~TACffi~TT~CACT~TCGTAATTA6800 KLIKYSSMIMLGTIWCSLMMLPVSFNDSFXEYN

ATTTTAGGGATGTTTATTGGTGA~TTTAATGTAATTAT~TAT~TGA~TT~GCTTATTTTGTAT~TITAT~TGCT~TTTATTA6900 IKPINMPSFFNLTIMFFMFHSIWSMKYWNMKSNK -ND4 . TIGTTATAACGGTCGAAAATGCTGATATAATAAAAAAAGC7000 TMVTSFASM***CKHISISSVYDNGHARILI

ACTAATATTGCGAGCCCGATGCTAGCTTCn;AAGC~AGAGTAA~~A~TIATTAACCCT~ATIT~TATAAGGTITGATAGGGATACTA7100 VLMALGISAESAALTLLVFIMLGENSMLNSLSVL

ATAATAAAAAAATICTTACTATTAAGATTTCTAA~TAATAAGGT~TAAGAA~TGTTTIC~T~A~AAATIGTTATIA~GCTCT~TCCTA~7 200 LLFISVMLIELALLTNLLHKRQLCITMLASISS ;f"ND4L tRNA thr TAARGCTCCAAGCGTGAATAATATGGGTT~TTTATCATGTCT~T~GC~TATAGC~TGGTCTT~~C~ATA~T~TT~7 300 LAGLTFLPNFFNMM -tRNA ser (UCN)

TTICTTTTAGTATGCCGTTAA~GAAT~~CACTTCTTIATIGTIT~~A~~~~CCA~TAAGCGG~~~GTIGATT~TCAGT7400 ***F N I T W D

CTCATATTTTmTGTIAGAGATIA~TAACAG~TATACAAAT~T~TTI~CAATTAGAAT~~CTA~GGCT~GC7500 WMKKTLPIMLFLLFYVFTLIQGILIFPDEVPQA

ACCAATTCAAGTTAAAAGT~GAC~ACACTA~CTTCAGTAT~~TGAGTIACAGGGT~~TA~T~GAATITTCCT~~T~TT7600 GIWTLLLFVSVLSWYLIQTVPYFSITRFKGLNI Chiton Mitochondrial Genome 441

AATGGGACAAAAAATAATA~AAT~~T~~TAATTTATT~AA~ATC~~CATffi~AATA~T7700 LPVFFLIVISMLLGVVGGLKNPISRLIAYAFLFY

ACCATTCTGGCTGGATATGT~TAA(3TAGAGGGT7800 WEPQIHTPTVLPNAPIFNDPHALLNPDLFVLLL

TAAAAAAATTA(3TATTACT~~CCC~ffiA~TTAGTGTGTA~TGAGT~TAGTAACPT~~CTATCP(3TA~GATTCCTAA~~G7 900 LFIVMVVFGVLDKLTYFSHITVKESDSNIGLPN

TTAGACCCTGTTICATGTGTA~A~TGCT~A~AATAAT~CAAGATAAT~TG~TCGTG~AAAG~8000 NSGTEHLFLIHLVSAAAIIFPALYHFSFFRTLTA

CGTIATCTACTGCAAATCCCTCA~~A~GAA(3TA~G~G~~TATAAG~TCGCT~TATTAAGTTTGTAATAAC~~~CTC~8100 NDVAFGGWIWQVMTTGIYPIASMLNTIVTAGWF

AGATATCTGTCCPCA~AGTAC~AACCGATAAAAGC8200 SMQGWPLVYGIFATGMTLFLLIVGVXWTHINVF

GAGCCATAATATATCCCGCGC~TATGGAAATAAAGGCA8300 SGYYMGRGIHFYLCIFFLSGGNAHAARILWGYNV

CGTCTffiACAGATATGTGCAACffiAAG~AAGTTCGATA~~T~TGTATT~TffiAAAG~A~T~T~~TAATAA~A8400 DRCIHAVSSFALEINTAYHMALFLGTLIQIILC

TAGGC~~GTGACCCATCAT~TTGATAA8500 LGLLSGFNWWISLNSPSPLDILSGNIIKFVPHS f-"Csb GATCGAATTGG~TAAGCATATTTGTAGGGGCGGAGAGGT8600 SRIPKLM*KYPRLPGNSFYCIKVVCILVFFLILA

AATCTTACAATAT~~~TAAA~TTCCTAAG~TI~AATTTCATA~TATTA~GTITGTTCCTCPA8700 LSVIILLNFPSFLSIGLHNFFSNWLFMMKTQEEL

ATAAAGTTCCAATGAAGTCTTT~TAAAGTATTAATAACC~AATACPGTT~ACA~~~CTTGAATT~~TTT~TAAATATT~TT8800 LTGIFDSLYLMLLWFVTSVPFFVQILNKKFMLN

TGGIATIAAGGCTGTATAAGCGAATATTACTAAAA 8900 PMLATVYAFMVLLGGIYILFLIFSFWTSSGFFI

AAGATACTAATAAATAGCCTAT~T~AGT~T~~CPAATT~T~AGTT~GAGGTAATG~CAC~G~TIA~GATA9000 LISIFLSMFLLLLGLSIPQTLLPLSFACSFILSM "NDC tRNA pro TTGTTAATAAGAATGTCATATTTCA~T~TffiTTIAAG~TATT~ATT~~CPAAA~TT~~TT~mGTTATCPGA~AAT~GGTT9100 TLLFTM ***S L L L T

GTTACTATAAATTTT~GATA~TTTCATATT~TTATT~T~TACT~Gffi~TCTTAGTC~TTAATITTATTAATAGAT~TATC~~9200 TVMFKASLNWMLIMLSVPLFSLWTLKMLLDYRLR

GAGGAAA~ACCTCGAACTCAAAT~C~~AT~GTAA(3TTTTA(3TC~T~GTTAGTT~TT~AATC~C~TCCTAA9300 PFTGRVWIFVFAIFTVKVWFLFNTLIWSDGAGL

AAAGAGAATPGTCGPTATTATPerCATGMT~TTCTTGC~ATTC~AA~TAATAGA~G~~CTTC~TACTCGATGTT~CCT9400 FLITTMMSMFLISAYEALFLLAFSGSGYEINFG

GATACTAGTTCAGATTCPC(3TTCP~~~T~~TGTCTCT~TAAGCAGACTACAATTCATATTAGT~AAT~G~ATIA~9500 SVLESEGEAFDFPARNTEALCVVIWMLLLPTMLI

TTCPGTITCAGTATAGTAATTGATGGGATG~~AAAT~~TTIAAATTIAGGATGAAC~AGAGCA~~A~AG~T~TAACGTTACTIC9600 SNWYLLQHSSIQNLNLSSSLAASSLLVLFLTVE

GTAAGAAATGG?TTGAGCATGCACGG~GCA~TAGAA9700 YSITQAVARLAGLLAYKSNSAWGAIMTGYVSLA

GATGTGCATAGAC~TTAAAATATC9800 STCLFFLVGFNFYWLTFDSWYLSWLCLSLILSLV

CTGGTGCTGTGTAATAAGGAAG~GTTTGAATTATAAAWLT9900 PATYYPLKNSNYLNNEEKLFLKLADALPQALGF

AATICTTA~TTG~GGTCT~TCGAAAC~AATATA~C~~CTITCCG~CTAATAAAG~TGCTA(~TGCTAATAAGATA~TATAG10000 ISVKNPGKRFQIYGLGKRELLTFFAVALLISIY . "Dl tRNA leu (UUR) GCAATTACTATTCAAG~GATAGAAAAAGATAAGAACATAACTAAAA10100 AIVMWSFSLFM

FIGURE7.Pontinued. 442 J. L. Boore and W. M. Brown

+tRNA leu (CUN) [-----l-rmA- TATTA~GAGATT~~TCGTATAATAATT~AAATTATTT~CTT~TTTGCI~~TAATATA~AAATTATTTGTTA~TTT~~TT10~00

CGTACTAAAACATAAATAATTAGCIC~GAT~CC~~~TG~TRC~~C~CTCAGATCATGTAAAGGTTTAA~CU\ACAGACC~10300

CATT~AAGCTT~GTRCTTTAATCCAATCG~~~TTTTCTTCGATTA~CTCTCTG~ATGCTGTTAT10400

CCCTACGGTAGCPTTTTCTTAATCATAATTATGGCTC10500

AAAAATGGATATAAAmTGTTTTT~~GATmATTTAT10600

CACTT~TTAATTTTTATTTAATGTAATAGAGACAGC10700

TTAGCACAGTCAa;GTAcT~~GTTTAAAATTTCRCT~~ACGACT~ATTAATAATTTTCAAAG~GATGTT~T~A~~10800

a;CTAGAATTI~GAGTTCC~TTACATTTATAATATTTTTTT~T~~~ATTCTT~TTATARA~ATAmGA~TAATRCTCA10900

TTTTAACATARACTAATTTAAGPTTTTAAATTTTAAATAA~GATATT~~TTAGGTAT~TTTTTGAT~ATT~TTAAA~CTATTA11000

ATATGATAGCIACTATTAA~RCATTAATTAAAGGTT~AATTTTCTTTTAAATTTTATT~TTAGCCTAATAAAAT~TAATTTAAATTA~TA11100

TATTAAAAAATAATTTTTAGTAATTT~TTAGTAcTTATATACTTC~AACTAGCTATC~~CffiCAT~TTTAACTTCTATAAATA11200

AATCATTAATTTAATTTn;CTACATTAATAAA~ATATATAGCTCCTTTT~TCT~TGATATTATAATAT~TAATCT~TTAAAC~TAATGC11300

AAARGGTACTAGTGTPAATCGTACPT~TTTATTC~TACAGTRCTGT~TTT~TT~TTTTTT~ATACTAATTTATATAA11400 . f" "1-rRNA-] f" tRNA val GGGTTTTGAATTAAAAAATTTGTAA~TTATTGGAAGGT11500 -[ """ #-rRNA-- TTTTAAGGAAGCTTC~TACTCCTA~ATGTTRC~CTTATCT~AAAAAG~AGTGAC~ATATGTA~CGTCTT~GCTTTTTCAAA11 600

ATAATGTTCATAA~ATTTTTA~T~~TCC~TTGATT~TATAAT~C~T~ATTTCC~ATTTTAAATTTAAATTGTAACCCTAATATTAT11700

~TTACTATAACTGGCGTGACCTGTCATAATA~~GTTGTCT~AGTAGTTAACTPATGRCG~~ATACAAACTAGT~TTTAAGTAAAG~11800

AGGTAAAGCGCGGGTTGTCGAGTTAAAATACAGGTTCCCCC 11900

TCTGGT~AATGATTTAATTTTTTAATAGTG~TATCTAATCCCAGTTTTA~T~TTTT~T~CGTC~TTTAGTAAGAA~ffiT~T~12000

AATGTAAAmTACPTT~CATTTTTTATAATCTGACATC~AT~TATACCTTTTACCGTTTAGATTAT~AAC~GTATAACC~~12100

TPGCI~ACAAATTTAACTAGTTCTGAAATCTAmCTAA12200

AATTGATATCTTAAAAATAAATATTTTTTATAATTTATACAAAT~TTTATTTTTA~TRCAGCTTC~~~A~ATGTGTTTCT~G~12300 . f" --.-rRNA-] etRNA mat TTAATTAATTATAAACTAGACAAATTTATTTGTAAAAG12400 +tRNA tyr- f" tRNA cy# GIAAACTCTAACAATCAGATGTAACTTTAAATTT~TTTAATATTTTTTAAACTAT~~TAATAG~GATATCCTCATAAATARATP~A~12500 f- tRNA trp

TTACCACCTTTTATCGGCCACT~ATTCAAAACAC~CATTTTAT~ATAGAAAGCTTT~~~~CTATTAGTT~TTAACTT~~T~T~GAA12600 f-" tRNA gln glY- -tRNA AAAGAGTTAC~TTTTTC~AATTCAAATTCCTAC~~~TAC~RCTTTTTATTATTT~TTTAAATGATTAAACTTTTTA~TT~~A12700 *tRNA glu . . " AGTGTACTAATTATAC.:N\AGAAACATAGTTGAATAC~TTATCAGATAGATATTTATATTPATTCTTT~TAAAT~TTAGATGTG~TATTTAACRC12800

TACTTATCTGATAGCATTTATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATATAT~TAU\ATA12900 .CO3 TGACCATATATTTATATATCTTAAAG~TAG~TGATTC~TCCG~CATTTAGTAGAATTT~CCT~TTTA~- 13000 MIRNPFHLVEFSPWPLV

GGGGTCTATAGGCGCGTTTAA~~GCTTAGCAGCT~A~CAT~GTT~CTTTATTTTTAGTATG~T~GTTATTTTAATTTTA~A13100 GSMGAFCLTVGLAAWFHGFSLFLVWVGVILILL

RCAATGGTTCAATGGTGGCGAGATGTAATTCGPGAAGGGA 13200 TMVQWWRDVIREGTFQGYHTLRVSRGLRWGMIL

TTATTACTTCTU\AGTACTATTTTTT~GCATTTTTTPG~CTTA~TCCATTCTAGATTGGCCCCTTGTT~AAATT~TCT~TGA~TCCACA13300 FITSEVLFFFAFFWAYFHSSLAPCLEIGSCWPPQ

AGGAGTTGCTCCITTAAATCcTTTTCAAG~CTTTATTGAATRCAGC~GTTATTAGCGTC~TTAGTGTTACPTGAGCGCRCCATAGTTTAATP13400 GVAPLNPFOVPLLNTAVLLASGVSVTWAHHSLI Chiton Mitochondrial Genome 443

GATGGGGATC~~GT~~A~~ACAGITATTTTA~~GTA~TTACGIT~~A~~~TAT~A~~A~13500 DGDQGGANISLLTTVILGAYFTFLQAGEYLETS

TTACTATT~CGA~~GTTA~ATCAACA~~TT~A~A~GGG~CAT~TT~TGIGTTAGTA~~A~~A~~TTA~13600 FTIADSCYGSTFFVATGFHGFHVLVGSLFLLVTL

GTGACGAAATTTTAGTTGTC~~C~CCA~~~~~C~CTGCTTGGT~T~ATTTT~GATGTA~~TTGTTT~A13700 WRNFSCHFSSSHHFGFEAAAWYWHFVDVVWLFL tRNA lys TATAT~cTATTTACTGGTGGGGT~TAA~TATATC(A TATAT~cTATTTACTGGTGGGGT~TAA~TATATC(A 13800 Y.1 S I Y W W G S*** tRNA ala-”.) - tRNA arg *- GGAACTATACTTPAAGTAAAAGGTTTAATTAA~~A~T~TA~~~T~~CCAGI~~GIGATTATT~TTTTT~T~~C13900 tRNA am----) CA~ACGTAGCAGAGAAGPTCTGTmAAGA(3TTCTGAATA 14000 tRNA 110 + ND3 * ARTI”rATATA~~TGTTAC~C~T~~~TACATTGA~GTTGIA~T~TAAG~T~~CTTT~CA~~~TTT~TGIA~~T~G14100 MFFVLSL

GI~GITTACTTTTTTATTAAGATTATT14200 VLFTFLLSLVLLSVSLSLTKKKMMNREKSSPFE

GTGGGTTTGATCCTAAAAGTC~C~~T~C~TTT~A~GGT~~TTAA~A~GTAGITTTCTTA~~T~TGIn;A~T~TTTT~T14 300 CGFDPKSSARLPFSMRFFLITVVFLVFDVEIVLL

TTTGCCCTATTTATTTTCTAGTGGCn;AAGAATCGATGT14400 LPYLFSSGWSIDVFSLVGSMMILVILIIGVLHE tRNA ser(AGN) __)__ TGGICn;AAGGGAGGTTGT~T~T~A~~~TTAAGTT~T~TTTGGGA~TTTTGGCTGC~AA~~~TAA~G~T~~C14 500 WSEGSLEWFSSSN*** ND2 j . CGTTTPfCTTTTTGPGTTTTAAATTTTCC~TGTT~~TATTTATTTTTAT~TATT~T~~A~~AT~T~TTAT~T~ATTCAT~A~C~G14600 VLNFPFVGLFIFILFFGTLFSLSSIHWFG

GPITGGCTn;G~TAGAATTAAATTTAATATCC14700 VWLGLELNLMGFIPVMVQKSTSEETESGVKYFL

TTCAAGCTGTTGGCTCAGCTTTPrrGTTTTTGPTTGGATTGA 14800 VQAVGSALFLFGLMLMNWNFCCWELNFFSGFSXS

AGGTTTAATTTTTTTTGGTT~~T~~T~~~CA~A~~~~~AGGGTTGTA~~TAT~~AATA~AAT~T14 900 GLI.FFG.LLMFLGA.APF.HFWYPSV.VAG.LSWMSNF. TT~TATTAACAG~CAAAAAATTGCTCCATTGmATGGT15000 LLLTVQKIAPLFMVCWYLNLSSFLLLILVFLSS

TATPTGGAGGAGICGGAGGGGTTAA~A~C~C~TACGC15100 LFGGVGGVNQTSVRALIAYSSILHMGWMLXGASA

GGGATGGAGCAGTATT~TTTTTAC~TTTTTTTACT15200 GWSSIFFYFFFYCFILGFIAYLMGLDESFNMSC

TTTAGAAGAGTGIATGn;TGAAATT~TATTCGCGPAAA15300 FSSVYVWNSYSRNFLVFMLLSLGGLPPLLGFFG

AG~AGTGTPAGTTAGATT~ATCTTT~G~TTTAGTTTTAA~TTATTTTAGTT~T~~TAT~TT~ATTATATTATTATTT~T~T15400 KWLVLVSLLSLGNLVLSIILVCGSMISLYYYLVL

TAGTTTTTCA~A~TTTAAGCAAAGGGAGGTGAAGA15500 SFSLLLSKGSWSSGALVSKNLMWFGGAMNLSGL

GGTGTITTATTGTGGTIAAGTACll’TlTAGAT 15532 GVLLWLSTF***

FIGURE7.-Continued.