<<

| INVESTIGATION

Untangling Heteroplasmy, Structure, and Evolution of an Atypical Mitochondrial by PacBio Sequencing

Jean Peccoud,*,1 Mohamed Amine Chebbi,* Alexandre Cormier,* Bouziane Moumen,* Clément Gilbert,* Isabelle Marcadé,* Christopher Chandler,† and Richard Cordaux* *Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Unité Mixte de Recherche (UMR) Centre National de la Recherche Scientifique (CNRS) 7267, Université de Poitiers, 86000 France and †Department of Biological Sciences, State University of New York at Oswego, New York 13126 ORCID IDs: 0000-0002-3356-7869 (J.P.); 0000-0002-2131-7467 (C.G.)

ABSTRACT The highly compact mitochondrial (mt) genome of terrestrial isopods (Oniscidae) presents two unusual features. First, several loci can individually encode two tRNAs, thanks to single nucleotide polymorphisms at anticodon sites. Within-individual variation (heteroplasmy) at these loci is thought to have been maintained for millions of years because individuals that do not carry all tRNA genes die, resulting in strong balancing selection. Second, the oniscid mtDNA genome comes in two conformations: a 14 kb linear monomer and a 28 kb circular dimer comprising two monomer units fused in palindrome. We hypothesized that heteroplasmy actually results from two genome units of the same dimeric molecule carrying different tRNA genes at mirrored loci. This hypothesis, however, contradicts the earlier proposition that dimeric molecules result from the replication of linear monomers—a process that should yield totally identical genome units within a dimer. To solve this contradiction, we used the SMRT (PacBio) technology to sequence mirrored tRNA loci in single dimeric molecules. We show that dimers do present different tRNA genes at mirrored loci; thus covalent linkage, rather than balancing selection, maintains vital variation at anticodons. We also leveraged unique features of the SMRT technology to detect linear monomers closed by hairpins and carrying noncomplementary bases at anticodons. These molecules contain the necessary information to encode two tRNAs at the same locus, and suggest new mechanisms of transition between linear and circular mtDNA. Overall, our analyses clarify the evolution of an atypical mt genome where dimerization counterintuitively enabled further mtDNA compaction.

KEYWORDS mtDNA; concerted evolution; crustacean isopods; telomeres; third-generation sequencing

HE typical bilaterian mitochondrial (mt) genome is de- multipartite (e.g., Suga et al. 2008; Dickey et al. 2015) and Tscribed as a single circular molecule ranging from 15 to linear (Raimond et al. 1999) structures, atypical size (e.g., 20 kb in length, which contains 37 genes, including 13 pro- Helfenbein et al. 2004; Liu et al. 2013), changes in gene tein-coding genes, two rRNA genes, and 22 tRNA genes content (e.g., Okimoto et al. 1992; Helfenbein et al. 2004), (Boore 1999). While the majority of bilaterian mt plasticity in gene order (e.g., Singh et al. 2009; Gissi et al. conform to this description, several notable exceptions have 2010), and additional genetic codes (e.g., Watanabe and been uncovered. Unusual bilaterian mt genomes include Yokobori 2011; Abascal et al. 2012). Because they deviate from the standard model, these mt genomes may constitute Copyright © 2017 by the Genetics Society of America ideal systems to further our understanding of mt biology and doi: https://doi.org/10.1534/genetics.117.203380 Manuscript received April 28, 2017; accepted for publication July 1, 2017; published evolution in animals, as they can help to address questions Early Online July 5, 2017. of recombination, concerted evolution of mt loci and non- Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10. standard inheritance. 1534/genetics.117.203380/-/DC1. 1Corresponding author: Laboratoire Ecologie et Biologie des Interactions (EBI), UMR The mt genome of terrestrial isopods (Isopoda: Oniscidea) CNRS 7267, Bâtiment B8-B35, 5 rue Albert Turpain, TSA 51106, 86073 Poitiers is one example of such atypical genomes. It is notable for Cedex 9, France. E-mail: [email protected] 2Present address: Laboratoire Evolution, Génomes, Comportement, Écologie, UMR its compaction. In particular, genes coding transfer RNAs 9191 CNRS, UMR 247 IRD, Université Paris-Sud, 91198 Gif-sur-Yvette, France. (tRNAs) can partially or fully overlap with protein coding

Genetics, Vol. 207, 269–280 September 2017 269 genes (Doublet et al. 2015). But one truly unique feature of this genome is the capacity of three tRNA loci to each encode two alternative tRNAs with distinct anticodons, thanks to single nucleotide polymorphisms (SNPs) occurring within the same individual (Marcadé et al. 2007; Doublet et al. 2008; Chandler et al. 2015). At all three loci, mtDNA shows two different bases at one position of the anticodon, thus making individuals heteroplasmic at these nucleotide positions. This variation appears as a double peak on chro- matograms generated by direct Sanger sequencing of PCR amplicons (Marcadé et al. 2007; Doublet et al. 2008; Chandler et al. 2015), cut and uncut amplicons on electro- phoresis gels after mtDNA digestion by appropriate en- zymes (Doublet et al. 2008), or SNPs among sequences obtained from next-generation technologies (Chandler et al. 2015). The same three heteroplasmic anticodon sites have been detected in individuals of two oniscid species, Trachelipus rathkei and Cylisticus convexus (Chandler et al. 2015), each site allowing the encoding of two tRNAs per locus and saving one dedicated tRNA locus. One of these heteroplasmic sites is shared with Armadillidium vulgare (Marcadé et al. 2007) and a diverse array of terrestrial isopod species (Doublet et al. 2008). The presence of these heteroplasmic sites in divergent oniscid lineages suggests that at least some of them have been maintained for millions of years (Doublet et al. 2008). Bottlenecks resulting from the transmission of relatively few to zygotes usually remove heteroplasmy in few generations (Wolff et al. 2011; Breton and Stewart 2015; Stewart and Chinnery 2015). In these oniscids, how- ever, it is believed that “constitutive” heteroplasmy is main- tained by the requirement of all tRNA variants within an animal, and possibly even within an individual mito- chondrion. This case of balancing selection (the evolution- ary maintenance of polymorphism) represents the only suspected example of vital heteroplasmy in (Doublet et al. 2008). The hypothesis of constitutive heteroplasmy maintained by Figure 1 (A) Hypothesized replication of a linear monomeric mtDNA molecule into a circular dimer in oniscids. A gray arrow represents a balancing selection must, however, consider another unique genome unit or a monomer. Its “head” is close to the 16S rRNA gene, feature of the mt genome of terrestrial isopods. This genome is and its “tail” is close to the cytochrome b gene. Tick marks represent the remarkable for presenting two conformations: one linear locations of known heteroplasmic tRNA loci and indicate the two tRNAs monomer of 14 kb that represents one unit of mt genome that each can encode. Upon replication, the telomeric hairpin of a mono- containing the standard bilaterian mt genes; and a circular mer (shown in red) becomes the junction between palindromic genome units of the circular dimer, each resulting from the replication of a mono- 28-kb dimer that is a palindrome composed of two mono- mer strand. (B) Replication of a linear molecule carrying a pair of non- mers, each representing one genome unit, arranged in a mir- complementary bases leads to an asymmetric dimer carrying different rored fashion (Raimond et al. 1999; Marcadé et al. 2007). bases at the mirrored positions. The presence of dimers, which constitute about half of the mtDNA molecules in A. vulgare (Raimond et al. 1999), leaves conflicts with another formulated hypothesis: that dimers the possibility that both tRNAs of a heteroplasmic site can be arise from the replication of linear monomers. The extremi- encoded by the two genome units of a dimeric molecule, such ties of linear monomers contain inverted terminal repeats that a single dimer may encode all tRNAs. The transmission that are thought to be telomeric hairpins covalently linking of such dimers would allow faithful inheritance of all essen- the two DNA strands (Doublet et al. 2013). DNA polymerase tial tRNAs genes to daughter mitochondria and to the prog- would be able to navigate the hairpin and then replicate the eny, and would ensure good balance of the tRNAs within other strand, circularizing the linear monomer into a dimer in organelles. This hypothesis implies that the two genome the process (Figure 1A). If so, this dimer would be expected units within a dimer are not completely identical. It therefore to present totally identical genome units.

270 J. Peccoud et al. Table 1 Summary information about the four oniscid lineages used in this study

A. vulgare BF A. vulgare WXfa A. nasatum T. rathkei Matriline source location Nice, France Helsingør, Denmark Thuré, France Oswego, NY Illumina data Individuals sequenced 1 female 1 female 2 males 5 siblings Technology HiSeq 2000, 2 3 100 bp HiSeq 2000, 2 3 100 bp HiSeq 2000, 2 3 100 bp HiSeq 2500, 2 3 250 bp SMRT data Individuals sequenced 13 females 7 females 12 males 9 siblings Technology PacBio RS II, P6C4 chemistry a Illumina sequence data were obtained by Leclercq et al. (2016). Other sequence data were generated for ongoing genome assembly projects.

Therefore, the nature of heteroplasmy, the adaptive ben- divergence with the reference genome, was selected and efits of dimeric mtDNA molecules, and the possible conver- trimmed if needed. sions between the unusual conformations of the oniscid mt For consensus sequence polishing, Illumina reads were genome are entangled issues that must be investigated aligned to the retained contig using Bowtie2 version 2.2.9 together.To take on this task, we used long reads generated (Langmead and Salzberg 2012), which was set to the default by the Single-Molecule Real Time (SMRT) sequencing low sensitivity (“fast” search strategy), and configured to re- technology from Pacific Biosciences (Eid et al. 2009). tain only alignments including both reads of a pair. The align- These reads allowed us to reconstruct the haplotypes, ment file was processed with Pilon version 1.18 (Walker et al. hence the combination of tRNAs encoded by individual 2014) to correct potential errors in the mapped reference dimeric molecules, in four oniscid lineages. We specifically contig. These two steps were repeated, and the alignment investigated whether mt molecules can encode all required file was inspected for remaining errors with Integrated Ge- tRNAs. In addition to long reads, we took advantage of nome Viewer version 2.3.92 (Robinson et al. 2011). Any error unique features of the SMRT sequencing technology to in the consensus sequence was manually corrected with identify the conformation of molecules and clarify the con- Geneious Pro version 5.4 (Drummond et al. 2010). versions between dimeric and monomeric forms of this For A. vulgare WXf, we used the BF mt genome as a refer- atypical mitochondrial genome. ence. We corrected differences with WXf using the same map- ping strategy as described above. For T. rathkei, we used the reference genome available in GenBank (accession number Materials and Methods KR013001.1). As this genome contains a complete unit We examined four terrestrial isopod matrilines (Table 1): two flanked by short palindromic parts representing the ends of from A. vulgare (named BF and WXf), one from A. nasatum, the other genome unit, we generated the expected dimeric and one from T. rathkei. For each matriline, short sequencing form and used it as a reference. We corrected potential errors reads (Illumina) and long reads (SMRT) have been obtained and differences with our studied lineage as described for the from the genomic DNA of one or several related individuals other genomes. (full siblings or first cousins) as part of full-genome assembly From each of the mapping file generated above, alignments projects. of reads originating from the same DNA fragment were removed with SAMtools rmdup version 1.3.1 (Li et al. Generation of mitochondrial genome sequences 2009). Bases at each position were called by SAMtools We aimed at building the dimeric mt genome sequence of each mpileup. A custom C program was used to convert the lineage with both units placed head to head. This configura- pileup file into base counts, discarding bases with quality tion was chosen to facilitate the use of long reads spanning score ,25. Sites where the rarer base was carried by .20% mirrored anticodon sites, which are much closer to the head- of the reads were considered possibly heteroplasmic. At to-head junction than they are to the tail-to-tail junction such site, the reference sequence of each lineage was mod- (Figure 1). ified to show ambiguities following IUPAC conventions. The mt genomes of A. vulgare BF and A. nasatum were This was necessary to avoid any bias in the alignment of reconstructed from contigs generated for other full-genome long reads, which minimizes mismatches at the risk of cre- assembly projects. For each of these lineages, we first retrieved ating spurious indels. contigs comprising mitochondrial sequences by performing a Alignment of long reads blastn (Camacho et al. 2009) homology search against the nearly complete mitochondrial genome of A. vulgare (GenBank We aligned long reads on the corresponding reference ge- accession number EF643519.3). These searches returned sev- nome of each lineage using BLASR version 1.3.1 (Chaisson eral contigs having large portions of the mitochondrial genome and Tesler 2012) with default settings. Long reads consisted sequence in a head-to-head configuration, as expected if these in reads of inserts (Figure 2A) and circular consensuses contigs comprised the dimeric form. The contig encompassing (CCS), both being generated by the sequencing centers. A the longest homology in such configuration, and the lowest CCS is the consensus among reads from the same polymerase

PacBio and Atypical Mitogenomes 271 read (DNA fragment), and may not be called if not enough reads are present. For subsequent analyses, it was crucial to ascertain the alignment orientation (sequenced DNA strand) of each read mapping across the palindromic genome units. This orienta- tion could be determined as we found that genome units were separated by a short nonsymmetrical junction (see Figure 1 and Results). We thus retained up to two alignments per read, one per orientation, which we compared to determine the most likely sequenced strand of the junction (see below). We did not simply retain the alignment with best overall score, as sequencing errors may prevent reliable inference of the most likely mapping orientation. All following steps were executed in R 3.3 (R Core Team 2014), with the help of functions from packages GenomicAlignments (Lawrence et al. 2013) and Biostrings (Pagès et al. 2017). Our script was based on the splitting of the BLASR alignment (.bam) file into a matrix of individual bases, in which columns correspond to successive positions of the reference sequence and rows to aligned reads. For all positions of the junction, we counted the frequency of mis- matches (including deletions) between the sequence of the reference and that of a read, in each alignment orientation. We did not count insertions in the read as these could not be retained in the matrix. If mismatch frequencies between map- ping orientations of a read differed by .10%, and if the lowest mismatch frequency was #25%, we considered the mapping orientation corresponding to that frequency as the correct one. Otherwise, the alignment orientation of a read on the junction was considered undetermined. For each read, we retained the alignment corresponding to the inferred mapping orientation on the junction. If the more likely orientation could not be inferred, we retained the alignment with best mapping score, or selected alignments at random if scores were identical (which was the case for reads not covering the junction). Determination of mtDNA molecule conformations

Figure 2 (A) Process of SMRT sequencing. A DNA fragment is bluntly To establish the combinations of tRNAs encoded by single ligated to two SMRTbell adapters (blue) forming hairpins and carrying a dimeric molecules, we used SMRT reads covering mirrored DNA polymerase (Travers et al. 2010). During sequencing, the newly anticodon sites. Counterintuitively, a read mapping across formed fragment (striped with white arrow heads pointing toward the genome units may still come from a linear monomer, as 39 end) leads to a polymerase read, which is composed of reads of insert (simply called “reads” for short) corresponding to alternative strands of explained in Figure 2. This was inferred if all reads sequenced the original fragment and separated by SMRTbell sequences. Reads are from the molecule mapped on the junction between genome oriented as they would align to the green strand in the original frag- units in the same orientation (Figure 2B). If successive reads ment. (B) SMRT sequencing of a molecule whose telomeric hairpin acts of the molecule mapped in alternate orientations, the mole- as a SMRTbell. All resulting reads align on the reference [which is the cule was instead classified as a dimer (Figure 2C). If solely the dimeric mitochondrial genome containing the junction between units fi (shown in red)] on the same orientation, and their middle (the conver- rst read of the molecule spanned the junction, we inferred gence of equally sized black arrows) corresponds to the center of the themostlikelymappingorientationofthesecondread,ifpre- junction. (C) Sequencing of a dimeric molecule covering the junction sent, based on the expectation that the start of a read should between genome units produces reads that align in alternate orienta- map closely to the edge of the region mapped by the previous tions. The middle of these reads is unlikely to correspond to the middle read (Supplemental Material, Figure S1 in File S1). of the junction. Some drawings are inspired by Fichot and Norman fi (2013). For a DNA fragment to be classi ed as linear monomer, we further required that the middle of the region mapped by at least one read was #30 bp from the center of the junction, since the hairpin telomere is sequenced at the middle of the

272 J. Peccoud et al. Figure 3 Alignment between dimeric mtDNA sequences of three Armadillidium lineages at the region of the junction between “heads” of genome units (top), and homologous region in T. rathkei (bottom). Bases shown in bold font over a gray background constitute the junctions that separate the heads of genome units. The sequences flanking a junction are the reverse complement of each other. Sequences were aligned by the muscle algorithm (Edgar 2004). The T. rathkei region was not aligned as its divergence with the other lineages would have reduced legibility. read (Figure 2B). This can only be ascertained in “complete” heteroplasmic position. We thus swapped the deletion and reads, i.e., those starting and ending at a SMRTbell adapter. the mismatched base in the base matrix, which reduced We defined a complete read as one having its start and end mismatches without altering the original read sequence. coordinates in the original polymerase read (Figure 2A) We believe that BLASR improperly managed the alignment ,70 bp away from those of the previous and next reads, of reads on the ambiguous nucleotides that we placed in the respectively. This length corresponds to that of a SMRTbell reference sequences at heteroplasmic positions. adapter (45 bp) plus a safety margin. If the read was the first Despite these corrections, the frequency of bases that were of the polymerase read, we imposed that its start coordinate not supported by Illumina data at these sites was 7.5% on in the polymerase read was at most 70 and that at least one of average, not counting deletions. Because of such a high rate its first 50 bases aligned on the reference genome. CCS were of base substitution errors, all reads of a parent polymerase considered as complete reads. The position of the middle of a read may not support the same haplotype. In such cases, we complete read on the reference genome was designated as selected the haplotype according to four successive criteria: the midpoint between start and end positions of its align- (i) higher number of sites having the bases supported by ment. These were adjusted by adding or subtracting, as Illumina data, (ii) presence in the CCS, (iii) higher frequency appropriate, the lengths of the unaligned “clipped” read parts of the haplotype among reads of the polymerase read, and (which are often zero). Fragments whose reads mapped in the same orientation but failed to meet this requirement were not assigned to a molecule type. Likewise, we did not assign fragments for which fewer than two reads could be oriented with certainty, except in the following case. We reasoned that during the sequencing of a linear monomer, the polymerase, after going through the hairpin telomere, either returns to the SMRTbell or ends its polymerization. Either event terminates the read at a position that cannot be further from the center of the telomere than where the start of the read is, assuming that the read starts at the SMRTbell. We classified the parent fragment as dimer if the end of the read mapped at a distance from the junction that was at least 100 bp longer than the distance between the junction and the mapping position of the read start. This requirement must be fulfilled by actual mapping positions and by those considering clipped read parts. To exclude first reads (of a polymerase read) that may not start at the SMRTbell, we imposed that such read started at coordinate ,70bp in its parent polymerase read, and that its left clipped part was ,50 bp. Establishment of haplotypes carried by mtDNA dimers We then establishing haplotypes, hence tRNA genes carried by mtDNA molecules assigned to dimers. Haplotypes were estab- Figure 4 Sequencing depth of short reads on the mitochondrial dimeric lished by concatenating the bases at heteroplasmic sites in the genomes of four oniscid lineages. Colored segments indicate the pres- matrix we generated. Prior to that, we slightly corrected ence of SNPs, each presenting two bases at very similar relative frequen- alignments as we found frequent 1-bp deletions in reads at cies (green: adenine, blue: cytosine, orange: guanine, red: thymine). Only SNPs for which the rarer bases are carried by $20% of the mapped reads, these sites, associated with mismatches at the immediate and whose sequencing depth is .20% of the mean depth are shown. flanking positions. Most mismatched bases corresponded to Converging red triangles represent the location of the head-to-head junc- one of the two possible bases carried by short reads at a tions.

PacBio and Atypical Mitogenomes 273 Table 2 Location and composition of heteroplasmic sites found in the mtDNA of four oniscid lineages Location Matriline Nucleotide position Base counts (A/C/G/T) tRNA Leu2 (TAA)/Leu1 (TAG) A. vulgare BF 9,171 11,611 / 0 / 11,322 / 7 A. vulgare WXf 9,168 7,010 / 1 / 6,977 / 0 A. nasatum 9,176 4,211 / 0 / 4,192 / 1 T. rathkei 9,279 3,079 / 0 / 3,260 / 4 tRNA Gly (TCC)/Arg (TCG) A. vulgare BF 11,601 5 / 13,516 / 12,951 / 7 A. vulgare WXf 11,604 4 / 7,659 / 7,562 / 4 A. nasatum 11,605 6 / 4,352 / 4,377 / 1 T. rathkei 11,718 1 / 2,272 / 2,473 / 0 tRNA Val (TAC)/Ala (TGC) A. vulgare BF 12,004 12,277 / 0 / 12,073 / 1 A. vulgare WXf 12,007 7,014 / 1 / 7,247 / 5 A. nasatum 12,008 4,042 / 1 / 4,282 / 3 T. rathkei 12,121 2,260 / 1 / 2,202 / 1 tRNA Gly/Arg A. vulgare WXf 11,606 4 / 7,375 / 2 / 7,859 nad3 genea A. vulgare WXf 11,784 7397 / 7,400 / 1 / 3 12S rRNA A. vulgare WXf 13,474 0 / 6,755 / 1 / 6,542 Sites that are shared across lineages are designated after the tRNAs they encode depending on the anticodon (shown in parenthesis in 59 to 39 orientation). Base counts refer to number of mapped Illumina reads carrying a given base. Positions are given in coordinates of the first genome unit. a Variation at position 11,784 involves a change in the nad3 protein sequence.

8 (iv) fewer mismatches with the most frequent haplotype > f þ r <> ; ¼ found across all reads. if Br Bf ¼ 2 i > Identification of fragments with > f þ r : 2 ; otherwise: noncomplementary bases 2

If a dimer resulted from the replication of a linear monomer, it Index i varies from 21, if all bases between reads mapped in should present identical genome units, and encode a single opposite orientation are noncomplementary, to 1 if all are tRNA type per pair of mirrored loci. As haplotypes clearly complementary. Intermediate values represent conflicting re- contradicted this prediction (see Results), we reasoned that sults between reads mapping in the same orientation. We the bases forming the two DNA strands of a linear monomer defined a per-fragment index I that averages i over sites cov- converted into a dimer may not be complementary at the ered by the fragment. To minimize the influence of sequenc- anticodon sites (Figure 1B). ing errors, values of I that were not obtained from at least two To assess base complementarity within linear monomers, bases per strand were ignored. These two bases may either be whose sequencing reads unite both strands of a molecule sequenced at the same site in two reads from the same strand, (Figure 2B), we compared bases between mirrored hetero- or sequenced at two sites in the same read. We considered plasmic sites covered by the same read. We also looked for that a fragment carried noncomplementary bases or comple- base complementary in fragments that were sequenced with mentary bases if I was ,20.9 or . 0.9, respectively. Frag- two SMRTbells, by comparing bases carried by reads map- ments whose indices fell between these values were not ping on different mtDNA strands. These fragments include considered. those classified as dimers, and those that do not span the junction between genomes units, which hence could not be Data availability classified (hereafter called “unclassified” fragments). Unclas- Annotated mitochondrial genome sequences are available sified fragments were defined as molecules whose reads all from GenBank under accession numbers MF187611- aligned $100 bp away from the junction between genome MF187614. Sequencing reads that mapped on mitochondrial units. Alignment positions considered clipped read parts. genomes are available at the National Center for Biotechnol- Rather than a binary value, we developed an index to ogy Information short read archive under accession number quantify the complementarity of bases between DNA strands SRP108987. All inhouse scripts and programs are available of a molecule (polymerase read) at a given position, as each upon request. File S1 contains Supplemental Text, Figure S1, strand may be sequenced several times. This index ignores all Figure S2, Figure S3, and Table S1. reads carrying rare bases or deletions at this position. We define Bf and Br as the most frequent bases among forward- aligned reads and reversed-aligned reads, respectively. If the Results two possible bases have equal counts among reads of a given Mitochondrial genome sequences and polymorphic sites orientation, the most frequent one is chosen at random. We let f be the fraction of forward-aligned reads carrying Bf and Dimeric genome sequences of all four lineages were success- r be the fraction of reverse-aligned reads carrying Br. We de- fully reconstructed, including junctions between the heads fine our index of complementarity as: of genome units. These junctions are 34- to 42-bp long in

274 J. Peccoud et al. Figure 5 Haplotypes found in dimeric mtDNA molecules at pairs of heteroplasmic anticodon sites in four oniscid lineages. Dimers are shown as converging curved gray arrows as in Figure 1A. Bases of the dominant haplotype are shown on the coding strand of tRNAs for each genome unit, and corresponding anticodons are indicated by the name of tRNAs in front of these bases. For each pair of mirrored site, ratios represent the number of sequenced molecules carrying the dominant two-base haplotype over the number of successfully sequenced molecules at these sites. Error bars represent 95% confidence intervals estimated by the Clopper-Pearson method of R package binom (Dorai-Raj 2014). Bar plots represent the fraction of sequenced molecules carrying a six-site haplotype, among dimeric molecules that could be successfully sequenced at all sites. To minimize the influence of sequencing errors, molecules that showed deletion or rare bases (Table 2) at any of these sites were ignored. Bases in red represent differences from the dominant haplotype of the same lineage. Symmetrical haplotypes, which mirror the bases found in one genome unit, are shown in bold. Numbers in parentheses represent the number of sequenced molecules carrying a haplotype. For T. rathkei, we merged counts for each haplotype and its mirrored counterpart, as the mapping orientation of reads across the palindromic genome units could not be determined with certainty (see Results section).

Armadillidium lineages (Figure 3), and their sequences cor- (Figure 3). The opposite junction located between the cyto- respond to the “inverted repeats” that have been located near chrome b genes of genome units is 0–3 bp long (data not the 12S rRNA gene of mtDNA monomers in A. vulgare shown), depending on the lineage. (Doublet et al. 2013). These sequences are predicted to form The high sequencing depth of Illumina reads aligned to secondary hairpin structures that were suspected to consti- their respective dimeric genomes clearly outlined heteroplas- tute the telomeres of linear monomers (Doublet et al. 2013). mic sites as SNPs (Figure 4). Three pairs of mirrored SNPs are The location of these sequences at the junctions between shared by all lineages (Table 2), and correspond to variation genome units in dimers corroborate this hypothesis, under at the three tRNA sites previously identified in T. rathkei the model of monomer replication shown in Figure 1. In and C. convexus (Chandler et al. 2015). Only one of these T. rathkei, only one base separates the heads of genome units sites (in tRNA Ala/Val, Table 2) was previously known to

PacBio and Atypical Mitogenomes 275 Figure 6 Top: frequencies of molecules having noncomplementary bases among different types of mtDNA molecules (see text) in three Armadillidium lineages. Red parts of molecules represent hairpin telomeres or junctions between genome units, and blue parts represent ligated SMRTbell adapters (see Figure 2). Ratios above points indicate the numbers of molecules with noncomplementary bases over molecules that could be characterized for base complementarity. Error bars represent 95% confidence intervals. Bottom: example of an “unclassified” mtDNA molecule from A. vulgare WXf having noncomplementary bases at the four variable sites it covers. These sites are named after their genomic positions (Table 2). Rows represent successive reads sequenced from complementary strands (see Figure 2A). Each strand has been sequenced six times, and reads form the reverse strand (in respect to the reference genome) have been reverse-complemented. Sequencing errors are shown in gray. be heteroplasmic in A. vulgare and A. nasatum (Marcadé et al. size selection to be sequenced during library prepara- 2007; Doublet et al. 2008). All lineages present the same two tions, as sequenced fragments classified as linear mono- expected bases at very similar frequencies (50%) at the mers are considerably shorter than dimers (Figure S3 in shared SNPs. The corresponding tRNA genes hence present File S1). roughly equal frequencies among sequenced individuals of a In each Armadillidium lineage, reads that spanned all six lineage, or within the only sequenced animal in the case of sites of dimeric molecules indicated the presence of a domi- A. vulgare lineages (Table 1). Within-individual variation was nant haplotype (Figure 5). This haplotype (“GCAGGA”) is the systematically observed in previous studies (Doublet et al. same in A. vulgare BF and A. nasatum.InA. vulgare WXf, the 2008; Chandler et al. 2015), and can safely be extrapolated dominant haplotype (“AGGACG”) is the reverse of the afore- to all four lineages. Other bases may be found at the shared mentioned one. We double-checked that the head-to-head SNPs (Table 2) but their extremely low frequencies, mea- junction of the A. vulgare WXf reference genome was in the sured in thousandths, can be explained by sequencing errors. same orientation (strand) as that of the other two lineages. The A. vulgare WXf lineage shows three additional SNPs that This 42-bp junction between genome units (Figure 3) allows present the same pattern of variation as shared heteroplasmic unambiguous orientation of reads, hence of haplotypes. A sites, but those are not located in anticodons of tRNA genes differentdominanthaplotype(“ACAGGG”/“GGGACA”)is (Table 2). found in T. rathkei. We cannot establish the orientation of this haplotype because reads have almost equal probability Asymmetry at anticodon sites of dimeric of mapping on either strand of the reference genome, as mtDNA molecules explained previously. In the three Armadillidium lineages, the mapping positions Importantly, in each lineage, the prevalent haplotype car- and orientations of long reads on the junction between ge- ries different bases at each pair of mirrored anticodons, and nome units (Figure S2 in File S1) matched our predictions thus represents molecules that encode all six possible tRNAs at (Figure 2, B and C), which allowed classifying DNA frag- these loci. While relatively few sequencing reads spanned all ments as monomers or dimers without ambiguity.In T. rathkei, six sites without any apparent sequencing error, asymmetry the 1-bp-long junction (Figure 3) was too short to determine between genome units of a dimer is confirmed by the more the most likely mapping orientation of reads, given the high numerous sequenced molecules that covered at least one pair error rate of SMRT sequences. In this lineage, we could of mirrored anticodons: 90% of them carry different bases classify some molecules as dimers on the sole basis of mapping at any pair of sites (Figure 5), a result that extends to the coordinates of reads, and we could not technically identify three private SNPs of A. vulgare WXf (haplotype counts not monomers. The other lineages considerably vary with re- shown). Nevertheless, three six-base haplotypes (shown in spect to the frequency of dimeric molecules (Table S1 in File bold in Figure 5) are symmetric, mirroring the bases found in S1). This variation may simply reflect differences in fragment one of the two genomic units in the dominant haplotype.

276 J. Peccoud et al. These symmetric haplotypes are supported by a single se- inheriting symmetric dimers lacking vital tRNAs, the rate of quenced DNA molecule each, all of which were classified as death required to explain the almost complete ab- dimers by the mapping coordinates of reads, rather than ori- sence of symmetric haplotypes in lineage BF seems unbear- entations. As coordinate-based classification may be affected able. We therefore reasonably conclude that most, if not all, by incorrect in silico delineation of reads in raw polymerase dimers derive from the replication of other circular dimers reads (Figure 2A), it is not strictly excluded that these hap- rather than monomers in the BF lineage, and, by parsimony, lotypes are in fact carried by linear monomers. Most of the in the other lineages as well. other minor six-site haplotypes differ from the dominant one Our results also argue against the replication of monomers by just one base (shown in red in Figure 5). into others. Replication produces complementary strands, and should therefore quickly eliminate all molecules carrying Linear monomers with noncomplementary bases noncomplementary bases. The successful SMRT sequencing Estimates of the fraction of linear monomers carrying non- of both strands of monomers (Figure 2B) also suggests that complementary bases varied across lineages (Figure 6), from such molecules should not be able to replicate, due to their 28% in A. vulgare BF to 82% in A. vulgare WXf. No dimeric hairpins, and may at best become dimers (Figure 1). It has molecule was found to carry noncomplementary bases, and been suggested that monomers may replicate via a rare cir- wehadnoreasontoexpectany.Bycontrast,11–35% of un- cular form (Doublet et al. 2013), but we found no evidence classified mtDNA fragments (those of undetermined conforma- for such molecules (Supplemental Text in File S1). tion, see Materials and Methods) did present noncomplementary bases at heteroplasmic positions (Figure 6). A mechanism to generate linear monomers and noncomplementary strands Discussion If linear monomers do not replicate, their existence and maintenance must be explained by another mechanism. We Source of apparent heteroplasmy in oniscids propose that these molecules arise from the self-renaturation We were able to sequence individual mtDNA dimeric mole- of single-stranded dimers. Palindromic genome units would cules of four oniscid lineages, using long reads that covered become strands that are fully complementary, except at sites mirrored tRNA loci showing apparent heteroplasmy. Almost where the molecule is asymmetric, thereby explaining the all sequenced dimers present asymmetric haplotypes at the existence of monomers with noncomplementary bases. SMRT three pairs of anticodon sites. Consequently, vital sequence sequencing should not have produced single-stranded DNA variation between mt genome units is distributed within molecules, since the whole processing of DNA has been (and molecules. Asymmetric dimers that covalently link genome must be) performed at, or below, room temperature without units encoding different tRNAs avoid the fitness costs of denaturing agents. In living cells however, a single-stranded balancing selection. They ensure good balance among tRNA dimer may be produced by DNA replication, during which one genes in an organelle, and minimize the risk of transmitting strand serves as template while the other strand is lagging, as molecules that do not encode certain tRNAs to organelles or observed in Drosophila (Goddard and Wolstenholme 1980; cells. Consequently,the mt genome of these oniscids should be Joers and Jacobs 2013). Assuming mtDNA replication in onis- assimilated as the 28 kb dimer, as only it carries all essen- cids proceeds similarly, we suggest that monomers with non- tial tRNA genes. Under this view, variation at tRNA loci occurs complementary bases are formed by the annealing of lagging within a mt genome, not between mt genomes, and may not strands of asymmetric dimers before these strands had a be defined as heteroplasmy sensu stricto. chance to serve as replication templates. As virtually all di- Some of the sequence variation between homologous tRNA mers present asymmetric haplotypes, this mechanism should loci is still distributed between molecules and corresponds to yield monomers that all carry noncomplementary bases. true heteroplasmy: linear monomers with fully complemen- Monomers with complementary bases may arise from these tarystrands may presentdifferent haplotypeswithin alineage, monomers in which mismatched bases have been replaced by and, most likely, within an individual. Between-monomer DNA repair enzymes (Li 2008), or from the hypothesized variation is however unlikely to be maintained by balancing cleavage of a dimer in two monomers (Doublet et al. selection, as results suggest that monomers do not replicate. 2013). Our data do not reveal which mechanism is more likely. However, it indicates that at least one may have oc- Evidence argues against monomer replication curred at higher rate in the sequenced individuals of lineage Indeed, monomer replication into dimers should eventually A. vulgare BF,whichpresentamuchhigherfractionof equalize the frequencies of asymmetric haplotypes among monomers with fully complementary strands than the dimers and of molecules with noncomplementary bases among other linages. Irrespective of the nature of these mecha- monomers (Supplemental Text in File S1). Yet, these frequen- nisms, between-monomer variation would result from the cies differ, especially in A. vulgare BF, whose dimeric haplo- continuous generation of monomers with complementary types are almost all asymmetric (Figure 5), whereas ,30% of bases within individuals, rather than from the recurrent monomers have mismatched bases (Figure 6). While this discrep- death of zygotes that do not inherit such variation (balanc- ancy could be explained by the rapid death of mitochondria ing selection).

PacBio and Atypical Mitogenomes 277 Remarkably, a monomer with noncomplementary bases without losing several tRNA genes now requires an improb- contains the information needed to produce two tRNAs at just able chain of mutational events, the maintenance of dimeric one locus, using both DNA strands. To our knowledge, this genomes tells little about the potential initial benefits of way of compressing vital information has not been reported to dimerization. These benefits may be revealed by investigating date. As a gene cannot be transcribed in both strands, such lineages with fully palindromic dimeric genomes (i.e., without information can only be used after conversion of a monomer apparent heteroplasmy), if any exists. into a dimer (Figure 1B)—a process that we suggest may not Recombination and concerted evolution in a occur. This begs the question of the adaptive benefits of linear dimeric genome mtDNA monomers in general. Linear monomers could simply be byproducts of the replication of dimers, during which self- Conservatively, each individual can be considered as carrying renaturation may happen a various rates. Further studies a single asymmetric haplotype at dimeric molecules. We in- assessing the frequencies and production rates of monomers deed cannot exclude that rare haplotypes found in dimers may help to determine the functional importance of these (Figure 5) simply result from sequencing errors, which we molecules. estimated at 7.5% at these sites. The haplotype that is shared by A. vulgare BF and A. nasatum (Figure 5) may rep- Dimerization and mitochondrial genome compaction resent an ancestral state that has been maintained since the Our observations corroborate previous suspicions that appar- last common ancestor of both species, 20 MYA (Becking ent heteroplasmy was permitted by dimerization (Doublet et al. 2017). Alternatively, this haplotype may have evolved et al. 2012; Chandler et al. 2015), and allow a scenario for independently in these two Armadillidium lineages. Evolu- the origin of asymmetric dimers to be drawn. tionary convergence is less parsimonious, considering that The ancestral mtDNA dimeric genome of oniscids probably eight different haplotypes (23, considering their orientation duplicated all genes, and was therefore totally palindromic. with respect to the head-to-head junction) can encode all This is supported by the absence of apparent heteroplasmy at required tRNAs, and all should be equivalent with respect the tRNA-Ala/Val locus (Table 2) in certain species that show to fitness. Long-term maintenance of a given haplotype is dimeric mtDNA (Doublet et al. 2008, 2012). These species expected, since at one of the asymmetric anticodon (which have not been investigated at the other two hetero- sites produces a variant that does not encode all tRNAs and plasmic loci) carry tRNA-Ala at this locus, as do isopods with that should be counter-selected. This also applies for a cross- “standard” mtDNA (Kilpert et al. 2012). In an ancestral di- ing over between different genome units of two dimers. meric genome, one of the mirrored tRNA-Ala genes would Crossing overs between genome units within adimer have become a new tRNA-Val gene by “anticodon shift” (from would however lead to a new haplotype encoding all re- TGC to TAC via base substitution)—a type of event that is quired tRNAs. Such event may have occurred in A. vulgare suspected to have occurred in diverse eukaryotic lineages WXf, causing an inversion of the region encompassing the (Rogers and Griffiths-Jones 2014). Then, as long as this newly head-to-head junction between the two genome units and created asymmetry subsisted, the two mirrored tRNA-Val effectively reversing the haplotype found in the other genes that were initially present at another locus could be de- Armadillidium lineages. Another crossing-over may have leted without compromising viability. The other two hetero- occurred between the tRNA Leu1/Leu2 locus and the two plasmic tRNA loci may have evolved in a similar fashion in the other loci, explaining the haplotype found in T. rathkei shared ancestor of the species we studied. (Figure 5)—a species that diverged from Armadillidium The evolution toward shorter, asymmetric dimers contrib- 40 MYA (Becking et al. 2017). uted to the extreme level of compaction of mt genome units in Recombination of mtDNA has been reported in diverse oniscids (Doublet et al. 2015), and may have been adaptive if lineages such as scorpions (Gantenbein et al. 2005), bivalves it saved energy for mtDNA replication. Similar mtDNA com- (Burzynski et al. 2003), teleost fishes (Hoarau et al. 2002; paction and tRNA production rates could have been achieved Tatarenkov and Avise 2007), lizards (Ujvari et al. 2007), and by simply deleting one of the mirrored tRNA genes at several humans (Slate and Gemmell 2004). In oniscids, recombina- pairs of loci. The evolutionary path taken clearly minimized tion of mtDNA can explain not only the different haplotypes the asymmetry between genome units, possibly to sustain we established, but also the almost perfect identity of ge- transition between mtDNA conformations and recombina- nome units within highly divergent species. The concerted tion between units (discussed in next section). evolution of genome units should benefit the peculiar oniscid Interestingly, dimerization appears to have permitted fur- mt genome in at least two ways. First, adaptive evolution of ther genome compaction. While this evolution seems coun- dimeric mtDNA molecules would be severely constrained terproductive in terms of space saving, the net increase in without recombination or any other mechanism susceptible molecule size should not be seen as inefficient compaction. An to homogenize genome units. In the absence of such mecha- asymmetric dimer may be slightly more efficient in storing nisms, an adaptive mutation would indeed remain at a “het- genetic content than two monomeric mtDNA molecules car- erozygous” state until the equivalent mutation occurs at the rying all tRNA genes, and should not require more energy for mirrored site of the other genome unit. Second, recombina- replication. Since returning to a standard monomeric genome tion restricts divergence of mirrored mitochondrial genes

278 J. Peccoud et al. that are bound to fulfill the same fundamental function (cel- mussel Mytilus trossulus from the Baltic. Mol. Biol. Evol. 20: lular respiration or mt protein synthesis). Alternatively to 388–392. recombination, replication of monomers with complemen- Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos et al., 2009 BLAST+: architecture and applications. BMC Bio- tary bases into dimers can homogenize genome units, informatics 10: 421. thereby offering an adaptive explanation for the existence Chaisson, M. J., and G. Tesler, 2012 Mapping single molecule of linear monomers. We, however, view this process as dele- sequencing reads using basic local alignment with successive terious, since it should predominantly yield totally symmet- refinement (BLASR): application and theory. BMC Bioinfor- rical dimers lacking tRNAs genes. matics 13: 238. Chandler, C. H., M. Badawi, B. Moumen, P. Greve, and R. Cordaux, Regardless of the underlying mechanisms, homogeniza- 2015 Multiple conserved heteroplasmic sites in tRNA genes in tion of genome units of a dimer proceeds at a moderate pace. the mitochondrial genomes of terrestrial isopods (Oniscidea). Indeed, the genome units of A. vulgare WXf differ at three G3 5: 1317–1322. private sites (Table 2), and other similar sites have been Dickey, A. M., V. Kumar, J. K. Morgan, A. Jara-Cavieres, R. G. Shatters reported in lineages from C. convexus and T. rathkei et al., 2015 A novel mitochondrial genome architecture in thrips (Insecta: Thysanoptera): extreme size asymmetry among chromo- (Chandler et al. 2015). None of the three private WXf muta- somes and possible recent control region duplication. BMC Geno- tions are involved in the encoding of alternative tRNAs, and mics 16: 439. no evidence suggests that variation at these positions is selected. Dorai-Raj, S., 2014 Binom: Binomial Confidence Intervals For Variation at these sites is simply maintained through the inher- Several Parameterizations. Available at: https://cran.r-project. itance of the asymmetric dimers carrying it. The accumulation of org/package=binom. Accessed: January 9, 2015. Doublet,V.,C.Souty-Grosset,D.Bouchon,R.Cordaux,andI.Marcadé, three asymmetric in A. vulgare WXf must have taken 2008 A thirty million year-old inherited heteroplasmy. PLoS One thousands of generations. Relatively long maintenance of asym- 3: e2938. metric mutations may have left more time for the loss of tRNA Doublet, V., R. Raimond, F. Grandjean, A. Lafitte, C. Souty-Grosset loci, under the evolutionary scenario we described previously. et al., 2012 Widespread atypical mitochondrial DNA structure Once these tRNA loci have been lost, variation at mirrored an- in isopods (crustacea, Peracarida) related to a constitutive het- eroplasmy in terrestrial species. Genome 55: 234–244. ticodons must have been maintained for millions of generations Doublet, V., Q. Helleu, R. Raimond, C. Souty-Grosset, and I. Mar- by the selection of asymmetric molecules in the face of homog- cadé, 2013 Inverted repeats and genome architecture conver- enization of genome units. sions of terrestrial isopods mitochondrial DNA. J. Mol. Evol. 77: 107–118. Doublet, V., E. Ubrig, A. Alioua, D. Bouchon, I. Marcadé et al., Acknowledgments 2015 Large gene overlaps and tRNA processing in the compact mitochondrial genome of the crustacean Armadillidium vulgare. We thank Isabelle Giraud, Thomas Becking, and Lise RNA Biol. 12: 1159–1168. Ernenwein for animal rearing and preparation of DNA Drummond, A. J., B. Ashton, S. Buxton, M. Cheung, A. Coope et al., samples used for sequencing. We also thank Matthew Hahn 2010 Geneious v5. Available at: http://www.geneious.com/. and two anonymous reviewers for their recommendations Accessed: July 19, 2013. Edgar, R. C., 2004 MUSCLE: multiple sequence alignment with and comments on the manuscript. This work was funded by high accuracy and high throughput. Nucleic Acids Res. 32: European Research Council Starting Grant 260729 (Endo- 1792–1797. SexDet) and Agence Nationale de la Recherche Grant ANR- Eid, J., A. Fehr, J. Gray, K. Luong, J. Lyle et al., 2009 Real-time 15-CE32-0006-01 (CytoSexDet) to R.C., the 2015–2020 DNA sequencing from single polymerase molecules. Science – State-Region Planning Contract and European Regional De- 323: 133 138. Fichot, E. B., and R. S. Norman, 2013 Microbial phylogenetic pro- velopment Fund, and intramural funds from the Centre Na- filing with the Pacific Biosciences sequencing platform. Micro- tional de la Recherche Scientifique and the University of biome 1: 10. Poitiers. C.C. was funded by the National Science Founda- Gantenbein, B., V. Fet, I. A. Gantenbein-Ritter, and F. Balloux, tion (grant NSF-DEB1453298). 2005 Evidence for recombination in scorpion mitochondrial DNA (Scorpiones: Buthidae). Proc. R. Soc. Lond., Ser. B: Biol. Sci. 272: 697–704. Gissi, C., G. Pesole, F. Mastrototaro, F. Iannelli, V. Guida et al., Literature Cited 2010 Hypervariability of ascidian mitochondrial gene order: exposing the Myth of deuterostome organelle genome stability. Abascal, F., D. Posada, and R. Zardoya, 2012 The evolution of the Mol. Biol. Evol. 27: 211–215. mitochondrial genetic code in arthropods revisited. MDN 23: Goddard, J. M., and D. R. Wolstenholme, 1980 Origin and direc- – 84 91. tion of replication in mitochondrial-dna molecules from the ge- Becking, T., I. Giraud, M. Raimond, B. Moumen, C. Chandler et al., nus Drosophila. Nucleic Acids Res. 8: 741–757. 2017 Diversity and evolution of sex determination systems in Helfenbein, K. G., H. M. Fourcade, R. G. Vanjani, and J. L. Boore, terrestrial isopods. Sci. Rep. 7: 1084. 2004 The mitochondrial genome of Paraspadella gotoi is highly Boore, J. L., 1999 Animal mitochondrial genomes. Nucleic Acids reduced and reveals that chaetognaths are a sister group to pro- Res. 27: 1767–1780. tostomes. Proc. Natl. Acad. Sci. USA 101: 10639–10643. Breton, S., and D. T. Stewart, 2015 Atypical mitochondrial inher- Hoarau, G., S. Holla, R. Lescasse, W. T. Stam, and J. L. Olsen, itance patterns in eukaryotes. Genome 58: 423–431. 2002 Heteroplasmy and evidence for recombination in the mi- Burzynski, A., M. Zbawicka, D. O. F. Skibinski, and R. Wenne, tochondrial control region of the flatfish Platichthys flesus. Mol. 2003 Evidence for recombination of mtDNA in the marine Biol. Evol. 19: 2261–2264.

PacBio and Atypical Mitogenomes 279 Joers, P., and H. T. Jacobs, 2013 Analysis of replication intermediates Robinson, J. T., H. Thorvaldsdottir, W. Winckler, M. Guttman, E. S. indicates that Drosophila melanogaster mitochondrial DNA repli- Lander et al., 2011 Integrative genomics viewer. Nat. Biotech- cates by a strand-coupled theta mechanism. PLoS One 8: e53249. nol. 29: 24–26. Kilpert, F., C. Held, and L. Podsiadlowski, 2012 Multiple rearrange- Rogers, H. H., and S. Griffiths-Jones, 2014 tRNA anticodon shifts ments in mitochondrial genomes of isopoda and phylogenetic im- in eukaryotic genomes. RNA 20: 269–281. plications. Mol. Phylogenet. Evol. 64: 106–117. Singh, T. R., G. Tsagkogeorga, F. Delsuc, S. Blanquart, N. Shenkar Langmead, B., and S. L. Salzberg, 2012 Fast gapped-read align- et al., 2009 Tunicate mitogenomics and phylogenetics: pecu- ment with Bowtie 2. Nat. Methods 9: 357–359. liarities of the Herdmania momus mitochondrial genome and Lawrence, M., W. Huber, H. Pages, P. Aboyoun, M. Carlson et al., support for the new chordate phylogeny. BMC Genomics 10: 2013 Software for computing and annotating genomic ranges. 534. PLoS Comput. Biol. 9: e1003118. Slate, J., and N. J. Gemmell, 2004 Eve ‘n’ Steve: recombination of Leclercq, S., J. Thézé, M. A. Chebbi, I. Giraud, B. Moumen et al., human mitochondrial DNA. Trends Ecol. Evol. 19: 561–563. 2016 Birth of a W sex chromosome by horizontal transfer of Stewart, J. B., and P. F. Chinnery, 2015 The dynamics of mito- Wolbachia bacterial symbiont genome. Proc. Natl. Acad. Sci. chondrial DNA heteroplasmy: implications for human health USA 113: 15036–15041. and disease. Nat. Rev. Genet. 16: 530–542. Li, G. M., 2008 Mechanisms and functions of DNA mismatch re- Suga, K., D. B. M. Welch, Y. Tanaka, Y. Sakakura, and A. Hagiwarak, pair. Cell Res. 18: 85–98. 2008 Two circular chromosomes of unequal copy number make Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., up the mitochondrial genome of the rotifer Brachionus plicatilis. 2009 The sequence alignment/map format and SAMtools. Bio- Mol. Biol. Evol. 25: 1129–1137. informatics 25: 2078–2079. Tatarenkov, A., and J. C. Avise, 2007 Rapid concerted evolution in Liu, Y. G., T. Kurokawa, M. Sekino, T. Tanabe, and K. Watanabe, animal mitochondrial DNA. Proc. Biol. Sci. 274: 1795–1798. 2013 Complete mitochondrial DNA sequence of the ark shell Travers, K. J., C. S. Chin, D. R. Rank, J. S. Eid, and S. W. Turner, Scapharca broughtonii: an ultra-large metazoan mitochondrial 2010 A flexible and efficient template format for circular con- genome. Comp. Biochem. Physiol. Part D Genomics Proteomics sensus sequencing and SNP detection. Nucleic Acids Res. 38: 8. 8: 72–81. Ujvari, B., M. Dowton, and T. Madsen, 2007 Mitochondrial DNA Marcadé, I., R. Cordaux, V. Doublet, C. Debenest, D. Bouchon et al., recombination in a free-ranging Australian lizard. Biol. Lett. 3: 2007 Structure and evolution of the atypical mitochondrial 189–192. genome of Armadillidium vulgare (Isopoda, crustacea). J. Mol. Walker, B. J., T. Abeel, T. Shea, M. Priest, A. Abouelliel et al., Evol. 65: 651–659. 2014 Pilon: an integrated tool for comprehensive microbial Okimoto, R., J. L. Macfarlane, D. O. Clary, and D. R. Wolstenholme, variant detection and genome assembly improvement. PLoS 1992 The mitochondrial genomes of two nematodes, Caeno- One 9: e112963. rhabditis elegans and Ascaris suum. Genetics 130: 471–498. Watanabe, K., and S.-i. Yokobori, 2011 tRNA modification and Pagès, H., P. Aboyoun, R. Gentleman, and S. Debroy, 2017 Biostrings: genetic code variations in animal mitochondria. J. Nucleic Acids String Objects Representing Biological Sequences, and Matching Algo- 2011: 623095. rithms. R package version 2.44.1. Wolff, J. N., D. J. White, M. Woodhams, H. E. White, and N. J. Raimond, R., I. Marcadé, D. Bouchon, T. Rigaud, J. P. Bossy et al., Gemmell, 2011 The strength and timing of the mitochondrial 1999 Organization of the large mitochondrial genome in the Bottleneck in salmon suggests a conserved mechanism in verte- isopod Armadillidium vulgare. Genetics 151: 203–210. brates. PLoS One 6: e20522. R Core Team, 2014 R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna. Communicating editor: M. W. Hahn

280 J. Peccoud et al.