Miniaturized mitogenome of the parasitic PNAS PLUS Viscum scurruloideum is extremely divergent and dynamic and has lost all nad genes

Elizabeth Skippingtona, Todd J. Barkmanb, Danny W. Ricea, and Jeffrey D. Palmera,1

aDepartment of Biology, Indiana University, Bloomington, IN 47405; and bDepartment of Biological Sciences, Western Michigan University, Kalamazoo, MI 49008

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved June 1, 2015 (received for review March 5, 2015) Despite the enormous diversity among parasitic angiosperms in revealed that host-to-parasite HGT has been frequent in Raf- form and structure, life-history strategies, and plastid genomes, flesiaceae mitogenomes, which otherwise are relatively unremark- little is known about the diversity of their mitogenomes. We able with respect to gene content and sequence divergence (8). report the sequence of the wonderfully bizarre mitogenome of Depending on the Rafflesiaceae species, 24–41% of protein genes the hemiparasitic aerial mistletoe Viscum scurruloideum. This ge- are inferred to have been acquired by HGT. The repetitive nature nome is only 66 kb in size, making it the smallest known angio- of Rafflesiaceae mtDNAs and the short reads used in these studies sperm mitogenome by a factor of more than three and the rendered assembly of complete genome sequences impractical, smallest land plant mitogenome. Accompanying this size reduc- but with a minimum size of 320 kb, the Rafflesia lagascae mito- tion is exceptional reduction of gene content. Much of this reduc- genome (4) falls within the known angiosperm size range (0.2– tion arises from the unexpected loss of respiratory complex I 11.3 Mb) (9, 10). (NADH dehydrogenase), universally present in all 300+ other an- To help remedy the lack of knowledge of parasitic plant giosperms examined, where it is encoded by nine mitochondrial mitogenomes, we selected the Santalales for comparative se- and many nuclear nad genes. Loss of complex I in a multicellular quencing for two reasons. First, the Santalales is the largest organism is unprecedented. We explore the potential relationship group of parasitic (with more than 2,000 parasitic species) EVOLUTION Viscum between this loss in and its parasitic lifestyle. Despite its and is also highly diverse (11, 12). This diversity is manifest in Viscum small size, the mitogenome is unusually rich in recombina- terms of autotrophic autonomy, with members of the order tionally active repeats, possessing unparalleled levels of predicted ranging from free-living nonparasitic trees to green, photo- sublimons resulting from recombination across short repeats. synthesizing hemiparasites to highly derived species that lack Many mitochondrial gene products exhibit extraordinary levels of leaves and maintain only minimal photosynthesis at narrow divergence in Viscum, indicative of highly relaxed if not positive se- Viscum points in the life cycle. The order also includes holoparasites lection. In addition, all mitochondrial protein genes have (completely nonphotosynthetic heterotrophs) if further phylo- experienced a dramatic acceleration in synonymous substitution genetic study confirms the tentative placement of the bizarre rates, consistent with the hypothesis of genomic streamlining in holoparasitic family Balanophoraceae as sister to, if not a response to a high mutation rate but completely opposite to the member of, Santalales (2, 13). Another important aspect of this pattern seen for the high-rate but enormous mitogenomes of diversity is that Santalales have undergone multiple transitions Silene. In sum, the Viscum mitogenome possesses a unique con- from root parasitism (the ancestral parasitic condition in the stellation of extremely unusual features, a subset of which may be group) to aerial parasitism, with these “mistletoes” representing related to its parasitic lifestyle.

mitogenome | mutation rate | complex I | parasitic plants | Significance genome reduction The mitochondrial genomes of flowering plants are character- arasitism has evolved at least 12 or 13 times in angiosperms, ized by an extreme and often perplexing diversity in size, or- Pwith parasitic plants comprising about 1% of known angio- ganization, and mutation rate, but their primary genetic sperm species (1, 2). Parasitic angiosperms vary enormously in function, in respiration, is extremely well conserved. Here we size, life history, anatomy, physiological adaptation to a parasitic present the mitochondrial genome of an aerobic parasitic Viscum scurruloideum lifestyle, and nutritional dependence on host plants (3). Plastid plant, the mistletoe . This genome is genomes have been sequenced from many parasitic species and miniaturized, shows clear signs of rapid and degenerative lineages and vary enormously in size and gene content, from evolution, and lacks all genes for complex I of the respiratory those that are typical of nonparasitic photosynthetic plants to electron-transfer chain. To our knowledge, this is the first re- Rafflesia lagascae, which may not even have a plastid genome (4– port of the loss of this key respiratory complex in any multi- Viscum 6). Despite this great diversity, the mitogenomes of parasitic cellular eukaryote. The mitochondrial genome has plants are largely unexplored. Most studies of parasitic plant taken a unique overall tack in evolution that, to some extent, mtDNAs have dealt primarily with their apparent propensity for likely reflects the progression of a specialized parasitic lifestyle. uptake of foreign DNA via horizontal gene transfer (HGT). HGT Author contributions: E.S. and J.D.P. designed research; E.S., T.J.B., and D.W.R. performed is relatively common in angiosperm mitogenomes, and is especially research; D.W.R. contributed new reagents/analytic tools; E.S., T.J.B., D.W.R., and J.D.P. common in parasitic plants, in which HGT is facilitated by the analyzed data; and E.S. and J.D.P. wrote the paper. intimate physical connection between parasites and their host The authors declare no conflict of interest. plant (1, 2, 7). This article is a PNAS Direct Submission. Rafflesiaceae, a small family of endophytic holoparasites fa- Data deposition: The sequence reported in this paper has been deposited in the GenBank mous for possessing the largest flowers in the world, is the only database, www.ncbi.nlm.nih.gov/genbank/ (accession nos. KT022222 and KT022223). parasitic plant clade for which large-scale mitochondrial genomic 1To whom correspondence should be addressed. Email: [email protected]. sequencing has been undertaken (4, 8). Sequences of 38 genes This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. common to three Rafflesiaceae species and two host species 1073/pnas.1504491112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1504491112 PNAS Early Edition | 1of10 Downloaded by guest on September 24, 2021 the majority of aerial parasitic plants (14). Second, we recently showed (15) that the Santalales are one of the principal donors of foreign mtDNA to the amazingly HGT-rich mitogenome of the basal angiosperm Amborella trichopoda. Accordingly, we wished to determine the number and identity of the Santalales donors and, to the extent possible, the timing and mechanism of these transfers. Here we present the mitogenome sequence of the hemi- parasitic mistletoe Viscum scurruloideum. Unlike Rafflesiaceae mitochondrial genomes, this Viscum genome shows little evi- dence of HGT. However, it is highly unusual in a number of other ways, including genome size, mutation rate, selective pressure, levels of repeat-mediated recombination and sub- limons, and the loss of many genes, including the entire suite of nine mitochondrial genes encoding respiratory complex I. We consider the implications of the unprecedented (for multicellular organisms) loss of complex I and Viscum’s host dependence. Results and Discussion The Smallest Angiosperm Mitochondrial Genome. The Viscum mito- genome assembly consists of two contigs, a circular contig 42,186 bp in length and a linear contig 23,687 bp in length (Fig. 1A). High-depth contigs (>80×) were constructed from a short (100-bp) paired-end sequencing library (SI Appendix, Fig. S1). Searches for potentially missing mitochondrial sequences did not produce any similarly high-depth contigs (Materials and Methods). The Viscum genome was compared with 33 genomes chosen to sample the phylogenetic diversity of sequenced an- giosperm mitogenomes (Fig. 2). With the exception of Silene, for which four species were included to capture the remarkable heterogeneity in genome size and mutation rate within the genus (9), only one species is represented per genus, and at most two per family are included. Angiosperm mitogenomes are excep- tional for their large and variable sizes (9, 10). At 66 kb, Viscum is the smallest (by a factor of 3.3) angiosperm mitogenome se- quenced to date (Fig. 2), is smaller than any other land plant genome (the next smallest is the moss Bauxbaumia at 101 kb), and among charophytes (the green algal ancestors of land plants) is closest in size to some of the smallest known genomes, those of Entransia (62 kb) and Chara (68 kb) in particular. Despite its small size, Viscum’s guanine-cytosine content of 47.4% is un- remarkable compared with that of other angiosperms, which generally fall in the range of 43–45% (10). Fig. 1. The mitochondrial genome of V. scurruloideum.(A) Gene map, with Between V. scurruloideum and the astoundingly large mito- gene orientation shown for all genes ≥200 nt in length. Note that the larger genome of (11.3 Mb), the known range of angio- of the two contigs is circular but is shown as arbitrarily linearized. The dotted sperm mitogenome sizes is now on the order of 200-fold. In line indicates probable trans-splicing of the first intron in the cox2 gene. (B) Repeat map, for both contigs, with curved lines connecting pairs of repeats striking contrast to the highly reduced mitogenome of Viscum, and with line width proportional to repeat size. Repeats larger than the library the four species of Viscum examined thus far have exceptionally fragment size (800 bp) are in gray, repeats with one or more read pairs con- large nuclear genomes ranging in size from 2C = 122–201 Gb sistent with a rearrangement event across the repeat are in blue, and the (16). These are the largest nuclear genome sizes reported for remaining repeats are in red. Boxes on the outer ring mark the positions of , a group that contains more than 70% of the ca. 350,000 genes, colored as in A. The numbers give genome coordinates in kilobases. species of angiosperms. No information is available on Viscum plastid genomes. ccmB, matR, and all nine nad genes (nad1, nad2, nad3, nad4, Dramatically Reduced Core Coding Capacity and Intron Content. nad4L, nad5, nad6, nad7, and nad9) (Fig. 2). To check for the Angiosperm mitogenomes virtually always share the same set possibility that these genes fall within an assembled portion of of 24 core protein genes, mostly coding for respiratory proteins, mtDNA, total genomic reads were searched for evidence of but often differ substantially in their inventories of a further homology with these genes using BLAST. Because 95% of our 17 protein genes mostly encoding ribosomal proteins (17, 18). mitochondrial assembly has a read depth greater than 80 (SI Three species of Silene possess the most protein-gene–poor an- Appendix, Fig. S1), we expect mitochondrial reads matching giosperm mitogenomes described to date (9). Two of these these genes also to give depths greater than 80. Although eight of species share the entire core set of 24 protein genes, and the these 11 genes yielded a smattering of matches, the low read third contains 23 of these genes, with its loss of ccmFc the only depths do not support the conclusion that these reads are of known case of such loss in angiosperms. Conversely, the three mitochondrial origin (SI Appendix, SI Discussion). Because the Silenes have lost all but one or two of the 17 genes comprising the loss of these three classes of genes is highly unexpected—all 11 variable set (Fig. 2). The Viscum mitogenome presents a dra- genes are found in all 300+ other angiosperm mitogenomes ex- matic departure with respect to the core genes. It has lost the amined (Fig. 2) (10, 17)—they are discussed in separate sections entirety of 11 of the 24 core protein genes. These losses include below. The Viscum genome also has lost 11 of the 17 variably

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al. Downloaded by guest on September 24, 2021 PNAS PLUS EVOLUTION

Fig. 2. Genome size and protein-gene content of 34 sequenced angiosperm mitochondrial genomes and outgroup Cycas taitungensis. Intact genes are dark gray, pseudogenes are light gray, and absent genes are white. The phylogeny shown is the current best estimate of relationships among these plants (ref. 78 and references therein).

present protein genes, with nine of these genes missing entirely 18 of these introns (9). Viscum, however, contains only three and the other two present as obvious pseudogenes (Fig. 2). Al- group II introns (SI Appendix, Fig. S6). Most of this reduction in together, only 19 intact and potentially functional protein genes intron number can be attributed to the loss of the nad genes, were identified in the genome assembly. This protein-coding which commonly possess 19 group II introns in seed plants (SI capacity is the smallest identified in an angiosperm mitogenome, Appendix, Fig. S6). In addition, Viscum has lost rpl2, which five genes fewer than the already highly reduced capacity in Silene. normally contains a single group II intron, and a single intron has Three sequences were identified that could be folded into the been lost from each of the otherwise intact genes rps3 and rps10. typical tRNA structure (SI Appendix, Fig. S2). These sequences, In addition, the first intron in the Viscum cox2 gene is probably which correspond to trnW(cca), trnfM (cau), and trnK (uuu), trans-spliced, given the physical separation of the first exon from have well-conserved anticodon and dihydrouridine stem se- the rest of the gene (Fig. 1A) and that the exonic boundaries of quences but contain mismatches in other regions of the sec- these two regions do not contain stop and start codons, re- ondary structure. Although the Viscum mitogenome contains far spectively, as expected under the alternative model of gene fis- fewer tRNA genes than most other angiosperms, similarly re- sion. Trans-splicing in this intron is known to have arisen in two duced tRNA gene content (two and three, respectively) has been other lineages of vascular plants (20, 21). reported for the S. conica and Silene noctiflora mitogenomes (SI The loss of so many genes and their (mostly accompanying) Appendix, Fig. S3) (9). introns accounts for an appreciable fraction (ca. 25%) of the size Large and small subunit rRNA (LSU rRNA and SSU rRNA, reduction in the 66-kb Viscum mitogenome compared with the respectively) genes were identified also. Although both rRNA next smallest sequenced angiosperm mitogenome (222 kb) in our gene sequences are exceptionally divergent (Exceptional Di- comparative dataset (Fig. 2). Most of the loss in gene space is a vergence of Viscum Protein and rRNA Genes), superimposition consequence of the loss of the intron-rich nad genes, which onto the Oenothera berteriana mitochondrial SSU rRNA and Zea typically comprise nearly 40 kb of DNA, or more than half the mays mitochondrial LSU rRNA secondary structures (SI Ap- gene space of normal angiosperm mitogenomes (SI Appendix, pendix, Figs. S4 and S5, respectively) revealed significant con- Fig. S7). The rest of the Viscum size reduction is a consequence servation at the secondary structure level, indicating that these of a massive reduction in the amount of intergenic spacer DNA, genes are still functional in Viscum. Secondary structure and which totals ca.36kbinViscum compared with ca. 150 kb in the phylogenetic analyses were not carried out using the 5S rRNA 222-kb Brassica napus genome and more than 11 Mb in S. conica. because of its short length (<100 nt), but a pairwise alignment of its sequence to the Liriodendron 5S rRNA yielded 87% identity Loss of Mitochondrial Maturase Gene matR. The loss of the maturase and 5% gaps. gene matR could reflect a very rare case of functional transfer The ancestral angiosperm mitogenome contained 25 group II to the nucleus. However, because matR resides within nad1 introns (19). Most angiosperm genomes examined contain nearly intron 4, it is more likely that matR was lost along within the this entire set (SI Appendix, Fig. S6), and none have fewer than entire suite of mitochondrial nad genes in Viscum. MatR is the

Skippington et al. PNAS Early Edition | 3of10 Downloaded by guest on September 24, 2021 only mitochondrial-specified maturase homolog present in an- Table 1. Summary of Viscum protein-gene divergence giosperms, and although the precise function of MatR in mito- Coding chondria is uncertain, it is clearly homologous to proteins known sequence to assist in group II intron splicing in other mitochondrial genomes length, nt % amino acid identity* (22). Furthermore, preliminary data suggest that MatR binds to † † † several group II introns in vivo (23). If MatR is indeed a poly- Gene Viscum Lirio Viscum vs. Lirio Vitis vs. Lirio dN dS dN/dS valent splicing factor, the precise consequences of the loss of this gene for the group II intron-splicing process in Viscum need to cox1 1,611 1,572 83 98 0.11 0.77 0.14 be elucidated. atp9 219 225 81 93 0.05 0.68 0.07 atp1 1,533 1,527 72 95 0.19 1.04 0.18 Possible Rare Transfer of ccmB to the Nucleus. In contrast to the atp6 912 720 71 95 0.21 0.91 0.24 matR situation, functional transfer to the nucleus is the best cox2 750 762 69 90 0.26 1.01 0.25 explanation for the absence of ccmB from Viscum mtDNA. In cob 1,221 1,179 69 97 0.19 1.05 0.18 most plants, mitochondrial biogenesis of cytochrome c is carried cox3 795 795 66 97 0.22 0.84 0.27 out via a maturation pathway encoded by four mitochondrial rps12 411 375 61 95 0.31 3.89 0.08 ccm genes (ccmB, ccmC, ccmFc, and ccmFn) and several nuclear ccmC 654 699 56 91 0.39 1.06 0.37 genes (24). Because ccmC, ccmFc, and ccmFn are all present as mttB 789 720 46 95 intact genes in the Viscum mitogenome, albeit in moderately to ccmFn 1,467 1,683 43 92 exceptionally divergent forms (Table 1), the ccm complex is al- rps10 273 357 42 96 most certainly functional in Viscum, and therefore ccmB prob- rpl16 378 432 44 99 ably has been functionally located to the nucleus. CcmB is a atp8 483 474 41 96 hydrophobic protein with six transmembrane domains that tra- rps4 1,224 1,035 39 93 verse the inner mitochondrial membrane (24). Although func- ccmFc 1,215 1,326 38 87 tional transfer of ccmB has not been reported previously in plants sdh4 390 387 34 88 (Fig. 2) (17), if this gene experienced anything approaching ccmFc- rps3 1,518 1,554 32 93 and ccmFn-like levels of divergence while in the mitogenome, this atp4 474 558 29 93

divergence may have altered the hydrophobicity of CcmB to an Synonymous and nonsynonymous (dS and dN, respectively) divergence extent that allowed it to be imported more readily into the mito- and dN/dS ratios are given for the nine best-conserved mitochondrial protein chondrion following ccmB functional transfer to the nucleus. Such genes of Viscum. a transfer would be analogous to other cases of rare mitochon- *Identity calculation excludes gap positions. † drial-to-nuclear gene transfer, which have been accompanied by a Liriodendron tulipifera. reduction in hydrophobicity of the encoded protein (25, 26). Un- × fortunately, nuclear coverage in our sequence data (ca.0.1; “ ” Materials and Methods and SI Appendix, SI Discussion)isinade- cleus. Computational searches for nad-like reads within the quate to detect a full-length transferred copy of ccmB, and therefore total genomic DNA yielded only a small number of matches, testing of the transfer hypothesis must await nuclear transcriptome most of which corresponded to intronic sequences that are un- or genome sequencing. likely to be properly expressed in the nucleus (SI Appendix, SI Discussion). Seven of the nine nad genes found in all 300+ other Unprecedented Loss of Complex I in a Multicellular Organism. The angiosperm mitogenomes examined (Fig. 2) (9, 17) have rarely, loss of all mitochondrial nad genes in Viscum is of special interest if ever, been found functionally transferred to the nucleus in because it very likely corresponds to the loss of an entire respiratory nonangiosperms. Indeed, it is these seven nad genes that are complex, something that has never been reported before for any found in all examined animal mitogenomes. Three of the seven plant or alga, including the holoparasite Rafflesia (4). The nad genes (nad1, nad4, and nad5) are universally present in all mitoge- contain five or six trans-spliced introns in angiosperms, with the 14 nomes of eukaryotes that retain complex I, whereas nad2, nad3, or 15 nad transcription units scattered essentially randomly in the nad4L, and nad6 have been reported lost from the mitochondrial rapidly rearranging mitogenomes of angiosperms (10, 19). There- genome (and presumably functionally transferred to the nucleus) fore, the eradication of all traces of nad genes from the Viscum only one to three times across eukaryotes (31). The difficulty of mitogenome probably occurred via many separate deletions, pre- functionally transferring these seven nad genes to the nucleus is sumably occurring over some period of time. This likelihood, probably the result of severe mechanistic challenges to the suc- combined with the presence of pseudogenes for other “lost” genes cessful transport of their hydrophobic products across the outer (i.e., rps1 and sdh3), suggests that the loss of selection on complex I mitochondrial membrane and insertional assembly within the may have occurred much earlier than the loss of rps1 and sdh3. inner membrane (32, 33). Therefore, even in the absence of any ATP formation in mitochondria occurs mainly through oxi- meaningful information on nad gene status in the nucleus of V. dative phosphorylation. This is achieved by a process of electron scurruloideum, the loss of all nine nad genes from the V. scur- transfer through an assemblage of more than 20 carriers that are ruloideum mitogenome most likely reflects the loss of complex I organized into complexes I–IV of the electron transport chain from this plant. (27). With the notable exceptions of four lineages, all respiring The losses of complex I in two yeast lineages have been ac- eukaryotes contain complex I, a multisubunit, rotenone-sensitive companied by duplications of nuclear-encoded alternative NADH NADH dehydrogenase that is encoded by 5–12 mitochondrial dehydrogenases that bypass complex I and oxidize NADH without genes and a few dozen nuclear nad genes. The four exceptions, in proton translocation (34). Because these alternative means of + which all mitochondrial and nuclear nad genes have vanished, regenerating NAD are unlinked to proton pumping in the mito- include two distinct lineages of yeasts (28), the cryptomycotan chondrion, they generate less energy in the form of ATP. Although fungus Rozella (29), and the lineage comprising dinoflagellates, the evolutionary loss of complex I is unprecedented in plants, mu- apicomplexans, Chromera, and Oxyrrhis (30). That these are all tant lines are informative for understanding the potential conse- unicellular makes the case in Viscum particularly notable. quences of the loss of the nad genes in Viscum. In particular, three The loss of the entire suite of mitochondrial nad genes in mutant lines, Nicotiana sylvestris CMSII lacking the mitochondrial Viscum almost certainly reflects the loss of complex I itself, as nad7 gene (35, 36), Arabidopsis thaliana ndufs4 lacking a nuclear opposed to the functional transfer of all these genes to the nu- nad gene (37), and NCS2 maize plants lacking functional nad4,are

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al. Downloaded by guest on September 24, 2021 deficient in complex I (38, 39). Like the complex-I–deficient yeasts, pared with other angiosperm cox1 introns (SI Appendix, Fig. PNAS PLUS these plant mutants remain viable via use of alternative NADH S9A). This contrast strongly implies that the intron was acquired dehydrogenases (37, 38, 40), albeit with constitutively lower mito- recently in the Viscum lineage, most likely independently of in- chondrial phosphorylation efficiencies. Plants in general have the tron acquisitions in Lepionurus and Dendrophthoe, the two other ability to oxidize NADH and NADPH via several nuclear-encoded members of the Santalales known to possess the intron (SI Ap- alternative dehydrogenases (41). The determination of which of pendix, SI Discussion). The Viscum cox1 gene contains an un- these dehydrogenases Viscum uses to compensate for the lack of usually lengthy region of putatively converted exonic sequence complex I awaits experimental investigation and sequencing of its (known as a coconversion tract), as is also consistent with a re- huge nuclear genome. cent horizontal acquisition (SI Appendix, Figs. S9B and S10). The plant complex I mutants grow more slowly than their respective wild types and have other deficiencies: N. sylvestris Exceptional Divergence of Viscum Protein and rRNA Genes. Phylo- CMSII mutants exhibit male sterility and slow growth (36); genetic analysis (SI Appendix, Fig. S8) of two of the three rRNA A. thaliana ndufs4 shows depressed germination and growth (37); genes and the nine best-conserved protein genes in Viscum shows and homoplasmic NC2 maize plants exhibit sterility and severely that they are even more divergent, often considerably so, than depressed growth (42). Because Viscum does not exhibit no- genes in the fast-evolving S. conica and S. noctiflora mitoge- ticeably defective germination or growth, regardless of any del- nomes (9). Moreover, the rRNA trees seriously underestimate eterious effects caused by loss of complex I in Viscum, the the extent of rRNA divergence (SI Appendix, legend of Fig. S8), organism has been able to adapt and thrive. Analysis of the and the 10 most divergent protein genes in Viscum (Table 1) are CMSII mutant has shown that mitochondrial complex I is es- so divergent that we deemed it imprudent to use them for either sential for optimal photosynthetic performance (43). In partic- phylogenetic analysis or analysis of synonymous (dS) and non- ular, steady-state photosynthesis in CMSII is consistently reduced synonymous (dN) site divergence. Unusually frequent C-to-U by 20–30% at atmospheric CO2 levels, suggesting that this com- mRNA editing in Viscum is highly unlikely to account for its high plex I mutant is not able to use its photosynthetic capacity as well levels of inferred protein divergence (SI Appendix, SI Discus- as wild-type N. sylvestris despite the action of alternative de- sion); rather, as described below, this divergence appears to be hydrogenases (40, 44). It has long been known that mistletoes the consequence of major changes in both mutational and se- typically have lower photosynthetic rates than their hosts (45–48). lective pressures. Thus, the lower rates of photosynthesis and the loss of complex I Full alignments for the 10 most divergent Viscum proteins are EVOLUTION may be interrelated in Viscum. shown in SI Appendix, Fig. S11. These proteins are characterized The precise molecular interactions underpinning the re- by short regions of moderate-to-high sequence similarity in- lationship between reduced photosynthetic efficiency and the terspersed with longer regions exhibiting little to no detectable mitochondrial electron transport chain have not been described, sequence similarity. These latter regions often include a number but it is notable that under conditions in which CO2 fixation of indels, and the reliability of the alignment in these regions is (and, hence, sucrose synthesis) is enhanced, photosynthesis in often questionable. Their extraordinary level of divergence raises CMSII is less inhibited relative to wild-type N. sylvestris. This the question of whether these 10 “genes” code for functional observation suggests that the decrease in photosynthesis in proteins. The most divergent proteins in Viscum are, for the most CMSII under atmospheric CO2 levels is not caused by in- part, encoded by ORFs of moderate to considerable length sufficient mitochondrial ATP production for sucrose synthesis (Table 1). Considering Viscum’s highly accelerated evolution, and is specific to conditions in which the enzymes necessary for such an accumulation of nucleotide substitutions should have photosynthesis are preoccupied with another task (44). We hy- introduced premature stop codons into most if not all of these pothesize that Viscum uses alternative dehydrogenases to com- ORFs by chance. Given the frequency of stop codons in the pensate for the loss of complex I in maintaining mitochondrial genome, and assuming a random distribution of point mutations, ATP production but suffers an associated reduction in photo- the probabilities of getting ORFs of lengths 50, 100, 150, and 200 synthetic efficiency. Whether cause or cure, parasitism could codons by chance alone are 0.14, 0.02, 0.003, and 0.0004, re- compensate for some loss of photosynthetic ability. spectively. Thus, it is likely that these proteins are, by and large, functional, especially because the majority of them are of com- Recent HGT of the cox1 Intron. We find little evidence suggestive of parable length to those found in other angiosperms (Table 1). It HGT in Viscum mtDNA. First, there is the lack of multiple, should be noted that the unprecedented level of divergence in divergent copies of any genes in the Viscum mitogenome. In Viscum makes it difficult to rule out the possibility that un- contrast, most cases of plant mitochondrial HGT reported thus identified protein genes exist in the mitogenome but have far have resulted in the maintenance of both horizontally ac- evolved beyond our ability to recognize them. quired and native gene copies within the mitogenome, with To assess further the level of divergence in Viscum protein Rafflesiaceae and Amborella especially notable in this regard (8, genes, dS and dN values were estimated for the nine best-con- 15). Second, the extreme divergence of all 21 Viscum protein and served genes (Fig. 3, Table 1, and SI Appendix, Fig. S12). Syn- rRNA genes (see the next section and SI Appendix, Fig. S8), onymous substitutions in protein genes often are assumed to be which confounds phylogenetic detection of HGT, also means more or less free from natural selection and thus are used fre- that any transfers must be relatively ancient, occurring before the quently to approximate the neutral mutation rate (52). The accumulation of a significant fraction of this divergence. Finally, patterns of divergence in dS trees (Fig. 3A and SI Appendix,Fig. no evidence of gene conversion was found in any Viscum genes S12) are similar to those in the trees based on all three codon using OrgConv with Bonferroni correction for multiple tests (49). positions (SI Appendix,Fig.S8). For almost all genes, dS branch The only clear evidence of HGT in the Viscum mitogenome lengths are longer for Viscum than for any of the other sampled involves its sole group I intron, located in the cox1 gene, which angiosperms, including the high-rate Silene species (Fig. 3). This also is the only group I intron found in any angiosperm mito- divergence indicates a highly elevated mutation pressure operating genome. This intron has been acquired more or less recently in (perhaps still so) across the mitogenome during a portion of the many angiosperm lineages by hundreds of mitochondrial-to-mi- Viscum lineage time. Estimates of the frequency and magnitude of tochondrial horizontal transfer events (2, 50, 51). In striking this mutation-rate increase await sampling of mitogenomes from a contrast to the extremely divergent gene sequences in Viscum (SI phylogenetically broad range of Santalales taxa and the construction Appendix, Fig. S8), including that of cox1 itself, the Viscum cox1 of molecular chronograms for these taxa. Increased sequence di- intron is unexceptional in its level of sequence divergence com- vergence in seven sampled members of the Viscaceae (including

Skippington et al. PNAS Early Edition | 5of10 Downloaded by guest on September 24, 2021 Fig. 3. Elevated sequence divergence of the best-conserved mitochondrial protein genes in Viscum.(A) Phylograms of dS and dN sequence divergence for six of the nine best-conserved genes (Table 1). All trees are shown to the same scale. The absence of the cox2 gene in Vigna is indicated with an “X.” (B) Levels of

dS and dN sequence divergence for eight protein genes in Viscum and two rapidly evolving Silene mitochondrial genomes.

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al. Downloaded by guest on September 24, 2021 one species of Viscum) was observed in a concatenated phylogenetic PNAS PLUS analysis of one nuclear gene (SSU rDNA) and two plastid genes (rbcL, matK) (11). Although the magnitude of the mitochondrial elevation appears to be significantly higher than for the plastid and nuclear genes, it is possible that multiple factors underlie the mi- tochondrial situation and that one of these factors is common to all three genomes. If mutation rates and the effects of selection on synonymous sites were reasonably constant across the Viscum mitogenome, we would expect that synonymous substitution rates should be fairly constant among its protein genes. This expectation is met for eight of the nine best-conserved protein genes (dS = 0.68– 1.06), but rps12 is a clear outlier (dS = 3.89) (Table 1; also see SI Appendix, Fig. S12). This observation adds to the growing, and puzzling, recognition that angiosperm mitogenomes can experi- ence substantial heterogeneity in intragenomic rates, although the level found in Viscum pales against that in certain other genomes, especially that of Ajuga (53). As expected, given the highly elevated mutation rate(s) in Fig. 4. Alternative configurations for repeat pairs ≤100 bp in length. Viscum mtDNA, dN branch lengths are highly elevated in Viscum relative to other angiosperms (Fig. 3 and SI Appendix, Fig. S12). Contrary to the pattern seen for high-mutation-rate genomes in and many other angiosperm mtDNAs (9). Thus, there is no strict Silene (Fig. 3B) and in Plantago and Pelargonium (54), in which relationship between repeat content, genome size, and mutation – dN/dS is depressed (0.04 0.07 for a concatenate of atp1, cob, rate in angiosperm mtDNAs. cox1, cox2, and cox3, five of the nine best-conserved protein In angiosperm mitochondria, low-abundance substoichiometric genes in Viscum) relative to typically low-mutation angiosperm DNA molecules (termed “sublimons”) are often present, arising via genomes, the average dN/dS value for these five genes in Viscum recombination between short repeats (57–59). Most pairs of repeats EVOLUTION is 0.20 (Table 1). These different dN/dS values suggest a different in Viscum mtDNA are short; of those ≥30 bp in size, 97% (305 of interplay of mutational and selective forces among these in- 315) are ≤100 bp in length, and 86% of these are <50 bp in length. dependently arisen, high-mutation-rate lineages. In particular, it To assess recombination at short Viscum repeats, we counted the seems likely that the set of 10 extremely divergent protein genes number of read pairs that are consistent with the high-depth ref- (SI Appendix,Fig.S11) has and/or is evolving under relaxed, erence assembly and those that are consistent with the predicted if not to some extent positive, selection compared with other alternative configuration (AC) of the repeat-flanking single-copy angiosperms. sequences that would result from repeat-mediated recombination. As well as generally being very large, plant mitogenomes typi- To avoid inflating the estimates of ACs, when two or more repeat- cally are characterized by extremely low rates of point mutation pair environments were supported by the same set of spanning (54). In recent years, the mutational burden hypothesis, which reads, only one repeat pair was considered in our analysis. We predicts that mitogenome size should be negatively correlated with found evidence for ACs for 127 (78%) of the 162 short-repeat pairs mutation rate (55), has been adopted by some to explain the or- examined (Fig. 4 and SI Appendix,TableS1). There is a strong igins of variation in mitogenome size. Under this hypothesis, high correlation (r = 0.72, P < 0.001) between repeat length and AC mutation pressure enhances natural selection against nonessential DNA, and hence against large genomes. The reduced Viscum ge- frequency (Fig. 4), but there is no correlation between repeat sim- ilarity and AC frequency (SI Appendix,Fig.S13). nome and its highly elevated dS values are consistent with the hy- pothesis of genomic streamlining in response to a high mutation The AC frequency for short repeats in Viscum mtDNA is rate (55) but are in complete contrast to the enormous mitoge- markedly higher than in the nine other angiosperm mtDNAs for which AC frequency has been measured by high-depth genome nomes of S. noctiflora and S. conica, which have experienced both – dramatic accelerations in mitochondrial mutation rates and huge sequencing and also is higher than in angiosperms in which non expansions of intergenic DNA (9). These entirely opposite findings genome-scale approaches have been used (59). Not even a single AC was detected for the vast number (>300,000 and >70,000) of emphasize the need for further genome sequencing to provide a – broader perspective on the remarkable genomic diversity across short-repeat pairs (30 100 bp in length) present in the mitoge- angiosperms and to enable refined hypotheses regarding the un- nomes of S. conica (11.3 Mb) and cucumber (1.7 Mb), re- doubtedly multifaceted relationship among mutation rate, genome spectively (9, 60). Alternative configurations were detected for size, and other aspects of genome evolution. only one of the ca. 2,400 short repeats in the 6.7-Mb genome of S. noctiflora genome (9, 60) and for only two of 110 short repeats Abundant Repeats and Extreme Levels of Recombination and Sublimons. in the 404-kb genome of Vigna angularis (61). AC frequency was Repetitive sequences cover 26 kb (39%) of the Viscum mitoge- not reported for the 526-kb Mimulus mitogenome (62), but the nome (Fig. 1B). In light of its small size and gene-dense nature, it number of AC-consistent reads was given and indicates much is remarkable that large repeats (≥1 kb) cover so much of the lower AC frequency levels than in Viscum.AsinViscum, repeat Viscum genome (16 kb, 24%), because this is a greater absolute length correlates with AC frequency in Mimulus. Finally, for four amount than seen in all but two of the 11 much larger (222- to mitogenomes (361–429 kb in size) analyzed in 983-kb) eudicot mitogenomes compared in ref. 56. Also re- aggregate (63), AC frequency averaged only 0.8% (median = markable is that repeat coverage in Viscum (39%), the smallest 0.7%, range = 0.2–1.5%) for the 13 repeat pairs with a length of known angiosperm mitogenome, is proportionally comparable to 50–100 bp for which at least five ACs were detected. The number that of the largest known mitogenome, that of S. conica (40.8%; of the repeats for which zero to four ACs were recovered was not 4.6 Mb of repeat coverage in a 11.3-Mb genome) (9). In contrast, reported, and therefore AC frequency again is markedly lower the also enormous (6.7-Mb) mitogenome of S. noctiflora—which than in Viscum. has a high mutation rate—contains a relatively modest pro- Only four of the 10 repeat pairs in Viscum with a length ≥100 bp portion of repetitive sequences (10.9%) compared with Viscum are sufficiently short (387–593 bp) to allow ACs to be detected

Skippington et al. PNAS Early Edition | 7of10 Downloaded by guest on September 24, 2021 given the library-fragment size of ∼800 bp. With AC frequencies of The Viscum findings have begun to illuminate patterns and modes 39–58%, all four repeat pairs are essentially at recombinational of evolution of angiosperm mitogenomes that previously were un- equilibrium (i.e., a 50:50 ratio of reference configurations and ACs). recognized, raising deep questions about the diversity of gene To our knowledge, these are the smallest repeats at recombina- landscapes, the effect of mutational environments on genomic con- tional equilibrium in plant mitogenomes. As with the short repeats, traction and expansion, and which genes, if any, are truly essential in these data are in striking contrast to the extremely low level of re- angiosperm mitogenomes. We expect that greatly expanded sam- combination detected for the great majority of comparably sized pling of the speciose and diverse Santalales, as well as the many repeats in S. conica, S. noctiflora, and cucumber (9, 60). Repeats of other lineages of parasitic angiosperms, will provide significant in- this size either are not present or were not analyzed in Vigna, sights into the evolution and function of parasitism in angiosperms, Mimulus,andthefourS. vulgaris mitogenomes (61–63). uncover an increasingly fascinating world of mitochondrial diversity, Although the Viscum mitogenome assembles as two contigs, and shed light on the origins of the massive amount of foreign an- the extraordinary abundance and diversity of recombinant read giosperm mtDNA present in the Amborella mitogenome. pairs means that this configuration is highly unlikely to exist in vivo. [This consideration is quite apart from the issue of whether Materials and Methods any plant mitogenome assembly actually corresponds to the in Assembly. The Viscum mitogenome assembly was generated using 100-bp vivo arrangement of the genome (64–66).] Instead, within an paired-end Illumina HiSeq2000 reads (with 800-bp fragment size) from leaf- individual and possibly even at the level of a single mitochondrion, derived total genomic DNA (SNP 15591; collected in Sabah, Malaysia, Hamadon, kg. Bundu Tuhan, 1,300 m, by T.J.B., September 22, 2000). The relatively small the Viscum genome almost certainly exists as an extremely com- fraction of reads (<1%) corresponding to the mitochondrial genome was plicated population of recombinationally interrelated, overlapping extracted from the pool of 82.5 million total genomic reads by iteratively molecules of a broad range of sizes and stoichiometries. Although extending contigs. Reads determined to be similar to known angiosperm mito- the numerous sublimons detected in this study presumably all ulti- chondrial genes using BLASTN (67) were used as seeds. These reads then were mately arose from repeat-mediated recombination, the extent to blasted against all reads, and reads with no or few high-quality base conflicts in which this recombination is actively ongoing is unclear and almost thealignedregionandwhichoverhungattheendwereusedtoextenda certainly varies among repeats, with ongoing—andalsohigh-fre- pseudo contig. The ends of nascent contigs were blasted again and extended quency/equilibrating—recombination a safe inference for only the until no further extension was obtained. To identify identical or highly similar reads, megablast was used with a word size of 32. Pools of reads derived from 10 large repeat pairs. In contrast, as a consequence of occasional these iterations were assembled using CAP3 (68) with the following parameter to frequent ongoing repeat-mediated recombination, some of the settings: a = 11, b = 16, d = 21, e = 11, o = 50, f = 2, h = 500, i = 60, j = 80, s = sublimons associated with the 127 short repeats almost certainly 1,800, and t = 300. The majority of the contig ends corresponded to repeats that exist in forms independent of the main genome. Indeed, many of could be resolved by read depth, read-pair conflicts, and identification of repeat the reads indicative of sublimons have polymorphisms relative to boundaries for which a large portion of reads were chimeric outside the repeat the reference assembly, suggesting that many regions are evolving region. Repeat sequences were identified by comparing the mitogenome to independently of the predominant forms of the genome. The more itself using ungapped BLASTN with the parameter settings xdrop_gap = 10 and = ≤ abundant ACs may even represent multiple independently arisen xdrop_gap_final 10. All alignments with an e-value 1e-7 were considered and replicating sublimons. repeats for calculations of repeat number. The repeat map (Fig. 1B) was gen- erated with Circos (69). All 10 large (387- to 5,439-bp) repeat pairs in Viscum show The final assembly contained two contigs, a 42,186-bp contig that could be either 100% identity or, in two cases, 99.9% identity (SI Ap- rationalized into a circle based on read-pair information and a 23,687-bp kb pendix, Table S1). This level of identity is yet another way in contig that was terminated by repeats that matched within the 42-kb contig which the highly divergent Viscum genome differs dramatically and whose lengths could not be determined using read-depth and repeat- from the also divergent S. noctiflora and S. conica mitogenomes: In boundary information. In this sense, therefore, the Viscum mitogenome S. noctiflora and S. conica only a tiny fraction of the hundreds to assembly (Fig. 1A) is incomplete. These repeats may be relatively long and thousands of repeats, respectively, with a length of 400–2,000 bp are possibly overlap with other repeats, making their boundaries difficult to 100% identical (most are 75–90% identical) (9). These differences identify. The read depth adjacent to the corresponding repeat regions in the imply much higher rates of gene conversion/concerted evolution in 42-kb contig is too variable to make a clear judgment as to precisely where the terminal repeats in the 24-kb contig end. The contigs we present are well Viscum than in these two Silene species. The greatly reduced levels supported in terms of high-quality read depth and read-pair depth, with the of both homologous recombination (as measured by AC frequency) exception of a low-depth region around 16.9 kb in the 42-kb contig (SI and gene conversion in these two Silene genomes were hypothesized Appendix, Fig. S1). This region corresponds to a palindromic sequence. The to be potentially important factors underlying the highly elevated relative positions of forward- and reverse-oriented reads mapping in this mutation rates in these genomes (9). Therefore it is all the more region suggest that the palindrome may have interfered with the paired- intriguing that, compared with these other high-mutation-rate end amplification of sequences containing this palindrome. mitogenomes, Viscum has high levels of these two mechanis- We looked for potentially missing mitochondrial sequences by using reads tically related recombinational processes. that map to the assembly but whose read-pair mates do not (singlets). Al- though we identified many singlet reads, using their mate pairs as iterative Conclusion and Outlook assembly seeds (see above) did not produce any high-depth contigs or contigs that were part of the main mitochondrial genome. On the basis of poly- The mitogenome of the hemiparasitic mistletoe V. scurruloideum morphisms and homology to sequences in the GenBank nt/nr database, we has proven to be an unexpected goldmine of surprises, setting infer that the majority of these singlets probably correspond to mitochon- new benchmarks for the extremes of genome size, gene content, drial-like sequences in the nucleus. It is conceivable that some nongenic or protein divergence, and recombinational activity and/or sub- highly divergent genic regions of the genome are missing from our assem- limon levels in angiosperm mtDNA. The gene repertoire of the bly, but, given its gene density (Fig. 1A) and the high number of singlet Viscum mitogenome may be so specialized to parasitic life that, partners that did not yield mitochondrial-like contigs, this notion is unlikely. unlike all other multicellular organisms investigated so far, it can × survive without a mitochondrial-encoded respiratory complex I. Coverage Estimation. Nuclear genome coverage was estimated at 0.1 by Recently the intracellular parasitic fungus Rozella allomycis also using the Lander/Waterman equation and conservatively assuming V. scur- ruloideum to have the same nuclear genome size (2C = 124.6) as Viscum was found to harbor a very rapidly evolving genome that lacks minimum, which has the smallest nuclear genome of the four Viscum species complex I of the respiratory chain (29). This discovery raises the examined thus far (16). tantalizing possibility that shared genomic signatures of parasit- ism may be present in mitogenomes both within and beyond the Gene Annotation. Contig sequences were compared by BLASTX v. 2.2.28 (67) bounds of the angiosperm clade. to mitochondrial protein sequences from 55 complete green algal and land plant

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al. Downloaded by guest on September 24, 2021 mitogenomes. Gene boundaries were annotated based on inspection of align- Table S3) or predicted to undergo RNA editing in V. scurruloideum (SI Appendix, PNAS PLUS ments satisfying e-value <0.001. The locations of rRNA genes were determined Tables S2 and S3) were excluded from all nucleotide analyses. rRNA gene se- similarly by BLASTN searches against the rRNA gene sequences of these ge- quences were aligned similarly using MAFFT L-INS-I, but at the nucleotide level. nomes. tRNA genes were predicted using tRNAscan version 1.23 (70). Ambiguously aligned regions of the rRNA gene alignments were removed using GBLOCKS version 0.91b, with the following parameter settings: t = c, b1 = 11, RNA Editing Sites. C-to-U RNA editing sites were predicted using PREP- b2 = 16, b3 = 8, b4 = 5, and b5 = none (75). These stringent settings remove Mitochondrial (71) with stringency settings (C-values) of 0.2, 0.6, and 1.0 and large fractions of most alignments. All phylogenetic trees were estimated with PREPACT v. 2.12.3 (72) with default parameter settings. Viscum results are RAxML v. 7.2.8 (76) using the generalized time-reversible (GTR) model with reported (SI Appendix, Table S2) for multiple stringency settings because gamma correction for among-site rate variation and 10 starting trees. Support previous analyses in Silene have suggested that different C-value settings are for nodes was assessed using 1,000 bootstrap replicates. The ancestral sequences appropriate for species with different rates of sequence evolution (9). All shown in SI Appendix,Figs.S9,S10,andS11were reconstructed using baseml results reported in the main text were obtained using PREP-Mitochondrial and codeml in the PAML package (77). with C-value = 0.2. The predicted editing sites probably underestimate the true extent of editing within Viscum, because the PREP-Mitochondrial ap- Evolutionary Rate Estimation. PAML’s codeml (77) was used to estimate proach is blind to synonymous editing sites. branch dS and dN, using a simplified Goldman–Yang codon model with separate branch dN/dS ratios (ω) that allowed for the following three sets of Phylogenetic Analysis. Protein sequences were extracted from GenBank and branches: the Viscum branch; the S. conica and S. noctiflora branches; and all aligned using MAFFT v. 7.130b (73) with the L-INS-I option. The resulting amino remaining branches. Codon frequencies were computed using the F3 × 4 acid alignments then were computationally converted to nucleotide alignments method. The values for ω and transition/transversion ratio were estimated using PAL2NAL (74), with the resulting arrangement of nucleotide triplets from the data with start values of 0.4 and 2, respectively. reflecting the protein alignment in the case of each gene set. To avoid bias in phylogenetic analysis and evolutionary rate estimation arising from C-to-U ACKNOWLEDGMENTS. We thank Andy Alverson and Virginia Sanchez-Puerta RNA editing, codons known to undergo mitochondrial RNA editing in at least for providing data used in certain analyses and Andy Alverson and Dan Sloan for one of eight angiosperms (A. thaliana, Beta vulgaris, B. napus, Citrullus lanatus, critical reading of the manuscript. This work was supported in part by National Cucurbita pepo, Oryza sativa, Silene latifolia,andVitis vinifera)(SI Appendix, Science Foundation Award IOS 1027529 (to J.D.P.).

1. Westwood JH, Yoder JI, Timko MP, dePamphilis CW (2010) The evolution of parasitism 22. Wahleithner JA, MacFarlane JL, Wolstenholme DR (1990) A sequence encoding a in plants. Trends Plant Sci 15(4):227–235. maturase-related protein in a group II intron of a plant mitochondrial nad1 gene. 2. Barkman TJ, et al. (2007) Mitochondrial DNA suggests at least 11 origins of parasitism Proc Natl Acad Sci USA 87(2):548–552.

in angiosperms and reveals genomic chimerism in parasitic plants. BMC Evol Biol 7: 23. Brown GG, Colas des Francs-Small C, Ostersetzer-Biran O (2014) Group II intron EVOLUTION 248. splicing factors in plant mitochondria. Front Plant Sci 5:35. 3. Heide-Jørgensen HS (2008) Parasitic Flowering Plants (Koninklijke Brill NV, Leiden, 24. Rurek M (2008) Proteins involved in maturation pathways of plant mitochondrial and The Netherlands). plastid c-type cytochromes. Acta Biochim Pol 55(3):417–433. 4. Molina J, et al. (2014) Possible loss of the chloroplast genome in the parasitic flow- 25. Daley DO, Clifton R, Whelan J (2002) Intracellular gene transfer: Reduced hydro- – ering plant Rafflesia lagascae (Rafflesiaceae). Mol Biol Evol 31(4):793 803. phobicity facilitates gene transfer for subunit 2 of cytochrome c oxidase. Proc Natl 5. Wicke S, et al. (2013) Mechanisms of functional and physical genome reduction in Acad Sci USA 99(16):10510–10515. photosynthetic and nonphotosynthetic parasitic plants of the broomrape family. 26. Daley DO, Whelan J (2005) Why genes persist in organelle genomes. Genome Biol Plant Cell 25(10):3711–3725. 6(5):110. 6. Schelkunov MI, et al. (2015) Exploring the limits for reduction of plastid genomes: A 27. Saraste M (1999) Oxidative phosphorylation at the fin de siècle. Science 283(5407): case study of the mycoheterotrophic orchids Epipogium aphyllum and Epipogium 1488–1493. roseum. Genome Biol Evol 7(4):1179–1191. 28. Gabaldón T, Rainey D, Huynen MA (2005) Tracing the evolution of a large protein 7. Mower JP, Jain K, Hepburn NJ (2012) The role of horizontal transfer in shaping the complex in the eukaryotes, NADH:ubiquinone oxidoreductase (Complex I). J Mol Biol plant mitochondrial genome. Advances in Botanical Research, ed Maréchal-Drouard L 348(4):857–870. (Academic, New York), Vol 63: Mitochondrial Genome Evolution,pp41–69. 29. James TY, et al. (2013) Shared signatures of parasitism and phylogenomics unite 8. Xi Z, et al. (2013) Massive mitochondrial gene transfer in a parasitic – clade. PLoS Genet 9(2):e1003265. Cryptomycota and microsporidia. Curr Biol 23(16):1548 1553. 9. Sloan DB, et al. (2012) Rapid evolution of enormous, multichromosomal genomes in 30. Flegontov P, et al. (2015) Divergent mitochondrial respiratory chains in phototrophic – flowering plant mitochondria with exceptionally high mutation rates. PLoS Biol 10(1): relatives of apicomplexan parasites. Mol Biol Evol 32(5):1115 1131, 10.1093/molbev/ e1001241. msv021. 10. Mower JP, Sloan DB, Alverson AJ (2012) Plant mitochondrial diversity—the genomics 31. Lang BF, Gray MW, Burger G (1999) Mitochondrial genome evolution and the origin revolution. Plant Genome Diversity, eds Wendel JF, Greilhuber J, Dolezel, I Leitch IJ of eukaryotes. Annu Rev Genet 33:351–397. (Springer, Berlin), pp 123–144. 32. Popot JL, de Vitry C (1990) On the microassembly of integral membrane proteins. 11. Der JP, Nickrent DL (2008) A molecular phylogeny of Santalaceae (Santalales). Syst Bot Annu Rev Biophys Biophys Chem 19:369–403. 33(1):107–116. 33. Claros MG, et al. (1995) Limitations to in vivo import of hydrophobic proteins into 12. Nickrent DL, Malécot V, Vidal-Russell R, Der JP (2010) A revised classification of yeast mitochondria. The case of a cytoplasmically synthesized apocytochrome b. Eur J Santalales. Taxon 59(2):538–558. Biochem 228(3):762–771. 13. Nickrent DL, Der JP, Anderson FE (2005) Discovery of the photosynthetic relatives of 34. Marcet-Houben M, Marceddu G, Gabaldón T (2009) Phylogenomics of the oxidative the “Maltese mushroom” Cynomorium. BMC Evol Biol 5:38. phosphorylation in fungi reveals extensive gene duplication followed by functional 14. Mathiasen RL, Nickrent DL, Shaw DC, Watson DM (2008) Mistletoes: Pathology, sys- divergence. BMC Evol Biol 9:295. tematics, ecology, and management. Plant Dis 92(7):988–1006. 35. Gutierres S, et al. (1997) Lack of mitochondrial and nuclear-encoded subunits of 15. Rice DW, et al. (2013) Horizontal transfer of entire genomes via mitochondrial fusion complex I and alteration of the respiratory chain in Nicotiana sylvestris mitochondrial – in the angiosperm Amborella. Science 342(6165):1468 1473. deletion mutants. Proc Natl Acad Sci USA 94(7):3436–3441. 16. Zonneveld BJM (2010) New record holders for maximum genome size in eudicots and 36. Pineau B, Mathieu C, Gérard-Hirne C, De Paepe R, Chétrit P (2005) Targeting the monocots. J Bot, 10.1155/2010/527357. NAD7 subunit to mitochondria restores a functional complex I and a wild type phe- 17. Adams KL, Qiu YL, Stoutemyer M, Palmer JD (2002) Punctuated evolution of mito- notype in the Nicotiana sylvestris CMS II mutant lacking nad7. J Biol Chem 280(28): chondrial gene content: High and variable rates of mitochondrial gene loss and 25994–26001. transfer to the nucleus during angiosperm evolution. Proc Natl Acad Sci USA 99(15): 37. Meyer EH, et al. (2009) Remodeled respiration in ndufs4 with low phosphorylation 9905–9912. efficiency suppresses Arabidopsis germination and growth and alters control of me- 18. Adams KL, Palmer JD (2003) Evolution of mitochondrial gene content: Gene loss and – transfer to the nucleus. Mol Phylogenet Evol 29(3):380–395. tabolism at night. Plant Physiol 151(2):603 619. 19. Richardson AO, Rice DW, Young GJ, Alverson AJ, Palmer JD (2013) The “fossilized” 38. Karpova OV, Newton KJ (1999) A partially assembled complex I in NAD4-deficient – mitochondrial genome of Liriodendron tulipifera: Ancestral gene content and order, mitochondria of maize. Plant J 17(5):511 521. ancestral editing sites, and extraordinarily low mutation rate. BMC Biol 11:29. 39. Marienfeld JR, Newton KJ (1994) The maize NCS2 abnormal growth mutant has a 20. Kim S, Yoon MK (2010) Comparison of mitochondrial and chloroplast genome seg- chimeric nad4-nad7 mitochondrial gene and is associated with reduced complex I ments from three onion (Allium cepa L.) cytoplasm types and identification of a trans- function. Genetics 138(3):855–863. splicing intron of cox2. Curr Genet 56(2):177–188. 40. Sabar M, De Paepe R, de Kouchkovsky Y (2000) Complex I impairment, respiratory 21. Hecht J, Grewe F, Knoop V (2011) Extreme RNA editing in coding islands and abun- compensations, and photosynthetic decrease in nuclear and mitochondrial male dant microsatellites in repeat sequences of Selaginella moellendorffii mitochondria: sterile mutants of Nicotiana sylvestris. Plant Physiol 124(3):1239–1250. The root of frequent plant mtDNA recombination in early tracheophytes. Genome 41. Rasmusson AG, Soole KL, Elthon TE (2004) Alternative NAD(P)H dehydrogenases of Biol Evol 3:344–358. plant mitochondria. Annu Rev Plant Biol 55:23–39.

Skippington et al. PNAS Early Edition | 9of10 Downloaded by guest on September 24, 2021 42. Yamato KT, Newton KJ (1999) Heteroplasmy and homoplasmy for maize mitochon- 60. Alverson AJ, Rice DW, Dickinson S, Barry K, Palmer JD (2011) Origins and re- drial mutants: A rare homoplasmic nad4 deletion mutant plant. J Hered 90(3): combination of the bacterial-sized multichromosomal mitochondrial genome of cu- 369–373. cumber. Plant Cell 23(7):2499–2513. 43. Noctor G, Dutilleul C, De Paepe R, Foyer CH (2004) Use of mitochondrial electron 61. Naito K, Kaga A, Tomooka N, Kawase M (2013) De novo assembly of the complete transport mutants to evaluate the effects of redox state on photosynthesis, stress organelle genome sequences of azuki bean (Vigna angularis) using next-generation tolerance and the integration of carbon/nitrogen metabolism. J Exp Bot 55(394): sequencers. Breed Sci 63(2):176–182. 49–57. 62. Mower JP, Case AL, Floro ER, Willis JH (2012) Evidence against equimolarity of large 44. Dutilleul C, et al. (2003) Functional mitochondrial complex I is required by tobacco repeat arrangements and a predominant master circle structure of the mitochondrial leaves for optimal photosynthetic performance in photorespiratory conditions and genome from a monkeyflower (Mimulus guttatus) lineage with cryptic CMS. Genome during transients. Plant Physiol 131(1):264–275. Biol Evol 4(5):670–686. 45. Hollinger DY (1983) Photosynthesis and water relations of the mistletoe, Phoraden- 63. Sloan DB, Müller K, McCauley DE, Taylor DR, Storchová H (2012) Intraspecific variation dron villosum, and its host, the California calley oak, Quercus lobata. Oecologia 60(3): in mitochondrial genome sequence, structure, and gene content in Silene vulgaris,an 396–400. angiosperm with pervasive cytoplasmic male sterility. New Phytol 196(4):1228–1239. 46. Ehleringer JR, Cook CS, Tieszen LL (1986) Comparative water-use and nitrogen re- 64. Sloan DB (2013) One ring to rule them all? Genome sequencing provides new insights lationships in a mistletoe and its host. Oecologia 68(2):279–284. into the ‘master circle’ model of plant mitochondrial DNA structure. New Phytol 47. Johnson JM, Choinski JS (1993) Photosynthesis in the Tapinanthus-Diplorhynchus 200(4):978–985. mistletoe host relationship. Ann Bot (Lond) 72(2):117–122. 65. Bendich AJ (1993) Reaching for the ring: The study of mitochondrial genome struc- 48. Marshall JD, Ehleringer JR, Schulze ED, Farquhar G (1994) Carbon-isotope composi- ture. Curr Genet 24(4):279–290. tion, gas-exchange and heterotrophy in Australian mistletoes. Funct Ecol 8(2): 66. Backert S, Börner T (2000) Phage T4-like intermediates of DNA replication and re- 237–241. combination in the mitochondria of the higher plant Chenopodium album (L.). Curr 49. Hao W (2010) OrgConv: Detection of gene conversion using consensus sequences and Genet 37(5):304–314. its application in plant mitochondrial and chloroplast homologs. BMC Bioinformatics 67. Camacho C, et al. (2009) BLAST+: Architecture and applications. BMC Bioinformatics 11:114. 10:421. 50. Cho Y, Qiu YL, Kuhlman P, Palmer JD (1998) Explosive invasion of plant mitochondria 68. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res by a group I intron. Proc Natl Acad Sci USA 95(24):14244–14249. 9(9):868–877. 51. Sanchez-Puerta MV, Cho Y, Mower JP, Alverson AJ, Palmer JD (2008) Frequent, 69. Krzywinski M, et al. (2009) Circos: An information aesthetic for comparative geno- phylogenetically local horizontal transfer of the cox1 group I Intron in flowering mics. Genome Res 19(9):1639–1645. plant mitochondria. Mol Biol Evol 25(8):1762–1777. 70. Lowe TM, Eddy SR (1997) tRNAscan-SE: A program for improved detection of transfer 52. Kimura M (1984) The Neutral Theory of Molecular Evolution (Cambridge Univ Press, RNA genes in genomic sequence. Nucleic Acids Res 25(5):955–964. Cambridge, UK). 71. Mower JP (2009) The PREP suite: Predictive RNA editors for plant mitochondrial 53. Zhu A, Guo W, Jain K, Mower JP (2014) Unprecedented heterogeneity in the syn- genes, chloroplast genes and user-defined alignments. Nucleic Acids Res 37(Web onymous substitution rate within a plant genome. Mol Biol Evol 31(5):1228–1236. Server issue):W253-9. 54. Mower JP, Touzet P, Gummow JS, Delph LF, Palmer JD (2007) Extensive variation in 72. Lenz H, Knoop V (2013) PREPACT 2.0: Predicting C-to-U and U-to-C RNA editing in synonymous substitution rates in mitochondrial genes of seed plants. BMC Evol Biol organelle genome sequences with multiple references and curated RNA editing an- 7:135. notation. Bioinform Biol Insights 7:1–19. 55. Lynch M, Koskella B, Schaack S (2006) Mutation pressure and the evolution of or- 73. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: ganelle genomic architecture. Science 311(5768):1727–1730. Improvements in performance and usability. Mol Biol Evol 30(4):772–780. 56. Alverson AJ, Zhuo S, Rice DW, Sloan DB, Palmer JD (2011) The mitochondrial genome 74. Suyama M, Torrents D, Bork P (2006) PAL2NAL: Robust conversion of protein se- of the legume Vigna radiata and the analysis of recombination across short mito- quence alignments into the corresponding codon alignments. Nucleic Acids Res chondrial repeats. PLoS ONE 6(1):e16404. 34(Web Server issue):W609-12. 57. Small ID, Isaac PG, Leaver CJ (1987) Stoichiometric differences in DNA molecules 75. Castresana J (2000) Selection of conserved blocks from multiple alignments for their containing the atpA gene suggest mechanisms for the generation of mitochondrial use in phylogenetic analysis. Mol Biol Evol 17(4):540–552. genome diversity in maize. EMBO J 6(4):865–869. 76. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analy- 58. Small I, Suffolk R, Leaver CJ (1989) Evolution of plant mitochondrial genomes via ses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690. substoichiometric intermediates. Cell 58(1):69–76. 77. Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 59. Woloszynska M (2010) Heteroplasmy and stoichiometric complexity of plant mito- 24(8):1586–1591. chondrial genomes—though this be madness, yet there’s method in’t. J Exp Bot 61(3): 78. Stevens PF (2014) Angiosperm Phylogeny Website. Available at www.mobot.org/ 657–671. mobot/research/apweb/welcome.html. Accessed March 1, 2015.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1504491112 Skippington et al. Downloaded by guest on September 24, 2021