<<

Mitochondrial substitution rates are extraordinarily elevated and variable in a genus of flowering

Yangrae Cho†‡, Jeffrey P. Mower†, Yin-Long Qiu§, and Jeffrey D. Palmer¶

Department of Biology, Indiana University, Bloomington, IN 47405-3700

Contributed by Jeffrey D. Palmer, November 8, 2004

Plant mitochondrial (mt) genomes have long been known to evolve We now report unprecedented levels and variation in mt RS, slowly in sequence. Here we show remarkable departure from this much of which is confined to a single genus of flowering plants pattern of conservative evolution in a genus of flowering plants. (, a large group of cosmopolitan weeds). Moreover, Substitution rates at synonymous sites vary substantially among unlike previous cases of (modest) rate variation in mito- lineages within Plantago. At the extreme, rates in Plantago exceed chondria, rate variation in Plantago is specific to the mt genome. those in exceptionally slow plant lineages by Ϸ4,000-fold. The Radical and reversible changes in the underlying mt mutation fastest Plantago lineages set a new benchmark for rapid evolution rate are probably responsible for this rate variation. in a DNA genome, exceeding even the fastest animal mt genome by an order of magnitude. All six mt genes examined show Materials and Methods similarly elevated divergence in Plantago, implying that substitu- Most Plantago DNAs were provided by the Royal Botanic tion rates are highly accelerated throughout the genome. In Gardens (Kew, U.K.). All other DNAs are as described in ref. 24. contrast, substitution rates show little or no elevation in Plantago PCR products were generated, purified, and sequenced as for each of four chloroplast and three nuclear genes examined. described in refs. 25 and 26. Taxon names and GenBank These results, combined with relatively modest elevations in rates accession numbers for all sequences used in this study are listed of nonsynonymous substitutions in Plantago mt genes, indicate in Tables 2–4, which are published as supporting information on EVOLUTION that major, reversible changes in the mt mutation rate probably the PNAS web site. PCR primer sequences and aligned data sets underlie the extensive variation in synonymous substitution rates. are available upon request. These rate changes could be caused by major changes in any Phylogenetic relationships within (see Fig. 2A) number of factors that control the mt mutation rate, from the were determined from a 4,730-nt data set consisting of portions of production and detoxification of oxygen free radicals in the mito- four chloroplast regions (ndhF, rbcL, and intergenic spacers atpB͞ chondrion to the efficacy of mt DNA replication and͞or repair. rbcL and trnL͞trnF). Relationships within Plantago subgenus Plan- tago (see Fig. 2B) were analyzed from a 9,845-nt data set containing genome evolution ͉ plant mitochondria ͉ Plantago ͉ rate two additional chloroplast regions (intergenic spacers psaA͞trnS variation ͉ synonymous substitution rates and trnC͞trnD). Maximum likelihood (ML) trees were constructed with PAUP* by using the general time-reversible model, a gamma n a pioneering study 25 years ago, Brown et al. (1) discovered that distribution with four rate categories, and an estimate of the Iprimate mitochondrial (mt) DNA evolves rapidly at the sequence proportion of invariant sites. The rate matrix, base frequencies, level compared with nuclear DNA. With rare exception (2), most shape of the gamma distribution, and proportion of invariant sites animal mt DNAs have been found to evolve rapidly in sequence were estimated before the ML analysis from a neighbor-joining tree (3–6). Rapid mt evolution may be the rule in other groups of constructed from the data. eukaryotes, although this conclusion must be tempered by the Divergence times outside Plantaginaceae were taken from ref. scanty data and distant comparisons available for most groups (7, 27. Those within the family were calculated by using a penalized 8). Plants are the most glaring exception to the general rule that mt likelihood approach (28) as implemented in the R8S program (29) DNA evolves rapidly in sequence. In 1987, Wolfe et al. (9) showed and a time constraint of 48 million years (27) for the Antirrhinum͞ that rates of synonymous substitution in angiosperm mt genes are Plantago split. The ML tree shown in Fig. 2A was used as the anomalously low, a few-fold lower than in chloroplast genes, Ϸ10- starting tree for the divergence time analysis shown in Fig. 2C. For to 20-fold lower than in nuclear genes of both angiosperms and Fig. 2D, the starting tree was constructed by first constraining the mammals, and Ϸ50- to 100-fold lower than in mammalian mt genes. taxa in the 4,730-nt data set to incorporate the alternative relation- A year later, Palmer and Herbon (10) extended the inference of low ships within subgenus Plantago from Fig. 2B and then estimating rates of sequence change to the entire plant mt genome (most of branch lengths for this topology in PAUP*. A smoothing factor of which is noncoding) and showed that rates of sequence and structural evolution are dramatically uncoupled in plant mt DNA. All subsequent studies have confirmed that nucleotide substitu- Abbreviations: mt, mitochondrial; ML, maximum likelihood; SSB, substitutions per site per tion rates are in general quite low in land plant mt genomes (11, 12). billion years; dN, substitutions per nonsynonymous site; dS, substitutions per synonymous site; R , synonymous substitution rate; SSU, small subunit; rDNA, rRNA-encoding DNA. At the same time, moderate variation in synonymous substitution S Data deposition: The nucleotide sequences reported in this paper have been deposited in rates (RS) (up to 7-fold) has been found in comparing several groups the GenBank database (accession nos. AJ389588–AJ389621 and AY818897–AY818951). of plants (13–16). In most cases, correlated rate changes are seen for †Y.C. and J.P.M. contributed equally to this work. chloroplast and͞or nuclear genes. Forces operating across the two ‡Present address: Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State organelle genomes or all three genomes, such as paternal trans- University, Blacksburg, VA 24061. mission of organelles or generation-time effects, respectively, have §Present address: Department of Ecology and Evolutionary Biology, University of Michigan, been invoked to explain these patterns, whereas a shift to hetero- Ann Arbor, MI 48109-1048. trophy has been invoked to explain correlated increases in rRNA ¶To whom correspondence should be addressed at: Department of Biology, Indiana Uni- substitution rates in parasitic plants (17). Phylogenetic studies versity, Jordan Hall 142, 1001 East Third Street, Bloomington, IN 47405-3700. E-mail: suggest rate variation in other groups of land plants (18–23), [email protected]. although these studies have lacked a quantitative focus. © 2004 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0408302101 PNAS ͉ December 21, 2004 ͉ vol. 101 ͉ no. 51 ͉ 17741–17746 Downloaded by guest on September 24, 2021 three was determined by using the R8S cross-validation procedure. substitution rates were calculated for a given tree branch by Different starting points of initial age estimates and reanalysis after dividing its synonymous branch length by the estimated diver- perturbation of the final age estimates had no effect on the results. gence time along that branch. Divergence times within Plantagi- Standard errors for each time were calculated by rerunning the naceae were calculated by using two different phylogenetic divergence time analyses on 500 bootstrapped data sets as described estimates for the group owing to the uncertain placement of P. in ref. 29. media, which shows the greatest mt divergence (Fig. 1). The first Branch lengths for the gene trees in Fig. 1 and Fig. 3, which is estimate was based on a 4,730-nt chloroplast data set. Apart from published as supporting information on the PNAS web site, were Plantago subgenus Plantago, and especially the placement of P. estimated by using PAML 3.13D (30) and topologically constrained media, relationships are well supported in this tree (Fig. 2A) and trees. Relationships within Plantaginaceae were constrained ac- in good agreement with previous molecular studies of the family cording to the analyses shown in Fig. 2 A and B. For all other taxa, (34, 35). To achieve more robust placement of P. media, the topologies were constrained according to ref. 31. Branch lengths, number of chloroplast characters was doubled (to 9,845 nt) for representing the number of substitutions per synonymous site (dS) subgenus Plantago and two outgroups. This increase in chloro- or number of substitutions per nonsynonymous site (dN), were plast characters gave a topology (Fig. 2B) in which P. media is determined for protein genes by using a simplified Goldman–Yang sister to P. rigida but with bootstrap support that is equally as low (GY94) codon model (32) with separate dN͞dS ratios (␻) for each (53% versus 52%) as its placement in Fig. 2A as sister to the branch. Codon frequencies were computed by using the F3 ϫ 4 other three species of subgenus Plantago. The failure to resolve method (32). The transition͞transversion rate ratio (␬) and dN͞dS relationships within subgenus Plantago by using nearly 10,000 nt, ratios were estimated during the analysis with initial values of 2 and together with the short internodes within the subgenus, implies 0.4, respectively. Standard errors for total branch lengths were a rapid radiation for these four species (see Fig. 4 for the reported by PAML, and these values were propagated to calculate single-gene chloroplast trees). standard errors for their corresponding dS and dN branch lengths. All estimates of Plantaginaceae divergence times and most Branch lengths representing total substitutions per site were esti- estimates of substitution rates are similar between the two mated for rRNA genes by using the general time-reversible nucle- analyses (Fig. 2 C and D and Table 1), with the only major otide model and a gamma distribution for rate variation with four differences occurring, as expected, within subgenus Plantago. categories. The rate matrix, nucleotide frequencies, and shape of Consequently, we discuss below only the rate estimates from Fig. the gamma parameter (initial value of 0.5) were estimated during 2 B and D, except where noted otherwise. The absolute rate of the analysis. dS, designated RS, is relatively low in the terminal lineages Absolute RS values per branch were calculated by dividing the leading to Veronica and Digitalis, the sister and near-sister of synonymous branch length by the length of time for that branch. Plantago. For Veronica, RS is 0.28 substitutions per site per billion Standard errors for RS were determined by propagating the errors years (SSB) for cox1 and 0.63 SSB for atp1 (mean RS weighted associated with branch length and time. Additional uncertainties in by gene length ϭ 0.45), whereas the Digitalis values are 0.25 for RS stem from the calibration point used in the divergence time cox1 and 0.09 for atp1 (mean RS ϭ 0.17). These rates are slightly estimates (see Fig. 2; see also Supporting Results, Figs. 3–6, and lower to a few-fold lower than the average rates among the broad Tables 2–8, which are published as supporting information on the array of analyzed in Fig. 1, as averaged among them PNAS web site) and from uncertainty in the phylogenetic position from the tips of the trees to either the common ancestor of of Plantago media (see Results). (mean RS for both genes ϭ 1.0), of (mean RS ϭ 0.73), or of all core eudicots (mean RS ϭ 0.66) (Table 1). In Results striking contrast, synonymous rates are much higher on the stem In a preliminary report (33), we provided evidence of potentially branch leading to Plantago (mean RS ϭ 9.3) and on most rapid mt evolution in one species of Plantago (Plantago rugelii) branches within the genus (Table 1 and Fig. 2 C and D). based on anomalously poor mt hybridization in Southern blots At the extreme, in the topology with P. media sister to P. rigida and partial sequencing of two mt genes. Nearly full-length (Fig. 2D), RS is 244 SSB on the terminal branch leading to P. sequences have now been obtained for six mt genes from P. media. In the alternative topology (Fig. 2C), RS on this branch rugelii. Phylogenetic analysis of synonymous sites for the four is somewhat lower (166), whereas an even more spectacular RS protein genes and of all positions for the two rRNA genes of 736 is estimated on the topology-specific, short E–F internode. illustrates extraordinary divergence of all six P. rugelii genes Compared with the Digitalis and Veronica terminals, the mean RS when compared with each of several other diverse eudicots on the P. media terminal in Fig. 2D is estimated to be 1,400- and examined (Fig. 3). In contrast, sequences from either P. rugelii 540-fold higher, respectively, and that on the superfast E–F or the closely related for four chloroplast genes internode of Fig. 2C is estimated to be 4,300- and 1,600-fold and three nuclear genes show unexceptional divergence com- higher. pared with other eudicots (Fig. 3). Our analyses (Fig. 1 and Fig. 5) show evidence for synonymous To obtain a fuller picture of the tempo and pattern of mt DNA rate variation elsewhere in eudicots beyond the extreme varia- evolution in Plantago and related taxa, nearly complete se- tion seen in Plantaginaceae. Examples include Ϸ43-fold faster quences were determined from eight additional Plantago species evolution in Goodenia relative to its sister Sambucus (see also for three mt genes [atp1, cox1, and small subunit (SSU) rRNA- Table 1) and rapid evolution in Justicia and Ajuga relative to their encoding DNA (rDNA)] and two chloroplast genes (rbcL and respective sisters. The relatively long internodes subtending ndhF). Phylogenetic trees of synonymous and nonsynonymous short termini leading to the crucifers Arabidopsis and Brassica sites (or of all sites for SSU rDNA) were used to illustrate (Fig. 1), a pattern seen for many other genes in these genomes variation in substitution rates. The most notable feature of these (36), raise the possibility of relatively rapid evolution early in this trees (Fig. 1) is the extraordinary divergence within Plantago at group followed by a rate decrease. Exceptionally low RS values synonymous sites for both mt protein genes and at all sites for the over long periods of time are estimated for the ‘‘basal’’ eudicots mt SSU rDNA. Furthermore, the nine Plantago species fall more Trochodendron and Platanus (0.062 and 0.078 SSB, respectively), or less into the same four groups according to their degree of and for Sambucus (0.063 SSB) (Fig. 1 and Table 1). With the enhanced mt sequence divergence (Fig. 1; see also the report on Trochodendron rate as one extreme and the fastest Plantago rate relative rate tests in Supporting Results). as the other, RS among the eudicots examined in this study ranges To quantify the evolution of mt RS within Plantago and on by a factor of 3,900 (for the topology in Fig. 2D) or 12,000 (for selected branches elsewhere in the trees shown in Fig. 1, absolute the topology in Fig. 2C).

17742 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0408302101 Cho et al. Downloaded by guest on September 24, 2021 EVOLUTION

Fig. 1. Substitution rates in Plantago are highly elevated and variable in mt genes compared with chloroplast (cp) and nuclear (nc) genes. Shown are topologically constrained ML trees based on synonymous (dS) or nonsynonymous (dN) sites for protein genes (all drawn to the same scale) or all sites for SSU rDNA. The three mt trees at left differ from the three boxed (Top Inset, Middle Inset, and Bottom Inset) trees only according to the constraints placed on subgenus Plantago (constrained as shown in Fig. 2 B and A, respectively). Note that the picture of highly elevated and variable RS in Plantago evident in these trees was observed for both unconstrained and constrained trees (Fig. 6) indicating that our findings are not an artifact of topological constraints. Taxa within Plantago are color-coded by subgenus. Pl, subgenus Plantago; Co, Coronopus; Ps, Psyllium. Mt gene trees were based on 1,341 (cox1), 1,272 (atp1), and 1,401 (SSU rDNA) nt; chloroplast gene trees were based on 1,317 (rbcL) and 1,674 (ndhF) nt; and the nuclear sut1 tree was based on 1,293 nt.

Cho et al. PNAS ͉ December 21, 2004 ͉ vol. 101 ͉ no. 51 ͉ 17743 Downloaded by guest on September 24, 2021 Fig. 2. Estimates of phylogeny, divergence times, and RS values in the Plantaginaceae. (A and B) ML trees of Plantaginaceae (A) and subgenus Plantago (B) were based on four and six chloroplast regions, respectively. Numbers indicate bootstrap support from 1,000 ML replicates. Seven outgroup species were included in the analysis in A but are not shown. (C and D) Chronograms of Plantaginaceae based on the topology found in A (C), and the topology found in B (D) (see Materials and Methods). Internal nodes are labeled A–J. Values above each branch indicate mean RS in SSB from Table 1 for that branch.

Discussion ability, known to evolve more rapidly. However, even the fastest We have discovered unprecedented variation in rates of mt synon- recorded animal mt DNAs (3, 4) pale at 37 SSB compared with ymous substitution within a genus of flowering plants. To the best the 244 SSB (and possibly even 736 SSB) estimated for the fastest of our knowledge, the greatest previously measured disparity in mt Plantago lineage. Two recent studies have identified lineages of lice (5) and gastropods (6) that may prove to set new speed RS is Ϸ7-fold, in comparisons of grasses and palms (14), whereas a 10- to 30-fold range of overall substitution rates was reported for a records for animal mt DNA, however absolute rates have not yet mt intron in comparisons of diverse monocots and eudicots (15). In been estimated. Plants further eclipse animals in terms of the contrast, we have uncovered on the order of 4,000-fold (and known range of mt substitution rates. Even with the discovery of possibly even 12,000-fold) variation in mt synonymous substitution relatively low mt rates in anthozoans, rates that may be ances- trally low for animals (2), the range of mt RS values among all across eudicots. Most of this range is found within a single family Ϸ (Plantaginaceae) and even within a single genus (Plantago), but metazoans is only 40–100 (2), compared with 4,000 (and significant rate heterogeneity is also evident in a number of other possibly as high as 12,000) among just eudicots. Unlike animal mt DNA, whose generally high substitution rate lineages and comparisons (Fig. 1 and Table 1). is usually accompanied by a large excess of transitions relative to Considering these findings and the number of cases of plant mt transversions (37), there is no evidence for an elevated transi- rate variation already reported (13–17) or indicated by phylogenetic tion͞transversion ratio in Plantago mt DNAs. For cox1 and atp1, analysis (18–23), we suspect that, as more plants are sampled, more this ratio is generally low (Ϸ1.0 in most cases), both within and more evidence will be found for widespread heterogeneity in Plantago and across eudicots, and, if anything, it is somewhat rates of synonymous substitutions among plant mt DNAs. Denser depressed in Plantago, especially for cox1 (Table 5). We have taxon sampling can only reveal more peaks and valleys (now hidden limited evidence (unpublished data) for an elevated rate of by averaging across relatively long, poorly sampled branches) in insertions͞deletions in Plantago mt DNAs. More data are what seems to be an emerging pattern of episodic evolution of mt needed, however, before it can be concluded, as in the case of gene sequences in plants. Discovery of additional cases of dramatic mammal mt and nuclear genomes (38, 39), that rates of point and Plantago-like accelerated evolution can be expected, and indeed, we length mutations are closely coupled. have found a second such case (Geraniaceae, especially Pelargo- nium; ref. 33 and unpublished data). Number and Directionality of Changes in mt Substitution Rates in Rates of synonymous substitutions in chloroplast genes are Plantago. It is clear that the rate of mt synonymous substitutions much less variable than in mt genes of the same plants (Fig. 1). changed at least several times during the evolution of the Plantagi- Chloroplast rates are almost uniformly a few-fold higher than mt naceae (Figs. 1 and 2). How many changes, on what branches, with rates, and only in Plantago (and Pelargonium) do mt rates what magnitude, and in what direction are all less clear issues. significantly exceed chloroplast rates. Thus the long-standing Certainly, the rate must have increased in a common ancestor of generalization (9, 10) that sequence evolution proceeds more Plantago (Fig. 1), but exactly when and at what magnitude cannot slowly in plant mitochondria compared with chloroplasts is still be discerned from the present data. A second rate change is needed true, albeit now with spectacular exception. to account for the difference in rates between subgenus Psyllium and subgenus Plantago (Fig. 2), with the directionality and timing mt Sequence Evolution in Plants Versus Animals. Animal mt DNA of this second change dependent on the magnitude of the initial rate has reigned for 25 years (1) as the exemplar of rapid sequence increase. In one scenario, a very large initial increase late along evolution among cellular genomes, with only certain RNA branch B–C was perpetuated along the branches leading to subge- viruses, replicated by using polymerases that lack proofreading nus Plantago, followed by a rate decrease along the C–H branch

17744 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0408302101 Cho et al. Downloaded by guest on September 24, 2021 Table 1. Absolute synonymous substitution rates (Rs) for atp1 and cox1 atp1 cox1

Branch Time, Myr dS ϫ 100 RS, SSB dN͞dS dS ϫ 100 RS, SSB dN͞dS Mean RS, SSB

Group means Lamiales 64 Ϯ 410Ϯ 4 1.6 Ϯ 0.6 0.095 3.2 Ϯ 2.5 0.50 Ϯ 0.39 0.17 1.0 Asterids 107 Ϯ 510Ϯ 6 0.96 Ϯ 0.58 0.096 5.5 Ϯ 2.7 0.51 Ϯ 0.26 0.13 0.73 Core eudicots 114 Ϯ 5 9.3 Ϯ 6.0 0.81 Ϯ 0.52 0.11 5.8 Ϯ 2.6 0.51 Ϯ 0.23 0.21 0.66 Selected Non-Plantaginaceae Goodenia 94 Ϯ 541Ϯ 5 4.4 Ϯ 0.5 0.033 10 Ϯ 2 1.1 Ϯ 0.2 0.14 2.7 Sambucus 94 Ϯ 5 0.95 Ϯ 0.37 0.10 Ϯ 0.04 0.45 0.25 Ϯ 0.09 0.027 Ϯ 0.010 3.0 0.063 Trochodendron 124 Ϯ 5 0.97 Ϯ 0.43 0.078 Ϯ 0.035 0.23 0.59 Ϯ 0.29 0.048 Ϯ 0.024 0.54 0.062 Platanus 130 Ϯ 6 0.98 Ϯ 0.57 0.075 Ϯ 0.044 0 1.05 Ϯ 0.4 0.081 Ϯ 0.027 0.65 0.078 Plantaginaceae† AtoDigitalis 36 Ϯ 1 0.33 Ϯ 0.19 0.091 Ϯ 0.052 0.64 0.90 Ϯ 0.41 0.25 Ϯ 0.11 0.23 0.17 AtoB 3Ϯ 2 1.4 Ϯ 0.6 5.1 Ϯ 4.0 0.16 0.84 Ϯ 0.55 3.1 Ϯ 2.8 0 4.1 BtoVeronica 33 Ϯ 1 2.1 Ϯ 0.8 0.63 Ϯ 0.24 0.099 0.92 Ϯ 0.48 0.28 Ϯ 0.14 0.11 0.45 BtoC 16Ϯ 223Ϯ 414Ϯ 3 0.037 7.8 Ϯ 1.6 4.7 Ϯ 1.1 0.13 9.3 CtoD 3Ϯ 114Ϯ 452Ϯ 29 0.045 17 Ϯ 564Ϯ 35 0.016 58 DtoP. coronopus 14 Ϯ 159Ϯ 841Ϯ 6 0.022 15 Ϯ 410Ϯ 3 0.022 25 DtoE 10Ϯ 176Ϯ 973Ϯ 11 0.022 99 Ϯ 29 95 Ϯ 29 0.034 84 FtoP. media 3.6 Ϯ 0.4 77 Ϯ 9 215 Ϯ 35 0.014 97 Ϯ 11 271 Ϯ 45 0.030 244 FtoP. rigida 3.6 Ϯ 0.4 0 Ϯ 0 0 — 2.2 Ϯ 1.7 6.3 Ϯ 4.7 0.14 3.2 GtoP. australis 3.7 Ϯ 0.4 1.2 Ϯ 0.6 3.2 Ϯ 1.7 0.36 0.31 Ϯ 0.22 0.84 Ϯ 0.60 0.35 2.0 GtoP. rugelii 3.7 Ϯ 0.4 3.1 Ϯ 1.2 8.3 Ϯ 3.4 0 0.29 Ϯ 0.08 0.79 Ϯ 0.24 4.8 4.4 CtoH 5.7Ϯ 1.1 7.3 Ϯ 2.8 13 Ϯ 6 0.044 3.8 Ϯ 1.9 6.7 Ϯ 3.5 0.026 9.7

HtoI 4.8Ϯ 0.9 10 Ϯ 221Ϯ 6 0.10 1.2 Ϯ 0.6 2.6 Ϯ 1.3 0.081 12 EVOLUTION ItoP. atrata 6.4 Ϯ 0.6 13 Ϯ 221Ϯ 4 0.018 1.6 Ϯ 0.6 2.4 Ϯ 0.9 0.22 11 ItoP. lanceolata 6.4 Ϯ 0.6 5.9 Ϯ 1.6 9.2 Ϯ 2.7 0 3.2 Ϯ 1.0 5.0 Ϯ 1.5 0.084 7.0 JtoP. sempervirens 11 Ϯ 1 5.3 Ϯ 1.5 4.9 Ϯ 1.4 0.023 2.0 Ϯ 0.6 1.8 Ϯ 0.6 0.19 3.3 JtoP. sericea 11 Ϯ 1 9.2 Ϯ 1.9 8.5 Ϯ 1.8 0.038 2.4 Ϯ 0.9 2.2 Ϯ 0.8 0.057 5.3 Subgenus Plantago‡ DtoE 10Ϯ 162Ϯ 960Ϯ 11 0.023 63 Ϯ 15 61 Ϯ 16 0.037 60 EtoP. media 3.8 Ϯ 0.5 64 Ϯ 8 168 Ϯ 30 0.009 62 Ϯ 10 163 Ϯ 33 0.029 166 EtoF 0.3Ϯ 0.6 14 Ϯ 6 435 Ϯ 886 0.031 33 Ϯ 9 1,020 Ϯ 2,053 0.035 736 FtoP. rugelii 3.5 Ϯ 0.4 2.3 Ϯ 1.0 6.5 Ϯ 3.0 0 0.29 Ϯ 0.08 0.83 Ϯ 0.25 4.4 3.6 GtoP. australis 3.2 Ϯ 0.4 1.2 Ϯ 0.9 3.8 Ϯ 2.7 0 0.31 Ϯ 0.22 0.96 Ϯ 0.69 0.35 2.3 GtoP. rigida 3.2 Ϯ 0.4 1.6 Ϯ 0.9 4.9 Ϯ 2.8 0 2.2 Ϯ 0.7 6.9 Ϯ 2.3 0.14 5.9

Mean Rs is the mean of atp1 (1,272 nt) and cox1 (1,341 nt) values weighted by gene length. Group means are the average root to tip distances for taxa within that group (Fig. 1), excluding Goodenia and Plantago (P.). Myr, Million years; —, denominator (ds) ϭ 0. †Taken from Fig. 2 B and D topology. ‡Taken from Fig. 2 A and C topology.

leading to subgenus Psyllium. In a second scenario, a smaller initial major speed-ups and slow-downs. More refined and accurate increase early on the B–C branch was perpetuated throughout substitution-rate scenarios should emerge with the generation of subgenus Psyllium, followed by an additional rate increase along further sequence data of three kinds: (i) from mt genes of additional branch C–D. A decrease in RS in the terminal lineage leading to species of Plantago (Ͼ200 have yet to be examined), (ii) from mt Plantago coronopus is likely under either scenario, although the genes besides atp1 and cox1, and (iii) from additional non-mt magnitude and͞or timing of this seems to be different for cox1 than markers to better resolve the phylogeny of subgenus Plantago. for atp1 (Table 1). Denser taxon sampling within Plantago may identify still further Interpretation of the magnitude and timing of rate changes in episodes of RS change, as some of the relatively long internodes in subgenus Plantago depends on the topology analyzed (Figs. 2, our current trees may average different periods with significantly compare C with D). The most parsimonious interpretation of different substitution rates. By the same token, we may discover Fig. 2D postulates a major rate decrease in the common ancestor relatively intense but brief periods of even faster evolution than of subgenus Plantago (presumably near the end of the D–E those documented thus far. Further sampling of taxa and mt genes branch), followed by an even greater rate increase on the will also allow us to determine whether synonymous rates are terminal branch leading to P. media. The topology of Fig. 2C relatively homogenous across a mt genome, or, for example, implies a huge, transient change in RS (an increase to 736 SSB, whether rates are occasionally or even consistently higher in certain followed by Ͼ100-fold decrease) on the very short E–F branch, genes and regions of the mt genome than in others. In this respect, although the huge error associated with this value (Table 1) it is worth noting that RS is often 2- to 4-fold higher for atp1 than makes this change difficult to assess with any confidence, and a for cox1 and is rarely much lower (Table 1 and data not shown). nearly 3-fold rate increase on the P. media terminal branch. These scenarios for the evolution of RS in Plantaginaceae mt Underlying Basis of Extreme Molecular Divergence in Plantago. The DNAs are necessarily speculative and tentative. No matter how one exceptional divergence of Plantago mt genes could potentially be views it, however, it seems clear that mt sequences in Plantagi- explained if they had been relocated to the (generally) much higher naceae have been riding a wild rollercoaster, one with multiple mutational environment of the nucleus (9, 11–13). This possibility

Cho et al. PNAS ͉ December 21, 2004 ͉ vol. 101 ͉ no. 51 ͉ 17745 Downloaded by guest on September 24, 2021 can be ruled out because (i) four of the six analyzed mt genes (cox1, of underlying changes in the mt mutation rate. In contrast, most cob, and both SSU and large subunit rDNA) have never been found previously reported cases of (much more modest) synonymous site functionally transferred to nuclear DNA of any eukaryote, whereas variation in plant mt DNA have been postulated to reflect gener- transfer of the other two in plants is extremely rare (24); (ii) this ation time effects (13–15), although this interpretation has been scenario would not explain the subsequent major rate decrease(s) questioned (41). inferred in subgenus Plantago without invoking unprecedented The overall pattern and magnitude of mutation rate change in reverse transfer of genes back to the mitochondrion or major Plantago—a series of relatively recent, major, and reversible decreases in nuclear substitution rates; (iii) probes for two genes changes specifically in the mt mutation rate—is reminiscent of the (cob and cox1) have been shown (25) to hybridize preferentially to well established literature on bacterial mutators and antimutators P. rugelii mt DNA relative to total DNA; (iv) cDNA sequences from (42, 43). The most common mutators in bacteria are in genes for these two P. rugelii genes undergo mt-characteristic (40) C3U DNA replication and DNA mismatch repair, and the limited editing [at one and two sites, respectively (25)]; and (v) both genes number of mt mutators characterized in lab strains of yeast (44, 45) lack any obvious mt targeting sequences within 300 nt upstream of also fall into these same categories (note that in plants, as in yeast, their presumptive start codons. The recovery of RNA-edited cDNA all of the genes responsible for mt DNA replication and repair are sequences also indicates that these two genes are probably func- nuclear genes). Mutations causing error-prone replication and͞or tional. Functionality is indicated more generally because all mt defective repair could have led to major increases in the mt protein genes reported in this study have intact reading frames mutation rate at the base of genus Plantago and at subsequent times despite their extreme divergence and because dN͞dS is not close to in Plantago phylogeny, whereas direct reversals or compensatory 1.0 on any Plantago branch (Table 1). suppressions of these mutations could have led to subsequent Another possible explanation for the extreme divergence of slow-downs in the mutation rate in certain descendant Plantago Plantago mt genes is exceptionally high levels of RNA editing, lineages. Although replication and mismatch repair are the most such that only mt genes, per se, but not their encoded cDNAs and commonly found mutators, many other kinds of mutators have also proteins, are evolving extremely rapidly. This explanation can be been characterized in bacteria (42, 43). Major changes in the ͞ ruled out because all three Plantago cDNA sequences deter- production and or detoxification (by superoxide dismutase and mined show only one or two (see above) or no RNA edits, and catalase) of oxygen free-radicals, thought to be a major cause of because Plantago mt genes show no major bias in base compo- damage to mt DNA, could in principle underlie this mutation-rate sition (e.g., all T residues are not converted to C residues, as variation. would be expected if the conventional, low-level C3U editing of plant mitochondria had been extended to all possible sites). We thank Mark Chase (Royal Botanic Gardens) for providing most of Instead, because the dramatic rate variation in Plantago is the Plantago DNAs used in this study, Danny Rice for helpful discussion, Na Zhou (Indiana University, Bloomington) for help generating some of confined to the mt genome (Figs. 1 and 3) and nonsynonymous the Plantago chloroplast sequences, and Greg Young (Indiana Univer- rates are proportionately less elevated than synonymous rates in sity, Bloomington) for help generating two of the atp1 sequences. This Plantago mt genes (Fig. 1, Table 1, and Table 6), we conclude that research was supported by National Institutes of Health Grant R01- rate variation in Plantago mt genes is probably a direct consequence GM-35087 (to J.D.P.)

1. Brown, W. M., George, M., Jr., & Wilson, A. C. (1979) Proc. Natl. Acad. Sci. 23. Davis, J. I., Stevenson, D. W., Petersen, G., Seberg, L., Campbell, L. M., USA 76, 1967–1971. Freudenstein, J. V., Goldman, D. H., Hardy, C., R., Michelangeli, F. A., 2. Shearer, T. L., Van Oppen, M. J. H., Romano, S. L. & Worheide, G. (2002) Simmons, M. P., et al. (2004). Syst. Bot. 29, 467–510. Mol. Ecol. 11, 2475–2487. 24. Adams, K. L., Qiu, Y.-L., Stoutemyer, M. & Palmer, J. D. (2002) Proc. Natl. 3. Pesole, G., Gissi, C., De Chirico, A. & Saccone, C. (1999) J. Mol. Evol. 48, Acad. Sci. USA 99, 9905–9912. 427–434. 25. Cho, Y. (1997). Ph.D. thesis, (Indiana University, Bloomington). 4. Crawford, A. J. (2003) J. Mol. Evol. 57, 636–641. 26. Mower, J. P., Stefanovic´, S., Young, G. J. & Palmer, J. D. (2004) Nature 432, 5. Johnson, K. P., Cruickshank, R. H., Adams, R. J., Smith, V. S., Page, R. D. & 165–166. Clayton, D. H. (2003) Mol. Phylogenet. Evol. 26, 231–242. 27. Wikstro¨m, N., Savolainen, V. & Chase, M. W. (2001) Proc. R. Soc. London Ser. 6. Williams, S. T., Reid, D. G. & Littlewood, D. T. J. (2003) Mol. Phylogenet. Evol. B 268, 2211–2220. 28, 60–86. 28. Sanderson, M. J. (2002) Mol. Biol. Evol. 19, 101–109. 7. Bruns, T. D. & Szaro, T. M. (1992) Mol. Biol. Evol. 9, 836–855. 29. Sanderson, M. J. (2003) Bioinformatics 19, 301–302. 8. Gray, M. W. & Spencer, D. F. (1996) in Evolution of Microbial Life, eds. 30. Yang, Z. (1997) Comput. Appl. Biosci. 13, 555–556. Roberts, D., Sharp, P., Alderson, G. & Collins, M. A. (Cambridge Univ. Press, 31. Angiosperm Phylogeny Group (2003) Bot. J. Linn. Soc. 141, 399–436. Cambridge, U.K.), pp. 109–126. 32. Yang, Z. & Nielson, R. (1998) J. Mol. Evol. 46, 409–418. 9. Wolfe, K. H., Li, W.-H. & Sharp, P. M. (1987) Proc. Natl. Acad. Sci. USA 84, 33. Palmer, J. D., Adams, K. L., Cho, Y., Parkinson, C. L., Qiu, Y.-L. & Song, K. 9054–9058. (2000) Proc. Natl. Acad. Sci. USA 97, 6960–6966. 10. Palmer, J. D. & Herbon, L. A. (1988) J. Mol. Evol. 28, 87–97. 34. Olmstead, R. G., dePamphilis, C. W., Wolfe, A. D., Young, N. D., Elisons, W. J. 11. Gaut, B. S. (1998) Evol. Biol. 30, 93–120. & Reeves, P. A. (2001) Am. J. Bot. 88, 348–361. 12. Muse, S. V. (2000) Plant Mol. Biol. 42, 25–43. 35. Rønsted, N., Chase, M. W., Albach, D. C. & Bello, M. A. (2002) Bot. J. Linn. 13. Laroche, J., Li, P., Maggia, L. & Bousquet, J. (1997) Proc. Natl. Acad. Sci. USA Soc. 139, 94, 5722–5727. 323–338. 14. Eyre-Walker, A. & Gaut, B. S. (1997) Mol. Biol. Evol. 14, 455–460. 36. Bergthorsson, U., Richardson, A. O., Young, G. J., Goertzen, L. R. & Palmer, 15. Laroche, J. & Bousquet, J. (1999) Mol Biol. Evol. 16, 441–451. J. D. (2004) Proc. Natl. Acad. Sci. USA 101, 17747–17752. 16. Whittle, C.-A. & Johnston, M. O. (2002) Mol. Biol. Evol. 19, 938–949. 37. Brown, W. M., Prager, E. M., Wang, A. & Wilson, A. C. (1982) J. Mol. Evol. 17. Duff, R. J. & Nickrent, D. L. (1997) J. Mol. Evol. 45, 631–639. 18, 225–239. 18. Beckert, S., Steinhauser, S., Muhle, H. & Knoop, V. (1999) Plant Syst. Evol. 218, 38. Cann, R. L. & Wilson, A. C. (1983) Genetics 104, 699–711. 179–192. 39. Saitou, R. & Ueda, S. (1994). (1994) Mol. Biol. Evol. 11, 504–512. 19. Vangerow, S., Teerkorn, T. & Knoop, V. (1999) Plant Biol. 1, 235–243. 40. Maier, R. M., Zeltz, P., Ko¨ssel, H., Bonnard, G., Gualberto, J. M. & 20. Chaw, S.-M., Parkinson, C. L., Cheng, Y., Vincent, T. M. & Palmer, J. D. (2000) Grienenberger, J. M. (1996) Plant Mol. Biol. 32, 343–365. Proc. Natl. Acad. Sci. USA 97, 4086–4091. 41. Whittle, C.-A. & Johnston, M. O. (2003) J. Mol. Evol. 56, 223–233. 21. Bowe, L. M., Coat, G. & dePamphilis, C. W. (2000) Proc. Natl. Acad. Sci. USA 42. Miller, J. H. (1996) Annu. Rev. Microbiol. 50, 625–643. 97, 4092–4097. 43. Horst, J.-P., Wu, T. & Marinus, M. G. (1999) Trends Microbiol. 7, 29–36. 22. Barkman, T. J., Lim, S.-H., Mat Salleh, K. & Nais, J. (2004) Proc. Natl. Acad. 44. Foury, F. & Vanderstraeten, S. (1992) EMBO J. 11, 2717–2726. Sci. USA 101, 787–792. 45. Hu, J. P., Vanderstraeten, S. & Foury, F. (1995) Gene 160, 105–110.

17746 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0408302101 Cho et al. Downloaded by guest on September 24, 2021