bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Extreme variation in rates of evolution in the plastid Clp protease complex

Alissa M. Williams*1, Giulia Friso2, Klaas J. van Wijk2, Daniel B. Sloan1 1Department of Biology, Graduate Program in Cell and Molecular Biology, Colorado State University, 2Section of Biology, School of Integrative Plant Sciences (SIPS), Cornell University, Ithaca, NY 14853, *Corresponding author: Colorado State University, 1878 Campus Delivery, Fort Collins, CO 80523. E-mail: [email protected] bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Abstract 2 Eukaryotic cells represent an intricate collaboration between multiple genomes, even down to the 3 level of multisubunit complexes in mitochondria and plastids. One such complex in is the 4 caseinolytic protease (Clp), which plays an essential role in plastid protein turnover. The 5 proteolytic core of Clp comprises subunits from one plastid-encoded gene (clpP1) and multiple 6 nuclear genes. The clpP1 gene is highly conserved across most green plants, but it is by far the 7 fastest evolving plastid-encoded gene in some angiosperms. To better understand these extreme 8 and mysterious patterns of divergence, we investigated the history of clpP1 molecular evolution 9 across green plants by extracting sequences from 988 published plastid genomes. We find that 10 clpP1 has undergone remarkably frequent bouts of accelerated sequence evolution and 11 architectural changes (e.g., loss of introns and RNA-editing sites) within seed plants. Although 12 clpP1 is often assumed to be a pseudogene in such cases, multiple lines of evidence suggest that 13 this is rarely the case. We applied comparative native gel electrophoresis of chloroplast protein 14 complexes followed by protein mass spectrometry in two species within the angiosperm genus 15 Silene, which has highly elevated and heterogeneous rates of clpP1 evolution. We confirmed that 16 clpP1 is expressed as a stable protein and forms oligomeric complexes with the nuclear-encoded 17 Clp subunits, even in one of the most divergent Silene species. Additionally, there is a tight 18 correlation between amino-acid substitution rates in clpP1 and the nuclear-encoded Clp subunits 19 across a broad sampling of angiosperms, suggesting ongoing selection on interactions within this 20 complex. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

21 Introduction 22 Rates of sequence evolution vary dramatically across genes and genomes. Understanding 23 the forces that determine such variation is one of the defining goals in the field of molecular 24 evolution. In seed plants, the plastid genome (plastome) generally evolves two- to six-fold more 25 slowly than the nuclear genome (Wolfe et al., 1987; Drouin et al., 2008; Smith and Keeling, 26 2015). However, among angiosperms, there is considerable heterogeneity in the rate of plastome 27 evolution. Many lineages have maintained a slowly-evolving plastome, while others have 28 experienced drastic rate increases (Jansen et al., 2007). For instance, among close relatives 29 within the tribe Sileneae there have been at least three recent and independent increases in 30 plastome evolutionary rate (Erixon and Oxelman, 2008; Sloan, Triant, Forrester, et al., 2014). 31 Similar accelerations have been documented in the (Haberle et al., 2008; 32 Barnard-Kubow et al., 2014; Knox, 2014) , Geraniaceae (Guisinger et al., 2008; Weng et al., 33 2014), (Magee et al., 2010), and Poaceae (Guisinger et al., 2010), among other taxa. At 34 a structural level, plastome gene order has largely been conserved, with most angiosperms 35 retaining the structural organization that was present in the most recent common ancestor of this 36 group (Raubeson and Jansen, 2005). However, the sporadic increases in rates of plastome 37 sequence evolution have often been accompanied by structural changes, including indels, 38 inversions, duplications, shifts in inverted-repeat boundaries, and gene and intron loss (Jansen et 39 al., 2007; Guisinger et al., 2010; Weng et al., 2014; Sloan, Triant, Forrester, et al., 2014). 40 Interestingly, increases in plastome evolutionary rate are often not genome-wide; rather, 41 these increases tend to affect a subset of genes (Guisinger et al., 2008; Sloan, Triant, Forrester, et 42 al., 2014). Commonly affected genes include those encoding the essential chloroplast factors 43 Ycf1 and Ycf2 (Sloan, Alverson, Wu, et al., 2012), RNA polymerase subunits (Blazier et al., 44 2016), ribosomal proteins (Guisinger et al., 2008), and the AccD subunit of the acetyl-CoA 45 carboxylase enzyme complex (Rockenbach et al., 2016). Some of the most striking and extreme 46 accelerations are found in clpP1, which encodes a core subunit of the plastid caseinolytic 47 protease (Clp) (Erixon and Oxelman, 2008; Barnard-Kubow et al., 2014; Zhang et al., 2014; 48 Williams et al., 2015). 49 The plastid Clp complex plays an important role in maintaining homeostasis in plant cells 50 by stabilizing the plastid proteome. It is the most abundant stromal protease in developing

51 chloroplasts and has been shown to degrade various chloroplast proteins, e.g., the cytochrome b6f bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

52 complex, a copper transporter, glutamyl-tRNA reductase, and phytoene synthase (Majeran et al., 53 2000; Nishimura and van Wijk, 2015; Apitz et al., 2016; Nishimura et al., 2017; Welsch et al., 54 2018). Consistent with the significance of the Clp complex, plastid-encoded ClpP1 as well as 55 most nuclear-encoded subunits are essential for plant growth and viability (Kuroda and Maliga, 56 2003; Clarke et al., 2005; Zheng et al., 2006; Koussevitzky et al., 2007; Kim et al., 2009; 57 Nishimura and van Wijk, 2015). 58 The proteolytic core of the Clp complex is composed of two stacked heptameric rings 59 (Yu and Houry, 2007; Nishimura and van Wijk, 2015). In most bacteria, including E. coli, these 60 14 subunits are identical, encoded by a single gene (clpP). The plastid Clp core is also composed 61 of two heptameric rings, but there have been multiple duplications of the clpP gene throughout 62 cyanobacterial and plastid evolution such that the rings now comprise numerous paralogous 63 subunits (Peltier et al., 2001; Majeran et al., 2005; Olinares, Kim, Davis, et al., 2011; Nishimura 64 and van Wijk, 2015). In green plants, only clpP1 is retained in the plastome, and the other 65 paralogs are found in the nucleus. In Arabidopsis thaliana, there are a total of eight nuclear 66 paralogs, and subunit composition differs between the two core rings. The “R ring” contains 67 three copies of the plastid-encoded ClpP1 subunit and one of each of four “ClpR” subunits 68 (ClpR1-4), while the “P ring” contains four different nuclear-encoded “ClpP” subunits (ClpP3-6) 69 in a 1:2:3:1 ratio (Olinares, Kim and van Wijk, 2011). Catalytic activity in Clp subunits is 70 conferred by an amino-acid triad (Ser 101, His 126, Asp 176). The distinguishing feature 71 between ClpP and ClpR subunits is that the latter have each lost at least one catalytic residue and 72 are thus assumed to be non-proteolytic; ClpR subunits are also found in the cyanobacterium 73 Synechococcus elongatus and the apicoplast of the parasite Plasmodium falsiparum (Schelin et 74 al., 2002; Peltier et al., 2004; El Bakkouri et al., 2013). Therefore, ClpP1 (the sole plastid- 75 encoded subunit) appears to be the only catalytic member of the R ring. In addition to the core, 76 the plastid Clp complex also contains several chaperone, accessory, and adaptor subunits 77 required for proper Clp function, all of which are nuclear-encoded (Nishimura and van Wijk, 78 2015). 79 The importance of the plastid Clp system to cellular function makes accelerations in 80 clpP1 evolutionary rate particularly surprising. Various mechanisms have been hypothesized to 81 explain such cases of rapid plastid gene evolution (Erixon and Oxelman, 2008; Guisinger et al., 82 2008; Magee et al., 2010; Williams et al., 2015; Blazier et al., 2016), but the relative bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

83 contributions of adaptive evolution, relaxed selection, outright pseudogenization, and increased 84 mutation rates remain unclear. Because Clp subunits are encoded by two different genomes, 85 cytonuclear interactions are integral to the functioning of this complex. There is growing 86 evidence that accelerated organelle genome evolution can lead to correlated increases in rates of 87 evolution in interacting nuclear-encoded proteins. Such effects have been detected in the plastid 88 Clp complex (Rockenbach et al., 2016), as well as plastid and mitochondrial ribosomes (Sloan, 89 Triant, Forrester, et al., 2014; Weng et al., 2016), the plastid-encoded RNA polymerase (Zhang 90 et al., 2015), mitochondrial oxidative phosphorylation (OXPHOS) complexes (Havird et al., 91 2015; Li et al., 2017; Yan et al., 2018), and DNA replication and repair machinery that directly 92 interacts with plastid and mitochondrial genomes (Zhang et al., 2016; Havird et al., 2017). In 93 plants, however, such studies have been limited to close relatives within just two groups 94 (Geraniaceae and Silene) and have not been examined at deeper timescales across angiosperms. 95 Here, to better understand the context and scope of plastid clpP1 acceleration, we provide 96 a detailed accounting of the molecular evolutionary history of clpP1 across the entire green plant 97 lineage. Further, we use proteomic techniques to assess the functional status of clpP1 in two 98 species within the genus Silene; this genus has some of the highest and most variable observed 99 rates of divergence for this gene. Finally, we determine whether coevolution between plastid- 100 and nuclear-encoded subunits of the Clp complex is a broad and repeatable pattern across 101 angiosperm diversity. 102 103 Results 104 ClpP1 has undergone numerous and extreme accelerations in rates of amino-acid 105 substitution across independent lineages of green plants 106 Massive acceleration in ClpP1 amino-acid substitution rate has occurred multiple times 107 across green plants (Figure 1, Figure S1). These accelerations are most pronounced in seed 108 plants, with striking examples found in conifers, gnetophytes, and numerous angiosperm 109 lineages. Therefore, previous reports of divergent ClpP1 sequences in specific lineages (Erixon 110 and Oxelman, 2008; Barnard-Kubow et al., 2014; Zhang et al., 2014; Williams et al., 2015) 111 appear to be the result of an incredibly frequent occurrence of accelerations over the course of 112 seed plant evolution. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

113 The cases of ClpP1 rate acceleration in seed plants are especially remarkable when 114 considered against the high level of conservation in many related land plant lineages. All 115 sampled bryophytes, lycophytes, and ferns have between 82 and 100% ClpP1 amino-acid 116 sequence identity with the inferred common ancestor of land plants. Many seed plants show 117 similarly high levels of conservation, but this same measure of sequence identity ranges 118 anywhere from 27% to 87% in gymnosperms, and anywhere from 32% to 80% in angiosperms. 119 These wide ranges correspond to major increases in amino-acid substitution rate. For instance, 120 since the split at the base of the land plant phylogeny ca. 490 Mya (Morris et al., 2018), a 121 representative liverwort (Marchantia polymorpha) and a representative hornwort (Anthoceros 122 angustus) have exhibited a total rate of 0.1 amino-acid substitutions per site per billion years. At 123 the other extreme, the angiosperms Silene noctiflora and S. conica diverged only about 5 million 124 years ago (Rautenberg et al., 2012) and have since exhibited a rate of 335 amino acid 125 substitutions per site per billion years – a >3000-fold increase relative to the rate of divergence 126 between liverworts and hornworts. 127 To assess whether the observed rate variation in ClpP1 was the result of a plastome-wide 128 effect, we repeated our analyses on a representative photosynthetic protein (PsaA) and found 129 considerably less rate heterogeneity (Figure 1). This result is in line with the widespread 130 conservation that is characteristic of plastid genes and especially those involved in 131 photosynthesis (Guisinger et al., 2008). All sampled species of land plants, including seed plants, 132 have between 90 and 98% identity in PsaA amino acid sequence relative to the inferred common 133 ancestor. Further, any rate increases in PsaA have been much smaller than those in ClpP1 134 (Newick tree files provided at https://github.com/alissawilliams/clpP1_2018). Using the same 135 comparisons as above between bryophytes and Silene, there has been a rate of 0.22 amino acid 136 substitutions per site per billion years between Marchantia polymorpha and Anthoceros 137 angustus, whereas this rate is 2.9 amino acid substitutions per site per billion years between 138 Silene noctiflora and S. conica. Therefore, the rate has increased roughly 10-fold in Silene 139 relative to the liverwort-hornwort comparison, but it is not nearly the 3000-fold increase we 140 observe in ClpP1. 141 Although the most dramatic cases of ClpP1 accelerations were found in seed plants, there 142 was also rate heterogeneity among lineages of green algae, albeit at much deeper timescales than 143 within seed plants. The most extreme examples of ClpP1 divergence in algae are found in bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

144 lineages such as Chlorokybus, Mesostigma, Stichococcus, and Chlorophyceae. The rapid 145 sequence evolution within Chlorophyceae is perhaps not surprising because large insertions in 146 ClpP1 have been previously identified in this group (Huang et al., 1994; Majeran et al., 2005; 147 Derrien et al., 2012). 148 149 ClpP1 acceleration is correlated with changes in structure and gene architecture 150 Our analysis demonstrated that accelerated rates of ClpP1 amino-acid substitution are 151 also associated with broader changes at a structural level. For example, we confirmed that 152 species with clpP1 duplications within the plastome (Figure S2) generally have high rates of 153 amino acid substitution (Erixon and Oxelman, 2008; Park et al., 2017). Angiosperm examples 154 include Silene chalcedonica (=Lychnis chaldeconica), Carex siderosticta, and multiple lineages 155 within the Geraniaceae. By sampling a subset of angiosperm species, we also observed that high 156 substitution rates tend to be associated with indels in clpP1 (Figure S3). This relationship was 157 not quite statistically significant (Spearman’s rho = 0.36, p = 0.08), but that may reflect limited 158 power resulting from our conservative approach to identifying indel events in the gappy clpP1 159 sequencing alignment. 160 Accelerated clpP1 evolution is also associated with intron loss in many lineages (Figure 161 S4). Most land plants share two clpP1 introns, which appear to have been gained in series during 162 the evolution of streptophytes. Intron 1 was likely gained in a common ancestor of 163 Charophyceae, Coleochaetophyceae, Zygnemophyceae, and land plants based on its presence in 164 Chara vulgaris, Chaetosphaeridium globosum, and Mesotaenium endlicherianum. Intron 2 165 appears to have been acquired more recently in a common ancestor of Zygnemophyceae and land 166 plants because the only algal species in which it was identified was M. endlicherianum, which is 167 consistent with the conclusion that Zygnemophyceae is the sister lineage to all land plants 168 (Wickett et al., 2014). The only other clpP1 intron identified in our sample appears to be an 169 independent acquisition in the streptophyte Klebsormidium flaccidum at a unique position within 170 the gene. This inferred history of intron gains implies multiple secondary losses within green 171 algae because multiple algal species within these groups lack one or both of the introns. 172 Strikingly, we identified at least 31 independent losses of one or both introns in land plants 173 (Figure 2), which are strongly associated with accelerated rates of ClpP1 evolution, including in 174 multiple Silene species, the genus Oenothera, and Plantago maritima. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

175 In land plant mitochondria and plastids, there is frequent C-to-U RNA editing (Freyer et 176 al., 1997; Tsudzuki et al., 2001), and there is evidence that accelerated sequence evolution can 177 also be associated with loss of editing sites (Parkinson et al., 2005; Sloan et al., 2010; Zhu et al., 178 2014; Guo et al., 2016). In Arabidopsis thaliana, there is a single RNA editing site in clpP1 at 179 codon 187, where the codon CAU (His) is edited to UAU (Tyr) (Tillich et al., 2005). This edited 180 C has been replaced in most accelerated angiosperm species, usually in favor of “hard-coding” a 181 T (U) at this position (Figure S5). In contrast, the C is maintained in most non-accelerated 182 lineages (though there are exceptions such as the loss of this site at the base of the (Hein 183 and Knoop, 2018) well before any rate accelerations in that group). We also examined a second 184 site in codon 28, which also undergoes CAU (His) to UAU (Tyr) editing in angiosperms (Hein 185 and Knoop, 2018). This editing site was replaced with a hard-coded T at the base of the core 186 but appears to be retained in other major angiosperm groups, where the same trend 187 toward hard-coding in accelerated species is evident (Figure S6). 188 189 clpP1 loss or pseudogenization events in green plants 190 To assess whether the functional inactivation (i.e., pseudogenization) of clpP1 may 191 contribute to cases of extreme rate accelerations, we looked for evidence of potential 192 pseudogenes and outright gene loss across green plants. Beyond the obvious loss of clpP1 in the 193 holoparasitic genera Rafflesia and Polytomella, which lack any detectable plastomes whatsoever 194 (Molina et al., 2014; Smith and Lee, 2014), we have identified an additional seven species that 195 appear to lack clpP1 in their plastomes (Table 1). Of these seven lineages, which encompass 196 both green algae and angiosperms, five are holoparasitic or mycoheterotrophic. The loss of 197 photosynthesis in holoparasites and mycoheterotrophs is typically associated with radical 198 changes in the plastome (Krause, 2008; Bromham et al., 2013). Although clpP1 is often retained 199 in these species despite massive loss of the genes that encode photosynthetic machinery (Wolfe 200 et al., 1992; Delannoy et al., 2011), our results indicate that non-photosynthetic genes such as 201 clpP1 also face increased probability of loss during the evolution of holoparasitism. 202 In addition to cases of outright loss, there are species in which clpP1 is still present but 203 may be a pseudogene. For instance, the reported sequence in the angiosperm Epipremnum 204 aureum is radically altered as part of a partial tandem duplication, and there are other 205 angiosperms in which the first exon has been lost (Table 1). There are also lineages in which bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

206 clpP1 contains an internal stop codon. However, these cases most likely reflect 207 posttranscriptional modifications and changes in translation rather than actual pseudogenes. The 208 internal stop codon in the hornwort Anthoceros angustus has been shown to be removed by U-to- 209 C RNA editing (Kugita et al., 2003), and it is possible that a similar mechanism removes internal 210 stop codons in the hornwort Nothoceros aenigmaticus and the fern Diplopterygium glaucum – 211 lineages in which U-to-C plastid RNA editing is prevalent (Duff and Moore, 2005; Knie et al., 212 2016). In addition, the copy of clpP1 in the chlorophyte Jenufa minuta contains multiple (UGA) 213 stop codons, but these are found at positions normally encoding conserved Trp residues in 214 numerous genes within this plastome (Lemieux et al., 2015), suggesting that J. minuta has 215 undergone a change in the plastid genetic code in which the UGA codon has been reassigned to 216 encode Trp. 217 218 ClpP1 is still expressed and likely assembles with nuclear-encoded Clp subunits in Silene 219 species that exhibit extreme heterogeneity in rates of ClpP1 sequence evolution 220 To further assess whether observed increases in rates of ClpP1 sequence evolution reflect 221 a loss of functionality, we took advantage of the rate heterogeneity within the angiosperm 222 genus Silene. We selected S. latifolia as a species that has retained a highly conserved copy of 223 ClpP1 and S. noctiflora as a close relative with a recent and extreme rate acceleration that has 224 resulted in one of the most divergent copies of ClpP1 in our dataset, including the substitution of 225 both histidine and aspartate in its catalytic triad (Figure 1). We isolated intact chloroplasts and 226 separated the native soluble stromal complexes by native gel electrophoresis (LB-Native-PAGE), 227 which we have applied in the past to determine the assembly state of the ClpPR core and other 228 stromal complexes in Arabidopsis (Peltier et al., 2006; Olinares, Kim, Davis, et al., 2011). Based 229 on tandem mass spectrometry (MS/MS) of size-fractionated samples, over 97% of all adjusted 230 spectral counts (AdjSPC) matched annotated plastid proteins, with 1.3 and 0.9% of the AdjSPC 231 matching to the Clp protein family in S. noctiflora and S. latifolia, respectively. We identified 232 (i.e., deduced from the Silene species sequence data) all predicted ClpPRT proteins, including 233 ClpP1 (Figure 3; Supplemental Dataset 1). Further, we detected ClpP1 in the same size 234 fractions as the ClpR subunits in both species, which is consistent with the expectation of a mass 235 of ca. 200 kDa for the “R-ring”(Olinares, Kim, Davis, et al., 2011), providing indirect evidence bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

236 that ClpP1 still assembles as part of the ring structure that makes up the proteolytic core of the 237 plastid Clp complex (Figure 3; Figure S7). 238 239 Signatures of selection and rate variation across ClpP1 240 In species with accelerated ClpP1 sequence evolution, substitutions were widely 241 distributed across the length of the protein (Figure 4a), but individual sites varied substantially 242 in their degree of conservation (Figure 4b). Only a single residue (Gly at position 110) was 243 invariant across all sampled green plants. As expected, residues in the catalytic triad were 244 broadly conserved, though Asp 176 has been lost in over 20 species (Figure S8). Substitutions at 245 Ser 101 and His 126 were less common but still observed in 5 and 6 species, respectively. 246 Notably, some species have experienced substitutions at multiple sites in the catalytic triad; for 247 example, Plantago maritima and Vaccinium macrocarpon have both lost all three residues. 248 We partitioned the ClpP1 sequence into major functional domains and applied maximum- 249 likelihood models to compare rates of amino-acid substitution across these partitions. We found 250 that rates differed significantly among regions (χ2 = 337.9; d.f. = 3; P << 0.001), with the highest 251 rates observed in the predicted “handle” domain (Figure 4a), which is likely involved in 252 physical interactions between the two heptameric rings that make up the tetradecameric 253 proteolytic core of the Clp complex (Yu and Houry, 2007). We also observed that the beginning 254 of the handle domain appeared to be a hotspot for structural variants, including large insertions in 255 Carex, Eustrephus, Silene, Taxus, and Viviania (alignment available at 256 https://github.com/alissawilliams/clpP1_2018). 257 To take advantage of independent acceleration events and avoid the saturation expected 258 when examining deep splits in the green plant phylogeny, our further analyses focused on a 259 broad sample of 25 angiosperm species representing a wide range of rate variation. We ran a 260 PAML branch-site model on this sample, using ClpP1-accelerated lineages as the foreground, to 261 determine whether there are signatures of positive selection on specific codons across multiple 262 accelerated species. This analysis identified 32 amino acid residues with greater than 95 percent 263 probability of being under positive selection. Based on the E. coli ClpP structure, these 32 sites 264 are scattered across the protein in no obvious spatial pattern (Figure 5). 265 We also calculated gene-wide average ratios of nonsynonymous to synonymous

266 substitution rates (dN/dS) for clpP1 and psaA on each branch in our angiosperm phylogeny. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

267 Across both accelerated and non-accelerated species, clpP1 dN/dS ratios tend to be greater than

268 the corresponding psaA ratios (Figure 6). As expected, the clpP1 dN/dS ratios are higher in 269 species that were identified as having highly divergent ClpP1 protein sequences. However, we 270 found that some slow species (Solanum lycopersicum, Cucumis sativus, and Vitis vinifera) have

271 dN/dS ratios more characteristic of accelerated species despite much lower overall levels of ClpP1

272 protein sequence divergence. We obtained gene-wide dN/dS ratios greater than 1 for clpP1 in 273 several species (Table S1) which could indicate extreme positive selection in these branches, but 274 none of these values were significantly greater than 1 (uncorrected p-values all greater than

275 0.05). We found that variation in dN/dS is not as large as variation in overall rates of ClpP1 276 protein sequence evolution, reflecting the fact that accelerated species have also experienced 277 increased synonymous substitution rates, though not nearly to the extent of the increase in 278 nonsynonymous substitution rates (Figure 6). 279 280 ClpP1 accelerations are highly correlated with accelerations in nuclear-encoded Clp genes 281 Across our sample of 25 angiosperms, the amino-acid substitution rate in ClpP1 is 282 strongly associated with the amino acid-substitution rate of the nuclear-encoded core Clp 283 subunits (Figure 7a-c). Notably, the mirroring effect between the ClpP1 and nuclear-encoded 284 Clp branch lengths does not occur in a random sample of nuclear-encoded proteins, indicating 285 that the correlated accelerations are not due to genome-wide rates of evolution. After using 286 branch lengths from random genes to account for background rate variation, the vast majority of 287 the variation in branch length for nuclear-encoded Clp subunits can be explained by the ClpP1 288 branch length for a particular lineage (R2 = 0.88, P << 0.001) (Figure 7d). 289 290 Discussion 291 292 Can pseudogenization explain massive accelerations in rates of clpP1 evolution? 293 Our analysis shows that accelerated clpP1 evolution has occurred frequently and 294 independently across green plants—particularly among seed plants. Accelerations in clpP1 are 295 thus a striking feature of seed plant evolution, especially given that clpP1 is highly conserved in 296 a majority of plant species. Pseudogenization has often been hypothesized as an explanation for 297 extreme clpP1 divergence (Hirao et al., 2008; Zhang et al., 2014; Williams et al., 2015), and in bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

298 many other plastome sequencing projects, clpP1 gene sequences have gone completely 299 unrecognized and unannotated because of their extreme divergence (Haberle et al., 2008; Straub 300 et al., 2011; Fajardo et al., 2013; Yao et al., 2015). There has also been speculation in these 301 cases that clpP1 has been functionally transferred to the nucleus, as intracellular gene transfer is 302 a common and ongoing phenomenon in plants (Millen et al., 2001; Adams, Qiu, et al., 2002). In 303 the highly reorganized Boodlea composita plastome (Cortona et al., 2017), clpP1 appears to have 304 been lost and possibly transferred to the nucleus, though we find that distinguishing clpP1 from 305 its nuclear paralogs is difficult in this case due to deep sequence divergence. Thus, we are not 306 aware of any clear examples of plastid clpP1 transfer to the nucleus in green plants. 307 Although we have identified probable cases of clpP1 loss or pseudogenization within the 308 plastome in a relatively small number of species (Table 1), it is unlikely that these processes 309 represent a general explanation for the repeated and widespread pattern of substitution-rate 310 acceleration in green plants. In the vast majority of our sampled species, clpP1 reading frames 311 have remained intact, even in species with extreme rates of indels and nucleotide substitutions. In 312 the absence of functional constraints, such rapid change would quickly introduce internal stop 313 codons and frameshifts, which suggests that there is still selection in these species to retain a 314 functional gene copy. 315 Previous work has provided some evidence that even highly divergent clpP1 genes may 316 still be functional. For instance, the divergent copies of clpP1 in Acacia ligulata and 317 Campanulastrum americanum are still transcribed and spliced (Barnard-Kubow et al., 2014; 318 Williams et al., 2015). In this study, we examined the plastid proteome of Silene noctiflora, a 319 species with one of the most divergent known copies of clpP1, and found the first evidence that 320 such divergent genes can still be expressed at the protein level and co-assemble with other Clp 321 subunits. These results suggest that ClpP1 in S. noctiflora is still an important part of the core 322 Clp structure, despite the fact that it has lost two members of its catalytic triad. If the subunit 323 composition of R ring is indeed conserved, there are potentially important (but currently 324 unknown) functional consequences of catalytic triad loss in the only catalytic member of this 325 ring, which may affect overall Clp catalytic activity and/or complex structure (Andersson et al., 326 2009; Zeiler et al., 2013). We note that inactivation of the catalytic site in Arabidopsis thaliana 327 ClpP3 has no phenotypic effect, whereas complete loss of ClpP3 results in a severe phenotype. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

328 In contrast, in case of ClpP5, loss of the catalytic site or complete gene loss results in embryo 329 lethality (Kim et al., 2013; Liao et al., submitted). 330 331 Role of mutation rate vs. selection in clpP1 accelerations 332 Another explanation for extreme divergence in clpP1 (and other plastid genes) could be 333 an increase in the underlying mutation rate (Park et al., 2017). However, there are apparent 334 difficulties with interpretations based solely on changes in mutation rate. While we would 335 generally expect an increase in mutation rate to affect the entire plastome, it is clear from whole- 336 plastome sequencing efforts that clpP1 acceleration often occurs with little or no change in rates 337 for a large fraction of the plastome (Guisinger et al., 2008; Sloan, Triant, Forrester, et al., 2014; 338 Williams et al., 2015). The locus-specific nature of clpP1 accelerations was supported by our 339 analysis of psaA substitution rates across green plants, which were consistently low, even in 340 species with extremely divergent clpP1 sequences (Figure 1). Thus, if clpP1 acceleration is 341 caused by an increase in mutation rate, it would require a highly localized effect within the 342 plastome. While such an effect is more difficult to explain than a genome-wide increase in 343 mutation rate, “localized hypermutation” has been suggested previously as the cause of high 344 divergence in the plastid gene ycf4 (Magee et al., 2010) and may be associated with the presence 345 of short, repetitive sequences (Stoike and Sears, 1998). There are also several documented cases 346 of extreme variation in synonymous substitution rates across individual mitochondrial genomes, 347 demonstrating that mutation rates do likely vary within organelle genomes in some plants (Sloan 348 et al., 2009; Zhu et al., 2014). 349 One possible mechanism for localized hypermutation is “mutagenic retroprocessing” 350 (Parkinson et al., 2005; Park et al., 2017), which occurs when a mature transcript recombines 351 back into the genome after reverse transcription. In this scenario, accelerated substitution rates 352 could be explained by the relatively high error rates of reverse transcriptases (Preston, 1996; 353 Sabot and Schulman, 2006) and/or RNA polymerases (Traverse and Ochman, 2016). 354 Retroprocessing would be expected to affect exons and introns differently. If the recombination 355 event involves the entire gene, introns would be lost because they are not included in mature 356 transcripts. If the recombination event involves only part of a mature transcript, that portion 357 would necessarily be an exon, meaning that the rate acceleration would be limited to exonic 358 regions. Both of these predictions have empirical support. Species with accelerated clpP1 bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

359 sequences often lack clpP1 introns (Figure S4) (Erixon and Oxelman, 2008; Park et al., 2017). 360 Among the accelerated species that do retain their clpP1 introns, rates of sequence evolution are 361 much higher in exons than in introns (Erixon and Oxelman, 2008; Barnard-Kubow et al., 2014). 362 Despite these observations, it is not clear why clpP1 would specifically or preferentially undergo 363 mutagenic retroprocessing in the plastome, particularly because most plastid genes have high 364 transcription rates and many are transcribed at higher rates than clpP1 (Mullet, 1993; Sanitá 365 Lima and Smith, 2017). 366 Another difficulty with an explanation based solely on mutation is that, if clpP1 has only

367 been subject to an increased mutation rate, we would not necessarily expect an increase in dN/dS.

368 The dN/dS statistic is typically interpreted as a measure of selection pressure, so low values are 369 expected for genes under purifying selection, even if the mutation rate is high. Indeed, previous

370 work has found that increased mutation pressure can even be associated with decreased dN/dS 371 values in genes that remain under strong purifying selection (Wolf et al., 2009; Havird and

372 Sloan, 2016). In contrast, we found a trend of increased dN/dS values in species with fast rates of 373 clpP1 evolution (Figure 6). While this result may not be surprising given that amino acid

374 substitution rates and dN/dS values are interconnected, it does indicate that the rates of 375 nonsynonymous substitutions in these lineages have increased disproportionately relative to 376 synonymous substitutions. This result suggests that clpP1 acceleration is due, at least in part, to a 377 change in selective pressures on protein sequence rather than simply an increase in mutation rate. 378 This line of argument provides an alternative interpretation for the aforementioned observation 379 that introns do not exhibit a similar degree of accelerated sequence evolution (Erixon and

380 Oxelman, 2008; Barnard-Kubow et al., 2014). Importantly, any interpretation of dN/dS ratios 381 comes with the caveat that they can be overestimated and model-dependent, especially in cases 382 where there are multinucleotide mutations and/or indels in the gene of interest (Li et al., 2009; 383 Stoletzki and Eyre-Walker, 2011; De Maio et al., 2013; Venkat et al., 2017). 384 385 Why might selection pressures on clpP1 have changed? 386 While localized increases in mutation rate could be involved in clpP1 accelerations, it is 387 unlikely that mutation rates are the only contributing factor for the reasons described above.

388 Rather, it is likely that changes in selection are involved, and increases in dN/dS are typically 389 caused by some combination of relaxed selection and/or positive selection. One possible bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

390 explanation is that an increased mutation rate has itself altered selection pressures. This 391 mechanism has been previously hypothesized in legumes, where the plastid gene ycf4 has 392 undergone increases in evolutionary rate in several species potentially as a result of localized 393 hypermutation (Magee et al., 2010). Because high mutation rates can lead to an accumulation of 394 deleterious mutations, there may be selection for affected genes to “escape” this mutation 395 pressure by being functionally transferred to the nucleus (Blanchard and Lynch, 2000; Magee et 396 al., 2010). If functional replacement does occur, the plastid-encoded gene would no longer be 397 needed for its original function and thus experience relaxed selection. In this scenario, the 398 decrease in functional constraint would occur due to gene transfer/replacement, which was 399 initially driven by increased mutation pressure (Magee et al., 2010). 400 Relaxed selection can also occur without a change in underlying mutation rate. For 401 instance, it is possible that functional constraint on the entire Clp complex could be reduced if it 402 simply becomes less important to cellular functioning. Such a decrease in functional constraint 403 could conceivably occur if some of the many other plastid proteases take precedence (Nishimura 404 et al., 2017). Alternatively, functional constraint may be reduced specifically on clpP1 (as 405 opposed to the entire complex). As described above, a likely cause of relaxed selection (or 406 outright pseudogenization) would be replacement in the Clp complex by a nuclear-transferred 407 copy of clpP1 or one of the existing nuclear-encoded subunits. Such transfers/replacements 408 frequently occur for other genes in plant organelles even in the absence of increased mutational 409 pressure (Millen et al., 2001; Adams, Qiu, et al., 2002; Adams, Daley, et al., 2002). However, 410 given that we did not find evidence for widespread clpP1 pseudogenization and/or gene 411 replacement and that the highly divergent ClpP1 subunit in S. noctiflora still appears to be 412 associated with other plastid Clp core subunits, it is unlikely that functional replacement of 413 ClpP1 in the Clp complex has broadly occurred in accelerated lineages. 414 The other form of selection that could be involved in clpP1 rate accelerations and

415 increases in dN/dS is positive selection. Under positive selection, there is selection for change, 416 which can lead to a superficially similar pattern of accelerated protein sequence evolution as 417 observed under relaxed selection. Previous work has found evidence of positive selection in both 418 nuclear- and plastid-encoded Clp core subunits (Erixon and Oxelman, 2008; Rockenbach et al., 419 2016). Often, positive selection is assumed to reflect an adaptation for a novel function or a 420 response to an environmental change. While there is no obvious shared background environment bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

421 or biological feature among clpP1-accelerated lineages, our understanding of Clp function 422 (including the identities of many of its target substrates) remains incomplete, so adaptive change 423 of the whole complex is still a viable hypothesis. Further, multiple bacterial lineages including 424 Bacillus thuringiensis and cyanobacteria have undergone major Clp complex reorganizations as 425 the result of core subunit duplication and diversification, suggesting selection to deviate from the 426 conserved ancestral functions of Clp (Fedhila et al., 2002; Stanne et al., 2007). 427 Positive selection could also be related to the intimate interactions between the plastid 428 Clp subunits encoded by different genomes. Using an evolutionary rate correlation analysis, we 429 have shown that accelerations in clpP1 were paralleled by similar accelerations in nuclear- 430 encoded Clp genes across a broad range of angiosperms (Figure 7). Our analysis represents a 431 general class of computational methods that detect correlated rate changes between residues, 432 genes, and/or complexes across a phylogeny as a means to identify genes that share a functional 433 relationship and may be coevolving or at least responding to the same selection pressures 434 (Dutheil and Galtier, 2007; Yeang and Haussler, 2007; Clark and Aquadro, 2010; Juan et al., 435 2013). Therefore, the fact that clpP1 and the other core plastid Clp subunits had a significant rate 436 correlation demonstrates that the plastid- and nuclear-encoded Clp subunits are subject to shared 437 variation in selection pressures. This result could be simply due to selection acting on the 438 complex as a whole (as described above), or it could indicate that clpP1 and its nuclear-encoded 439 counterparts are coevolving because of their direct interactions within the complex. Thus, 440 coevolution between Clp subunits could be a driver of clpP1 acceleration. A change in one 441 subunit could introduce pressure on the other subunits to change in response—and these 442 subsequent changes could drive further change, creating a chain reaction. Regardless of the 443 initial trigger, this mechanism could explain both previous observations of positive selection on 444 Clp subunits and the high correlation between their evolutionary rates. There has been 445 speculation that such a positive feedback loop in the plastid Clp could be due to antagonistic 446 interactions between the plastid and nuclear genomes (Rockenbach et al., 2016), and recent 447 studies have implicated other plastid loci in selfish interactions with the nucleus (Bogdanova et 448 al., 2015; Sobanski et al., 2018), but direct evidence for this or any other trigger for 449 coevolutionary change is currently lacking. 450 bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

451 In summary, our analysis has characterized the remarkable extent and repeatability of 452 clpP1 acceleration, provided evidence of retained functionality at the protein level even for one 453 of the most extreme cases of clpP1 divergence, and revealed an exceptionally strong, 454 angiosperm-wide rate correlation within this complex. However, there is much work to be done, 455 as the specific causes and mechanisms driving rapid clpP1 evolution remain uncertain. 456 457 Experimental Procedures 458 459 Extraction and filtering of plastid gene sequences 460 All 988 complete Viridiplantae plastome sequences available in the NCBI RefSeq 461 collection as of May 2, 2016 were downloaded from GenBank and parsed with a custom BioPerl 462 script to extract the annotated coding sequences and number of exons for clpP1 and psaA. 463 Nucleotide sequences for species missing after this initial step were manually extracted, either 464 via inspection of the GenBank annotation or after a tblastn v2.2.30+ (Gertz et al., 2006) search 465 using the corresponding Arabidopsis thaliana protein sequence as a query. Extracted sequences 466 were then screened with custom Perl scripts to identify missing or internal stop codons and 467 identify potential annotation errors based on gene-length outliers. Corrections were made to 468 annotations with the aid of NCBI ORFfinder (Rombel et al., 2002), except in cases where 469 internal stop codons were known or inferred to be due to U-to-C RNA editing. 470 To reduce redundancy in the dataset, only a single sample was chosen from genera that 471 were represented by more than one species, except in cases where substantial variation in ClpP1 472 sequence was observed among congeners. This down-sampling reduced the clpP1 dataset to 480 473 species (and 483 sequences because we retained divergent clpP1 copies found within the 474 plastomes of Carex siderosticta and Silene chalcedonica). We used the same set of species for 475 the psaA analysis, except that it did not include 16 holoparasitic species that have lost psaA along 476 with most or all of their photosynthesis-related genes (Krause, 2008; Bromham et al., 2013) or 477 the lycophyte Selaginella moellendorffii, which exhibits extreme levels of RNA editing (Smith, 478 2009), making it difficult to estimate rates of protein sequence evolution. The resulting psaA 479 dataset contained 463 sequences. 480 In the Methods below, we will refer to the set of Viridiplantae species described above as 481 the “large dataset” (Table S2). For some analyses, we used a more targeted sampling of 25 bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

482 angiosperms, which we will refer to as the “small dataset” (Table S3). The species in the small 483 dataset were selected to 1) span the phylogenetic diversity of angiosperms, 2) capture multiple 484 independent accelerations in plastome evolutionary rate as well as related species with slow 485 evolutionary rates, and 3) only include species for which nuclear genome/transcriptome 486 resources were available (for use in subsequent analyses of cytonuclear coevolution; see below). 487 All scripts, sequence data, and alignments are available at 488 https://github.com/alissawilliams/clpP1_2018. 489 490 Sequence alignment and tree construction 491 To assess variation in rates of ClpP1 and PsaA protein sequence evolution across green 492 plants, the nucleotide sequences of the large dataset were translated into protein sequences in 493 MEGA v7.0.21 (Kumar et al., 2016). These protein sequences were then aligned using the 494 MAFFT v7.222 einsi option (Katoh and Standley, 2013). Constraint trees were manually 495 constructed based on established phylogenetic relationships, using NCBI and the 496 Angiosperm Phylogeny Website v13. Branch lengths were estimated using codeml in the PAML 497 v4.9a package (Yang, 2007) with an LG substitution matrix and rate variation among sites 498 estimated with a gamma distribution. For this analysis, the ClpP1 alignment was trimmed to 499 remove all insertions relative to the 196-aa Nicotiana tabacum reference sequence. 500 501 Analysis of substitution rate variation and tests for selection 502 Variation among sites. To generate site-specific estimates of amino-acid substitution rate across 503 the ClpP1 subunit, we applied a partitioned model in codeml. Using option G, we specified a 504 separate partition for each position in the 196-aa alignment. The Mgene parameter was set to 0 505 such that total tree length could vary across partitions (i.e., different sites could have different 506 rates), but branch lengths had to remain proportional. The complexity of this model necessitated 507 that it be run under a simple Poisson model of amino acid substitutions. To assess whether 508 certain regions of the protein exhibited disproportionate accelerations in fast lineages, we 509 performed this analysis on two different subsets of the large dataset. The first was a set of 27 510 slow “background” lineages sampled from across green plants. The second was a set of 60 511 angiosperms that contained 38 “accelerated” lineages and 22 interspersed slow lineages that were 512 included to increase the probability of detecting parallel amino-acid substitutions in fast lineages. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

513 Site-specific rate estimates were summarized using a sliding-window analysis with a window 514 size of 21 aa. This rate analysis was performed both on raw tree lengths and on normalized rates 515 that were scaled to the average tree length across the data set. Site-specific variation was also 516 summarized with WebLogo v3.5.0 (Crooks et al., 2004), using default settings and the trimmed 517 196-aa alignment of 483 ClpP1 sequences described above. 518 To further investigate rate variation within ClpP1, we partitioned the subunit into the 519 following four regions based on characterized structural domains in E. coli (Yu and Houry, 520 2007): 1) the “handle domain”, consisting of positions 129-162; 2) the “head domain”, consisting 521 of positions and 32-124 and 165-193; 3) the N-terminal “axial loops”, consisting of positions 7- 522 20 (although the alignment between E. coli and the plastid ClpP1 is weak in this region, so it not 523 clear whether there is a conserved functional role); and 4) “other” spacer regions, consisting of 524 all remaining positions in the 196-aa alignment. We repeated the codeml analysis described 525 above but used the full 483-sequence alignment and specified these four partitions with option G. 526 As a basis of comparison, we performed the same analysis without any partitions. To test for 527 evidence of significant rate heterogeneity among the four regions, we performed a likelihood 528 ratio test (LRT) that compared these two models with three degrees of freedom. 529

530 Variation among branches. To examine differences in dN/dS across species, we used codeml to

531 determine dN and dS (for both clpP1 and psaA) for each branch of the small dataset. We 532 converted our previously obtained ClpP1 and PsaA amino-acid alignments into codon-based 533 nucleotide alignments. In the codeml run, we used the parameters model=1 and fix_omega=0,

534 which together specify estimation of an individual dN/dS value for each branch in the tree. For

535 terminal branches with a dN/dS estimate > 1, we assessed statistical significance by constraining

536 each branch of interest (separately) to a dN/dS value of 1 (model=2, fix_omega=1). We 537 determined whether the unconstrained PAML model was a significantly better fit to the data than 538 each constrained model by performing LRTs with one degree of freedom. 539 540 Branch-site models. Using the codon-based alignment of clpP1 for the small dataset (described 541 above), we used a codeml branch-site model (Zhang et al., 2005) to infer whether any amino 542 acid sites in ClpP1 have been subject to positive selection. We partitioned the species of the 543 small dataset into one of two categories: fast or slow rates of ClpP1 evolution. The “fast” species bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

544 were specified as the foreground, which means that the analysis identified sites under positive 545 selection in this species subset. This analysis used the parameters model=2, NSsites=2, 546 fix_omega=1, and omega=1 for the null model, and the parameters model=2, NSsites=2, 547 fix_omega=0, and omega=1 for the alternative model. To determine whether the alternative 548 model was a significantly better fit to the data than the null model, we performed an LRT with 549 one degree of freedom. We mapped sites with a Bayes empirical Bayes (BEB) posterior

550 probability of ³ 0.95 for positive selection based on this analysis to the homologous positions in

551 the E. coli ClpP structure (Protein Data Bank 1YG6) visualized in Chimera v1.11.2 (Pettersen et 552 al., 2004). 553 554 Evolutionary rate covariation between the plastid- and nuclear-encoded Clp subunits. 555 To determine whether the rate of amino-acid substitution in ClpP1 is correlated with the 556 rate in nuclear-encoded Clp subunits across angiosperms, we compiled protein sequences of all 557 nine core Clp subunits (ClpP1,3-6, ClpR1-4) and 20 non-Clp nuclear-encoded genes for each 558 species in the small dataset, using a custom Python script to reciprocally blast A. thaliana protein 559 sequences against predicted protein sequences from each of the other species. Predicted protein 560 sequence data were collected from various sources, including sequenced genomes on Phytozome 561 (https://phytozome.jgi.doe.gov/), transcriptomes from the 1KP project (Wickett et al., 2014), and 562 various other transcriptome sequencing projects (Table S3). The 20 non-Clp genes represent the 563 subset of the 50 control genes examined by Rockenbach et al. (Rockenbach et al., 2016) for 564 which we could recover orthologs in all species of interest. Protein sequences were aligned with 565 MAFFT (Katoh and Standley, 2013) and trimmed at the N- and C-terminal ends as needed (in 566 cases of poor alignment) to avoid overestimation of branch lengths. In rare cases, internal 567 trimming was required for the same reason. In two cases, nuclear sequences from one species 568 were paired with a ClpP1 sequence from a closely related species (Acacia aulacocarpa was 569 paired with A. ligulata, and Mimulus guttatus was paired with Erythranthe lutea) due to the lack 570 of a published plastome (Table S3). 571 We estimated branch lengths both for individual proteins and for sets of concatenated 572 amino-acid sequences using codeml with the LG substitution matrix and a gamma distribution. 573 The concatenated sets of sequences were 1) all nuclear-encoded core Clp proteins, 2) all nuclear- 574 encoded non-Clp proteins, and 3) two randomly divided halves of the 20 nuclear-encoded non- bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

575 Clp proteins (10 proteins each). We used correlation analysis to compare the branch lengths of 576 ClpP1 and the concatenated nuclear-encoded Clp proteins, each normalized by dividing by the 577 branch length of one set of concatenated non-Clp proteins in that species. By using the two 578 different halves of the dataset for normalization, we avoided introducing statistical non- 579 independence between our variables in the correlation analysis. Only terminal branches were 580 used in the correlation analysis, with the exception of the grasses (Oryza sativa and Sorghum 581 bicolor). In that case, ClpP1 acceleration occurred before the split between the two species; thus, 582 the branch leading to the grasses was used in place of the two terminal branches of those species. 583 We log-transformed the normalized branch lengths for ClpP1 and the concatenation of nuclear- 584 encoded Clp subunits and calculated the Pearson correlation coefficient across branches with R 585 v3.4.1. 586 587 Analysis of indels 588 To determine the relationship between rates of amino-acid substitution and rates of indels 589 in clpP1, we used the codon-based alignment for the small dataset. Indels were coded using the 590 modified complex coding option in SeqState v1.0 (Müller, 2005). The indel data were visualized 591 in Mesquite v3.31 using “Trace Character History à Parsimony Ancestral States.” This 592 visualization plotted indel states on each branch of the constraint tree at each indel site. Using 593 these plots, we counted the number of novel indels on each branch of the tree. In this case, all 594 novel indels occurred on terminal branches. Correlation analysis was performed similarly to the 595 plastid-nuclear Clp correlation described above. Once again, the substitution-based branch 596 lengths for ClpP1 were normalized with one concatenated set of non-Clp sequences. The branch- 597 specific ClpP1 indel counts were normalized with the branch lengths from the other concatenated 598 set of non-Clp sequences. We tested for a significant association between these normalized rates 599 of ClpP1 sequence and structural evolution, using a Spearman Rank Correlation analysis in R. 600 601 Proteome analysis of Clp core complexes in Silene species 602 Tissue collection and chloroplast isolation. A total of 60 g of leaf tissue was collected from 603 mature rosettes from four S. noctiflora individuals and from ten S. latifolia individuals. The S. 604 noctiflora individuals were derived from the BRP line previously used for mitochondrial genome 605 sequencing (Wu et al., 2015), and the S. latifolia individuals were derived from the line bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

606 previously used for plastid genome sequencing (Sloan, Alverson, Wu, et al., 2012). Plants were 607 grown in Fafard 2SV Mix supplemented with vermiculite and perlite in the Colorado State 608 University greenhouses with supplemental lighting on a 16/8-hr light/dark cycle and regular 609 watering and fertilization. Chloroplasts were isolated following a protocol based on van Wijk et 610 al. (van Wijk et al., 2007). In brief, rinsed leaf tissue was disrupted with a blender in 100 ml of 611 grinding buffer for each 10 g of tissue (50 mM HEPES-KOH pH 8.0, 330 mM sorbitol, 2 mM 612 EDTA, 5 mM ascorbic acid, 5 mM cysteine, 0.05% BSA) and filtered through two layers of 613 Miracloth. The resulting samples were centrifuged at 1300 x g for 4 min in a fixed-angle rotor. 614 Pellets were resuspended in wash buffer (50 mM HEPES-KOH pH 8.0, 330 mM sorbitol, 2 mM 615 EDTA), loaded onto 40%/85% Percoll step gradients, and centrifuged at 3750 x g for 10 min in a 616 swinging-bucket rotor. Intact chloroplasts were harvested from the interface of the step-gradient, 617 diluted in wash buffer, and centrifuged at 1300 x g for 3 min in a fixed-angle rotor. The resulting

618 chloroplast pellets were flash frozen in liquid nitrogen and stored at -80 °C. All isolation steps

619 were performed at 4 °C under dim green light.

620 621 Isolation of stromal protein fractions and native polyacrylamide gel electrophoresis (Native

622 PAGE). Frozen chloroplast pellets were resuspended in HEPES 50 mM pH 8, MgCl2 10 mM, 623 glycerol 15% and protease inhibitors, and the soluble stromal proteomes were isolated from the 624 resuspended, broken chloroplasts by centrifugation at 100,000 x g for 30 min at 4 °C. The 625 supernatant containing the chloroplast soluble proteomes were concentrated by Amicon 10 kDa 626 filter units and proteins were quantified by BCA Protein Assay Kit (Thermo Fisher). For light- 627 blue native PAGE analysis, 50 µg of stromal protein of each species in 50 mM BisTris-HCl, 50 628 mM NaCl, 10% w/v glycerol and 0.001% Ponceau S (pH 7.2) was loaded per lane using the 629 NativePage Novex gel system with precast 4 - 16% acrylamide Bis-Tris gels (Invitrogen). The 630 upper buffer contained 0.002% Coomassie G. A total of three lanes were loaded for each species. 631 632 Mass spectrometry (MS) and Data analysis. Each gel lane was cut into 19 bands followed by 633 reduction, alkylation, and in-gel digestion with trypsin as described in (Shevchenko et al., 2006; 634 Friso et al., 2011). For replicate 1, we used one gel lane for each species, whereas we pooled two 635 gel lanes for replicate 2 to increase protein identifications and sequence coverage. The 636 resuspended peptide extracts were analyzed by data-dependent MS/MS using an on-line LC- bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

637 LTQ-Orbitrap (Thermo Electron Corp.) with details as described in (Kim et al., 2015). Hence, a 638 total of 38 MS/MS runs for each species was carried out. MS data searching against assembled 639 databases for S. noctiflora (88,166 sequences; 20,816,406 residues) and S. latifolia (101,108 640 sequences; 20,447,864 residues) was done using Mascot, followed by filtering, grouping of 641 closely related sequences based on matched MS/MS spectra and quantification based on 642 normalized AdjSPC (NadjSPC) as in (Friso et al., 2011; Kim et al., 2015). For each species, 643 databases were a merger of annotated proteins from organellar genomes (Sloan, Alverson, 644 Chuckalovcak, et al., 2012; Sloan, Alverson, Wu, et al., 2012) and protein sequence predictions 645 generated by TransDecoder (Haas et al., 2013) from transcriptome assemblies (Sloan, Triant, 646 Wu, et al., 2014). Protein annotations are based on homology to Arabidopsis and taken from the 647 Plant Proteome Data Base (http://ppdb.tc.cornell.edu/). 648 To determine the assembly state of ClpP1 and other ClpPRT subunits, native masses 649 were calibrated with endogenous stromal complexes for which we and/or others previously 650 determined the native mass in Arabidopsis, in particular CPN60 (800 kDa), RUBISCO 651 holocomplex (550 kDa), glutamate-ammonia ligase (GS2; 240 kDa), CLPC/D (200 kDa), 652 transketolase-1 (TKL-1; 150 kDa), thiazole biosynthetic enzyme 1 (THI1; 245 kDa), and 653 metalloprotease PREP1/2 (110 kDa) (see (Peltier et al., 2006) and references therein). 654

655 Acknowledgements 656 We thank Kate Rockenbach for assistance with implementing chloroplast isolation techniques in 657 Silene. We thank Dr. Nazmul Bhuiyan for native gel analysis of stromal proteomes and Dr. Qi 658 Sun from Computational Biology Service Unit (CBSU) at Cornell University for his support 659 with the S. noctiflora and S. latifolia databases, database searches, and protein annotations. We 660 thank Andrea Del Cortona for providing Boodlea composita sequence data. This work was 661 supported by a National Science Foundation (NSF) Graduate Research Fellowship (DGE- 662 1321845 to A.M.W.), NSF MCB-1733227 to D.B.S., and NSF MCB-1614629 to K.J.V.W. 663 A.M.W. was also supported by a Department of Education GAANN Fellowship. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

References

Adams, K.L., Daley, D.O., Whelan, J. and Palmer, J.D. (2002) Genes for two mitochondrial ribosomal proteins in flowering plants are derived from their chloroplast or cytosolic counterparts. Plant Cell, 14, 931–943.

Adams, K.L., Qiu, Y.-L., Stoutemyer, M. and Palmer, J.D. (2002) Punctuated evolution of mitochondrial gene content: High and variable rates of mitochondrial gene loss and transfer to the nucleus during angiosperm evolution. Proc. Natl. Acad. Sci., 99, 9905– 9912.

Andersson, F.I., Tryggvesson, A., Sharon, M., et al. (2009) Structure and function of a novel type of ATP-dependent Clp protease. J. Biol. Chem., 284, 13519–13532.

Apitz, J., Nishimura, K., Schmied, J., Wolf, A., Hedtke, B., Wijk, K.J. van and Grimm, B. (2016) Posttranslational Control of ALA Synthesis Includes GluTR Degradation by Clp Protease and Stabilization by GluTR-Binding Protein. Plant Physiol., 170, 2040–2051.

Barnard-Kubow, K.B., Sloan, D.B. and Galloway, L.F. (2014) Correlation between sequence divergence and polymorphism reveals similar evolutionary mechanisms acting across multiple timescales in a rapidly evolving plastid genome. BMC Evol. Biol., 14. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4300152/ [Accessed December 28, 2017].

Blanchard, J.L. and Lynch, M. (2000) Organellar genes: why do they end up in the nucleus? Trends Genet., 16, 315–320.

Blazier, J.C., Ruhlman, T.A., Weng, M.-L., Rehman, S.K., Sabir, J.S.M. and Jansen, R.K. (2016) Divergence of RNA polymerase α subunits in angiosperm plastid genomes is mediated by genomic rearrangement. Sci. Rep., 6, 24595.

Bogdanova, V.S., Zaytseva, O.O., Mglinets, A.V., Shatskaya, N.V., Kosterin, O.E. and Vasiliev, G.V. (2015) Nuclear-Cytoplasmic Conflict in Pea (Pisum sativum L.) Is Associated with Nuclear and Plastidic Candidate Genes Encoding Acetyl-CoA Carboxylase Subunits. PLOS ONE, 10, e0119835.

Bromham, L., Cowman, P.F. and Lanfear, R. (2013) Parasitic plants have increased rates of molecular evolution across all three genomes. BMC Evol. Biol., 13, 126.

Clark, N.L. and Aquadro, C.F. (2010) A novel method to detect proteins evolving at correlated rates: identifying new functional relationships between coevolving proteins. Mol. Biol. Evol., 27, 1152–1161.

Clarke, A.K., MacDonald, T.M. and Sjögren, L.L.E. (2005) The ATP-dependent Clp protease in chloroplasts of higher plants. Physiol. Plant., 123, 406–412. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Cortona, A.D., Leliaert, F., Bogaert, K.A., et al. (2017) The Plastid Genome in Cladophorales Green Algae Is Encoded by Hairpin Chromosomes. Curr. Biol., 27, 3771-3782.e6.

Crooks, G.E., Hon, G., Chandonia, J.-M. and Brenner, S.E. (2004) WebLogo: A Sequence Logo Generator. Genome Res., 14, 1188–1190.

De Maio, N., Holmes, I., Schlötterer, C. and Kosiol, C. (2013) Estimating empirical codon hidden Markov models. Mol. Biol. Evol., 30, 725–736.

Delannoy, E., Fujii, S., Colas des Francs-Small, C., Brundrett, M. and Small, I. (2011) Rampant Gene Loss in the Underground Orchid Rhizanthella gardneri Highlights Evolutionary Constraints on Plastid Genomes. Mol. Biol. Evol., 28, 2077–2086.

Derrien, B., Majeran, W., Effantin, G., Ebenezer, J., Friso, G., Wijk, K.J. van, Steven, A.C., Maurizi, M.R. and Vallon, O. (2012) The purification of the Chlamydomonas reinhardtii chloroplast ClpP complex: additional subunits and structural features. Plant Mol. Biol., 80, 189–202.

Drouin, G., Daoud, H. and Xia, J. (2008) Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol. Phylogenet. Evol., 49, 827–831.

Duff, R.J. and Moore, F.B.-G. (2005) Pervasive RNA Editing Among Hornwort rbcL Transcripts Except Leiosporoceros. J. Mol. Evol., 61, 571–578.

Dutheil, J. and Galtier, N. (2007) Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol. Biol., 7, 242.

El Bakkouri, M., Rathore, S., Calmettes, C., et al. (2013) Structural insights into the inactive subunit of the apicoplast-localized caseinolytic protease complex of Plasmodium falciparum. J. Biol. Chem., 288, 1022–1031.

Erixon, P. and Oxelman, B. (2008) Whole-Gene Positive Selection, Elevated Synonymous Substitution Rates, Duplication, and Indel Evolution of the Chloroplast clpP1 Gene. PLOS ONE, 3, e1386.

Fajardo, D., Senalik, D., Ames, M., et al. (2013) Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content, and rearrangements revealed by next generation sequencing. Tree Genet. Genomes, 9, 489–498.

Fedhila, S., Msadek, T., Nel, P. and Lereclus, D. (2002) Distinct clpP Genes Control Specific Adaptive Responses in Bacillus thuringiensis. J. Bacteriol., 184, 5554–5562.

Freyer, R., Kiefer-Meyer, M.-C. and Kössel, H. (1997) Occurrence of plastid RNA editing in all major lineages of land plants. Proc. Natl. Acad. Sci., 94, 6285–6290. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Friso, G., Olinares, P.D.B. and Wijk, K.J. van (2011) The workflow for quantitative proteome analysis of chloroplast development and differentiation, chloroplast mutants, and protein interactions by spectral counting. Methods Mol. Biol. Clifton NJ, 775, 265–282.

Gertz, E.M., Yu, Y.-K., Agarwala, R., Schäffer, A.A. and Altschul, S.F. (2006) Composition- based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biol., 4, 41.

Guisinger, M.M., Chumley, T.W., Kuehl, J.V., Boore, J.L. and Jansen, R.K. (2010) Implications of the Plastid Genome Sequence of Typha (Typhaceae, Poales) for Understanding Genome Evolution in Poaceae. J. Mol. Evol., 70, 149–166.

Guisinger, M.M., Kuehl, J.V., Boore, J.L. and Jansen, R.K. (2008) Genome-wide analyses of Geraniaceae plastid DNA reveal unprecedented patterns of increased nucleotide substitutions. Proc. Natl. Acad. Sci., 105, 18424–18429.

Guo, W., Grewe, F., Fan, W., Young, G.J., Knoop, V., Palmer, J.D. and Mower, J.P. (2016) Ginkgo and Welwitschia Mitogenomes Reveal Extreme Contrasts in Gymnosperm Mitochondrial Evolution. Mol. Biol. Evol., 33, 1448–1460.

Haas, B.J., Papanicolaou, A., Yassour, M., et al. (2013) De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat. Protoc., 8. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/ [Accessed August 7, 2018].

Haberle, R.C., Fourcade, H.M., Boore, J.L. and Jansen, R.K. (2008) Extensive rearrangements in the chloroplast genome of Trachelium caeruleum are associated with repeats and tRNA genes. J. Mol. Evol., 66, 350–361.

Havird, J.C. and Sloan, D.B. (2016) The Roles of Mutation, Selection, and Expression in Determining Relative Rates of Evolution in Mitochondrial versus Nuclear Genomes. Mol. Biol. Evol., 33, 3042–3053.

Havird, J.C., Trapp, P., Miller, C.M., Bazos, I. and Sloan, D.B. (2017) Causes and Consequences of Rapidly Evolving mtDNA in a Plant Lineage. Genome Biol. Evol., 9, 323–336.

Havird, J.C., Whitehill Nicholas S., Snow Christopher D. and Sloan Daniel B. (2015) Conservative and compensatory evolution in oxidative phosphorylation complexes of angiosperms with highly divergent rates of mitochondrial genome evolution. Evolution, 69, 3069–3081.

Hein, A. and Knoop, V. (2018) Expected and unexpected evolution of plant RNA editing factors CLB19, CRR28 and RARE1: retention of CLB19 despite a phylogenetically deep loss of its two known editing targets in Poaceae. BMC Evol. Biol., 18, 85.

Hirao, T., Watanabe, A., Kurita, M., Kondo, T. and Takata, K. (2008) Complete nucleotide sequence of the Cryptomeria japonicaD. Don. chloroplast genome and comparative bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

chloroplast genomics: diversified genomic structure of coniferous species. BMC Plant Biol., 8, 70.

Huang, C., Wang, S., Chen, L., Lemieux, C., Otis, C., Turmel, M. and Liu, X.-Q. (1994) The Chlamydomonas chloroplast clpP gene contains translated large insertion sequences and is essential for cell growth. Mol. Gen. Genet. MGG, 244, 151–159.

Jansen, R.K., Cai, Z., Raubeson, L.A., et al. (2007) Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns. Proc. Natl. Acad. Sci., 104, 19369–19374.

Juan, D. de, Pazos, F. and Valencia, A. (2013) Emerging methods in protein co-evolution. Nat. Rev. Genet., 14, 249–261.

Katoh, K. and Standley, D.M. (2013) MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol., 30, 772–780.

Kim, J., Kimber, M.S., Nishimura, K., Friso, G., Schultz, L., Ponnala, L. and Wijk, K.J. van (2015) Structures, Functions, and Interactions of ClpT1 and ClpT2 in the Clp Protease System of Arabidopsis Chloroplasts. Plant Cell, 27, 1477–1496.

Kim, J., Olinares, P.D., Oh, S., Ghisaura, S., Poliakov, A., Ponnala, L. and Wijk, K.J. van (2013) Modified Clp Protease Complex in the ClpP3 Null Mutant and Consequences for Chloroplast Development and Function in Arabidopsis. Plant Physiol., 162, 157–179.

Kim, J., Rudella, A., Rodriguez, V.R., Zybailov, B., Olinares, P.D.B. and Wijk, K.J. van (2009) Subunits of the Plastid ClpPR Protease Complex Have Differential Contributions to Embryogenesis, Plastid Biogenesis, and Plant Development in Arabidopsis. Plant Cell, 21, 1669–1692.

Knie, N., Grewe, F., Fischer, S. and Knoop, V. (2016) Reverse U-to-C editing exceeds C-to-U RNA editing in some ferns – a monilophyte-wide comparison of chloroplast and mitochondrial RNA editing suggests independent evolution of the two processes in both organelles. BMC Evol. Biol., 16, 134.

Knox, E.B. (2014) The dynamic history of plastid genomes in the Campanulaceae sensu lato is unique among angiosperms. Proc. Natl. Acad. Sci., 111, 11097–11102.

Koussevitzky, S., Stanne, T.M., Peto, C.A., Giap, T., Sjögren, L.L.E., Zhao, Y., Clarke, A.K. and Chory, J. (2007) An Arabidopsis thaliana virescent mutant reveals a role for ClpR1 in plastid development. Plant Mol. Biol., 63, 85–96.

Krause, K. (2008) From chloroplasts to “cryptic” plastids: evolution of plastid genomes in parasitic plants. Curr. Genet., 54, 111.

Kugita, M., Yamamoto, Y., Fujikawa, T., Matsumoto, T. and Yoshinaga, K. (2003) RNA editing in hornwort chloroplasts makes more than half the genes functional. Nucleic Acids Res., 31, 2417–2423. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Kumar, S., Stecher, G. and Tamura, K. (2016) MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol., 33, 1870–1874.

Kuroda, H. and Maliga, P. (2003) The plastid clpP1 protease gene is essential for plant development. Nature, 425, 86.

Lemieux, C., Vincent, A.T., Labarre, A., Otis, C. and Turmel, M. (2015) Chloroplast phylogenomic analysis of chlorophyte green algae identifies a novel lineage sister to the Sphaeropleales (Chlorophyceae). BMC Evol. Biol., 15, 264.

Li, J., Zhang, Z., Vang, S., Yu, J., Wong, G.K.-S. and Wang, J. (2009) Correlation between Ka/Ks and Ks is related to substitution model and evolutionary lineage. J. Mol. Evol., 68, 414–423.

Li, Y., Zhang, R., Liu, S., et al. (2017) The molecular evolutionary dynamics of oxidative phosphorylation (OXPHOS) genes in Hymenoptera. BMC Evol. Biol., 17, 269.

Liao, Friso, G. and Wijk, K.J. van (submitted).

Magee, A.M., Aspinall, S., Rice, D.W., et al. (2010) Localized hypermutation and associated gene losses in legume chloroplast genomes. Genome Res., 20, 1700–1710.

Majeran, W., Friso, G., Wijk, K.J. van and Vallon, O. (2005) The chloroplast ClpP complex in Chlamydomonas reinhardtii contains an unusual high molecular mass subunit with a large apical domain. FEBS J., 272, 5558–5571.

Majeran, W., Wollman, F.-A. and Vallon, O. (2000) Evidence for a Role of ClpP in the Degradation of the Chloroplast Cytochrome b6f Complex. Plant Cell, 12, 137–149.

Millen, R.S., Olmstead, R.G., Adams, K.L., et al. (2001) Many Parallel Losses of infA from Chloroplast DNA during Angiosperm Evolution with Multiple Independent Transfers to the Nucleus. Plant Cell, 13, 645–658.

Molina, J., Hazzouri, K.M., Nickrent, D., et al. (2014) Possible Loss of the Chloroplast Genome in the Parasitic Rafflesia lagascae (). Mol. Biol. Evol., 31, 793–803.

Morris, J.L., Puttick, M.N., Clark, J.W., et al. (2018) The timescale of early land plant evolution. Proc. Natl. Acad. Sci., 115, E2274–E2283.

Müller, K. (2005) SeqState. Appl. Bioinformatics, 4, 65–69.

Mullet, J.E. (1993) Dynamic regulation of chloroplast transcription. Plant Physiol., 103, 309– 313.

Nishimura, K., Kato, Y. and Sakamoto, W. (2017) Essentials of Proteolytic Machineries in Chloroplasts. Mol. Plant, 10, 4–19. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Nishimura, K. and Wijk, K.J. van (2015) Organization, function and substrates of the essential Clp protease system in plastids. Biochim. Biophys. Acta BBA - Bioenerg., 1847, 915–930.

Olinares, P.D.B., Kim, J., Davis, J.I. and Wijk, K.J. van (2011) Subunit stoichiometry, evolution, and functional implications of an asymmetric plant plastid ClpP/R protease complex in Arabidopsis. Plant Cell, 23, 2348–2361.

Olinares, P.D.B., Kim, J. and Wijk, K.J. van (2011) The Clp protease system; a central component of the chloroplast protease network. Biochim. Biophys. Acta BBA - Bioenerg., 1807, 999–1011.

Park, S., Ruhlman, T.A., Weng, M.-L., Hajrah, N.H., Sabir, J.S.M. and Jansen, R.K. (2017) Contrasting Patterns of Nucleotide Substitution Rates Provide Insight into Dynamic Evolution of Plastid and Mitochondrial Genomes of Geranium. Genome Biol. Evol., 9, 1766–1780.

Parkinson, C.L., Mower, J.P., Qiu, Y.-L., Shirk, A.J., Song, K., Young, N.D., dePamphilis, C.W. and Palmer, J.D. (2005) Multiple major increases and decreases in mitochondrial substitution rates in the plant family Geraniaceae. BMC Evol. Biol., 5, 73.

Peltier, J.-B., Cai, Y., Sun, Q., Zabrouskov, V., Giacomelli, L., Rudella, A., Ytterberg, A.J., Rutschow, H. and Wijk, K.J. van (2006) The Oligomeric Stromal Proteome of Arabidopsis thaliana Chloroplasts. Mol. Cell. Proteomics, 5, 114–133.

Peltier, J.-B., Ripoll, D.R., Friso, G., Rudella, A., Cai, Y., Ytterberg, J., Giacomelli, L., Pillardy, J. and Wijk, K.J. van (2004) Clp Protease Complexes from Photosynthetic and Non-photosynthetic Plastids and Mitochondria of Plants, Their Predicted Three- dimensional Structures, and Functional Implications. J. Biol. Chem., 279, 4768–4781.

Peltier, J.-B., Ytterberg, J., Liberles, D.A., Roepstorff, P. and Wijk, K.J. van (2001) Identification of a 350-kDa ClpP Protease Complex with 10 Different Clp Isoforms in Chloroplasts of Arabidopsis thaliana. J. Biol. Chem., 276, 16318–16327.

Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C. and Ferrin, T.E. (2004) UCSF Chimera—A visualization system for exploratory research and analysis - Pettersen - 2004 - Journal of Computational Chemistry - Wiley Online Library. J. Comput. Chem.

Preston, B.D. (1996) Error-prone retrotransposition: rime of the ancient mutators. Proc. Natl. Acad. Sci., 93, 7427–7431.

Raubeson, L.A. and Jansen, R.K. (2005) Chloroplast genomes of plants. In Plant diversity and evolution: genotypic and phenotypic variation in higher plants. Wallingford (UK): CABI, pp. 45–68.

Rautenberg, A., Sloan, D.B., Aldén, V. and Oxelman, B. (2012) Phylogenetic Relationships of Silene multinervia and Silene Section Conoimorpha (). Syst. Bot., 37, 226–237. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Rockenbach, K., Havird, J.C., Monroe, J.G., Triant, D.A., Taylor, D.R. and Sloan, D.B. (2016) Positive Selection in Rapidly Evolving Plastid–Nuclear Enzyme Complexes. Genetics, 204, 1507–1522.

Rombel, I.T., Sykes, K.F., Rayner, S. and Johnston, S.A. (2002) ORF-FINDER: a vector for high-throughput gene identification. Gene, 282, 33–41.

Sabot, F. and Schulman, A.H. (2006) Parasitism and the retrotransposon life cycle in plants: a hitchhiker’s guide to the genome. Heredity, 97, 381–388.

Sanitá Lima, M. and Smith, D.R. (2017) Pervasive Transcription of Mitochondrial, Plastid, and Nucleomorph Genomes across Diverse Plastid-Bearing Species. Genome Biol. Evol., 9, 2650–2657.

Schelin, J., Lindmark, F. and Clarke, A.K. (2002) The clpP multigene family for the ATP- dependent Clp protease in the cyanobacterium Synechococcus. Microbiol. Read. Engl., 148, 2255–2265.

Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V. and Mann, M. (2006) In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc., 1, 2856– 2860.

Sloan, D.B., Alverson, A.J., Chuckalovcak, J.P., Wu, M., McCauley, D.E., Palmer, J.D. and Taylor, D.R. (2012) Rapid Evolution of Enormous, Multichromosomal Genomes in Flowering Plant Mitochondria with Exceptionally High Mutation Rates. PLOS Biol., 10, e1001241.

Sloan, D.B., Alverson, A.J., Wu, M., Palmer, J.D. and Taylor, D.R. (2012) Recent Acceleration of Plastid Sequence and Structural Evolution Coincides with Extreme Mitochondrial Divergence in the Angiosperm Genus Silene. Genome Biol. Evol., 4, 294– 306.

Sloan, D.B., MacQueen, A.H., Alverson, A.J., Palmer, J.D. and Taylor, D.R. (2010) Extensive Loss of RNA Editing Sites in Rapidly Evolving Silene Mitochondrial Genomes: Selection vs. Retroprocessing as the Driving Force. Genetics, 185, 1369–1380.

Sloan, D.B., Oxelman, B., Rautenberg, A. and Taylor, D.R. (2009) Phylogenetic analysis of mitochondrial substitution rate variation in the angiosperm tribe Sileneae. BMC Evol. Biol., 9, 260.

Sloan, D.B., Triant, D.A., Forrester, N.J., Bergner, L.M., Wu, M. and Taylor, D.R. (2014) A recurring syndrome of accelerated plastid genome evolution in the angiosperm tribe Sileneae (Caryophyllaceae). Mol. Phylogenet. Evol., 72, 82–89.

Sloan, D.B., Triant, D.A., Wu, M. and Taylor, D.R. (2014) Cytonuclear interactions and relaxed selection accelerate sequence evolution in organelle ribosomes. Mol. Biol. Evol., 31, 673–682. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Smith, D.R. (2009) Unparalleled GC content in the plastid DNA of Selaginella. Plant Mol. Biol., 71, 627.

Smith, D.R. and Keeling, P.J. (2015) Mitochondrial and plastid genome architecture: Reoccurring themes, but significant differences at the extremes. Proc. Natl. Acad. Sci., 112, 10177–10184.

Smith, D.R. and Lee, R.W. (2014) A Plastid without a Genome: Evidence from the Nonphotosynthetic Green Algal Genus Polytomella. Plant Physiol., 164, 1812–1819.

Sobanski, J., Giavalisco, P., Fischer, A., et al. (2018) Biparental inheritance of chloroplasts is controlled by lipid biosynthesis. bioRxiv, 330100.

Stanne, T.M., Pojidaeva, E., Andersson, F.I. and Clarke, A.K. (2007) Distinctive Types of ATP-dependent Clp Proteases in Cyanobacteria. J. Biol. Chem., 282, 14394–14402.

Stoike, L.L. and Sears, B.B. (1998) Plastome Mutator–Induced Alterations Arise in Oenothera Chloroplast DNA Through Template Slippage. Genetics, 149, 347–353.

Stoletzki, N. and Eyre-Walker, A. (2011) The positive correlation between dN/dS and dS in mammals is due to runs of adjacent substitutions. Mol. Biol. Evol., 28, 1371–1380.

Straub, S.C., Fishbein, M., Livshultz, T., Foster, Z., Parks, M., Weitemier, K., Cronn, R.C. and Liston, A. (2011) Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing. BMC Genomics, 12, 211.

Tillich, M., Funk, H.T., Schmitz-Linneweber, C., Poltnigg, P., Sabater, B., Martin, M. and Maier, R.M. (2005) Editing of plastid RNA in Arabidopsis thaliana ecotypes. Plant J., 43, 708–715.

Traverse, C.C. and Ochman, H. (2016) Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles. Proc. Natl. Acad. Sci., 113, 3311–3316.

Tsudzuki, T., Wakasugi, T. and Sugiura, M. (2001) Comparative Analysis of RNA Editing Sites in Higher Plant Chloroplasts. J. Mol. Evol., 53, 327–332.

Venkat, A., Hahn, M.W. and Thornton, J.W. (2017) Multinucleotide mutations cause false inferences of positive selection. bioRxiv, 165969.

Welsch, R., Zhou, X., Yuan, H., et al. (2018) Clp Protease and OR Directly Control the Proteostasis of Phytoene Synthase, the Crucial Enzyme for Carotenoid Biosynthesis in Arabidopsis. Mol. Plant, 11, 149–162.

Weng, M.-L., Blazier, J.C., Govindu, M. and Jansen, R.K. (2014) Reconstruction of the Ancestral Plastid Genome in Geraniaceae Reveals a Correlation between Genome Rearrangements, Repeats, and Nucleotide Substitution Rates. Mol. Biol. Evol., 31, 645– 659. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Weng, M.-L., Ruhlman, T.A. and Jansen, R.K. (2016) Plastid–Nuclear Interaction and Accelerated Coevolution in Plastid Ribosomal Genes in Geraniaceae. Genome Biol. Evol., 8, 1824–1838.

Wickett, N.J., Mirarab, S., Nguyen, N., et al. (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. U. S. A., 111, E4859-4868.

Wijk, K.J. van, Peltier, J.-B. and Giacomelli, L. (2007) Isolation of Chloroplast Proteins from Arabidopsis thaliana for Proteome Analysis. In Plant Proteomics. Methods in Molecular Biology. Humana Press, pp. 43–48. Available at: https://link.springer.com/protocol/10.1385/1-59745-227-0:43 [Accessed April 20, 2018].

Williams, A.V., Boykin, L.M., Howell, K.A., Nevill, P.G. and Small, I. (2015) The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene. PLOS ONE, 10, e0125768.

Wolf, J.B.W., Künstner, A., Nam, K., Jakobsson, M. and Ellegren, H. (2009) Nonlinear Dynamics of Nonsynonymous (dN) and Synonymous (dS) Substitution Rates Affects Inference of Selection. Genome Biol. Evol., 1, 308–319.

Wolfe, K.H., Li, W.H. and Sharp, P.M. (1987) Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. U. S. A., 84, 9054–9058.

Wolfe, K.H., Morden, C.W. and Palmer, J.D. (1992) Function and evolution of a minimal plastid genome from a nonphotosynthetic parasitic plant. Proc. Natl. Acad. Sci., 89, 10648–10652.

Wu, Z., Cuthbert, J.M., Taylor, D.R. and Sloan, D.B. (2015) The massive mitochondrial genome of the angiosperm Silene noctiflora is evolving by gain or loss of entire chromosomes. Proc. Natl. Acad. Sci., 112, 10185–10191.

Yan, Z., Ye, G. and Werren, J. (2018) Evolutionary rate coevolution between mitochondria and mitochondria-associated nuclear-encoded proteins in insects. bioRxiv, 288456.

Yang, Z. (2007) PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol., 24, 1586–1591.

Yao, X., Tang, P., Li, Z., Li, D., Liu, Y. and Huang, H. (2015) The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis. PLOS ONE, 10, e0129347.

Yeang, C.-H. and Haussler, D. (2007) Detecting Coevolution in and among Protein Domains. PLoS Comput. Biol., 3. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2098842/ [Accessed June 1, 2018].

Yu, A.Y.H. and Houry, W.A. (2007) ClpP: A distinctive family of cylindrical energy- dependent serine proteases. FEBS Lett., 581, 3749–3757. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Zeiler, E., List, A., Alte, F., Gersch, M., Wachtel, R., Poreba, M., Drag, M., Groll, M. and Sieber, S.A. (2013) Structural and functional insights into caseinolytic proteases reveal an unprecedented regulation principle of their catalytic triad. Proc. Natl. Acad. Sci. U. S. A., 110, 11302–11307.

Zhang, J., Nielsen, R. and Yang, Z. (2005) Evaluation of an Improved Branch-Site Likelihood Method for Detecting Positive Selection at the Molecular Level. Mol. Biol. Evol., 22, 2472–2479.

Zhang, J., Ruhlman, T.A., Sabir, J., Blazier, J.C. and Jansen, R.K. (2015) Coordinated Rates of Evolution between Interacting Plastid and Nuclear Genes in Geraniaceae. Plant Cell, 27, 563–573.

Zhang, J., Ruhlman, T.A., Sabir, J.S.M., Blazier, J.C., Weng, M.-L., Park, S. and Jansen, R.K. (2016) Coevolution between Nuclear-Encoded DNA Replication, Recombination, and Repair Genes and Plastid Genome Complexity. Genome Biol. Evol., 8, 622–634.

Zhang, Y., Ma, J., Yang, B., Li, R., Zhu, W., Sun, L., Tian, J. and Zhang, L. (2014) The complete chloroplast genome sequence of Taxus chinensis var. mairei (Taxaceae): loss of an inverted repeat region and comparative analysis with related species. Gene, 540, 201– 209.

Zheng, B., MacDonald, T.M., Sutinen, S., Hurry, V. and Clarke, A.K. (2006) A nuclear- encoded ClpP subunit of the chloroplast ATP-dependent Clp protease is essential for early development in Arabidopsis thaliana. Planta, 224, 1103–1115.

Zhu, A., Guo, W., Jain, K. and Mower, J.P. (2014) Unprecedented Heterogeneity in the Synonymous Substitution Rate within a Plant Genome. Mol. Biol. Evol., 31, 1228–1236.

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 1. Examples of clpP1 loss or putative pseudogenization

Genus or species Classification clpP1 status Evidence

Bathycoccus prasinos Green alga Lost Boodlea composita Green alga Lost (possibly transferred to nucleus) Helicosporidium sp. Holoparasitic Lost green alga Pilostyles aethiopica Holoparasitic Lost angiosperm Pilostyles hamiltonii Holoparasitic Lost angiosperm Hydnora visseri Holoparasitic Lost angiosperm Sciaphila thaidanica Mycoheterotrophic Lost angiosperm Epipremnum aureum Angiosperm Apparent Large truncation/structural change pseudogenization Hanabusaya asiatica Angiosperm Apparent Loss of first exon pseudogenization Phelipanche purpurea Holoparasitic Apparent Loss of first exon angiosperm pseudogenization Actinidia (except Angiosperm Apparent Loss of first exon tetramera) pseudogenization *note: Scaevola has also been suggested to lack clpP1 (Jansen et al., 2007), but does not have a complete plastome assembly. bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 1: Comparison of evolutionary rates between ClpP1 and PsaA across green plants. Branch lengths represent amino acid substitutions per site. The species sampling between the trees is nearly identical (see Main Text for description of differences). Taxon names are included for select “fast” branches in the ClpP1 tree. See Figure S1 for a tree with full species labeling.

ClpP1 Gymnosperms Ferns Conifers Lychophytes Gnetales Bryophytes Charophytes

Trithuria

Poaceae PsaA Eustrephus Chlorophytes Stichococcus

Vanilla

Chlorophyceae Najas Oenothera Berberis Epimedium

Geraniales

Aquiliaria

Silene

Lonicera Vaccinium

Campanulaceae

Fabaceae

Angiosperms Jasminum

Plantago Orobanche 0.4

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 2: Intron gain and loss in clpP1 across green plants. Number of species sampled is included parenthetically for each group. Columns contain the number of each type of loss in each group. MRCA, most recent common ancestor.

MRCA with all Streptophytes Gain of First Intron MRCA with Charales/Coleochaetales Gain of Second Intron MRCA with Zygnematophyceae

Intron Loss Events Bryophytes (14) Lycophytes (3) Ferns (18) Gymnosperms (73) Angiosperms (795) • First Intron Only 0 0 0 0 3 • Second Intron Only 0 1 1 0 9 • Both Introns 0 0 0 1 17

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 3: Native-gel and mass spectrometry analysis of Silene plastid proteins. A) LB-Native- PAGE performed on stromal protein fraction from S. noctiflora and S. latifolia. Red lines indicate approximate positions of gel slices for MS/MS analysis, but native masses were more finely calibrated with known stromal complexes. B) AdjSPC for subunits of the Clp R ring, including ClpP1. Triangles indicate the gel slice corresponding to peak detection for native complexes used for internal calibration. For more details see Figure S7 and Supplemental Dataset 1.

A B Silene noctiflora Silene latifolia Silene noctiflora Silene latifolia Rep 1 Rep 2 Rep 1 Rep 2 40 Replicate 1 kDa 30 1 1048 2 20 3 4 10 720 5 6 0 7 480 8 9 40

10 Replicate 2 242 11 30 12 20 146 13

14 Adjusted Spectral Count (AdjSPC) 10 15 0 66 16 1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 19 17 Gel Slice 18 ClpP1 CLPR1 CLPR2 CLPR3 CLPR4 19 20 CPN60 Rubisco THI1 GS2 TKL−1 PREP1/2 800 kDa 550 kDa 2XX kDa 2XX kDa 150 kDa 110 kDa bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 4: Rate variation across ClpP1. a) Sliding window analysis of rate variation across a diverse subsample of angiosperms, using a window size of 21 aa and tree length measured as amino acid substitutions per site. Normalized tree lengths (bottom plot) were calculated by dividing each window by the average tree length of the entire protein. b) WebLogo representation of sequence conservation across green plants. The size of the letter at each amino acid position is indicative of the level of conservation.

A B Accelerated Background 4.0 60 3.0 2.0 bits 1.0 S S V EV L V D DE L A N Q SSQ V K IG V ER QI I C PPD H I E L D E FG R A N LTPT HF T A G E T C S K QQ K I Y V L I LM G T SP D V L F FY H M R A P Y G V S M T F D L K S L ED R V I I G I A Y R Y D E IF G I Q FR GE I R E L 0.0 PIGV KVP P D YN L R L L 40 R F M 5 P 10 15 20W 25 30 35 40 4.0 3.0 2.0

ree Length 20 bits

T W S S V D S I 1.0 A T I S S PS E L L K I IG E D I NK I A A V T N M F R GI K V G H M L Q R HY F T S N F F D E IQK F VC L L F L C V L H A MM T I E NMD A N T N V S N L I Q S S L G A M QQ R L D Q T L L A T CK Q VI G L H H D V I D M AA I K SLVS G I V A I L A T QI G M Y CP V M L 0.0 E N M YL E NS GG G 45 50 55 60 65 I 70 75 80 0 4.0 20 40 60 80 100 120 140 160 180 3.0 2.0 bits H Ax. Loops Head 1 Handle Head 2 1.0 R Q V V L P M Y T R Q T T I P F V A K E Y T V G F R EF N A S LT I T A L G R V G R L DL E N I F Y I PD T L A HI T L K Q V L G V L S V S V V A I S G V P N L A H S T V ES F I S L S A M L G T I H I V N AS V I A Y Q IV V IC TA IT 0.0 M G AASM S L GG K A PH D T 105 R 2.0 85 90 95 100 110 115 120 4.0 1.5 3.0 2.0 bits L

ree Length A F Q R 1.0 EA C T V S R IV I Q G T F L DC S D E A Q M Q KV E FY V L S M P GS Q T E E M R K R Y G L H S L S S G D H RS A M N L A I D K K A RV T G I Y K V I C L M A P V L S A A F Q R S P R A IKG W S L V T H N L V Y T D S E D K L I T CT M MK A I AL T S I II I G E L M M F V M DN ML 1.0 A I E E V V 0.0 R I Q E R ed P M125 130 135 140 145 150 155 Y 160 z H 0.5 4.0 3.0 2.0 Normali 0.0 bits V Y 1.0 K S Q R I ET D N E E A IVQ H R Q R E Q D F I S Y L E ID E L T E V G L T A D L K KD R QK RW K L F G V K S K D N V F D S A S G R A IT V K Q L A QT S T L A N MN D V R DK T F S Q I V Y L E L V P I A 20 40 60 80 100 120 140 160 180 R G P L S T H ILVA IV 0.0 T I D RD FM A EA G D Amino Acid Position 165 170 175 180 185 190 195

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 5: Amino acid residues under selection in ClpP1, as determined via a PAML branch-site analysis across accelerated species. Residues with posterior probabilities between 0.95 and 0.99 are colored in orange, and residues with posterior probabilities of 0.99 and above are colored in red. The three catalytic sites of ClpP1 are colored in blue. Sites were plotted on the crystal structure of the E. coli ClpP (PDB accession 1YG6).

Axial loop

Ser Head His domain

Asp

Handle region bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 6: dN/dS values for clpP1 and psaA in a sample of 25 angiosperms. Species were designated as “slow” or “fast” based on ClpP1 amino acid substitution rates. The top and bottom of each box represent the upper and lower quartiles, respectively. The line contained within the box represents the median. The dotted lines connect the full range of points, apart from outliers, which are represented by dots.

1.5 1.0 S d / N d 0.5 0.0

clpP1: clpP1: psaA: psaA: Fast species Slow species Fast species Slow species

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 7: Comparison of evolutionary rates of ClpP1, the nuclear-encoded core plastid Clp subunits, and non-Clp-related nuclear-encoded genes. Species are in alternating colors for visibility. Branch lengths represent amino acid substitutions per site. a) Nuclear-encoded core plastid Clp subunits (n=8, concatenated), b) ClpP1, c) non-Clp nuclear-encoded proteins (n=20, concatenated), d) Scatterplot comparison of branch lengths. Normalization of both axes was achieved using independent sets of non-Clp nuclear-encoded proteins (n=10, concatenated).

A B Sorghum bicolor Oryza sativa Musa acuminata Cucumis sativus Prunus persica Lathyrus sativus Glycine max Medicago truncatula Acacia aulacocarpa / Acacia ligulata* Ricinus communis Populus trichocarpa Arabidopsis thaliana Gossypium raimondii Eucalyptus grandis Oenothera biennis Geranium maderense Vitis vinifera Mimulus guttatus / Erythranthe lutea* Plantago maritima Solanum lycopersicum Lobelia siphilitica Silene noctiflora Silene latifolia Liriodendron chinense Amborella trichopoda 0.2 0.3 C D ● ed)

z 10 h Length c ●

● ● ● ● ● ●

● ● 1 ● lear−Encoded Clp Bran (concatenated and normali c ● ● Nu ● ●● ● ● ●● ● ● ●●

1 10 100 0.04 Clpp1 Branch Length (normalized)

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Supplementary Materials

Figure S1: ClpP1 evolutionary rates across green plants. Tree is the same as shown in Figure 1 but includes species names for each branch. Branch length represents amino acid substitutions per site.

Gymnosperms

Metasequoia_glyptostroboides

Cupressus_sempervirens

C

Cunninghamia_lanceolata Taiwania_cryptomerioides

C a

H

Cephalotaxus_wilsoniana a Juniperus_bermudiana Amentotaxus_formosana l l

a

l Ferns e

l

i

t Cryptomeria_japonica Cupressus_gigantea l

s y

o r

s

o i p

h

c

s p e i

p

e

s i

i n s

r

o Araucaria_heterophylla Retrophyllum_piresii d g

o

r s

i e

r

Osmundastrum_cinnamomeum r s Adiantum_capillus-veneris Podocarpus_totara i

c y Cyrtomium_devexiscapulae n u

e Woodwardia_unigemmata Torreya_fargesii _

a

y

g

e s Myriopteris_lindheimeriLygodium_japonicum r b

n

r

Taxus_mairei p

_ h

o

n Agathis_dammara o

a

Wollemia_nobilis a

c f k

u t o

o _

r

i

_

i h

t Marsilea_crenata r a

s s t

k s Lychophytes m

Ophioglossum_californicum Nageia_nagi y

_

Diplopterygium_glaucum _ a _ Pteridium_aquilinum u

a

o g a s

t

b

e

h

Alsophila_spinulosa s l e u

t o a

n

a r c n

a

b i t i

s

n

r

C P i S P Abies_koreana a

a Cedrus_deodara s Keteleeria_davidiana Mankyua_chejuensis Larix_decidua Pseudotsuga_sinensis Selaginella_moellendorffii Ephedra_equisetina Welwitschia_mirabilis Psilotum_nudum Gnetum_gnemon Gnetum_montanum Gnetum_ula Angiopteris_evecta Bowenia_serrulata Equisetum_arvense Cycas_taitungensis Ceratozamia_hildae Dioon_spinulosum Encephalartos_lehmannii Marchantia_polymorpha Lepidozamia_peroffskyana Bryophytes Macrozamia_mountperriensis Stangeria_eriopus Zamia_furfuracea Isoetes_flaccida Ginkgo_biloba Huperzia_lucidula Amborella_trichopoda Ptilidium_pulcherrimum Trithuria_inconspicua Nuphar_advena Physcomitrella_patens Nymphaea_alba Pellia_endiviifolia Illicium_oligandrum Aneura_mirabilis Drimys_granadensis Piper_cenocladum Calycanthus_floridus Tetraplodon_fuegianus Machilus_yunnanensis Syntrichia_ruralis Liriodendron_tulipifera Sanionia_uncinata Magnolia_kwangsiensis Nyholmiella_obtusifoliaOrthotrichum_rogeri Chloranthus_spicatus Heliconia_collinsiana Nothoceros_aenigmaticus Musa_textilis Takakia_lepidozioidesTetraphis_pellucida Ravenala_madagascariensis Zygnema_circumcarinatum Curcuma_roscoeana Zingiber_spectabile Charophytes Mesotaenium_endlicherianumAnthoceros_angustus Ananas_comosus Typha_latifolia Carex_siderosticta_1 Chaetosphaeridium_globosumStaurastrum_punctulatum Carex_siderosticta_2 Carex_siderosticta_3 Anomochloa_marantoidea Pharus_lappulaceus Roya_anglica Klebsormidium_flaccidum Puelia_olyriformis Ferrocalamus_rimosivaginus Bambusa_oldhamii Oryza_sativa Chlorokybus_atmophyticusChara_vulgaris Bromus_vulgaris Interfilum_terricola Triticum_aestivum Mesostigma_viride Aristida_purpurea Xylochloris_irregularis Zea_mays Watanabea_reniformis Phragmites_australis Pseudochloris_wilhelmii Isachne_distichophylla Planctonema_lauterbornii Eragrostis_minor Danthonia_californica Syagrus_coronata Myrmecia_israelensisNeocystis_brevis Elaeis_guineensis Choricystis_parasitica Podococcus_barteri Botryococcus_braunii Calamus_caryotoides unclassified_TrebouxiophyceaeCoccomyxa_subellipsoidea Pseudophoenix_vinifera Paradoxia_multiseta Bismarckia_nobilis Colpothrinax_cookii Phoenix_dactylifera Sabal_domingensis Microthamnion_kuetzingianumPabia_signiensis Dasypogon_bromeliifolius Stichococcus_bacillarisKoliella_longiseta Allium_cepa Eustrephus_latifolius Polygonatum_cyrtonema Fusochloris_perforata Iris_gatesii Dictyochloropsis_reticulataElliptochloris_bilobata Cypripedium_macranthos Paphiopedilum_niveum Phragmipedium_longifolium Leptosira_terrestris Bletilla_striata Marvania_geminata Calanthe_triplicata Auxenochlorella_protothecoidesLobosphaera_incisa Cymbidium_aloifolium Chlorella_vulgaris Erycina_pusilla Oncidium_hybrid Dicloster_acuatus Corallorhiza_bulbosa Parachlorella_kessleri Cattleya_crispata Pyramimonas_parkeae Masdevallia_coccinea Pycnococcus_provasolii Dendrobium_catenatum Chlorophytes Picocystis_salinarum Neottia_nidus-avis Prasinoderma_coloniale Epipogium_roseum Nephroselmis_olivacea Elleanthus_sodiroi Nephroselmis_astigmatica Sobralia_callosa Phalaenopsis_aphrodite Goodyera_fumata Monomastix_sp Rhizanthella_gardneri Micromonas_sp Ostreococcus_tauri Habenaria_pantlingiana Pedinomonas_minor Vanilla_planifolia Oltmannsiellopsis_viridis Campynema_lineare Fritillaria_cirrhosa Pseudendoclonium_akinetum Ulva_fasciata Lilium_hansonii Tydemania_expeditionis Heloniopsis_tubiflora Paris_verticillata Bryopsis_hypnoides Trillium_cuneatum Gloeotilopsis_sterilis Trillium_tschonoskii Veratrum_patulum Treubaria_triappendiculataGeminella_minor Xerophyllum_tenax Pleodorina_starrii Bomarea_edulis Gonium_pectorale Luzuriaga_radicans Chlamydomonas_reinhardtii Dioscorea_elephantipes Hafniomonas_laevis Metanarthecium_luteoviride Carludovica_palmata Dunaliella_salina Sciaphila_densiflora Phacotus_lenticularis Characiochloris_acuminata Petrosavia_stellaris Pinellia_ternata Oogamochlamys_gigantea Colocasia_esculenta Carteria_cerasiformis Lemna_minor Bracteacoccus_giganteus Spirodela_polyrhiza Acutodesmus_obliquus Wolffia_australiana Mychonastes_jurisii Wolffiella_lingulata Schizomeris_leibleinii Dieffenbachia_seguine Stigeoclonium_helveticum Elodea_canadensis Floydiella_terrestris Najas_flexilis Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersum Vitis_vinifera Euptelea_pleiosperma Tetrastigma_hemsleyanum Papaver_somniferum Oenothera_biennis Akebia_trifoliata Eugenia_uniflora Nandina_domestica Stockwellia_quadrifida Sinopodophyllum_hexandrum Eucalyptus_aromaphloia Berberis_bealei Corymbia_gummifera Epimedium_sagittatum Angophora_floribunda Megaleranthis_saniculifolia Allosyncarpia_ternata Clematis_terniflora Melianthus_villosus Ranunculus_macranthus Thalictrum_coreanum Francoa_sonchifolia Stephania_japonica Viviania_marifolia Nelumbo_lutea Pelargonium_x_hortorum Platanus_occidentalis Monsonia_speciosa Macadamia_integrifolia Hypseocharis_bilobata Meliosma_aff Geranium_palmatum Tetracentron_sinense Erodium_texanum Trochodendron_aralioides Erodium_chrysanthum Buxus_microphylla Erodium_carvifolium Pachysandra_terminalis Citrus_aurantiifolia Fagopyrum_esculentum Rheum_palmatum Zanthoxylum_piperitum Spinacia_oleracea Azadirachta_indica Salicornia_brachiata Sapindus_mukorossi Haloxylon_ammodendron Dipteronia_sinensis Colobanthus_quitensis Acer_morrisonense Agrostemma_githago Boswellia_sacra Silene_chalcedonica_1 Tilia_amurensis Silene_chalcedonica_2 Hibiscus_syriacus Silene_paradoxa Silene_latifolia Gossypium_hirsutum Theobroma_cacao Silene_vulgaris Aquilaria_sinensis Silene_conica Carica_papaya Silene_conoidea Silene_noctiflora Cryophytum_crystallinum Pachycladon_cheesemaniiLepidium_virginicumIsatis_tinctoria Carnegiea_gigantea Ardisia_polysticta Lysimachia_coreana Eutrema_salsugineum Primula_poissonii Crucihimalaya_wallichii Vaccinium_macrocarpon Cochlearia_borzaeana Camellia_cuspidata Cardamine_impatiens Camellia_sinensis Arabidopsis_thalianaBrassica_napus Ligusticum_tenuissimum Arabis_hirsuta Anethum_graveolens Foeniculum_vulgare Lobularia_maritima Angelica_acutiloba Seseli_montanum Olimarabidopsis_pumila Pastinaca_pimpinellifolia Viola_seoulensisSalix_interior Aethionema_cordifolium Bupleurum_falcatum Populus_alba Daucus_carota Anthriscus_cerefolium Hevea_brasiliensis Aralia_undulata Brassaiopsis_hainla Manihot_esculenta Jatropha_curcas Dendropanax_dentiger Eleutherococcus_senticosus Ricinus_communisLicania_alba Fatsia_japonica Kalopanax_septemlobus Parinari_campestris Metapanax_delavayi Hirtella_racemosa Panax_ginseng Schefflera_delavayi Couepia_guianensis Lonicera_japonica Artemisia_frigida Chrysobalanus_icaco Juglans_regia Chrysanthemum_x_morifolium Euonymus_japonicusOstrya_rehderiana Aster_spathulifolius Leontopodium_leiolepis Ageratina_adenophora Quercus_rubra Praxelis_clematidea Helianthus_annuus Guizotia_abyssinica Jacobaea_vulgaris Castanopsis_echinocarpa Trigonobalanus_doichangensisLithocarpus_balansae Cynara_humilis Castanea_mollissima Saussurea_involucrata Cucumis_sativus Silybum_marianum Centaurea_diffusa Fragaria_vesca Lactuca_sativa Brighamia_insignis Gynostemma_pentaphyllum Prinsepia_utilis Trachelium_caeruleum Corynocarpus_laevigataPentactina_rupicolaPyrus_pyrifolia Oncinotis_tenuiloba Echites_umbellatus Prunus_persica Pentalinon_luteum Morus_indica Nerium_oleander Angiosperms Asclepias_nivea Cynanchum_wilfordii Ficus_racemosa Rhazya_stricta Catharanthus_roseus Elaeagnus_macrophyllaHumulus_lupulus Gentiana_straminea Cannabis_sativa Coffea_arabica Gynochthodes_officinalis Jasminum_nudiflorum Hesperelaea_palmeri Soja_max Olea_europaea Medicago_truncatulaTrifolium_aureumVigna_radiata Dorcoceras_hygrometricum Plantago_maritima Plantago_media Trifolium_strictum Scrophularia_takesimensis Andrographis_paniculata Trifolium_subterraneum Sesamum_indicum Tanaecium_tetragonolobum Phaseolus_vulgaris Lupinus_luteus Schwalbea_americana Lathraea_squamaria Lindenbergia_philippensis Boulardia_latisquama Millettia_pinnata Cistanche_deserticola Glycine_tomentella Epifagus_virginiana Lotus_japonicus Conopholis_americana Orobanche_californica Orobanche_crenata Orobanche_gracilis Premna_microphylla Vicia_sativa Lavandula_angustifolia Rosmarinus_officinalis Salvia_miltiorrhiza Scutellaria_baicalensis Tectona_grandis A

a G P

C U

m Cicer_arietinum m

Lens_culinaris e j Glycyrrhiza_glabra i

u u t e u

u r n

r

Wisteria_floribunda u s n c g Astragalus_nakaianus i u g

Pisum_sativum c

c n l a a

p u

u i

u s r

n b _

i

l

c e

u t a

a Acacia_ligulata a r

a

u t a e

p r Lathyrus_odoratus Inga_leiocalycina _

_ Osyris_alba i l _ _ p

_ a Lathyrus_pubescens a

g

Larrea_tridentata m a m t

a _ Viscum_album _ a

Leucaena_trichandra r

u n

e g

o Paeonia_obovata e a n

c

a

i i o

n h i

s b r

t

s g

o Penthorum_chinense l

b

m Dunalia_obovata e

o

p a

v

a

o r Sedum_sarmentosum Saracha_punctata Iochroma_nitidum c

Physalis_peruviana a i r i

s

Atropa_belladonna i

p e

Hyoscyamus_niger

I i

C N

a Liquidambar_formosana t Datura_stramonium

a

e

e

Solanum_bulbocastanum

Lycopersicon_lycopersicum 0.4 bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S2: clpP1 duplications across green plants superimposed on ClpP1 evolutionary rate tree. Internal branches colored based on simple parsimony.

Metasequoia_glyptostroboide

C

Cunningha T C

a

aiwania_cryp

H u

Cephalotaxus_wilsonian C l J l

p i

e

Am u t a

r r

Cryptomeria_japonic a

s

e l n Cupressus_gigan o s s l l i o a p

i s s p

y

p c s i a en e s i

s

h e a Retrophyllum_piresi e i

u r Araucaria_he s i

i

p

d o r s

s Adiantum_capillus-veneri t s n

u i g

o o r c nens m _ a idian _ i Podocarpus_totar m r u r e anni Osm Cyrtomium_devexiscapulaW Myriopteris_lindheimer s s i t T y a s a v n e n a s i y e L s _ n

p orreya_ ia_lan a m e x o s _ g y b a oodwardia_unigemma r b Wollemia_nobili Taxus_maire e

a r m u m o godium_japonicuA f e o n t h o

r unda s omerioide a t a i k ga c r u

p r s k _ t s _ m m Marsilea_crenat i _leh _ ia_da a _ h e O Diplopterygium_glaucu a a Nageia_nag f a P t r s s t o s a c g u r o t hi y ungen _ phioglossum_cali v _ t e o u cu rm eola t A l d i eridium_aquilinu s str a a s i s f a rt r n b argesi i a ai l e h _da a b u s t e t

t s u o t o elee a erophyll n m e n s r t ophila_ n r c n _

i a m i t s a i a t s S a s a e s a an a ozamia_hilda s s s a a a C P S t _ mm P elaginella_ Abies_korean M s Cedrus_deodar a r K s Larix_decidu Psilo c a Pseudotsuga_s m i Ephedra_equisetin e a Welwitschia_mirabili nconsp an inna i Gnetum_gnemo ephala f Gnetum_montanu yc i s Angiopteris_evect a i Gnetum_ul c Equisetum_arvens a Bowenia_serrulat s s C ky s i Cera n r a ia_ u Marchan t m a Dioon_spinulosu t a um_nudu m pinulo E ulipi ua_ Lepidozamia_peroffskyan t o a Macrozamia_mountperriensi Stangeria_eriopu m t s i Zamia_furfurace thur Hupe Is c a Ginkgo_bilob i Ptilidium_pulcherrimu m fornicu on_ oe hejuen eu m e Amborella_trichopod r a P oellendo s Tr Physcomitrella_paten m a Nuphar_adven e t Nymphaea_alb ellia_endiia_polymorphte m s Aneura_mirabili rz Illicium_oligandru hus_spica s m Drimys_granadensi t ili oean ia_lu _ m t s Piper_cenocladu s Tetraplodon_fuegianu fla i Calycanthus_floriduiodend sc 1 a s r tex o Syntrichia_rurali cc a Machilus_yunnanensi r 2 Sanionia_uncina Li a Nyholmiella_obtusifoli c rff e 3 s Ort idul id Magnolia_kwangsiensi a_ i Chloran oli v a i m if ii Heliconia_collinsian u t ho f a s No T oli Musa_ rc Te a Ravenala_madagascariensi t a tr a s Zygnema_circumcarinatuhoceros_aenigmak t ic Cu Two copies a raphis_pellucidhu s Zingiber_spectabil Meso Anthoceros_angustuk m pha_la i ia_lepido m Ananas_comosu ormi s Ty f _ s Chae t r t Carex_siderosticta Staurastrum_punctulatuaenium_endlicherianu oge a Carex_siderosticta Five copies s Carex_siderosticta s t a m o z ri Anomochloa_marantoide iv s ioide a t a phae Roya_anglic Pharus_lappulaceu a s Klebsormidium_flaccidu a Puelia_olyri s a ticu a_ ali r s Ferrocalamus_rimosivaginu str idiu s Bambusa_oldhami Chlorokybus_atmophyticu m s Oryz s _au r ca Interfilum_terricolChara_vulgari_globo m Bromus_vulgari s ite ino rni a Triticum_aestivu fo Mesostigma_virid m m a Xylochloris_irregulari Aristida_purpure s_m cali s m Zea_mayrag i s Watanabea_reniformi u h i Pseudochloris_wilhelmi m P s a Planctonema_lauterborni s Isachne_distichophyllthonia_ m Eragrost a Dan ix_vinifers e Syagrus_coronat i Myrmecia_israelensi Neo s Elaeis_guineensi a s C Podococcus_barter hor cyst s s oliu Botryococcus_brauni s Calamus_caryotoide f icyst i unclassified_TrebouxiophyceaCoccomyxa_subellipsoide s_b i Pseudophoen Paradoxia_mulis_paras r Bismarckia_nobili ev i s a is Colpothrinax_cooki a it s Phoenix_dactylifer tonem ica Sabal_domingensi s Microthamnion_kuetzingianuPabia_signiensi t Dasypogon_bromelii ise i tum_cyr m m Stichococcus_bacillariKoliella_longi ta Allium_cep i a Eustrephus_latifoliu Fusochloris_perforat e Polygona s Iris_gatesi Dic Elliptochloris_bilobat se a tyochloropsis_re ta Cypripedium_macrantho a licat m s Paphiopedilum_niveuip foliu m Phragmipedium_longifoliu_tr Lep he Bletilla_striat a M tos a ant sill arv ira_ ticula a Cal d a Auxenochlorella_protothecoideLobosphaera_incisania_gete rre ta Cymbidium_aloiina_pu Chlo str Eryc a mina is a rella_ ta Oncidium_hybri m v Corallorhiza_bulbos Parachlorella_kesslerDicloster_acuatuulga a ris Cattleya_crispat s Pyramimonas_parkea Masdevallia_coccine m Pycnococcus_provasoli s s Dendrobium_catenatu i Neottia_nidus-avi Picocystis_salinaru i a e Prasinoderma_colonial e Epipogium_roseu Nephroselmis_olivace Elleanthus_sodiro i ta Nephroselmis_astigmatic m Sobralia_callosma i a_fu e Phalaenopsis_aphrodityer a ood tlingian Monomast a G a Rhizanthella_gardnera Micromonas_si e Ostreococcus_taurx_sp Habenaria_pan Pedinomonas_mino Vanilla_planifoli a Oltmannsiellopsis_viridi p i Campynema_lineari Fritillaria_cirrhos a Pseudendoclonium_akinetu r Ulva_fasciat s Lilium_hansoni a Tydemania_expeditioni a Heloniopsis_tubiflorm m Paris_verticillat i Bryopsis_hypnoide Trillium_cuneatu Gloeotilopsis_sterili s m s Trillium_tschonoskix Treubar Geminella_minos Veratrum_patulu ia_tr Xerophyllum_tenas iappend r s Pleodor iculat Bomarea_eduli ina_starra s Gonium_pec ii Luzuriaga_radican e Chlamydomonas_reinhardtitoral Dioscorea_elephantipe e a Ha Metanarthecium_luteoviridmat fniomona i rludovica_pal s_laevis Ca a Dunaliella_salin Sciaphila_densiflor Phacotus_lenticularia 1 Cha s Petrosavia_stellarisa raciochloris _acuminata Pinellia_ternat lenta Oogamochlamys_gigante olocasia_escu a C Carter r ia_cerasiformis Lemna_mino a Bracteacoccus_giganteu Spirodela_polyrhiz s a Acutodesmus_obliquu Wolffia_australian s Wolffiella_lingulata Mychonastes_jurisii Dieffenbachia_seguine Schizomeris_leibleinii Elodea_canadensis Stigeoclonium_helveticum Najas_flexilis Floydiella_terrestris Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersum tis_vinifera Vi Euptelea_pleiosperm m a Tetrastigma_hemsleyanu Papaver_somniferu s m Oenothera_bienni Akebia_ a trifoliata Eugenia_uniflor Nand ida ina_domestica Stockwellia_quadrif Sinopodophyllum_hexandru tus_aromaphloia Berberis_beale m Eucalyp a i Corymbia_gummifer Epimedium_sagittatu a Megaleranthis_saniculifolim Angophora_floribunda Clematis_terniflor a Allosyncarpia_ternat s hus_villosu Ranunculus_macranthua Meliant a sonchifoli Thalictrum_coreanu Francoa_ a s foli St m Viviania_mari 1 ephania_japoni Nelumbo_lu ca Pelargonium_x a t Platanus_occidenea Monsonia_specios a Macadamia_integrifoli m talis Hypseocharis_bilobat Meliosma_af 1 a Geranium_palmatutexanum Tetracentron_sinensf Erodium_ thum Trochodendron_aralioide lium e Erodium_chrysanifo Buxus_microphyll um_carv folia Erodi rantii Pachysandra_terminali s s_au tum a Citru ri Fagopyrum_esculentu lum_pipe a Rheu hoxy s Zant i m_pal Spinacia_oleracema m Azadirachta_indicsis Salicornia_brachiattum sinen e Sapindus_mukorossonia_ Haloxylon_ammodendroa ter a Dip Coloban a Acer_morrisonens s A gro thus_qui Boswellia_sacr s Silene_chalcedonicastemm n Tilia_amurensi a_gi tensi m Silene_chalcedonicathag s Hibiscus_syriacu o Silene_paradox o s Silene_latifoli 1 Gossypium_hirsutu a Silene_vulgari 2 Theobroma_caca i a Aquilaria_sinensimani Silene_conic a Carica_papayse m Silene_conoide chee s a Silene_noctiflora ladon_ tori hyc tinc m Cryophytum_crystallinu Pac tis_ ineu i a Lepidium_virginicuIsa lsug Carnegiea_gigante a Ard a s Lysimachia_coreanisia_po Eutrema_sa tien pa a Primula_poissonil m s yst a m Crucihimalaya_wallichi Vaccinium_macrocarpoicta Cochlearia_borzaean a Camellia_cuspidat Cardamine_i a Camellia_sinensi a Arabidopsis_thalianBrassica_napu a Ligusticum_tenuissimu i Arabis_hirsut m Ane foliu s Foeniculum_vulgarthum_graveolen a n rdi r Lobularia_maritimco Angelica_acu s a_ m lba Seseli_montanu s Pas Olimarabidopsis_pumil us_a m thione Viola_seoulensiSalix_interiol a B t s Ae upleuinaca_pimpinellit Popu s Daucus_caro ilob e s Anthriscus_cerefoliur a um m s Aralia_undulat _ tri fal Hevea_brasiliensi a Brassaiopsis_hainl c Manihot_esculent ta at Jatropha_curca a Dendropanax_den um foli s Eleutherococcus_senticosu a o Fatsia_japonic Ricinus_communiLicania_alb a s Kalopanax_septemlobu m Parinari_campes a Metapanax_delavay Hirtella_racemos a Panax_ginsen a s Schefflera_delavay tige Couepia_guianensi a Lonicera_japonic a r a Artemisia_frigid Chrysobalanus_icac e Chrysanthemum_ Juglans_regi 0.4 Euonymus_japonicuOstrya_rehderian a As g s m i m Leon ter_spa i s ss s Agera yllu u olli iv a Praxelis_clematidet Quercus_rubrm t opodiu a i aph sa a Helianthus_annuu t Guizotia_abyssinictina_adenophorthuli a s_ la i s Jacobaea_vulgari anea_ m co f x Castanopsis_echinocarp u i a Cynara_humili m oliu Lithocarpus_balansast a_penc _leiolepi Trigonobalanus_doichangensi a Saussurea_involucrat s Ca mm Cu a Silybum_marianu e a Cen st Fragaria_vesc c a Lactuca_sativ no s Brighamia_insigni s o s taurea_di a a _indi a Tr s Gy Prinsepia_utili m O Corynocarpus_laevigat s e tiv a a s a Pentactina_rupPyrus_pyrifoli c Echites_umbellatuncino c s a m Pentalinon_luteu heliu Prunus_persic r _lupulu Moru _ s m Ne s m Asclepias_nive ff u a Cynanchum_wilfordi t m us ic ulu s riu is_ a m Rha _ F m x m a a a Catharanthus_roseu t c a Gentiana_stramine enuilobae s _oleande Elaeagnus_macrophyllHu Cannabis_sa m Coffea_arabic zy a Gy ruleu s s Jasminum_nudifloru a_ oja_ s Hesperelaea_palmer no S a O str a Do a m Medicago_truncatul Vigna_radiat Plantago_maritimlea_europae s Trifolium_aureu m c i m a Plantago_medi h ct a r s Scrophularia_takesimensi rcocer iv t a Trifolium_strictu t Andrographis_paniculat hode inu m s S Trifolium_subterraneu t a s Tanaecium_tetragonolobu s a Sc e a La Phaseolus_vulgari Lupinus_luteu ivu Lindenbergia_philippensi s a us_japonicu loribund t a t f t Boulardia_latisquam a in a Cistanche_deserticol hwalbea_a a s i ia_ t e Epifagus_virginian t m Glycine_tomentell Conopholis_american h s_ _o Millettia_pinnat c a m a Orobanche_californic Lo i a Orobanche_crenat r s yc t O u a Premna_microphyll aea_ a Lavandula_angus V m Rosmarinus_officinali h Salvia_mil m a ff al Scu Tectona_grandi robanche_gra m a Ajuga_reptan P G eria_ m a C U ygr i r c t richandr c a t m a _indi Cicer_arie a i inali

u e t m n Glycyrrhiza_glabr m is Lens_culinari m r

riden e m

s n o i u u g s a u t r t i c Astragalus_nakaianu Pisum_sa ellaria_baicalensi _albu c metr W l qua u c u i u i u n s a

u s m c n

a i

p

m l e o c u

t a

r

n u b a a u e m

r

u a m Acacia_ligula m a _ i i _ r l cu nga_leio Osyris_alb t a Lathyrus_odoratu _ p a i sc a

I g Lathyrus_pubescen i c m t a r _

_ _ _ iorrhiz an t horum_chinens r m Larrea_ V m r t a

o

g a e i Leucaena_ s a a u

n Paeonia_obovat n e i a _ h r s c b c a i g

a

o

o i l a a ili s s

b t e a r s a Pen v a s a p

o a r Dunalia_obovat m r u Sedum_sarmentosu Saracha_punctat i

i s a t t e a

c Iochroma_nitidu i a o m i a Physalis_peruvian f a i

a t Hyoscyamus_nige a oli

C a p Atropa_belladonn

I s N

e D e

Liquidambar_formosan s a

s

Solanum_bulbocastanu

Lycopersicon_lycopersicu

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S3: Correlation between clpP1 indels and ClpP1 branch length. As with Figure 7, normalization of both axes was achieved using independent sets of non-Clp nuclear-encoded proteins (n=10, concatenated).

Number of Clpp1 Indels vs Clpp1 Branch Length

100

10 ● ● ● ● ● ●● ● ● Number of Clpp1 Indels (normalized)

1 ● ●●●● ●● ●● ●

1 100 Clpp1 Branch Length (normalized)

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S4: Loss of clpP1 introns superimposed on ClpP1 evolutionary rate tree. Internal branches colored based on simple parsimony.

Metasequoia_glyptostroboide

C

Cunningha T C

a

H aiwania_cryp u

C Cephalotaxus_wilsonian l J l

p i

e

t u Am a r r

a

Cryptomeria_japonic s

l n Cupressus_gigan e o l s

l o i p a s s i p y s

p c s

e s i

s en h i e a e a i

r Retrophyllum_piresi u s i i

p

d o

r s

Araucaria_he s s n

u i t g

o r c _ nens o a idian m _ r

r i m u Adiantum_capillus-veneri s e s

Podocarpus_totar y i

a v t T n anni s a y n e s s Myriopteris_lindheimer a _ n e

p i W a Cyrtomium_devexiscapula ia_lan e Osm o L g orreya_ _ b r a x b e s m Taxus_maire a r

m y o Wollemia_nobili f u o n e h m oodwardia_unigemma o t r godium_japonicuA a

omerioide t i k c s r u p r s k

_ t m ga _ m i _ a _ h ia_da e unda a a _leh s t s r a Marsilea_crenat Nageia_nag f s

g c u r o t a o y s

_ t v _ e Diplopterygium_glaucu P u a eola ungen hi l d s O rm a a t o i s

a

r n f b a cu t i

s h e b rt i argesi a u phioglossum_cali A eridium_aquilinu e

t ai s str t o elee _da n t t e n o r n r t c n l erophyll i a i t a i a s s _ a s t a s u a e ophila_ a s m s s C P an S a P ozamia_hilda m Abies_korean s Cedrus_deodar K t a mm Larix_decidu a a Pseudotsuga_s a i Ephedra_equisetin s s S _ Welwitschia_mirabili a a i Gnetum_gnemo M c Gnetum_montanu yc ephala r elaginella_ Psilo a Gnetum_ul nconsp m s inna i Bowenia_serrulat c i e an a C f Angiopteris_evect s i Cera n r a Dioon_spinulosu s s a_ Loss of first intron Equisetum_arvensky m a E i s t m pinulo Lepidozamia_peroffskyan u um_nudu a Macrozamia_mountperriensi ulipi t a Marchan ua_ Stangeria_eriopu t o i Zamia_furfurace t s thur m a Ginkgo_bilob i c e Amborella_trichopod Hupe Is fornicu eu m on_ m hejuen s Tr r Loss of second intron Ptilidium_pulcherrimu oe a Nuphar_adven oellendo m Nymphaea_alb a P m t t Illicium_oligandru e Physcomitrella_paten ellia_endiia_polymorphrz e m Drimys_granadensi hus_spica s Aneura_mirabili s m Piper_cenocladu t ili s t oean ia_lu _ Calycanthus_floridu f is iodend s la Machilus_yunnanensir ex sc Loss of both introns a t Tetraplodon_fuegianu cc Li ro 1 a rff e Syntrichia_rurali c Magnolia_kwangsiensi 2 idul id a_ a Sanionia_uncina i Chloran Nyholmiella_obtusifoli i oli 3 Ort v a Heliconia_collinsian m f s ii a u ti foli Musa_ rc ho a Ravenala_madagascariensi s No a T Te tr Cu a i s Zingiber_spectabil s thoceros_aenigmak t c m pha_la Zygnema_circumcarinatua raphis_pellucidhu Ananas_comosu k s i Meso Anthoceros_angustuia_lepido m Ty formi _ s Carex_siderosticta t roge a Carex_siderosticta Chae t Staurastrum_punctulatuaenium_endlicherianu s Carex_siderosticta r s z i Anomochloa_marantoide a m t ioide a Pharus_lappulaceu tiv o a a sphae Roya_anglic a Puelia_olyri s ticu s Klebsormidium_flaccidu s Ferrocalamus_rimosivaginua_ ali a s Bambusa_oldhami ridiu str s Oryz m m s _au Chara_vulgari Bromus_vulgari s r a Chlorokybus_atmophyticuInterfilum_terricol _globo e c a Triticum_aestivu it ino rni m Aristida_purpurem fo Mesostigma_virid a s m Zea_mayrag is_m cali Xylochloris_irregulari u h s m P i Watanabea_reniformi s Isachne_distichophyll s Pseudochloris_wilhelmi thonia_ a Planctonema_lauterborni m Eragrost a Dan e Syagrus_coronat s i s Elaeis_guineensi Myrmecia_israelensiNeo a s Podococcus_barter s C cyst Calamus_caryotoide hor s s foliu Botryococcus_braunii i Pseudophoenix_vinifer cyst s_b i unclassified_TrebouxiophyceaCoccomyxa_subellipsoidei r Bismarckia_nobili Paradoxia_muls_paras ev i is Colpothrinax_cooki s a Phoenix_dactylifer i s a tic Sabal_domingensi tonem a s tise i Dasypogon_bromelii Microthamnion_kuetzingianuPabia_signiensi t Allium_cep tum_cyr Stichococcus_bacillariK a i m m oliella_longi a Eustrephus_latifoliu e Polygona Fusochloris_perforat Iris_gatesi s s Elliptochloris_bilobat et Cypripedium_macrantho Dic a a a tyochloropsis_re s Paphiopedilum_niveulicat m ip oliu m Phragmipedium_longifoliu_tr f Lep he a Bletilla_striat to ant illa Ma si t Cal s rv ra_ icula a d a Lobosphaera_incisania_gete Cymbidium_aloi Auxenochlorella_protothecoide rre ta ina_pu str Eryc Chlo m is a ina Oncidium_hybri a rella_ ta m v a Corallorhiza_bulbos Dicloster_acuatuulga Parachlorella_kessler ris Cattleya_crispat s Pyramimonas_parkea s Masdevallia_coccine m Pycnococcus_provasoli s Dendrobium_catenatu i Neottia_nidus-avi Picocystis_salinaru i Prasinoderma_colonial e Epipogium_roseu a e Elleanthus_sodiro Nephroselmis_olivace i Sobralia_callos ata Nephroselmis_astigmatic m fum i e Phalaenopsis_aphrodityera_ a Good tlingian Monomast a a Rhizanthella_gardnera Micromonas_six_s Habenaria_pan e Ostreococcus_taur p Pedinomonas_mino p Vanilla_planifoli a Oltmannsiellopsis_viridi Campynema_linear i i Fritillaria_cirrhos r a Pseudendoclonium_akinetu Lilium_hansoni Ulva_fasciat s a Heloniopsis_tubiflor Tydemania_expeditioni a m m Paris_verticillat i Bryopsis_hypnoide s Trillium_cuneatu Gloeotilopsis_sterili Trillium_tschonoskim s x Geminella_minos Veratrum_patulu Treubar Xerophyllum_tenas ia_triappend r i Bomarea_eduli s Pleodorina_starrculata Luzuriaga_radican s Gonium_pec ii e Chlamydomonas_reinhardti Dioscorea_elephantipe torale Metanarthecium_luteoviridata Ha i ica_palm fniomona Carludov s_laevis a Dunaliella_salin Sciaphila_densiflor a 1 Phacotus_lenticulari Petrosavia_stellaris Cha s a raciochloris Pinellia_ternat _acuminata a Oogamochlamys_gigante Colocasia_esculent a r Carteria_ Lemna_mino cerasiformis a Bracteacoccus_giganteu Spirodela_polyrhiz s a Acutodesmus_obliquu Wolffia_australian s Wolffiella_lingulata Mychonastes_jurisii Dieffenbachia_seguine Schizomeris_leibleinii Elodea_canadensis Stigeoclonium_helveticum Najas_flexilis Floydiella_terrestris Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersum Vitis_vinifera Euptelea_pleiosperma Tetrastigma_hemsleyanum Papaver_somniferum a_biennis Oenother Akebia_ a trifoliata Eugenia_uniflor Nandina_d fida omestica Stockwellia_quadri Sinopodophyllum_hexandru a m Eucalyptus_aromaphloi Berberis_beale a i Corymbia_gummifer Epimedium_sagittatu a Megaleranthis_saniculifolim Angophora_floribund a Clematis_terniflor a Allosyncarpia_ternat s a Melianthus_villosu Ranunculus_macranthu hifolia coa_sonc Thalictrum_coreanu s Fran olia f Stephania_japoni m Viviania_mari 1 Nelumbo_lu ca Pelargonium_x a tea Platanus_occiden Monsonia_specios a Macadamia_integrifolitalis Hypseocharis_bilobatm Meliosma_af 1 a Geranium_palmatutexanum Tetracentron_sinensf Erodium_ thum Trochodendron_aralioide um e Erodium_chrysanifoli Buxus_microphyll ium_carv folia Erod rantii Pachysandra_terminali s s_au m a Citru ritu Fagopyrum_esculentu lum_pipe a hoxy Rheu s Zant i m_pal Azadirachta_indic Spinacia_oleracemat m sis um inen Salicornia_brachiat Sapindus_mukorosss e a eronia_ Haloxylon_ammodendro Dipt a Coloban a Acer_morrisonens s A thus_qui Boswellia_sacr grost s Silene_chalcedonicaemm tensi n Tilia_amurensi a_gi s m Silene_chalcedonicathag o Hibiscus_syriacu o Silene_paradox s 1 Gossypium_hirsutu Silene_latifoli 2 Theobroma_caca a Silene_vulgari a i Aquilaria_sinensimani Silene_conic a Carica_papayse m chee Silene_conoide s oria Silene_noctiflora ycladon_ t ach tinc m Cryophytum_crystallinu P tis_ ineu i a Lepidium_virginicuIsa sug Carnegiea_gigante l a a Ard s is Eutrema_sa tien Lysimachia_coreania_po a Primula_poissonilyst a m Crucihimalaya_wallichi s ict Cochlearia_borzaean Vaccinium_macrocarpoa a Camellia_cuspidat Cardamine_impa a a Camellia_sinensi i Arabidopsis_thalianBrassica_napu a Arabis_hirsut Ligusticum_tenuissimu m Ane n foliu s Foeniculum_vulgarthum_graveolen a Lobularia_maritimrdi s co r Angelica_acu a_ terio m lba Seseli_montanu Olimarabidopsis_pumil s Pas m thione Viola_seoulensilus_a s Ae Salix_in a Bupleutinaca_pimpinellitilob e Popu s Daucus_caro a r m s Anthriscus_cerefoliuum s _ Hevea_brasiliensi tri Aralia_undulat fal a ca Manihot_esculent Brassaiopsis_hainl ta tu foli Jatropha_curca a Dendropanax_den m a s Ricinus_communi Eleutherococcus_senticosu Licania_alb o Fatsia_japonic a m Parinari_campes s Kalopanax_septemlobu a Metapanax_delavay a Hirtella_racemos a Panax_ginsen t s Schefflera_delavay ige Couepia_guianensi a r a Lonicera_japonic Chrysobalanus_icac a Artemisia_frigid 0.4 Euonymus_japonicuJuglans_regi e Chrysanthemum_ s Ostrya_rehderian g a As s im i m Leon ter_spa ss llu s Agera Quercus_rubrolli y vu a i m ti a Praxelis_clematidetopodiu taph sa a Helianthus_annuu t a _ tina_adenophorhuli Castanopsis_echinocarp is la Guizotia_abyssinic x anea_ m s Jacobaea_vulgari m foliu Trigonobalanus_doichangensiLithocarpus_balansast a_pen u ico c a Cynara_humili _leiolepi Ca s mm Cu a Saussurea_involucrat e st a Silybum_marianu Fragaria_vesc a Cen no c s a Lactuca_sativ a a Gy s s o s Brighamia_insignitaurea_di Corynocarpus_laevigatPrinsepia_utili _indi Tr Pentactina_rup s m a a Pyrus_pyrifoli e iv O s s c t a a Prunus_persic a Echites_umbellatuncino c r m heliu Moru _ _lupulu Pentalinon_luteu s s m ff u Ne us ic ulu m Asclepias_nive t m a m a r is_ a a F m s Cynanchum_wilfordiiu _ Rha c x m tenuilobae Elaeagnus_macrophyllHu a Catharanthus_roseu Cannabis_sa a _oleande m s Gentiana_stramine r s Coffea_arabic zy uleu a Gy s a_ Jasminum_nudifloru oja_ s m Hesperelaea_palmer a Medicago_truncatul S a no str Trifolium_aureuVigna_radiat O m s a Do lea_europae c ict a r Trifolium_strictu m Plantago_maritim h a Plantago_medi a s rcocer t Trifolium_subterraneu iv Scrophularia_takesimensi hode t Andrographis_paniculat inu m Phaseolus_vulgari t a s S s Tanaecium_tetragonolobu Lupinus_luteu s e a us_japonicu a Sc i t loribund ivu La f t a s s Glycine_tomentell a Lindenbergia_philippensi a a ia_ in t a Boulardia_latisquam hwalbea_a _o Millettia_pinnat Cistanche_deserticol s_ s Lo c t e t m a i Epifagus_virginian h Conopholis_american a a yc t m Orobanche_californic r u ff V a Orobanche_crenat h a O aea_ i eria_ Premna_microphyll m a ygr c t al m Lavandula_angus Rosmarinus_officinali Salvia_mil inali m Scu robanche_gra Cicer_arie richandr c a Tectona_grandi Ajuga_reptan Glycyrrhiza_glabr G _indi m a P U is t C r Lens_culinari a m m a o Astragalus_nakaianu a i a e u t riden m n m W m Pisum_sa t r s metr i

e s n s

i u u g a t u r qua _albu i c ellaria_baicalensi c c l

u c u m i

u u n s u m u n

a i

p l e o c m

t e a i

r u n cu Acacia_ligula b a m a u r

r Lathyrus_odoratu nga_leio u a m

_ i Lathyrus_pubescen I a Osyris_alb i _ c l sc t a a m a i _ p a g an m t horum_chinens r _ r Larrea_ _ iorrhiz t _ _ t Leucaena_ V r i m a a

o

g a s e a a a Paeonia_obovat u

n

n

e s i _ h r s c c

b i g a a

o

o i ili Pen a l a s b e t a a r s a v s

Sedum_sarmentosu p

Dunalia_obovat o a r m r Saracha_punctat u i m a t i s t e a a a Iochroma_nitidu c i a

o Physalis_peruvian i f

i a Hyoscyamus_nige t oli a s C a p Atropa_belladonn

N I

e Liquidambar_formosan D e s a

s

Solanum_bulbocastanu

Lycopersicon_lycopersicu

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S5: RNA editing of clpP1 codon 187 superimposed on ClpP1 evolutionary rate tree. Internal branches colored based on simple parsimony.

Metasequoia_glyptostroboide

C

Cunningha C T

a

H aiwania_cryp u

C l

Cephalotaxus_wilsonian J l

p i

e

t

u a

r Am r

a

Cryptomeria_japonic s

l Cupressus_gigan e o n l s l o i p a

s p i y s s p c s

e

s i

s h en e e a i i a

r u s

i Retrophyllum_piresi i

p

d o

r s

s s n

Araucaria_he i u g

o r t c _ nens a idian _ r

o r m u i m e s s Adiantum_capillus-veneri Podocarpus_totar y i n a v s t T y e s n a anni _ n s a p e a i Myriopteris_lindheimer ia_lan e o

_ g orreya_ r b W e a Cyrtomium_devexiscapula L b x a s Osm Taxus_maire r m m o f

o n y Wollemia_nobili e h u o

r oodwardia_unigemma t a m t omerioide i A k c godium_japonicu u r s p r s k

_ t m m i

_ ga _ _ h ia_da e a

a a t s r

s a _leh s

unda f g c r Nageia_nag u t Marsilea_crenat o o y _ v _ s e t u a eola l

d s ungen

a hi rm a a i s P a t o n Diplopterygium_glaucu r b O f a i e h b

a u s argesi e rt

t cu t s t o ai i elee eridium_aquilinu n A n t e r r phioglossum_cali _da o n c t t n str i a

i t a i a erophyll s s _ l t a s a e a s a

C P s S s u an a P Abies_korean ozamia_hilda m ophila_ Cedrus_deodar s K t s m Larix_decidu a mm a Pseudotsuga_s a i Ephedra_equisetin Welwitschia_mirabili a a Gnetum_gnemo _ i s s S Gnetum_montanu yc ephala a Gnetum_ul a M c i Bowenia_serrulat nconsp r elaginella_ Psilo c i m s inna a C s i e an Cera n r a Dioon_spinulosu f Angiopteris_evect a s m E ia_ s Equisetum_arvensky pinulo Lepidozamia_peroffskyan s t m a Macrozamia_mountperriensi um_nudu u ua_ Stangeria_eriopu ulipi t a Marchan o i Zamia_furfurace t t s thur m a Ginkgo_bilob i e Amborella_trichopod c f eu m Tr Hupe Is hejuen ornicu s on_ m a Nuphar_adven r oe m Ptilidium_pulcherrimu oellendo Nymphaea_alb a P m Illicium_oligandru t te m Drimys_granadensi s e Physcomitrella_paten ellia_endiia_polymorphrz thus_spica Aneura_mirabili s m Piper_cenocladu ili ia_lu _ s Calycanthus_floridu t oean is iodend fla Machilus_yunnanensir sc s a tex cc Li o 1 Tetraplodon_fuegianu rff e r a Syntrichia_rurali c Magnolia_kwangsiensi idul id a_ a 2 Sanionia_uncina ii Chloran Nyholmiella_obtusifoli v a Heliconia_collinsian m oli 3 Ort ii a u if s f Musa_ t oli a rc Ravenala_madagascariensi s ho a No Cu T Te tr s Zingiber_spectabil a ic m s t k t Ananas_comosupha_la hoceros_aenigmaa raphis_pellucidhu Zygnema_circumcarinatu s Ty k m ormi i Meso Anthoceros_angustuia_lepido s Carex_siderosticta f _ t roge a Carex_siderosticta s Carex_siderosticta Chae taenium_endlicherianu Staurastrum_punctulatu r Anomochloa_marantoide s z i a m ioide a Pharus_lappulaceu tiv to a Puelia_olyri a a s Roya_anglic s phae ticu s Ferrocalamus_rimosivaginu s a_ a Klebsormidium_flaccidu s Bambusa_oldhami ali r str idiu s Oryz m s m Bromus_vulgari s_au a Chara_vulgari_globo e r c Chlorokybus_atmophyticuInterfilum_terricol a Triticum_aestivu it ni m ino r Aristida_purpurem fo Mesostigma_virid m ag ali a su Zea_mayr is_m c Xylochloris_irregulari m h s P i Watanabea_reniformi s Isachne_distichophyll s Pseudochloris_wilhelmi m thonia_ a Planctonema_lauterborni Eragrost Edited codon (CAT or CAC) a Dan e Syagrus_coronat s s Elaeis_guineensi i Hard-coded codon (TAT or TAC) Myrmecia_israelensiNeo s Podococcus_barter a C cyst s Calamus_caryotoide s hor s oliu i Pseudophoenix_vinifer f Not edited, CNN Botryococcus_brauniicyst s_b i Coccomyxa_subellipsoide r i Bismarckia_nobili unclassified_TrebouxiophyceaParadoxia_mulis_paras ev is Colpothrinax_cooki Not edited, all others s a s Phoenix_dactylifer it a ica Sabal_domingensi tonem s tise i Dasypogon_bromelii Pabia_signiensi ta Allium_cep tum_cyr Microthamnion_kuetzingianuK i m m Stichococcus_bacillarioliella_longi a Eustrephus_latifoliu e Polygona Fusochloris_perforat s Iris_gatesi se Elliptochloris_bilobat ta Cypripedium_macrantho Dic a a tyochloropsis_re s Paphiopedilum_niveucat m pli m Phragmipedium_longifoliutri foliu e_ Lep Bletilla_striath a ant a tos Cal sill Ma ira_ ticula a d rvania_gete Cymbidium_aloi a Auxenochlorella_protothecoideLobosphaera_incisrre ta ina_pu str Eryc m is a Chlo ina Oncidium_hybri a rella_ ta Corallorhiza_bulbos m vulga a Dicloster_acuatu Cattleya_crispat Parachlorella_kessler ris s Masdevallia_coccine Pyramimonas_parkea s m Dendrobium_catenatu Pycnococcus_provasoli s i Neottia_nidus-avi i Picocystis_salinaru Epipogium_roseu a Prasinoderma_colonial e e Elleanthus_sodiro Nephroselmis_olivace i m Sobralia_callos ata Nephroselmis_astigmatic fum i e Phalaenopsis_aphrodityera_ ood a a G tlingian Monomast a Rhizanthella_gardnera Micromonas_six_s Habenaria_pan e Ostreococcus_taur p Vanilla_planifoli Pedinomonas_mino p a Oltmannsiellopsis_viridi Campynema_linear i i Fritillaria_cirrhos r a Pseudendoclonium_akinetuUlva_fasciat s Lilium_hansoni Heloniopsis_tubiflora Tydemania_expeditioni a Paris_verticillat m m i Bryopsis_hypnoide Trillium_cuneatu s Gloeotilopsis_sterili Trillium_tschonoskim s Veratrum_patulu x Geminella_mino s Treubar Xerophyllum_tenas ia_triappend r icu Bomarea_eduli s Pleodor lata ina_starr Luzuriaga_radican s Gonium_pec ii e Dioscorea_elephantipe Chlamydomonas_reinhardtitorale Metanarthecium_luteoviridata Ha i ica_palm fniomon Carludov as_laevis a Dunaliella_salin Sciaphila_densiflor a 1 Phacotus_lenticulari Petrosavia_stellaris Cha s a raciochloris Pinellia_ternat _acuminata a Oogamochlamys_gigante Colocasia_esculent a r Carteria_ Lemna_mino cerasiformis a Bracteacoccus_giganteu Spirodela_polyrhiz s a Acutodesmus_obliquu Wolffia_australian s Wolffiella_lingulata Mychonastes_jurisii Dieffenbachia_seguine Schizomeris_leibleinii Elodea_canadensis Stigeoclonium_helveticum Najas_flexilis Floydiella_terrestris Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersum Vitis_vinifera Euptelea_pleiosperma Tetrastigma_hemsleyanum Papaver_somniferum Oenothera_biennis Akebia_tri a foliata Eugenia_uniflor Nandina_d fida omestica Stockwellia_quadri Sinopodophyllum_hexandru a m Eucalyptus_aromaphloi Berberis_beale a i Corymbia_gummifer Epimedium_sagittatu a Megaleranthis_saniculifolim Angophora_floribund a Clematis_terniflor a Allosyncarpia_ternat s thus_villosu Ranunculus_macranthua Melian a sonchifoli s Francoa_ Thalictrum_coreanu folia m Viviania_mari Stephania_japoni 1 ca Pelargonium_x Nelumbo_lu a tea Plat Monsonia_specios a anus_occiden Macadamia_integrifolitalis Hypseocharis_bilobatm Meliosma_af 1 a Geranium_palmatutexanum Tetracentron_sinensf Erodium_ thum Trochodendron_aralioide m e Erodium_chrysanifoliu Buxus_microphyll dium_carv folia Ero rantii Pachysandra_terminali s s_au a Citru ritum Fagopyrum_esculentu m_pipe xylu a Rheu s antho m Z i _pal m Azadirachta_indic Spinacia_oleracematu sis m inen Salicornia_brachiat Sapindus_mukorosss e a eronia_ Haloxylon_ammodendro Dipt a Coloban a Acer_morrisonens s A thus_qui Boswellia_sacr gro s stemm tensi n Tilia_amurensi Silene_chalcedonicaa_gi s m Silene_chalcedonicathag Hibiscus_syriacu o o Silene_paradox 1 Gossypium_hirsutu s Silene_latifoli 2 Theobroma_caca a Silene_vulgari a i Aquilaria_sinensi ani Silene_conic a em Carica_papays m chee Silene_conoide s a a ladon_ tori Silene_noctiflor hyc inc Pac t m Cryophytum_crystallinu Lepidium_virginicutis_ ineu i a Isa Carnegiea_gigante a a Ard s isi Eutrema_salsug tien Lysimachia_coreana_po a Primula_poissonilyst a m Crucihimalaya_wallichi ict Cochlearia_borzaean s Vaccinium_macrocarpoa a Cardamine_impa Camellia_cuspidat a a Arabidopsis_thalianBrassica_napu Camellia_sinensi i a Arabis_hirsut Ligusticum_tenuissimu m Ane a n foliu s t Lobularia_maritimrdi Foeniculum_vulgarhum_graveolen s co r a_ Angelica_acu m terio lba Seseli_montanu Olimarabidopsis_pumil m thione Viola_seoulensi s Pas s Ae Salix_inlus_a a B tinaca_pimpinellitilob e Popu upleu s Daucus_caro a r m s Anthriscus_cerefoliuum _ Hevea_brasiliensi s Aralia_undulat fal tri c Manihot_esculent a Brassaiopsis_hainl at f Jatropha_curca ta um oli a Dendropanax_den a Ricinus_communi s Eleutherococcus_senticosu Licania_alb o Fatsia_japonic a m Parinari_campes s Kalopanax_septemlobu a Metapanax_delavay a Hirtella_racemos a Panax_ginsen tige Couepia_guianensi s Schefflera_delavay a r a Lonicera_japonic Chrysobalanus_icac a Artemisia_frigid Euonymus_japonicuJuglans_regi s Ostrya_rehderian e Chrysanthemum_ g a As i s im Leon t ss m er_spa llu s Agera Quercus_rubr olli y u a i m tiv a Praxelis_clematidetopodiu a a taph s Helianthus_annuu thuli _ a tina_adenophor 0.4 Castanopsis_echinocarp is a Guizotia_abyssinic x anea_ l foliu Trigonobalanus_doichangensist m s Jacobaea_vulgari m Lithocarpus_balansaa_pencu ico _leiolepi Ca a Cynara_humili s mm Cu e a Saussurea_involucrat st Silybum_marianu Fragaria_vesc a no a Cen s c a a Gy a Lactuca_sativ s s Corynocarpus_laevigatPrinsepia_utili o s Brighamia_insignitaurea_di _indi a Pentactina_rupPyrus_pyrifoli s m a Tr s e s c tiv a O a Prunus_persic c ra Echites_umbellatuncino heliu Moru _ _lupulu m Pentalinon_luteu s s m ff u Ne us m ic ulu m t m a F a Asclepias_nive is_ a a m r _ s Cynanchum_wilfordiiu c Elaeagnus_macrophyll Rha m t ae Hu x enuilob Cannabis_sa a Catharanthus_roseu a _oleande r s m s Gentiana_stramine uleu Coffea_arabic zy a Gy s a_ oja_ Jasminum_nudifloru a m s Medicago_truncatul S Hesperelaea_palmer str s Trifolium_aureu Vigna_radiat a no m O a a Do ict r Trifolium_strictu lea_europae c m Plantago_maritim h a Trifolium_subterraneu a Plantago_medi s rcocer t iv Scrophularia_takesimensi hode inu t Andrographis_paniculat Phaseolus_vulgari t m s Lupinus_luteu a S s s Tanaecium_tetragonolobu a us_japonicu e i t loribund ivu a Sc Glycine_tomentell f t La s s a a Millettia_pinnat ia_ a Lindenbergia_philippensi a _o s Lo in t a Boulardia_latisquam hwalbea_a s_ a c Cistanche_deserticol m i t e t Epifagus_virginian h ff a Conopholis_american yc m a h a V t Orobanche_californic r u i eria_ a Orobanche_crenat aea_ c t O m ygr al a Premna_microphyll inali m Lavandula_angus Glycyrrhiza_glabr Cicer_arie Rosmarinus_officinali is richandr c Salvia_mil m Scu robanche_gra _indi t a Tectona_grandi m m Ajuga_reptan Lens_culinari a G P

C U r o Astragalus_nakaianu a m a W a riden a i metr i s e u Pisum_sa t m n s m t m r a

e n s qua u i u g t u _albu r c i c ellaria_baicalensi

c m l

u c u u i

u u n s

m u n m i a i

p e l cu e o c

t u a Acacia_ligula r m n

b a r Lathyrus_odoratu nga_leio a Lathyrus_pubescen u i r I u a c m

_ m Osyris_alb a

i _ a sc l t

i a _ p a an a horum_chinens g t r Larrea_ m Leucaena_ t r _ _ iorrhiz V _ _ i t

r a

m

a o a a

g a Paeonia_obovat s e a u s

n

n

e i c _ h r s c b a i g

a o ili a Pen o i a l

s b e a a t r a s Sedum_sarmentosu v s

p m Dunalia_obovat o a r Saracha_punctat u m r i t a t i s a e a i a a Iochroma_nitidu c

Physalis_peruvian o f i

i oli s Hyoscyamus_nige a t a

C a p Atropa_belladonn

N Liquidambar_formosan I e D e s a

s

Solanum_bulbocastanu Lycopersicon_lycopersicu

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S6: RNA editing of clpP1 codon 28 superimposed on ClpP1 evolutionary rate tree. Internal branches colored based on simple parsimony.

Metasequoia_glyptostroboide

C

T Cunningha Cup

a aiwania_cryp

H Cephalotaxus_wilsonian C l J

l

i

Am e unipe t a

Cryptomeria_japonic r r a s

s

e l s Cupressus_gigan o l

l i o a Edited codon (CAT or CAC) p s ss p y i a

c en s

e

i

s

h Retrophyllum_piresi e a Araucaria_he i

u r s

i

i

Adiantum_capillus-veneri p d r o s

t s s n u i o g m Osm r o nens c _ idian Cyrtomium_devexiscapulaW Myriopteris_lindheimer Podocarpus_totar a i anni m _ r

u r e t T s s a s i e y v n L a i Hard-coded codon (TAT or TAC) a n s y e _be s n

p m oodwardia_unigemma y orreya_ ia_lan a s x e o _ g a Wollemia_nobili b Taxus_maire r e

a m godium_japonicuA u r m

o f

unda o n t h o

r a ga s omerioide a

t

i k rm c u s per r s _ k O Diplopterygium_glaucu Marsilea_crenat _ t _leh a

m i

P _ ia_da _ h a Not edited, CNN Nageia_nag a f a s t t r a phioglossum_cali o s s hi c t udian g o t A y ungen o cu _ str eridium_aquilinu _ i v rm e t eola u s l s

a a rt f iren s l a

n s _da argesi b ai u a t h b t a u m s e

t ophila_ erophyll o s t o elee S m n e

r t r _ c n

Not edited, all others i a s s i t a a i s a a t s elaginella_ a s s a a e ozamia_hilda M _ an a r s mm s t

a C P S P a Psilo s Abies_korean c Cedrus_deodar m e K a Larix_decidu s an Pseudotsuga_s f inna i Ephedra_equisetin nconsp Angiopteris_evect a Welwitschia_mirabili ephala i i Gnetum_gnemo yc Gnetum_montanu Equisetum_arvens a c s ky i Gnetum_ul s a Bowenia_serrulat u Marchan s i C n a_ a t r a Cera i ulipi t ua_ um_nudu m pinulo m a Dioon_spinulosu E t Lepidozamia_peroffskyan o a Macrozamia_mountperriensi Hupe Stangeria_eriopu Ptilidium_pulcherrimu Is m i thur m c f t s Zamia_furfurace i on_ oe hejuen ornicu a Ginkgo_bilob r a P eu m e Amborella_trichopod Physcomitrella_paten oellendo s Tr e ellia_enditia_polymorpht m a Nuphar_adven Aneura_mirabili rz e Nymphaea_alb s m hus_spica s m Illicium_oligandru t ili oean ia_lu _ Drimys_granadensi t s a Tetraplodon_fuegianu s m Piper_cenocladu sc 1 fla iodend ex Syntrichia_rurali is Calycanthus_floridur t ro 2 Nyholmiella_obtusifoliSanionia_uncina cc a Machilus_yunnanensi a s Ort c rff e Li a_ 3 idul id Magnolia_kwangsiensi oli v i f a i Chloran m ti No ho ii Heliconia_collinsianu s T Te foli a t a tr a Musa_ rc Zygnema_circumcarinatuhoceros_aenigmak i Ravenala_madagascariensi s Meso a traphis_pellucidc a Cu Anthoceros_angustuk hu s i ia_lepido m Zingiber_spectabilpha_la m Ananas_comosu formi Chae s Ty taenium_endlicherianu _ s Staurastrum_punctulatu r t Carex_siderosticta oge a Carex_siderosticta s to s Carex_siderosticta a m z r v sphae ioide i Anomochloa_marantoide ti a Klebsormidium_flaccidu Roya_anglic a Pharus_lappulaceu a s a s ali a ticu Puelia_olyri ridiu s Ferrocalamus_rimosivaginua_ str s Bambusa_oldhami a Chlorokybus_atmophyticu m s Oryz s _au r c Interfilum_terricolChara_vulgari_globo m s ni Bromus_vulgari ite ino r Mesostigma_virid a Triticum_aestivu fo Xylochloris_irregulari m m ali a Aristida_purpureag is_m c s Watanabea_reniformi su m Zea_mayr i Pseudochloris_wilhelmi m h s a Planctonema_lauterborni P s Isachne_distichophyllthonia_ m Eragrost s a Dan i e Syagrus_coronat Myrmecia_israelensiNeo a C s Elaeis_guineensi s hor cyst s Podococcus_barter s foliu Botryococcus_braunii s Calamus_caryotoide unclassified_TrebouxiophyceaCoccomyxa_subellipsoidecyst is Paradoxia_mul _b i Pseudophoenix_vinifer is_paras r Bismarckia_nobili s ev i a is Colpothrinax_cooki a onem it s Phoenix_dactylifer t s ica Sabal_domingensi Microthamnion_kuetzingianuPabia_signiensi t Stichococcus_bacillari ise i Dasypogon_bromeliitum_cyr m m Koliella_longi ta Allium_cep i a Eustrephus_latifoliu Fusochloris_perforat e Polygona Elliptochloris_bilobat Iris_gatesi Dic se s a tyochloropsis_re ta Cypripedium_macrantho a m licat s Paphiopedilum_niveutrip foliu Lep m Phragmipedium_longifoliue_ t h a M os a Bletilla_striatant sill Auxenochlorella_protothecoidearv ira_ ticula Cal d a Lobosphaera_incisania_gete a rre ta Cymbidium_aloiina_pu Chlo str a mina is Eryc a rella_ ta Oncidium_hybri m Dicloster_acuatuvulga Corallorhiza_bulbos Parachlorella_kessler a s ris Cattleya_crispat Pyramimonas_parkea m Pycnococcus_provasoli s Masdevallia_coccine s Dendrobium_catenatu i Picocystis_salinaru Neottia_nidus-avi Prasinoderma_colonial i a e e Epipogium_roseu Nephroselmis_olivace Elleanthus_sodiro Nephroselmis_astigmatic i ata i m Sobralia_callosfum Phalaenopsis_aphrodityera_ a e ood tlingian Monomast a G Rhizanthella_gardnera Micromonas_si a e Ostreococcus_taurx_sp Habenaria_pan Pedinomonas_mino a Oltmannsiellopsis_viridi p Vanilla_planifoli Campynema_lineari i a Pseudendoclonium_akinetu r Fritillaria_cirrhos Ulva_fasciat s Lilium_hansoni a Tydemania_expeditioni a Heloniopsis_tubiflorm m Paris_verticillat i Bryopsis_hypnoide Trillium_cuneatu Gloeotilopsis_sterili s m s Trillium_tschonoskix Treubar Geminella_minos Veratrum_patulu ia_tr Xerophyllum_tenas iappend r s Pleodor iculata Bomarea_eduli s ina_starr Luzuriaga_radican Chlamydomonas_reinhardtiGonium_pec ii e toral Dioscorea_elephantipe e ata Hafnio i Metanarthecium_luteoviridica_palm monas Carludov a Dunaliella_salin_laevis Phacotus_lenticularia Sciaphila_densiflor 1 Cha Petrosavia_stellarisa raciochloris s Oogamochlamys_gigante_acuminata Pinellia_ternat lenta Colocasia_escu a r Carteria_ceras Lemna_mino Bracteacoccus_giganteuiformis a s Spirodela_polyrhiza Acutodesmus_obliquu Wolffia_australian s ta Mychonastes_jurisi Wolffiella_lingula i e Schizomeris_leibleinii Dieffenbachia_seguin Stigeoclonium_helveticu Elodea_canadensis m s Floydiella_terrestris Najas_flexili Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersum Vitis_vinifera Euptelea_pleiosperma rastigma_hemsleyanum Tet Papaver_somniferum Oenothera_biennis Ak a ebia_trifoliata Eugenia_uniflor Nand ida ina_domestica Stockwellia_quadrif Sinopodophyllum_hexandru tus_aromaphloia Berberis_beale m Eucalyp a i Corymbia_gummifer Epimedium_sagittatu a m Angophora_floribund Megaleranthis_saniculifoli a Clematis_terniflor Allosyncarpia_ternat s a thus_villosu Ranunculus_macranthua Melian chifolia Thalictrum_coreanu ancoa_son s Fr folia St 1 ephania_japoni m Viviania_mari Nelumbo_lu c Pelargonium_x a a Platanus_occidentea Monsonia_specios a Macadamia_integrifoli m talis Hypseocharis_bilobat Meliosma_af 1 Tetracentron_sinensf a Geranium_palmatutexanumm Erodium_ thu Trochodendron_aralioide ifolium Buxus_microphyll e Erodium_chrysan folia dium_carvrantii Pachysandra_terminali s Ero s_au itum Fagopyrum_esculentua Citru r lum_pipe a Rheu thoxy i m s Zan Spinacia_olerace_palm Azadirachta_indicsis atum m inen Salicornia_brachiat Sapindus_mukorosss e Haloxylon_ammodendro eronia_ a Dipt a Coloban s a Acer_morrisonens A t Boswellia_sacr s grost hus_qui Silene_chalcedonicaemm n Tilia_amurensi m a_gi tensi Silene_chalcedonicathag s Hibiscus_syriacu o Silene_paradox o s Silene_latifoli 1 Gossypium_hirsutu a Silene_vulgari Theobroma_caca i a 2 mani Silene_conic a Aquilaria_sinensiCarica_papayse m Silene_conoide chee a s tori Silene_noctiflora ladon_ inc m hyc t Cryophytum_crystallinu ac tis_ ineu i Carnegiea_gigante a P Lepidium_virginicuIsa lsug a Ard a s is ien Lysimachia_coreania_po Eutrema_sa t a mpa Primula_poissonilyst s Vaccinium_macrocarpo a m Crucihimalaya_wallichi icta Cochlearia_borzaean a Camellia_cuspidat Cardamine_i a Camellia_sinensi a Arabidopsis_thalianBrassica_napu a Ligusticum_tenuissimu i Arabis_hirsut m Ane oliu s Foeniculum_vulgar dif thum_graveolen a n or r Angelica_acu s Lobularia_maritimc erio a_ t lba Seseli_montanu m s Pas us_a Olimarabidopsis_pumilthione Viola_seoulensiSalix_inl a B t m e upleuinaca_pimpinellit s A Popu s Daucus_caro ilob e s Anthriscus_cerefoliur a um m ris Aralia_undulat _ t fal Hevea_brasiliensi a Brassaiopsis_hainl c a Dendropanax_den t a Manihot_esculentJatropha_curca a tu foli s Eleutherococcus_senticosu m a Ricinus_communi o Fatsia_japonic Licania_alb s Kalopanax_septemlobu a Parinari_campes a Metapanax_delavay m a Panax_ginsen a Hirtella_racemos s Schefflera_delavay t a Lonicera_japonic a ige Couepia_guianensi a Artemisia_frigid r Chrysobalanus_icac e Chrysanthemum_ Juglans_regi a As Euonymus_japonicuOstrya_rehderian m Leon g i m ter_spa s s Agera i s ss llu u y v a Praxelis_clematide olli ti topodiu 0.4 Quercus_rubrm a a Helianthus_annuu a i aph s t t _ la Guizotia_abyssinicina_adenophorthuli a is s Jacobaea_vulgari m Cynara_humili f x Castanopsis_echinocarpanea_ u a m oliu Lithocarpus_balansast a_penc a Saussurea_involucrat _leiolepi Trigonobalanus_doichangensi a Silybum_marianu s Ca mm Cu Cen e a st Fragaria_vesc c a Lactuca_sativ s s Brighamia_insigni o taurea_di s no a Tr a a _indi m O s Gy Prinsepia_utili s e tiv a a Corynocarpus_laevigatPyrus_pyrifoli c Echites_umbellatuncino c s a Pentactina_rupico m heliu s Prunus_persic ra Pentalinon_luteu _ _lupulu m Ne Moru s m Asclepias_nive s a ff u s Cynanchum_wilfordir t m us c ulu Rha iu is_ a m i x _ F m a Catharanthus_roseu m a a a Gentiana_stramine t c s _oleande enuilobae m Coffea_arabic zy Hu Cannabis_sa a Gy Elaeagnus_macrophyll s r Jasminum_nudifloru a_ uleu s s Hesperelaea_palmer oja_ a O no S a Do str Plantago_maritimlea_europae a m Vigna_radiat c Medicago_truncatulTrifolium_aureu m a Plantago_medi i m s s Scrophularia_takesimensi rcocer h ct a iv r t Andrographis_paniculat t a Trifolium_strictu inu m s S hode t a s Tanaecium_tetragonolobu Sc e s a La Trifolium_subterraneu ivu a Lindenbergia_philippensi s t a Boulardia_latisquam Phaseolus_vulgari Lupinus_luteu loribund t a a us_japonicu in a Cistanche_deserticol hwalbea_a t f t e Epifagus_virginian t a ia_ Conopholis_american h m s i a Orobanche_californic a m Orobanche_crenat s_ _o Glycine_tomentell c t a O r Millettia_pinnat i yc Premna_microphyll Lo a Lavandula_angus aea_ u m Rosmarinus_officinali a s Salvia_mil Scu Tectona_grandi V Ajuga_reptan robanche_gra m h G m a P al C U a a ff m r eria_ a ygr c a richandr m i

a t i c u _indi e t t m n m m

Cicer_arie r

e inali

Glycyrrhiza_glabr Lens_culinari s n is riden i u g

u r t t c s a o m ellaria_baicalensi c _albu l u c u i Pisum_sa u u qua Astragalus_nakaianu s metr W u n a i a i c

p

l m e

m c s

t a

r n u

a b

a u u

r u e a m

_ a m Acacia_ligula i _ l t a r Osyris_alb p a nga_leio _ i Lathyrus_odoratu sc g i cu I i m t c _

_ a Lathyrus_pubescen _ _ iorrhiz

r horum_chinens m an a V o r g Larrea_ a t e m a i Leucaena_ u a n

n

e i Paeonia_obovat h r s c b a i c g

a o a

o i l a s

b ili t e a s s Pen v a a p s a o a r Dunalia_obovat m r Saracha_punctat i

Sedum_sarmentosu i s e a t a

c

Iochroma_nitidu o i a i f a a Physalis_peruvian i m t Hyoscyamus_nige a oli C

p a

Atropa_belladonn N I s e Datura_stramoniu e

Liquidambar_formosan s a

s

Solanum_bulbocastanu

Lycopersicon_lycopersicu

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S7: Size-fractionated mass spectrometry analysis of Silene plastid proteins. Gel slices correspond to native gel shown in Figure 3. A) Scaled AdjSPC summed across all subunits in different components of the Clp complex in S. noctiflora and S. latifolia. Values are scaled to 1 by dividing by the maximum observed value in any of the 19 gel slices. B) Control complexes used for internal calibration. Scaled AdjSPC values are calculated as in panel A.

A Silene noctiflora Silene latifolia 1.00 Replicate 1 0.75

0.50

0.25

0.00 1.00 Replicate 2 0.75 AdjSPC (Scaled to 1) 0.50

0.25

0.00 1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 19 Gel Slice Clp P−ring Clp R−ring CLPC/D CLPT1/2

B Silene noctiflora Silene latifolia 1.00 Replicate 1 0.75

0.50

0.25

0.00 1.00 Replicate 2 0.75 AdjSPC (Scaled to 1) 0.50

0.25

0.00 1 3 5 7 9 11 13 15 17 19 1 3 5 7 9 11 13 15 17 19 Gel Slice CPN60 (800 kDa) Rubisco (550 kDa) THI1 (2XX kDa) GS2 (2XX kDa) TKL−1 (150 kDa) PREP1/2 (110 kDa)

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure S8: Loss of ClpP1 catalytic sites superimposed on ClpP1 evolutionary rate tree. Internal branches colored based on simple parsimony.

Metasequoia_glyptostroboide

C

T Cunningha Cup

a aiwania_cryp

Cephalotaxus_wilsonian H

C l J Am l i

e unipe t s a

Cryptomeria_japonic r r

a

s s Cupressus_gigan e l o l i

l o a

p s i a

ss

p en y c s

e Retrophyllum_piresi i Araucaria_he a s

h Adiantum_capillus-veneri e

i

u r s Osm i i

p d t r o Cyrtomium_devexiscapula s

W s Myriopteris_lindheimer o s u n i m Podocarpus_totar g nens anni

r o c idian _ a i m _ r a L t r u s e T e s s n i i a y v oodwardia_unigemma a n s m y y e _be s orreya_ n s x ia_lan p Wollemia_nobili a e godium_japonicu o a unda _ A Taxus_maire g m b r u e a

r a m

o f O t n o

ga h Diplopterygium_glaucu s o Loss of His r omerioide a s a

Marsilea_crenat t P i k rm c _ u _leh per r s phioglossum_cali k

_ t

m Nageia_nag i a ia_da t _ a s t f _ h A a cu str hi a r eridium_aquilinu o t s i s c ungen o udian g

o t l rm y t _ s v _ s e eola u rt

u l

s S f a s _da a m iren s ophila_ a ai t argesi n a Loss of Asp m b erophyll a t s h elaginella_ b a o u s e

t t s elee o a r s M n _ a e t

_ r r s c n

i a

i s t a i s e Psilo a t s ozamia_hilda m c a s mm e an an a f Angiopteris_evect a t a s a

inna C P S P s Abies_korean nconsp Equisetum_arvens Cedrus_deodar K i s a Larix_decidu Pseudotsuga_s Marchan i Ephedra_equisetin ephala Loss of Ser and Asp ky a Welwitschia_mirabili u i Gnetum_gnemo yc t a s a Gnetum_montanu c t a i Gnetum_ul ulipi ua_ Bowenia_serrulat a_ t um_nudu m pinulo s i C n i r a Cera m a Dioon_spinulosu Ptilidium_pulcherrimu Hupe o E Is a Lepidozamia_peroffskyan m c m Macrozamia_mountperriensi Loss of His and Asp f Stangeria_eriopu thur on_ a Physcomitrella_patenP oe hejuenornicu t s i Zamia_furfurace i r eu a Ginkgo_bilob e oellendo m e ellia_enditia_polymorpht s Amborella_trichopod Aneura_mirabili rz e Tr s m a Nuphar_adven oean Tetraplodon_fuegianu s m Nymphaea_alb thus_spica tili s a ia_lu _ m Illicium_oligandru 1 Loss of all three sites Drimys_granadensi sc Syntrichia_rurali f s m 2 s Nyholmiella_obtusifoliSanionia_uncina la Piper_cenocladu iodend tex ro Ort is Calycanthus_floridur a 3 c cc a Machilus_yunnanensi idul rff e Li a_ oli No id Magnolia_kwangsiensi if s T ho v i m t Te ii a i Chloran u Zygnema_circumcarinatut a f Heliconia_collinsian hoceros_aenigmak tr oli a rc s Meso a traphis_pellucidic a Musa_ Anthoceros_angustuk hu a Ravenala_madagascariensi i ia_lepido s Cu Chae m m Zingiber_spectabilpha_la formi Staurastrum_punctulatutaenium_endlicherianu Ananas_comosu _ s Ty r s Carex_siderosticta s to oge ta m s Carex_siderosticta a phae s Carex_siderosticta v a Klebsormidium_flaccidu z r ti s Roya_anglic ioide i Anomochloa_marantoide a a a Pharus_lappulaceu s ali a ridiu ticu Puelia_olyri a_ str Chlorokybus_atmophyticu s Ferrocalamus_rimosivaginu a m s Bambusa_oldhami r c Interfilum_terricolChara_vulgari s s_au ni _globo m s Oryz e no r Mesostigma_virid Bromus_vulgari it i fo Xylochloris_irregulari a a m Triticum_aestivum ali Watanabea_reniformi Aristida_purpure is_m c s i Pseudochloris_wilhelmi s m rag s a Planctonema_lauterborni u Zea_mayh m P s Isachne_distichophyllthonia_ m Eragrost s a Dan i Myrmecia_israelensi e s C Neo Syagrus_coronat a hor s Elaeis_guineensi s oliu Botryococcus_braunicyst f unclassified_TrebouxiophyceaCoccomyxa_subellipsoideicyst s Podococcus_barter Paradoxia_mul is s Calamus_caryotoide i _b Pseudophoenix_vinifer s_paras r i s a ev i Bismarckia_nobili is Colpothrinax_cookia tonem s Microthamnion_kuetzingianu iti s Phoenix_dactylifer Pabia_signiensi ca Sabal_domingensi m Stichococcus_bacillari t um_cyr m Koliella_longi ise i Dasypogon_bromeliit ta Allium_cep i Fusochloris_perforat a Eustrephus_latifoliu Dic Elliptochloris_bilobat e Polygona tyochloropsis_re se s Iris_gatesi a a ta Cypripedium_macranthocat m ipli foliu Lep s Paphiopedilum_niveutr m Phragmipedium_longifoliue_ to h a Auxenochlorella_protothecoideMa s a Bletilla_striatant sill d a Lobosphaera_incisrv ira_ ticula ania_gete a Cal rre ta Cymbidium_aloiina_pu a Chlo m str a r ina is Eryc m ella_ ta Oncidium_hybri Parachlorella_kesslerDicloster_acuatuvulga a Corallorhiza_bulbos s Pyramimonas_parkea ris Cattleya_crispat m Pycnococcus_provasoli s Masdevallia_coccine i Dendrobium_catenatu Picocystis_salinaru s e Prasinoderma_colonial i Neottia_nidus-avi a Nephroselmis_olivace e Epipogium_roseu Nephroselmis_astigmatic i Elleanthus_sodiroata i m Sobralia_callosfum a Phalaenopsis_aphrodityera_ e ood tlingian Monomast a G a Micromonas_s a Rhizanthella_gardnere Ostreococcus_taurix_s Habenaria_pan Pedinomonas_mino p a Oltmannsiellopsis_viridi p Vanilla_planifoli i Campynema_lineari a Pseudendoclonium_akinetu r Fritillaria_cirrhos Ulva_fasciat Lilium_hansoni a Tydemania_expeditioni s a Heloniopsis_tubiflorm Bryopsis_hypnoide m Paris_verticillat i Gloeotilopsis_sterili s Trillium_cuneatu m s Trillium_tschonoskix Treubar Geminella_mino Veratrum_patulu ia_tr s s iappend r Xerophyllum_tena s Pleodor icu Bomarea_eduli s ina_starrlata Chlamydomonas_reinhardtiGonium_pec ii Luzuriaga_radican e toral Dioscorea_elephantipe e mata Hafnio i Metanarthecium_luteoviridvica_pal monas_lae Carludo a Dunaliella_salinvis 1 Phacotus_lenticularia Sciaphila_densiflor Petrosavia_stellaris Characiochlo s a ris_acumina Pinellia_ternat lenta Oogamochlamys_giganteta locasia_escu a Co r Carteria_ce Lemna_mino Bracteacoccus_giganteurasiformis a Spirodela_polyrhiza Acutodesmus_obliquus Wolffia_australian s Wolffiella_lingulata Mychonastes_jurisii e Schi Dieffenbachia_seguin zomeris_leibleinii s Stigeoclonium_helveticum Elodea_canadensi Floydiella_ Najas_flexilis terrestris Acorus_calamus Oedogonium_cardiacum Ceratophyllum_demersu Vitis_vinifera m Euptelea_pleiosperma Tetrastigma_hemsleyanum Papaver_somniferu thera_biennis m Oeno Akebia_trifoliata Eugenia_uniflora N ida andina_domestica Stockwellia_quadrif Sinopodophyllum_hexandru a Berberis_beale m Eucalyptus_aromaphloia i Corymbia_gummifer Epimedium_sagittatu a Megaleranthis_saniculifolim Angophora_floribunda Clematis_terniflor Allosyncarpia_ternat s a thus_villosu Ranunculus_macranthua Melian chifolia Thalictrum_coreanu ancoa_son a s Fr foli Stephania_japoni m Viviania_mari 1 Nelumbo_lu Pelargonium_x a ca Platanus_occidentea Monsonia_specios a Macadamia_integrifoli m Meliosma_af talis Hypseocharis_bilobat 1 Tetracentron_sinens a Geranium_palmatutexanumm f Erodium_ thu Trochodendron_aralioide m Buxus_microphyll e ifoliu a Erodium_chrysandium_carv tiifoli Pachysandra_terminali Ero _auran itum Fagopyrum_esculentua s trus r a Ci lum_pipe Rheu hoxy i m s Zant Spinacia_olerace_palm Azadirachta_indicsis Salicornia_brachiatatum m sinen e Haloxylon_ammodendro Sapindus_mukorosseronia_ a a Dipt Coloban s A a Acer_morrisonens gro thus_qui Boswellia_sacr s Silene_chalcedonicastemm Tilia_amurensi m Silene_chalcedonicaa_gi tensi n o thag s Silene_paradox o Hibiscus_syriacu s Silene_latifoli a Silene_vulgari 1 Gossypium_hirsutu 2 Theobroma_caca anii Silene_conic a Aquilaria_sinensisem m Silene_conoide a Carica_papayhee c oria Silene_noctiflor s t m Cryophytum_crystallinua ladon_ tinc i chyc tis_ Carnegiea_gigante a Pa Lepidium_virginicuIsa lsugineu a Ard s Lysimachia_coreanis a tien ia_po pa a Primula_poissoni Eutrema_sa m s Vaccinium_macrocarpolyst ict a m Crucihimalaya_wallichimine_i a Camellia_cuspidat a Cochlearia_borzaean a Camellia_sinensi Carda a a Brassica_napu Ligusticum_tenuissimu i Arabidopsis_thalianArabis_hirsut m Ane foliu s Foeniculum_vulgarthum_graveolen rdi r Angelica_acu a n co s Lobularia_maritima_ lba Seseli_montanu m s Pas lus_a a B Olimarabidopsis_pumilhioneViola_seoulensiSalix_interio upleutinaca_pimpinelli m et s Daucus_caro tilob s A Popu s Anthriscus_cerefoliur e s Aralia_undulatum a tri _ m a Brassaiopsis_hainl fal Hevea_brasiliensi a Dendropanax_den c ta at Manihot_esculentJatropha_curca s Eleutherococcus_senticosu um foli o Fatsia_japonic a Ricinus_communiLicania_alb s Kalopanax_septemlobu a a Metapanax_delavay m Parinari_campes a Panax_ginsen s Schefflera_delavay Hirtella_racemos a a Lonicera_japonic t a Artemisia_frigid a ige Couepia_guianensi e Chrysanthemum_ r As Chrysobalanus_icacJuglans_regi a im Leont g Euonymus_japonicuOstrya_rehderian m Agera er_spa ss llu s s y vu a Praxelis_clematide i s olli ti Helianthus_annuutopodiu Quercus_rubrm a a taph s la Guizotia_abyssinictina_adenophort a i s_ s Jacobaea_vulgari huli a i co Cynara_humili m i a anea_ u a Saussurea_involucrat m foliu x Castanopsis_echinocarpst a_penc a Silybum_marianu _leiolepi Lithocarpus_balansa Cen Cu a s Trigonobalanus_doichangensiCa mm c a Lactuca_sativ e s Brighamia_insigni Fragaria_vesc s t st o a Tr aurea_di no _indi m iv a O a a a s Prinsepia_utili s e t Echites_umbellatuncinoc s c m Pentalinon_luteuheliu s Gy Corynocarpus_laevigatPyrus_pyrifoli a Ne a Pentactina_rup r _lupulu m s Prunus_persic _ m Asclepias_nive s a Cynanchum_wilfordi Morus s r u Rha iu tis_ m ff c ulu x Catharanthus_roseu a us i a a Gentiana_stramine m _ m F m s Coffea_arabic t c a m a Gy zy _oleande enuilobae a Hu s Jasminum_nudifloru Elaeagnus_macrophyllCannabis_sa s Hesperelaea_palmer a_ r oja_ a O no uleu s 0.4 a Do S Plantago_maritimlea_europae str m a Plantago_medi c Scrophularia_takesimensi Vigna_radiat s rcoc h a m iv Andrographis_paniculat ict s Medicago_truncatulTrifolium_aureu t m s S t a m inu s Tanaecium_tetragonolobu hode r t a Sc e a Trifolium_strictu a La s a Lindenbergia_philippensi s ivu a Boulardia_latisquam e t t a Cistanche_deserticol hwalbea_a a in t e Epifagus_virginian loribund Conopholis_american t r a Trifolium_subterraneu Lupinus_luteuus_japonicu f Orobanche_californic h m Phaseolus_vulgari a Orobanche_crenat a t ia_ a m a O Premna_microphyll s t Lavandula_angus r s_ yc a Rosmarinus_officinali c Salvia_mil i m Scu aea_ u i Tectona_grandi _o Ajuga_reptan P G Glycine_tomentell m a C U robanche_gra a Millettia_pinnatLo m r al a m V a h a s m a a i

u e c t m n ff m richandr m eria_ t r _indi ygr e

s n t i i g

u r c t Cicer_arie riden c

c ellaria_baicalensi Lens_culinari t l s c inali u is i Glycyrrhiza_glabr _albu u u s a u qua o m

Pisum_sa a i

p l e c metr

t a Astragalus_nakaianuW m r m c

a b a i a u u u r u s

_ a

i _ e l t m a m Acacia_ligula Osyris_alb p a

g

m r sc t nga_leio _ i _ _ _ i Lathyrus_odoratu iorrhiz i I r a c cu

a

Lathyrus_pubescen o

g a horum_chinens e V a an Larrea_ t r

n n i e i h r m Leucaena_ s a Paeonia_obovat b g c a

o

o a i l

b t e ili a a a

v s a

a a o r s s Pen m r Dunalia_obovat i a Saracha_punctat i s e a t Sedum_sarmentosu c

o Iochroma_nitidu i i a Physalis_peruvian i f t a a Hyoscyamus_nige a

Capsicum_annuu oli p a m Atropa_belladonn

N I

e Datura_stramoniu e s Liquidambar_formosan s a s

Solanum_bulbocastanu Lycopersicon_lycopersicu

bioRxiv preprint doi: https://doi.org/10.1101/405126; this version posted August 31, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table S1. clpP1 and psaA dN, dS, and dN/dS values in 25 angiosperms Species clpP1 dN clpP1 dS clpP1 dN/dS psaA dN psaA dS psaA dN/dS Ancestor of S. bicolor and O. sativa 0.149 0.292 0.511 0.009 0.147 0.058 Sorghum bicolor 0 0.061 0 0.003 0.058 0.046 Oryza sativa 0.012 0.038 0.323 0.007 0.073 0.096 Musa acuminata 0.019 0.077 0.252 0.004 0.084 0.047 Cucumis sativus 0.073 0.086 0.855 0.009 0.107 0.084 Prunus persica 0.031 0.087 0.357 0.001 0.085 0.015 Lathyrus sativus 0.211 0.285 0.742 0.002 0.07 0.031 Glycine max 0.021 0.184 0.111 0.011 0.199 0.055 Medicago truncatula 0.126 0.2 0.632 0.003 0.04 0.069 Acacia ligulata 0.386 0.263 1.468 0.002 0.033 0.054 Ricinus communis 0.009 0.156 0.057 0.004 0.076 0.048 Populus trichocarpa 0.019 0.064 0.301 0.001 0.074 0.016 Arabidopsis thaliana 0.044 0.158 0.277 0.006 0.147 0.041 Gossypium raimondii 0.015 0.069 0.214 0.005 0.1 0.048 Eucalyptus grandis 0.004 0.046 0.087 0.003 0.033 0.09 Oenothera biennis 0.365 0.478 0.764 0.002 0.108 0.022 Geranium maderense 0.493 0.424 1.161 0.009 0.169 0.05 Vitis vinifera 0.063 0.042 1.494 0.003 0.051 0.059 Erythranthe lutea 0.001 0.053 0.023 0.002 0.069 0.026 Plantago maritima 0.679 0.638 1.065 0.001 0.108 0.011 Solanum lycopersicum 0.13 0.13 0.999 0.002 0.093 0.026 Lobelia siphilitica 0.297 0.378 0.785 0.008 0.197 0.038 Silene noctiflora 0.578 0.532 1.085 0.003 0.019 0.134 Silene latifolia 0 0.047 0 0.001 0.004 0.136 Liriodendron chinense 0.008 0.039 0.199 0.003 0.047 0.061 Amborella trichopoda 0.034 0.135 0.249 0.006 0.135 0.046

Table S2: Large dataset sampling

Table S3: Small dataset sampling

Dataset S1A: Mass spectrometry-based identification of S. noctiflora proteins and annotation based on best homolog to Arabidopsis. Dataset S1B: Mass spectrometry-based identification of S. latifolia proteins and annotation based on best homolog to Arabidopsis.