<<

Supplementary Information

Localized hypermutation and associated gene losses in chloroplast genomes

Alan M. Magee1, Sue Aspinall2, Danny W. Rice3, Brian P. Cusack1, Marie Sémon4, Antoinette S. Perry1, Sasa Stefanovic5, Dan Milbourne6, Susanne Barth6, Jeffrey D. Palmer3, John C. Gray2, Tony A. Kavanagh1, and Kenneth H. Wolfe1

1 Smurfit Institute of Genetics, Trinity College, Dublin 2, Ireland 2 Department of Sciences, University of Cambridge, Downing Street, Cambridge CB2 3EA, United Kingdom 3 Department of Biology, Indiana University, Bloomington, Indiana 47405 4 Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS, INRA, UCB Lyon 1, Ecole Normale Supérieure de Lyon, 69364 Lyon Cedex 07, France 5 Department of Biology, University of Toronto, Mississauga, Ontario L5L 1C6, Canada 6 Teagasc Crops Research Centre, Oak Park, Carlow, Ireland Supplementary Figure legends

Figure S1. Synonymous and nonsynonymous divergence in angiosperm chloroplast matK (A), rbcL (B), and ycf4 (C) genes. The trees are the same as those in Figure 1, except for the addition of taxon names. Shown are dN (left) and dS (right) trees resulting from a codon-based likelihood analysis and a constrained topology, rooted using gymnosperm sequences (which are not included in the trees). See Methods for details of tree construction. The are in the same order from top to bottom in all trees to the greatest extent possible. In (C), numbers after species names indicate the ycf4 gene length in codons. The ycf4 pseudogenes in and Clitoria are included in the ycf4 trees and labeled in the left panel. The odoratus and Pisum sativum ycf4 pseudogenes were not included in the tree because they are truncated and very divergent, making their boundaries hard to define.

Figure S2. Dot matrix plots showing repetitive DNA around the accD-ycf4 region in . All plots are drawn to the same scale and show (A) self-comparison of Pisum sativum, (B) self- comparison of Lathyrus sativus, (C) comparison of P. sativum to L. sativus, and (D) self-comparison of L. latifolius. Panels A, B and D used a criterion of 28 matching bases per 30 bp window, and panel C used a criterion of 20 matches per 30 bp window. Black dots show matches on the same DNA strand and red dots show matches on the complementary strand.

Figure S3. (A-C) Gene maps of the chloroplast genomes of Lathyrus sativus (grasspea), Pisum sativum (pea), and Nicotiana tabacum (tobacco), drawn using GenomeVx (Conant and Wolfe 2008). Larger versions of the grasspea and pea maps are presented in Figure S7. Uppercase letters A through R show the 18 segments of conserved gene order among these three genomes, with plus or minus signs indicating orientation. The tobacco map shows an isomer whose small single copy region is inverted relative to the usual presentation of this genome (Shinozaki et al. 1986). The dashed line in region A of the tobacco map represents the copy of the IR that was deleted in legumes. (D) One scenario for rearranging the 18 conserved segments to turn an ancestral angiosperm genome with a tobacco-like gene order (but lacking an IR) into the current L. sativus and P. sativum gene orders, using only inversions. This scenario was calculated using GRIMM (Tesler 2002). The specific 6- and 8-step paths leading to L. sativus and P. sativum, respectively, are computationally optimal solutions (most parsimonious), but of course are not necessarily what actually happened. The 8-step path to P. sativum proposed by GRIMM involves the same set of inversions, but in a different order, to those proposed by Palmer et al. (1988) whose segment nomenclature is shown at the bottom. (E) Phylogenetic context of inversions and gene losses. The relative order of events within individual branches is not known. We calculated T. subterraneum (Cai et al. 2008) to have a total of 15 inversions and 4 protein gene losses, using GRIMM with a modified annotation that included ycf1, ycf4, rps18 and trnG-UCU and ignored a duplicate copy of trnI-CAU.

Figure S4. Dot matrix plots of self-similarity in six chloroplast genome sequences, using a criterion of 28 matches per 30 bp window. To allow tandem and near-tandem repeats to be seen, we do not show the major diagonal which would normally appear as a solid black line running from the bottom-left to the top-right corner of each plot. The IR appears as a pair of long red (inverted orientation) diagonals in the four IR-containing genomes. Arrows show the positions (where present) of ycf4, accD and psaI in the P. sativum and L. sativus plots.

Figure S5. Nucleotide sequence alignment between L. latifolius and L. cirrhosus in the accD-ycf4- cemA region. Caret symbols show nucleotide substitutions. In L. latifolius, blue triangles show the 57- bp repeat (lowercase letters indicate differences relative to the consensus), and orange triangles show the 67-bp repeat (the two copies are identical). Complete copies of the repeats are labeled A1-A6 and B1-B2, and incomplete copies are show by triangles without labels. The two largest repeat sequences in the L. cirrhosus accD-ycf4 intergenic region are shown by magenta and purple underlining (2 x 15 bp each). Sequences were aligned using Muscle and were constrained (after inspection of dot-matrix plots) to align only the B1 and A1 copies of the L. latifolius repeats to L. cirrhosus.

Figure S6. The LPD2-accD fusion in Trifolium repens. (A) Comparative organization of LPD genes and cDNAs in Medicago truncatula and T. repens, not drawn to scale. cDNA sequences for the four genes were inferred from EST data. Intron/exon organization is known only for the M. truncatula genes, and is conserved between LPD1 and LPD2 except in the transit peptide region. The chloroplast-derived accD sequence appears to have replaced the last two exons of T. repens LPD2. (B) Evidence that the T. repens fusion gene is orthologous to M. truncatula LPD2. Synonymous (dS) and nonsynonymous (dN) nucleotide divergences among the four genes were calculated using yn00 for the region corresponding to exons 2-13 of M. truncatula LPD2. Divergence between orthologs (shaded boxes) is lower than between paralogs. (C) Assembly of the cDNA sequence of the T. repens LPD2-accD fusion gene from EST data. Five EST reads cross the junction between LPD2 and accD. (D) Alignment of the predicted T. repens LPD2-AccD fusion protein (Tr_LPD2-accD) to other LPD and AccD proteins. The complete sequence of the fusion protein (805 residues) is shown. Residues 1- 512 are aligned to plastid lipoamide dehydrogenases from M. truncatula (Mt_LPD1 and Mt_LPD2) and T. repens (Tr_LPD1). Residues 513-805 are aligned to the Lotus japonicus chloroplast accD gene translation (Lj_accD). Protein domains in LPD are indicated (Lutziger and Oliver 2000), including a conserved motif (HAHPT) in the interface domain that is highly conserved among plastid, mitochondrial and bacterial LPD proteins but has been lost from the T. repens fusion protein. The site corresponding to the location of the intron between exons 13 and 14 of M. truncatula LPD2 is shown by a triangle. In the AccD alignment, similarity to the T. repens fusion protein begins at residue 209 of L. japonicus AccD. The location of the carboxyl transferase domain (Zhang et al. 2003) in AccD is marked. There are no known functional domains in AccD upstream of this domain. An arrow indicates the site corresponding to position 267 of pea AccD, where mRNA editing converts a Ser codon to a Leu codon (Sasaki et al. 2001). This site is edited in one of the two Lotus ESTs available (accession numbers GO021515 and DC597685). (E) Confirmation of the junction between LPD2 and accD in T. repens mRNA by reverse transcriptase PCR and Sanger sequencing. The arrow marks the junction. The primers used for amplification were

5'- AAAATGAAGGGGAGGGACAT and 5'- GCACTGAACCCACAAATGG, amplifying a 200 bp fragment centered on the junction (positions 1465 to 1664 of GenBank accession HM029367). (F) Phylogenetic relationship of Trifolium nuclear AccD sequences to chloroplast AccD sequences from other angiosperms. The tree was constructed from sequences of the C-terminal part of the protein (beginning at residue 513 of the T. repens LPD2-AccD protein). Sequences were aligned using Muscle and the tree was constructed using PhyML (WAG+Γ substitution model, 8 rate classes) as implemented in SeaView (Gouy et al. 2010).

Figure S7. Gene maps of (A) the Lathyrus sativus (grasspea) chloroplast genome, and (B) the Pisum sativum (pea) chloroplast genome.

Supplementary Tables.

Table S1 is a separate Excel file. Table S2. Synonymous and nonsynonymous divergence between pea and sweetpea nuclear genes.

Pea Sweetpea Description S N Omega dN S.E. dS S.E. GenBank acc. EST (dN/dS) number contig D89619 CL884 Cycloartenol synthase 113.6 282.4 0.0000 0.0000 0.0000 0.0178 0.0126 AY166633 CL64 Malate dehydrogenase (MDH) 266.6 678.4 0.0871 0.0059 0.0030 0.0680 0.0171 AF095284 CL564 Tic22 180.9 416.1 0.5556 0.0421 0.0103 0.0758 0.0214 AF043905 CL118 Plastoglobule associated protein PG1 259.7 613.3 0.2124 0.0165 0.0052 0.0776 0.0182 Z31559 CL1383 AccA acetyl-CoA carboxylase 129.9 422.1 0.8786 0.0682 0.0132 0.0777 0.0258 AF095285 CL1536 Tic20 131.9 390.1 0.7360 0.0573 0.0125 0.0778 0.0263 X89828 CL26 Fructose-1, 6-biphosphate aldolase 217.9 601.1 0.1614 0.0134 0.0048 0.0832 0.0207 M94558 CL523 ATP synthase delta subunit 200.1 522.9 1.1445 0.0966 0.0143 0.0844 0.0221 AJ630104 CL810 Galactokinase (galK gene) 244.5 661.5 0.3449 0.0299 0.0068 0.0866 0.0202 AJ250769 CL822 Cytosolic phosphoglucomutase (PGM gene) 133.1 445.9 0.1284 0.0113 0.0051 0.0880 0.0271 AY367058 CL240 SAT5 mRNA 155.8 429.2 0.1579 0.0141 0.0058 0.0894 0.0255 AB104529 CL589 LKA mRNA for brassinosteroid receptor 293.6 726.4 0.2149 0.0196 0.0053 0.0911 0.0187 U56697 CL402 Pyruvate dehydrogenase E1beta 210.4 689.6 0.1894 0.0176 0.0051 0.0930 0.0229 L29077 CL80 Ubiquitin conjugating enzyme (UBC4) 120.4 323.6 0.0314 0.0031 0.0031 0.0987 0.0308 AF275639 CL1669 Cytosolic phosphoglycerate kinase (PGK) 175.9 517.1 0.1168 0.0117 0.0048 0.1002 0.0264 M73744 CL337 IM30 protein mRNA 156.7 461.3 0.1268 0.0131 0.0054 0.1035 0.0276 X63604 CL831 AtpC (gamma subunit of ATP synthase) 177.6 395.4 0.2748 0.0284 0.0086 0.1035 0.0259 X60170 CL48 Mn superoxide dismutase (SOD) 137.0 496.0 0.1238 0.0132 0.0052 0.1068 0.0300 AJ251646 CL1446 beta-1,3-glucanase GNS2 171.8 416.2 0.3448 0.0370 0.0096 0.1072 0.0268 AJ001009 CL643 OEP24 preprotein 141.6 446.4 0.1045 0.0113 0.0051 0.1080 0.0301 AJ005589 CL1292 Protein tyrosine phosphatase 146.7 405.3 0.5498 0.0617 0.0127 0.1122 0.0302 X63605 CL46 PetC mRNA for chloroplast Rieske FeS protein 126.6 380.4 0.2615 0.0295 0.0090 0.1129 0.0327 X53035 CL800 P34 protein kinase 73.9 346.1 0.0489 0.0058 0.0041 0.1186 0.0435 DQ535894 CL1210 Tic21 233.1 501.9 0.2340 0.0285 0.0077 0.1216 0.0246 Y17186 CL664 Translation initiation factor eIF-4A 161.3 462.7 0.0713 0.0087 0.0044 0.1219 0.0297 AF002698 CL1095 NADPH-cytochrome P450 reductase (PSC450R1) 186.6 575.4 0.0677 0.0087 0.0039 0.1293 0.0285 AJ000520 CL1365 Tic55 205.7 646.3 0.1324 0.0172 0.0052 0.1301 0.0280 DQ026703 CL425 WD-40 repeat protein (MSI1) 109.9 673.1 0.0114 0.0015 0.0015 0.1303 0.0376 M71235 CL1699 Aminolevulinic acid dehydratase (ALAD) 142.4 457.6 0.1183 0.0155 0.0059 0.1307 0.0327 U60592 CL178 S-adenosylmethionine decarboxylase 263.3 795.7 0.1892 0.0256 0.0057 0.1352 0.0250 AF148506 CL1175 Hs1pro-1 homolog 64.5 217.5 0.0334 0.0046 0.0046 0.1383 0.0514 DQ845340 CL323 CRY mRNA 116.6 453.4 0.1087 0.0156 0.0059 0.1435 0.0388 AF109922 CL201 Sucrose transport protein SUT1 196.7 601.3 0.1526 0.0222 0.0062 0.1457 0.0297 X54359 CL832 Clone 26g 160.9 529.1 0.1714 0.0250 0.0070 0.1458 0.0333 X82404 CL558 SecA 204.6 662.4 0.0512 0.0076 0.0034 0.1483 0.0301 AY299688 CL353 RPA 32kDa mRNA 165.9 419.1 0.2946 0.0443 0.0105 0.1504 0.0341 X60169 CL343 Catalase 230.2 675.8 0.1382 0.0210 0.0056 0.1520 0.0289 AY822467 CL869 AT-rich element binding factor 2 (ATF2) 94.2 196.8 0.0326 0.0051 0.0051 0.1566 0.0461 AF191098 CL197 Nucleoside diphosphate kinase (NDPK) 180.7 518.3 0.1242 0.0195 0.0062 0.1573 0.0335 AF284759 CL221 Chloroplast TatC protein 126.1 497.9 0.0879 0.0142 0.0054 0.1615 0.0402 X60373 CL1027 Glutathione reductase 161.4 399.6 0.1403 0.0229 0.0077 0.1633 0.0364 L34578 CL264 Histone H1 (H1-41) 189.6 356.4 0.2079 0.0346 0.0101 0.1665 0.0330 X70703 CL485 MAP kinase homologue 126.6 443.4 0.0395 0.0068 0.0039 0.1720 0.0421 M60952 CL383 Chloroplast ribosomal protein (CL22) 150.2 428.8 0.5346 0.0922 0.0154 0.1724 0.0379 AF079850 CL362 Nodule-enhanced malate dehydrogenase (NEMDH) 151.9 418.1 0.1807 0.0318 0.0089 0.1758 0.0397 X66061 CL833 Thiolprotease 190.0 623.0 0.1985 0.0354 0.0077 0.1781 0.0349 X65155 CL179 Ribosomal protein L9 114.7 464.3 0.0407 0.0076 0.0041 0.1863 0.0463 AF323104 CL1455 Phosphatase-like protein Psc923 152.8 456.2 0.1552 0.0291 0.0081 0.1872 0.0395 X54358 CL767 Clone 15a 155.1 477.9 0.0547 0.0105 0.0047 0.1927 0.0400 AF002248 CL732 PSI LHC Chl a/b-binding protein (lhcA-P4) 200.7 555.3 0.0992 0.0201 0.0061 0.2025 0.0365 X62077 CL501 ApxI mRNA for ascorbate peroxidase 167.4 582.6 0.0933 0.0191 0.0058 0.2051 0.0416 AF144684 CL682 SecY 242.4 813.6 0.1606 0.0339 0.0066 0.2114 0.0339 Y15253 CL1298 Phospholipase C 100.6 424.4 0.0853 0.0191 0.0068 0.2238 0.0556 AY830931 CL1039 FVE mRNA 113.8 471.2 0.0374 0.0085 0.0043 0.2281 0.0522 AJ315851 CL232 2-Cys peroxiredoxin 174.6 530.4 0.0894 0.0239 0.0068 0.2677 0.0471 AJ243759 CL746 Outer envelope protein OEP37 139.7 391.3 0.2253 0.0640 0.0132 0.2841 0.0559

Median values and total numbers of sites 9340 27788 0.1353 0.0191 0.1305

Genes are sorted in order of increasing dS. An arbitrary cutoff of dS = 0.3 was applied to remove probable paralogs. S and N are the numbers of synonymous and nonsynonymous sites in each sequence. References for Supplementary Information

Cai Z, Guisinger M, Kim HG, Ruck E, Blazier JC, McMurtry V, Kuehl JV, Boore J, Jansen RK. 2008. Extensive reorganization of the plastid genome of Trifolium subterraneum () Is associated with numerous repeated sequences and novel DNA insertions. J Mol Evol 67: 696-704. Conant GC, Wolfe KH. 2008. GenomeVx: simple web-based creation of editable circular chromosome maps. Bioinformatics 24: 861-862. Gouy M, Guindon S, Gascuel O. 2010. SeaView version 4: A multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 27: 221-224. Lutziger I, Oliver DJ. 2000. Molecular evidence of a unique lipoamide dehydrogenase in plastids: analysis of plastidic lipoamide dehydrogenase from Arabidopsis thaliana. FEBS Lett 484: 12-16. Palmer JD, Osorio B, Thompson WF. 1988. Evolutionary significance of inversions in legume chloroplast DNAs. Curr Genet 14: 65-74. Sasaki Y, Kozaki A, Ohmori A, Iguchi H, Nagano Y. 2001. Chloroplast RNA editing required for functional acetyl-CoA carboxylase in . J Biol Chem 276: 3937-3940. Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, et al. 1986. The complete nucleotide sequence of tobacco chloroplast genome: its gene organization and expression. EMBO J 5: 2043-2049. Tesler G. 2002. GRIMM: genome rearrangements web server. Bioinformatics 18: 492-493. Zhang H, Yang Z, Shen Y, Tong L. 2003. Crystal structure of the carboxyltransferase domain of acetyl-coenzyme A carboxylase. Science 299: 2064-2067. Cullen australasicum max bracteata labialis montana Neonotonia wightii Pseudovigna argentea Pueraria phaseoloides Pachyrhizus erosus Dumasia villosa Phaseolus coccineus Macroptilium atropurpureum Vigna unguiculata Dolichos lablab Macrotyloma uniflorum Cologania lemmonii Psophocarpus tetragonolobus Erythrina sp. Bolusafra bituminosa Cajanus cajan Desmodium barbatum Desmodium psilocarpum Desmodium pauciflorum Lespedeza bicolor Lespedeza cuneata stipulacea Lespedeza intermedia Campylotropis macrocarpus legumes Kennedia nigricans Hardenbergia violacea Mucuna sp. Apios americana Clitoria ternatea Galactia striata Ophrestia radicosa Lathyrus latifolius Lathyrus sativus Lathyrus palustris Vicia faba Medicago truncatula Cicer arietinum Robinia pseudoacacia Sesbania emerus Lotus corniculatus lanata aurea Crotalaria pumila Swartzia flaemingii Caesalpinia andamanica Calliandra californica Brownea sp. Cercis canadensis Polygala californica Fagus sylvatica Populus tremula Brassica oleracea Arabidopsis thaliana Cleome viridiflora Carica papaya Citrus sinensis Oenothera elata Vitis vinifera other Apium graveolens Panax ginseng 0.10 Lactuca indica Nicotiana tabacum Atropa belladonna Pyrola rotundifolia Arctostaphylos uva ursi Spinacia oleracea Saccharum officinarum Zea mays Triticum aestivum monocots Oryza sativa Acorus gramineus Calycanthus floridus Nymphaea alba Amborella trichopoda

matK, dN matK, dS

Figure S1A Bituminaria bituminosa Cullen australasicum Glycine max Teramnus uncinatus Pueraria montana Neonotonia wightii Pueraria phaseoloides Pseudovigna argentea Pachyrhizus erosus Dumasia villosa Phaseolus coccineus Macroptilium atropurpureum Vigna unguiculata Dolichos lablab Macrotyloma uniflorum Cologania lemmonii Psophocarpus tetragonolobus Erythrina sp. Bolusafra bituminosa Cajanus cajan legumes Desmodium intortum Desmodium barbatum Desmodium pauciflorum Desmodium nudiflorum Lespedeza bicolor Lespedeza cuneata Kummerowia stipulacea Lespedeza intermedia Campylotropis macrocarpus Kennedia rubicunda Hardenbergia violacea Mucuna nigricans Apios americana Clitoria ternatea Galactia tashiroi Canavalia ensiformis Ophrestia radicosa Lathyrus sativus Lathyrus pratensis Vicia cracca Medicago truncatula Cicer arietinum Robinia pseudoacacia Sesbania sesban Lotus corniculatus Adesmia exilis Calpurnia aurea x watereri Goodia lotifolia Swartzia sp. Caesalpinia pulcherrima Brownea sp. Cercis canadensis Polygala cruciata Fagus sylvatica Populus tremuloides Brassica oleracea Arabidopsis thaliana Cleome hassleriana Carica papaya Citrus paradisi aurantiifolia other Oenothera elata Vitis vinifera eudicots Panax ginseng Apium graveolens Lactuca sativa 0.10 Nicotiana tabacum Atropa belladonna Pyrola rotundifolia Arctostaphylos alpina Spinacia oleracea Saccharum officinarum Zea mays monocots Triticum aestivum Oryza sativa Acorus gramineus Calycanthus floridus Nymphaea alba Amborella trichopoda

rbcL, dN rbcL, dS

Figure S1B Bituminaria bituminosa 203 Cullen australasicum 203 Glycine max 203 Teramnus uncinatus 203 Amphicarpaea bracteata 203 Pueraria lobata montana 203 Neonotonia wightii 204 Pueraria phaseoloides 202 Pseudovigna argentea 205 Pachyrhizus erosus 203 Dumasia villosa 203 Phaseolus coccineus 85 partial Macroptilium atropurpureum 195 Vigna unguiculata 195 Dolichos lablab 195 Macrotyloma uniflorum 204 Cologania lemmonii 207 ATT start Psophocarpus tetragonolobus 200 Erythrina sp. 205 Bolusafra bituminosa 203 Cajanus cajan 195 Desmodium intortum pseudogene Desmodium cuspidatum pseudogene Desmodium cuneatum pseudogene Desmodium barbatum 203 Desmodium nudiflorum 202 Desmodium pauciflorum 202 Lespedeza bicolor 202 TTG start Lespedeza cuneata 202 TTG start legumes Kummerowia stipulacea 202 TTG start Lespedeza intermedia 202 TTG start Campylotropis macrocarpus 202 TTG start Kennedia nigricans 203 Hardenbergia violacea 203 Mucuna sp. 194 Apios americana 194 Millettioids Clitoria ternatea pseudogene Galactia striata 194 Canavalia ensiformis 194 Ophrestia radicosa 134 partial Lathyrus cirrhosus 340 Lathyrus latifolius 340 Lathyrus sativus 309 * Lathyrus palustris 269 Vicia faba 219 IRLC Medicago truncatula 224 Cicer arietinum 223 Robinia hispida 200 Robinioids Sesbania emerus 203 Lotus corniculatus 200 Adesmia muricata 184 Crotalaria verrucosa 184 Laburnum anagyroides 184 Calpurnia aurea 184 Goodia latifolia 184 Swartzia pinnata 184 Calliandra inaequilatera 184 Caesalpinia pulcherrima 184 Brownea sp. 184 Cercis canadensis 184 Polygala lindheineri 184 Fagus sylvatica 182 partial Populus sp. 184 Brassica oleracea 185 Arabidopsis thaliana 184 Cleome hassleriana 184 Carica papaya 134 partial Citrus sinensis 134 partial other Oenothera elata 187 Vitis vinifera 186 eudicots Panax ginseng 184 Apium graveolens 146 partial Lactuca sativa 63 partial Nicotiana tabacum 184 Atropa belladonna 184 Pyrola chlorantha 178 partial Arctostaphylos uva ursi 178 partial Spinacia oleracea 184 Saccharum officinarum 185 Zea mays 185 Triticum aestivum 185 monocots Oryza sativa 185 Acorus gramineus 103 partial Calycanthus floridus 184 Nymphaea alba 184 0.10 Amborella trichopoda 235

ycf4, dN ycf4, dS

Figure S1C A B 0 1100 2200 3300 4400 0 0 4

0 1000 2000 3000 4000 4 4000 0 0 3 3 3000 0 0 2 2 2000 Lathyrus sativus Lathyrus Pisum sativum Pisum 0 0 1 1 1000 0 0 Pisum sativum Lathyrus sativus

590 729 309

trnQ accD psaI ψycf4 cemA trnQ accD ycf4 cemA

C 0 1000 2000 3000 4000 D 0 0 4 4 cemA

0 750 1500 2250 3000 0 0 3 3 ycf4 3000 309 0 2250 0 2 2 Lathyrus sativus Lathyrus 1500 729 0 accD 0 Lathyrus latifolius Lathyrus 1 1 750

Pisum sativum 0 Lathyrus latifolius 0 trnQ

590 340

trnQ accD psaI ψycf4 cemA accD ycf4 cemA

Figure S2 A J- B E+ K- D+ L+ E- K+

GCC 6 M+ Q+

I+ trnT

3 l

psbN p

rps11 rpoA r

rps8 rpl14 GGU CAU trnG rpl16 rps3 CAU psbD rps19 rpl23 rpl2 ycf2 petL ψ psbC petG trnI trnfM rps14 psaJ trnN psaA

petD psbZ B psbM rpl33 trnD petB psaB O- trnY a rps18 trnE psbH CAU psbT GUU s psbB p rps14 ndhD GUC H- psaA GUA UUC psaC trnW GCC trnfM trnP ndhE trnS CAU

CCA trnG trnL ycf3 UGG ycf3 trnI psbE rpl20 trnR UGA psbF rrn5 rrn4.5 CAA psbL 5’-rps12 UGU psbJ UGU ACG rps4 clpP rps4 rrn23 trnT trnT

GGA F- trnA GGA F- trnS trnC trnS trnI UGC ndhJ GAU rpoB GCA ndhJ UAA UAA ndhK rrn16 GAA GAA 3’-rps12 trnL ndhK trnL ndhC N- 40kb ndhC 40kb 20kb trnF UAC trnV 20kb trnF UAC trnV rps7 GAC trnV rpoC1 atpE atpE ndhB CAU atpB CAU atpB trnM trnM

rbcL rbcL B- rpoC2 ycf1

matK matK rps2 trnKUUU trnKUUU psbA P+ 60kb psbA atpI 60kb A+ rps15 A+ 0kb trnHGUG Pisum sativum 0kb trnHGUG Lathyrus sativus ndhH atpH 121020 bp ndhF 122169 bp ndhF atpF rpl32 ndhA rpl32 3’-rps12 trnLUAG atpA UCU rps7 ndhI ccsA trnR UCC ψ trnG ndhB ndhG rpl23 psbE R+ psbI psbF rpl2 psbL GCU psbK petA rps19 trnS H- psbJ UUG cemA rps3 N+ ycf4 trnQ ψ rpl16 ycf1 psaI accD rpl14 accD 80kb 80kb UUG rps8 100kb trnQ 100kb rpl36 ycf4 rps15 GCU rps11 ndhH trnS rpoA cemA G- GUC ndhA petA psbM GUA psbK trnD UUC psbI trnY ndhI atpA G+ trnE ndhG psbN petN ndhE UCC J+ psaC atpF petD UCU ndhD trnG atpH petL UGA trnR petG petB trnV atpI psaJ GGU rps18rpl33 trnA rps2 psbH trnS trnI P- psbT rrn16 trnT GAC

rrn23 psbB GAU UGC ACG rpoC2 CAA C+ B psbD o trnW trnP rpoC1 p trnL ccsA trnR r rrn5 rrn4.5 psbC trnL

CCA rpl20 UGG psbZ 5’-rps12 UAG clpP

petN trnC

ycf2

GUU D+ GCA trnN Q- R- B+ I- L+ M+ O- C+

C E+ D+ C+ F+

GCC

trnS

GGU

trnG psbZ

trnL psbC trnF GGA psbD trnT GCA UAA GAA petN

psaB

psaA trnC

4

trnM ycf3 1

CAU

s

UGA

rps4 p tRNAThr r UUC B+ GUA CAU GUC

trnS

trnfM ndhJ psbM ndhK trnE trnY ndhC trnD

trnV rbcL rpoB atpE G+ atpB UAC accD rpoC1 psaI ycf4 H+ cemA rpoC2 petA rps2 50kb atpI 25kb atpH psbJ UCU petL psbL atpF psbF trnR UCC petG psbE atpA trnG psaJ trnW psbI D rpl33 I+ CCA rps18 trnP GCU psbK UGG trnS UUG rpl20 trnQ 5’-rps12 rps16 clpP A+ A+ B+ C+ D+ E+ F+ G+ H+ I+ J+ K+ L+ M+ N+ O+ P+ Q+ R+ psbB matK tobacco-like 75kb UUU psbT trnK A+ F- E- D- C- B- G+ H+ I+ J+ K+ L+ M+ N+ O+ P+ Q+ R+ psbH psbN psbA 3 inversions petB 0kb trnHGUG A+ F- E- D- C- B- G+ H+ I+ J+ K+ L+ M+ O- N- P+ Q+ R+ Nicotiana tabacum rpl2 petD rpl23 A+ F- E- D- C- B- G+ H+ I- J+ K+ L+ M+ O- N- P+ Q+ R+ common ancestor rpoA 155943 bp trnICAU rps11 rpl36 ψinfA rps8 ycf2 J+ rpl14 rpl16 rps3 rpl22 rps19 A+ F- E- D- C- B- G+ H+ I- J+ K+ L+ M+ O- N- P+ Q+ R+ common ancestor rpl2 A+ F- E- D- C- G- B+ H+ I- J+ K+ L+ M+ O- N- P+ Q+ R+ rpl23CAU trnL trnI CAA A+ F- E- D- C- G- B+ H+ I- J+ K+ L+ M+ O- R- Q- P- N+ 3 inversions 100kb ndhB trnV A+ F- E- K- J- I+ H- B- G+ C+ D+ L+ M+ O- R- Q- P- N+ rps7 Lathyrus sativus GAC rrn16 3’-rps12 K+ 125kb trnI ycf2 trnA CAA GAU UGC trnL rrn23 rrn4.5 ndhB trnR rrn5 rps7 ACG L+ 3’-rps12 ycf1’ A+ F- E- D- C- B- G+ H+ I- J+ K+ L+ M+ O- N- P+ Q+ R+ common ancestor

trnL rpl32 A+ F- E- D- H- G- B+ C+ I- J+ K+ L+ M+ O- N- P+ Q+ R+ ccsA UAG trnN GUU A+ F- E- L- K- J- I+ C- B- G+ H+ D+ M+ O- N- P+ Q+ R+ GUU GAC ndhF M+ trnN A+ F- E- L- K- J- I+ C- B- G+ H+ P- N+ O+ M- D- Q+ R+ 5 inversions trnV rrn16 GAU UGC ndhD psaC trnI ndhE A+ F- E- L- K- Q- D+ M+ O- N- P+ H- G- B+ C+ I- J+ R+ ndhG ndhI

N+ ndhA trnA rrn23 r ndhH rrn5 ACG p rrn4.5 s

1 A+ F- Q+ K+ L+ E+ D+ M+ O- N- P+ H- G- B+ C+ I- J+ R+ ycf1 Pisum sativum

trnR 5

O+ R+ Q+ 6 5 1 7 8 2 3 4 9 10 11 6 Palmer et al. (1988) P+

E Ancestral angiosperm cpDNA. 79 protein genes. Tobacco-like gene order.

50 kb inversion (BCDEF; puts rps16 beside accD) loss of infA loss of rpl22

IR deletion loss of rps16

2 inversions (NO and I) loss of rpl23

14 inversions

5 inversions 3 inversions

loss of accD loss of ycf4 loss of psaI

outgroups Lotus Medicago Trifolium Pisum Lathyrus japonicus truncatula subterraneum sativum sativus Figure S3 0 0 30250 60500 90750 121000 0 30500 61000 91500 122000 0 0 37500 75000 112500 150000 0 0 0 0 0 0 0 1 2 0 2 2 5 1 1 1 0 0 0 0 5 0 5 7 5 2 0 1 1 9 9 1 0 accD 0 0 0 ycf4 0 psaI 0 5 0 accD 0 0 1 5 6 6 7 Lathyrus sativus Pisum sativum Lotus japonicus 0 0 0 5 0 0 2 5 5 0 0 7 3 3 3 0 0 0 Lathyrus sativus Pisum sativum Lotus japonicus

0 37500 75000 112500 150000 0 33500 67000 100500 134000 0

0 38750 77500 116250 155000 0 0 0 0 0 0 0 0 0 4 5 5 3 5 1 1 1 0 0 0 0 0 5 5 5 2 2 0 6 1 0 1 1 1 1 0 0 0 0 0 0 0 0 5 5 7 7 7 6 7 Oryza sativa Spinacia oleracea Nicotiana tabacum 0 0 0 0 0 5 5 5 7 7 3 8 3 3 3 0 0 0 Nicotiana tabacum Spinacia oleracea Oryza sativa Figure S4 L. latifolius TATTCCTTTAGCAATAGCTGTTATGGATTCTGAGTTTATAGCGGGTAGTATGGGATCCGTAGTGGGTGAGAAAATAACTCGGTTGATCGAATATGCTACCAATCAAGTTTTACCTCTTAT 120 L. cirrhosus TATTCCTTTAGCAATAGCTGTTATGGATTCTGAGTTTATAGCGGGTAGTATGGGATCCGTAGTGGGTGAGAAAATAACTCGGTTGATCGAATATGCTACCAATCAAGTTTTACCTCTTAT 120

L. latifolius TATCGTATGTGCGTCCGGAGGAGCGCGTATGCAAGAAGGAAGTTTGAGCTTAATGCAAATGGCTAAAATTTCTTCTGCTTTATATAATTATCAAATAAATCAAAAGTTATTCTATGTAGC 240 accD L. cirrhosus TATCGTATGTGCGTCCGGAGGAGCGCGTATGCAAGAAGGAAGTTTGAGCTTAATGCAAATGGCTAAAATTTCTTCTGCTTTATATAATTATCAAATAAATCAAAAGTTATTCTATGTAGC 240

2 differences / L. latifolius GATACTTACATCTCCGACTACTGGTGGGGTAACAGCTAGTTTTGCAATGTTGGGGGATATCATTATTGCGGAACCAAACGCAACCATTGCATTTGCGGGTAAAAGAGTAATTGAACAATT 360 526 bp aligned L. cirrhosus GATACTTACATCTCCGACTACTGGTGGGGTAACAGCTAGTTTTGCAATGTTGGGGGATATCATTATTGCGGAACCAAACGCAACCATTGCATTTGCGGGTAAAAGAGTAATTGAACAATT 360 = 0.4%

L. latifolius GTTAAATGTGGAAGTGCCTGAAGGTTCCCAATCGGCTGACCTTTTATTCGACAAGGGAGTATTGGATTCAGTCGTACCACGTCATTTTTTAAAGCAGTTTTTAACTGAGTTATTTCAGTT 480 L. cirrhosus GTTAAATGTGGAAGTGCCTGAAGGTTCCCAATCGGCTGACCTTTTATTCGACAAGGGAGTATTGGATTCAGTCGTACCACGTCATTTTTTAAAGCAGTTTTTAACTGAGTTATTTCAGTT 480

L. latifolius CCATGGTTTCGAACGTAGGAGTATAAATCAAAACGAATCACTATAAAATAAAAAATGTTTTTCAATTTGTTTGATGAGGAGATCCTAATACGAGGACTTGACACGGAAATATTTATGCAG 600 L. cirrhosus CCATGGTTTCGAACGTAAGAGTAAAAATCAAAACGAATCACTATAAAATAAAAACTGTTTTTCAATTTGTTTGATGAGGAGATCCTAATACGAGGACTTGACACGGAAATATTTATCGAG 600 ^ ^ ^ ^^ B1 L. latifolius TACAATAA------AAATAAACTAACTC------ATAAATAAAAAAGAAATTA------ATTATATAACAAAGAAAAAACATA 665 L. cirrhosus TACAATAATTTGAGACTCTCTTTTAAATAATAAATAAATAAATAAATTATATAACGATAAATAATAAAGAAATTAGAGAACGATAAATAATAATAAATTATATAAC------AAACAAA 713 ^^ ^ ^ ^ ^ A1 A2 Intergenic L. latifolius CAAATAAAATGAGTTCCAAAAGATTATGAAAATATCATA-TCTTTTGTACTTTCCACATTCATGATCCAATAATATCATGAACATAAAAAGATTATATCTTTTGTACTTTCCACATTCAT 784 L. cirrhosus CAAACAAAA------AAAGAATTA--AAGAT-TCACAATCTTTTG------GAGGTCAGGAAT------761 ^ ^^ ^ ^ ^ ^^ ^ ^ 19 differences / A3 A4 182 bp aligned L. latifolius GATCCAATAATATCATGAACATAAAAAGATTATATCTTTTGTACTTTCCACATTCATGATCCAATAATATCATGAACATAAAAAGATTATATCTTTTGTACTTTCCACATTCATGATCCA 904 L. cirrhosus ------761 = 10.4%

A5 1 A6 L. latifolius ATAATATCATGAACATAAAAAGATTATATCTTTTGTACTTTCCACATTCATGATCCAATAATATCATGAACATAAAAAGATTATgTCATGAACATAAAAAGATTATGAAAATATCATATC 1024 L. cirrhosus ------761

1 B2 L. latifolius TTTTaTACTTTCCACATTCATGATCCAATAATATCATGAACATAtAAAGAAATTAATTATATAACAAAGAAAAAACATACAAATAAAATGAGTTCCAAAAGATTATGAAAATATCATATC 1144 L. cirrhosus ------761

1 7 L. latifolius TTTTGTACTTTCCACATTCAACAATGCATCATGgACATAAAACAGTGGTTCCACAGGATAAAGCAAGGGTTCCAAAGGATTTGGCAGAATATCAGATCCTTCTTCTCAAAAACTGAAAAA 1264 L. cirrhosus ------ATCATGAACATAAAGCAGTGGTTTCACAGGATAAAGCAAGGGTGCCAAAGGATTTGGCAGAATATCAGATCCTTCTTCTCAAAAACTGAAACA 854 ^ ^ ^ ^ ^

L. latifolius ACAAATGGAGGACATGTTATTTCTTTTATGAACAAAGAGTTCCAAAAGATCTTTTCCAAGATCTTTTTCAAGATAAAGATCGTTTCCAAGATTTTGAAAAAAAGGAAAATAATTTTTGTT 1384 L. cirrhosus AAAAACGGAGGGCATGTTATTTCTTTTATGAACAAAGAGTTCAAAAAGATCTTGGAAAAGATCTTTTTCAAGATAAAGATCGTTTTCTTGATCTTGAAAAAAAGGAAAATAATTTTTGTT 974 ^ ^ ^ ^ ^^^^ ^ ^^ ^

L. latifolius ATTTTTTTGACCCTTTTTTGTAACGGAGCTTTAATCTTGATGATCATAAAAATGATCAATATAGGCCTTGCAATAACCCAAGTTACGCAAGATAACTTGGTCTCGTTTTTGATTGGCCTT 1504 L. cirrhosus ATTTTTTTGAACCTTTTTTTGAACGGAGCTTTAATCTTGATGATCATAAAAGTGATCAATATGAGCCTTGCAATAACCCAAGTTACGCAAGATAACTTGGTCTCGTTTTTGATTGTCCTT 1094 ^ ^^ ^ ^^ ^ ycf4

L. latifolius TTTCTCTTCTTGTTTCAAATTCAATTTGGAAGGCGTGTTGTATCCTATTTATTTCAAACCGGCATTAACGTAATTCGCTGGGTGACACGCAAAGAAGTGCTGGATTATTACGTCTCACAA 1624 L. cirrhosus TTTCTCTTTTTGTTTCAAAGTCAATTTTATAAGCGTGTTGTACCCTATTTATTTCAAACCGGCATTAACATAATTCGCTGGGTGACACGCAAAGAAGTGCTGGATTATTACGTCGCACAA 1214 ^ ^ ^^^ ^ ^ ^ ^ 54 differences / L. latifolius GACATTTGGATTGACACTATCCCGGGTTTCCGAAATGTTTGGTTTCTCTTGTGTTCTTATCTCAATTTTTCTTTAGGGTTCGATTCTCTATATCGTTTCTATATTTTCAATAAGATAAGC 1744 1023 bp aligned L. cirrhosus GACATTTGGATTGACACTATCCCGGGTTTCCGAAATGTTTGGTTTCTCGTGTGTTCTTATCTCAATTTTTATTTCGGGTTAGATTCTCTATCTCGTTTCTATATTTTCAATAAGGTAAGC 1334 = 5.3% ^ ^ ^ ^ ^ ^

L. latifolius ACTGAATTTTTCGAGTCCCCGATTGACAGGGTCTTCGATGTGCCAGTTAAAACTACGCTTTTTCTGCTTGGCAAAGGAATCGTAGGTATCGTTTTAGGTTTTTTGTTGGTATACAAAACC 1864 L. cirrhosus ACTGAATTTTTCGAGTCCCCGATTGACAGGGTCTTCGATGTGCCAGTTAATACTACGCTTTTTCTGCTTGGCAAAGGAATCGTAAGTCTCGTTTTAGGTTTTTTATTAGTATACAAAACC 1454 ^ ^ ^ ^ ^

L. latifolius TATTCCAATAGTAACAGCGGAACTAATTGTTTTGATAAAACAAAAGGAATGCTATATATTATTCGTTACGGATTCCCCTTCCCTGGAAAAACTAGTCGTACTGTTCTTCACATTCGGCTA 1984 L. cirrhosus TATTGCAATAATAACAGCGGAACTAATTGTTTTGATAAAACAAAAGGAATGCTATATATTATTCGTTACGAATTCTCCTTCCCTGGAAAAACTAGTCGTACTGTTCTTCACATTCCGCTA 1574 ^ ^ ^ ^ ^

L. latifolius AAAGATATTACGTGCCTCATACTCAAGACTGACCCAATATTTGGTGATGTCCTTTATATTCAAACCAGAGAACAGGGGATCTTGGCGGTTACACCTCCTAATAATAATGGTCGTATATAC 2104 L. cirrhosus AAAGATATTACGTGCCTCATACTAAGGAGTGATCCAATATTTGGTGATGTCCTTTATATTCAAACCAGAGAACAGGGGATCTTGGCGGTTACACCGCCTAATAATAATGGTCGTATATAC 1694 ^ ^ ^ ^ ^

L. latifolius ACCCTTATGGAACATGGTGCTGAATTGTCCAGATTCTTGAATATACCACTCCAAGTCCTTGAAGAGGTTTACTTCGAGGAAGAGGAAGAATAATTCAACCAAATTACGAATAAAAAAGGG 2224 Intergenic L. cirrhosus ACCCTTATGGAACATGGTGCTGAATTGTCCAGATTCTTGAATATACCACTCCAAGTCCTTGAAGAGGTTTACTTCGAGGAAGAGGAAGAATAATTCAACCAAATTACGAATAAAAAAGGG 1814

2 differences / L. latifolius AGTTTAGCCTATATCATATCAATTGATATATTATAGGCTAAACTGCTCTATTACTTGTATTTACTTGTAATAGAACTTATGCTTGATAGAAATAGTATTATAGATCAAATATATAGAGTT 2344 176 bp aligned L. cirrhosus AGTTTAGCCTATATCATATCAATTGATATATTATAGGCTAAACTGCTCTATTACTTGTATTTACTTGTAATAGAACTTATGCTTGATAGAAATATTATTATAGATCGAATATATAGAGTT 1934 ^ ^ = 1.1%

L. latifolius GAAAAATGAGATGAAACGGAGGACGAAAAATGGCAAAAAAGAAAGCATTCATTCCCCTTCTATGTCTTACATCTATAGTCTTTTTGCCCTGGTGCATCTCTTTTACATTTAAGAAAAGTC 2464 L. cirrhosus GAAAAATGAGATGAAACGGAGGACGAAAAATGGCAAAAAAGAAAGCATTCATTCCCCTTCTATGTCTTACATCTATAGTCTTTTTGCCCTGGTGCATCTCTTTTACATTTAAGAAAAGTC 2054

L. latifolius TGGAATCTTGGATTACTAATTGGTGGAATACGAAAGAATCCGAAATTTTCTTGAATACTATTCAAGAAAAAAGTATTTTAAAGAAATTCCTAGAGTTCGAGGAACTGTTCCTCTTGGATG 2584 L. cirrhosus TGGAATCTTGGATTACTAATTGGTGGAATACGAAAGAATCCGAAATTTTCTTGAATACTATTCAAGAAAAAAGTATTTTAAAGAAATTCCTAGAGTTCGAGGAACTGTTCCTCTTGGATG 2174 cemA

L. latifolius AAATGCTAAAGGAATACCCGGAAACACATTTACAAAACCTTCGTATCGAAATCTACAAAGAGACCATCCAATTGATTCAGACGAACAACCAAGATCATATCCATACGATTTTGCATTTCT 2704 L. cirrhosus AAATGCTAAAGGAATACCCGGAAACACATTTACAAAACCTTCGTATCGAAATCTACAAAGAGACCATCCAATTGATTCAGACGAACAACCAAGATCATATCCATACGATTTTGCATTTCT 2294 0 differences / 630 bp aligned L. latifolius TTACAAATATAATATGTTTCCTTATTCTAAGTGTTTATTCTATTAGGGGTAATCAAGAACTCATTATTCTTAACTCTTGGGTTCAAGAATTTCTATATAACTTAAGTGACACAATAAAAG 2824 = 0.0% L. cirrhosus TTACAAATATAATATGTTTCCTTATTCTAAGTGTTTATTCTATTAGGGGTAATCAAGAACTCATTATTCTTAACTCTTGGGTTCAAGAATTTCTATATAACTTAAGTGACACAATAAAAG 2414

L. latifolius CTTTTTCTATTCTTTTCTTAATTGATTTATCTGTAGGATACCATTCGACTGGTCTTTGGGAACTAATGATTTGGTCTGTCTATAAAGATTTTGGATTTACTCCGAATGATCATATTATAT 2944 L. cirrhosus CTTTTTCTATTCTTTTCTTAATTGATTTATCTGTAGGATACCATTCGACTGGTCTTTGGGAACTAATGATTTGGTCTGTCTATAAAGATTTTGGATTTACTCCGAATGATCATATTATAT 2534

L. latifolius CTTTTCTTGTTTCTATTTTGCCAGCCATTTTAGATACGATTTTAAAATACTGGATTTTC 3003 L. cirrhosus CTTTTCTTGTTTCTATTTTGCCAGCCATTTTAGATACGATTTTAAAATACTGGATTTTC 2593

Figure S5 A Trifolium repens LPD1 cDNA transit plastid LPD1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Medicago truncatula LPD1 gene transit plastid LPD1 chr. 8, AC126780

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Medicago truncatula LPD2 gene transit plastid LPD2 chr. 2, AC149491

Trifolium repens LPD2-accD fusion cDNA transit plastid LPD2 (truncated) accD

B dS M. truncatula T. repens M. truncatula T. repens dN LPD1 LPD1 LPD2 LPD2-accD M. truncatula LPD1 0.1939 0.6453 0.5115 T. repens LPD1 0.0087 0.5454 0.5497 M. truncatula LPD2 0.0275 0.0341 0.3035 T. repens LPD2-accD 0.0372 0.0439 0.0180

C transit plastid LPD2 (truncated) accD

0 500 1000 1500 2000 2500 2600 bp

Figure S6 A,B,C D

transit peptide Tr_LPD1 -MQSSLSLSFSPTSSATTFSRSHHS--FPLS--TATKHLNLRFCGFRRQALGLGFSTSLNRNLQQQQLLPHRQYSAVVSAARS 78 Mt_LPD1 -MQSSLSLSFSPTSSA---ARSHHS--FPVS---TGKPLNLRFCGLRREALGLGFKTSINRNI---RQLPCRQHSATVSAARS 71 Mt_LPD2 MTLSPISLSFSSPTTI---SGPHHCSLFNTT--TTATSLTLRFCSLRREAFAF---TSLSHRSPRNHHL-RLTHLNAISAAIS 74 Tr_LPD2-accD MTLSPISLSSSYATTI---SQPRHCSLFNTTTSTTATSLNLRFCALNPQPFAF---TSLRRRS---HPF-RRTHFNAISPALS 73

Tr_LPD1 DNGSTSGSSFDYDLLIIGAGVGGHGAALHAVEKGLKTAIVEGDVVGGTCVNRGCVPSKALLAVSGRMRELKSDHHLKSLGLHV 161 Mt_LPD1 DNGSTTGS-FDYDLLIIGAGVGGHGAALHAVEKGLKTAIVEGDVVGGTCVNRGCVPSKALLAVSGRMRELKSDHHLKSLGLHV 153 Mt_LPD2 NNGTPPKS-FDYDLLIIGAGVGGHGAALHAVEKGLKTAIVEGDVVGGTCVNRGCVPSKALLAVSGRMRDLRNDHHLKSLGIQV 156 Tr_LPD2-accD DNGTAPKS-FDYDLLIIGAGVGGHGAALHAVEKGLRTAIVEGDVVGGTCVNRGCVPSKALLAVSGRMRELRNDHHLKSLGIQV 155

FAD binding domain Tr_LPD1 SSAGYDRQGVADHANNLASKIRSNLTNSLKAIGVDILTGFGTVVGPQKVKIGSSDNIVTAKDIIIATGSVPFVPKGVEVDGKT 244 Mt_LPD1 SSSGYDRQGVADHANNLASKIRSNLTNSLKAIGVDILTGFGTILGPQKVKIGSSNNVVTAKDIIIATGSVPFVPKGVEVDGKT 236 Mt_LPD2 SNAGYDRQAVADHANNLASKIRGNLTNSMKALGVDILTGFGTILGPQKVKIGSSNNVVTAKDIIIATGSVPFVPKGIEVDGKT 239 Tr_LPD2-accD SNAGYDRQAVADHANNLASKIRGNLTNSMKALGVDILTGFGTILGPQKVKIGSSNNVVTAKDIIIATGSVPFVPRGIEVDGKT 238 LPD Tr_LPD1 VITSDHALKLESVPEWIAIVGSGYIGLEFSDVYTALGSEVTFVEALDQLMPGFDPEISKLAQRVLVNPRNIDYHTGVFASKIT 327 Mt_LPD1 VITSDHALKLESVPDWIAIVGSGYIGLEFSDVYTALGSEVTFVEALDQLMPGFDPEISKLAQRVLVNPRNIDYHTGVFASKIT 319 Mt_LPD2 VITSDHALKLETVPDWIAIVGSGYIGLEFSDVYTALGSEVTFVEALDQLMPGFDPEISKLAQRVLINPRNIDYHTGVFASKIT 322 Tr_LPD2-accD VITSDHALKLETVPDWIAIVGSGYIGLEFSDVYTALGSEVTFVEALDQLMPGFDPEISKLAQRVLINPRNIDYHTGVFASKIT 321

NAD+ binding domain Tr_LPD1 PARDGKPVLIELIDAKTKEPKDTLEVDAALIATGRAPFTEGLGLENVDVATQRGFIPVDERMRVIDANGKLVPHLYCIGDANG 410 Mt_LPD1 PARDGKPVLIELIDAKTKEPKDTLEVDAALIATGRAPFTQGLGLENVDVATQRGFVPVDERMRVIDANGKLVPHLYCIGDANG 402 Mt_LPD2 PARDGKPVMIELIDAKTKEQKETLEVDAALIATGRAPFTQGLGLENIDVATQRGFVPVDERMRVIDANGNLVPHLYCIGDANG 405 Tr_LPD2-accD PARNGKPVLIELIDAKTKEQKDTLEVDAALIATGRAPFTQGLGLENIDVATQRGFVPVDERMRVIDANGNLVPHLYCIGDANG 404

interface domain Tr_LPD1 KMMLAHAASAQGISVVEQVTGRDHVLNHLSIPAACFTHPEISMVGLTEPQAREKGEKEGFEVSVAKTSFKANTKALAENEGEG 493 Mt_LPD1 KMMLAHAASAQGISVVEQVTGRDHVLNHLSIPAACFTHPEISMVGLTEPQAREKGEKEGFDVSVAKTSFKANTKALAENEGEG 485 Mt_LPD2 KMMLAHAASAQGISVVEQVTGRDHVLNHLSIPAACFTHPEISMVGLTEPQAREKGEKEGFDVSIAKTSFKANTKALAENEGEG 488 Tr_LPD2-accD KMMLAHAASAQGISVVEQVTGRDHVLNHLSIPAAGFTHPENSVVGLTEPQAKEKGEKEG------FKANTIALAENEGEG 478 intron conserved

Tr_LPD1 LAKLIYRPDNGEILGVHIFGLHAADLIHEASNAIALGTRI-QDIKFAVHAHPTLSEVLDELFKSAKVKAQASDQVKEPVAI* 573 Mt_LPD1 LAKLIYRPDNGEILGVHIFGLHAADLIHEASNAIALGTRI-QDIKFAVHAHPTLSEVLDELFKSAKVKEHASIPVSEPVAV* 565 Mt_LPD2 LAKLIYRPDNGEILGVHIFGLHAADLIHEASNAIALGTHIQQDIKFAVHAHPTLSEVLDELFKSAKVKAQASDPVKEPVSV* 569 Tr_LPD2-accD HAKLSYR------HIFGLPAADLIHEASNSIALGTRI-QDI NDTNVSDLVSSNDLKGIQDYSHLWVQCEDCYGLNYKRFL 551 Lj_accD ...(190 aa not shown)... KKSSERERTSIITSTNNLNDSDFTKIEGSNDLDETQKYKHLWIECENCYGLNYKKIL 247

carboxyltransferase domain (pfam01039) Tr_LPD2-accD KEKMNICEHCGYHLKMSSSDRLKLLIDPGTWDPMDEYMVSMDPIEFNSEEDPY--CLDKYQQKTGLLEAVQTGTGQLHGIPVA 632 Lj_accD KSKMNICEHCGYHLKMSSSDRIELSIDPGTWNPMDEDMISVDPIEFHSEEEPYKDRIDSYQKTTGLTEAVQTGTGHLNGIPVA 330 mRNA editing site accD Tr_LPD2-accD IGIMEFEFLGGSMGSVVGEKITRLIEYATKQHLPLIIVCASGGARMQEGSLSLMQMAKISSALYDYQINKKLFYVPILTSPTT 715 Lj_accD IGIMDFEFMGGSMGSVVGEKITRLVEYATNQLLPLIVVCASGGARMQEGSLSLMQMAKISSALYDYQLNKKLFYVSILTSPTT 413

Tr_LPD2-accD GGVTASFAMLGDIIIVEPDATIAFAGKRVIEQTLNLEVPEGLQSAEFLFEKGLFDAIVPRNLLKEVLSELFQFHGFFPLTQNE 798 Lj_accD GGVTASFGMLGDIIIAEPNAYIAFAGKRVIEQTLNKAVPEGSQAAEYLFHKGLFDSIVPRNPLKGILSELFQLHAFFPLIQNE 496

Tr_LPD2-accD KYAYSNF* 805 Lj_accD KESRS* 501

E

Figure S6 D, E Lathyrus sativus Pisum sativum IR loss clade Cicer arietinum Trifolium repens nuclear Trifolium pratense nuclear legumes Medicago truncatula Lotus japonicus Glycine max Euonymus americanus Citrus sinensis Carica papaya Morus indica Ficus sp. Moore 315 Nothofagus glauca Populus trichocarpa Gonystylus bancanus Liquidambar styraciflua Heuchera sanguinea Staphylea colchica Quercus nigra Dillenia indica Oxalis latifolia Camellia oleifera Coffea arabica Ilex cornuta Lactuca sativa Helianthus annuus Panax ginseng Ipomoea purpurea Ipomoea batatas Cuscuta reflexa Antirrhinum majus Epifagus virginiana Solanum tuberosum Nicotiana tabacum Daucus carota Ehretia acuminata Forsythia europaea Berberidopsis corallina Aucuba japonica Cornus florida Carpobrotus chilensis Phoradendron serotinum Silene zawadskii Heliosperma alpestre Spinacia oleracea Platanus occidentalis Ranunculus macranthus Megaleranthis saniculifolia Lemna minor Nandina domestica Buxus microphylla Trochodendron aralioides Phalaenopsis aphrodite Piper cenocladum Typha latifolia Elaeis guineensis Dioscorea elephantipes Drimys granadensis Calycanthus floridus var. glaucus Liriodendron tulipifera Ceratophyllum demersum Nymphaea alba Nuphar advena 0.05

Figure S6 F

6

3 l

psbN p

rps11

rpoA r

rps8 rpl14 CAU rpl16 rps3 CAU rps19 rpl23 rpl2 petL ψ petG trnI trnfM rps14 psaJ petD rpl33 psaB rps18 petB psbH psbT psbB psaA trnW GCC trnP

CCA trnG psbE UGG ycf3 psbF rpl20 psbL 5’-rps12 psbJ UGU clpP rps4 trnT

GGA

trnC trnS ndhJ rpoB GCA UAA GAA ndhK trnL ndhC 40kb 20kb trnF UAC trnV rpoC1 atpE CAU atpB trnM

rbcL rpoC2

matK rps2 trnKUUU psbA atpI 60kb Lathyrus sativus 0kb trnHGUG atpH 121020 bp ndhF atpF rpl32 3’-rps12 atpA rps7 trnRUCU UCC trnG ndhB

psbI GCU psbK trnS UUG trnQ ycf1 accD 80kb 100kb ycf4 rps15 ndhH cemA GUC ndhA petA psbM GUA trnD UUC trnY ndhI trnE ndhG petN ndhE psaC ndhD

UGA trnV GGU trnA trnI trnS rrn16 trnT GAC

rrn23 GAU UGC ACG CAA psbD

trnL ccsA trnR rrn5 rrn4.5 psbC trnL

psbZ UAG

ycf2

GUU

trnN

Figure S7A trnT GCC

GGU trnG psbD ycf2 psbC

trnN psaA

psbZ B psbM trnD

trnY a trnE CAU GUU s p ndhD rps14 GUC GUA UUC psaC trnfM ndhE trnS CAU trnL trnI ycf3 trnR UGA rrn5 rrn4.5 CAA UGU ACG rps4 rrn23 trnT

trnA GGA

trnI UGC trnS GAU ndhJ UAA rrn16 GAA 3’-rps12 trnL ndhK 40kb ndhC trnV 20kb trnF UAC rps7 GAC trnV atpE ndhB CAU atpB trnM

rbcL ycf1

matK trnKUUU psbA rps15 60kb Pisum sativum 0kb trnHGUG ndhH 122169 bp ndhF ndhA rpl32 trnLUAG ndhI ccsA ψ ndhG rpl23 psbE psbF rpl2 psbL rps19 psbJ petA cemA rps3 ycf4 ψ rpl16 psaI rpl14 accD 80kb UUG rps8 trnQ 100kb rpl36 GCU rps11 trnS rpoA

psbK psbI atpA psbN UCC atpF petD trnG UCU atpH petL trnR petG petB atpI psaJ rps18rpl33 rps2 psbH psbT psbB rpoC2

B

o trnW trnP rpoC1 p

r

CCA rpl20 UGG 5’-rps12

clpP

petN trnC

GCA

Figure S7B