Journal of Genetics (2019)98:109 Ó Indian Academy of Sciences

https://doi.org/10.1007/s12041-019-1157-3 (0123456789().,-volV)(0123456789().,-volV)

RESEARCH NOTE

Complete plastid genome of maritima L. () and its implications in phylogenetic reconstruction of

QIANG LAI1,2, CHENGJIE ZHU3, SHIRAN GU1,2, TIEYAO TU1* and DIANXIANG ZHANG1

1Key Laboratory of Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, People’s Republic of China 2University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China 3Shenzhen Academy of Metrology and Quality Inspection, Shenzhen 518055, People’s Republic of China *For correspondence. E-mail: [email protected].

Received 30 May 2019; revised 8 September 2019; accepted 12 September 2019

Abstract. The present paper reports for the first time the characteristics of the complete plastid genome of Surianaceae (Suriana maritima L.) in the order Fabales. The circular complete plastid genome is 163,747 bp in length with a typical quadripartite organization containing 115 unique genes, of which 80 are protein-coding genes, 31 tRNA genes and four rRNA genes. The plastid genome of S. maritima is characterized by absence of intron in the atpF gene, which has never been reported for any other of the Fabales. The gene content and their orders in the plastid genome of Surianaceae are similar to the basal lineages of the legume (, ) and , supporting a likely common ancestor for the three families. Phylogenetic analysis supported the sister relationship between Surianaceae and Leguminosae, with strongly supported by Bayesian method and moderately supported by likelihood method. The complete plastid genome of Surianaceae could provide potential benefit in resolving the long-standing unresolved interfamily relationships of Fabales when a more comprehensive sampling from and Leguminosae is available for future studies.

Keywords. chloroplast genome; Leguminosae; plastome; Polygalaceae; Quillajaceae.

Introduction Surianaceae within the order Fabales and for better understanding of its genetic and biogeographic diversifica- Family Surianaceae, with five genera and eight species of tion on the Pacific islands and coasts. and , is a small family of Fabales occurring exclusively in the pan tropics (Christenhusz and Byng 2016). Suriana maritima L., also known as bay cedar, is the Materials and methods only species of the Suriana L. which is distributed in tropical coasts and islands in , America, , East We collected fresh and healthy tissues of S. maritima and the Pacific Ocean (Claxton et al. 2005). from Xisha Islands (also known as Paracel Islands) of China, S. maritima can be easily recognized by its actinomorphic, and deposited the voucher specimens (TuTY2599-XS) in the pentamerous and yellow flowers with 4–5 carpels, and nut- herbarium of South China Botanical Garden (IBSC). The like . This species has a high tolerance for , salt, whole genomic DNA was isolated with a modified CTAB heat and wind, and is considered as an ideal plant species for method (Doyle and Doyle 1987). The isolated total genomic coastal landscapes (Liu et al. 2018). Unfortunately, the DNA was fragmented to construct short-insert (300–500 bp) genetic and biogeographic diversification of the family are libraries following the manufacturer’s manual (Illumina). poorly understood and its phylogenetic relationships with Paired-end (PE) sequencing was conducted on the Illumina other three families (Leguminosae, Quillajaceae and Poly- HiSeq X-Ten instrument at Beijing Genomics Institute (BGI) galaceae) of Fabales have long been controversial (Bello in Wuhan, China. The clean data of sequencing were filtered et al. 2012). Knowledge on the genome of this species will and assembled using GetOrganelle pipeline (Jin et al. 2018), be crucial for recovering the phylogenetic position of which has been considered the most effecient assembler in 109 Page 2 of 7 Qiang Lai et al. assembling plastid genomes (Freudenthal et al. 2019). The We used microsatellite identification tool, MISA (Thiel GetOrganelle pipeline employs combined commands to et al. 2003) to locate the simple sequence repeats (SSRs) loci recruit plastid-like reads using Bowtie2 (Langmead and in the plastid genomes of S. maritima. The minimum Salzberg 2012) and to assemble the filtered reads using (threshold) number of the SSRs was set as 10, 6, 5, 4 for SPAdes (Bankevich et al. 2012). We manually corrected the mono-, di-, tri-, tetra-nucleotides SSRs. The IRa was not de novo assembly graphs and generated the complete cir- included in this analysis. cular chloroplast genome using Bandage (Wick et al. 2015) To reconstruct the phylogeny of Surianaceae and related based on the results from GetOrganelle. We used Geneious families, we included Prunus persica (L.) Batsch (Rosa- 9.1.8 (Kearse et al. 2012) to verify the accuracy of the ceae), Cucumis hystrix Chakrav. (Cucurbitaceae) and Cas- assembly, and used plastid genome annotator (PGA) (Qu tanea mollissima Blume (Fagaceae) as outgroups besides the et al. 2019) for genome annotation coupled with manual ingroup species mentioned above (table 1). We aligned 77 correction of start and stop codons and intron/exon bound- protein-coding genes using MAFFT 7.308 (Katoh and aries in Geneious. The annotated plastome was deposited in Standley 2013), followed by manually adjusting wherever GenBank (accession number: MK830069). necessary and concatenated them into a supermatrix by We applied Mauve 2.3.1 (Darling et al. 2004) to compare Geneious. We reconstructed the phylogeny by the methods the plastome structures. Because the plastid genome of Poly- of maximum likelihood (ML) using RAxML 8.2.10 (Sta- galaceae is not available so far, the dataset for plastome matakis 2014), including robustness assessment using structure comparisons included only three of four families in 1000 replicates of rapid bootstrap under the substitution Fabales, namely Surianaceae, Quillajaceae and Leguminosae. model of GTR ? G. We also reconstructed the phylogeny We employed genome of saponaria Molina from using the method of Bayesian inference (BI) by MrBayes 3.2 GenBank to represent Quillajaceae, a small family with only (Ronquist and Huelsenbeck 2003) with the setting of one two species from Brazil and Peru. For the much larger family million generations and under the substitution model of Leguminosae, we recruited plastomic data of nine species to GTR?I?G. represent the four of six subfamilies according to the classi- fication of LPWG (2017) (table 1). We also hired Trifolium aureum Pollich, which has a typical smaller plastome due to Results and discussion loss of IRa, as a characterize species in the inverted repeat- lacking clade (IRLC) (Sveinsson and Cronk 2014), and Aca- The circular complete chloroplast genome of S. maritima is cia ligulata Benth., which has typical larger plastomes caused 163,747 bp in length and presents a quadripartite organiza- by IRs expanding into the SSC, to represent the inverted tion: a large single copy (LSC) region of 90,899 bp and a repeat-expanding clade (IREC) (Dugas et al. 2015). The IRa small single copy (SSC) region of 20,284 bp, respectively. region was not included in the plastome structure analysis These two regions were separated by two inverted repeat because it was lost in T. aureum Pollich and it has not been regions (IRa and IRb), each 26,282 bp in length. A total of reported for Q. saponaria. To estimate the plastid genome 115 unique genes were recovered, consisting of 80 protein- length of Quillajaceae, we added an IR region according to the coding genes, 31 tRNA genes and four rRNA genes. The sequences of Q. Saponaria (accession number: MH880827). overall GC content of the whole plastome is 35.3%. It was

Table 1. Sampled species and characteristics of plastid genomes.

GenBank Number accession of unique GC Sequence Species name Order Family/subfamily number genes content/% length (bp)

Suriana maritima L. Fabales Surianaceae MK830069 115 35.3 163,747 canadensis L. Fabales Leguminosae/Cercidoideae KF856619 107 36.2 158,995 Bauhinia acuminata L. Fabales Leguminosae/Cercidoideae MF135595 112 36.5 155,548 Ceratonia siliqua L. Fabales Leguminosae/ KJ468096 108 36.7 156,367 ligulata Benth. Fabales Leguminosae/Caesalpinioideae NC_026134 112 35.4 174,233 bijuga (Colebr.) Kuntze Fabales Leguminosae/Detarioideae KX673214 108 36.2 159,215 Tamarindus indica L. Fabales Leguminosae/Detarioideae KJ468103 108 36.2 159,551 Lupinus luteus L. Fabales Leguminosae/Papilionoideae NC_023090 112 36.6 151,894 Trifolium aureum Pollich Fabales Leguminosae/Papilionoideae NC_024035 112 34.6 126,970 Astragalus nakaianus Y.N.Lee Fabales Leguminosae/Papilionoideae NC_028171 110 34.1 123,633 Molina Fabales Quillajaceae MH880827 109 36.5 132,838 Cucumis hystrix Chakrav. Cucuribitales Cucurbitaceae NC_023544 113 37.0 155,031 Prunus persica (L.) Batsch NC_014697 113 36.8 157,790 Castanea mollissima Blume Fagales Fagaceae NC_014674 113 36.8 160,799 Plastid genome of Surianaceae Page 3 of 7 109 higher in IRs (42.4%), moderate in LSC (32.6%), while between the genome size variation and species richness much lower in SSC (28.7%). within Fabales until a much denser sampling covering all the The plastid genomes are relatively conserved in angios- major clades of the four families are available. perms in terms of size, structure and gene content, ranging Plastome structure comparison suggested that the genes from 120 kb to 170 kb in length in most cases (Shaw et al. and their orders in the plastid genome of S. maritima is same 2007). Our results suggested that all the investigated species as the basal lineages of the legume family (Cercidoideae, of the three families have similar plastome contents. How- Detarioideae) and Quillajaceae (figure 1), supporting a likely ever, our study recovered S. maritima to be distinct from any common ancestor for the three families. The plastome con- other reported species of Fabales by having the atpF gene tents and structures of subfamilies Caesalpinioideae and without introns. The introns of the atpF gene are usually Papilionoideae of Leguminosae, however, are distinctive conserved in angiosperms and were found lost only in sev- from other lineages of Fabales. The derived position of eral segregate clades of Malpighiales and (Daniell Caesalpinioideae–Papilionoideae supported recent evolution et al. 2008). Sequence from all species of Surianaceae in of the unique features of both plastome and morphology of future study will be helpful in answering weather the loss of the two subfamilies. atpF intron is a synapomorphy of the whole family or just a We found three types of SSRs for the plastome S. mar- distinctive character of S. maritima. itima, including 108 mononucleotide, 25 dinucleotide, and The plastid sequence of Surianaceae is slightly longer than two tetranucleotide repeats (table 2). Our result showed that that of Quillajaceae (163,747 bp vs 158,835 bp), and is the LSC region accounts for the highest proportion (81.5%) between those of species of Leguminosae (table 1). The of all SSRs. The SSRs recovered in this study could be used plastome size of Surianaceae is more comparable to Quil- in detecting genetic polymorphisms at population level lajaceae and the basal lineages of Leguminosae rather than to within Surianaceae in future studies. the derived legume clades, of which Astragalus nakaianus The interfamily relationships within Fabales has long been (123,633 bp) and Acacia ligulata (174,233 bp) have the controversial and are far from been resolved although there minimum and maximum length of sequences, respectively, are only four members in the family-poor order. Based on in the phylogenetic tree. According to the number of species, sequence data from plastid genomes of three families, both Astragalus L. is the largest genus (over 3000 spp) in analyses of Bayesian and likelihood methods strongly sup- angiosperm and Acacia Miller is the second largest genus ported the monophyly of Fabales. The sister relationship (over 1000 spp) in the legume family. However, it is too between Surianaceae and Leguminosae was strongly sup- early to draw a conclusion weather there is a correlation ported by Bayesian method and moderately supported by

Figure 1. Plastome alignment map of three families of Fabales. The polylines in the blocks indicate sequence similarity among these species. Lines linking gene blocks among species with same colour indicate orthologues. Gene blocks arranged above or below in the same species indicate opposite directions of sequences. 109 Page 4 of 7 Qiang Lai et al.

Table 2. List of SSRs in the plastid genome of S. maritima.

Type Start position End position Repeat motif Repeat number Location Region

1 Mono 6 16 A 11 trnH-GUG/psbA LSC 2 Mono 1369 1378 T 10 psbA/matK LSC 3 Mono 2000 2010 T 11 matK LSC 4 Mono 3541 3552 A 12 trnK-UUU LSC 5 Mono 4351 4360 T 10 trnK-UUU/rps16 LSC 6 Mono 4610 4625 A 16 trnK-UUU/rps16 LSC 7 Mono 4628 4638 A 11 trnK-UUU/rps16 LSC 8 Mono 4810 4821 T 12 trnK-UUU/rps16 LSC 9 Mono 5441 5451 A 11 rps16 LSC 10 Mono 5875 5886 T 12 rps16 LSC 11 Mono 5989 5999 T 11 rps16 LSC 12 Mono 6636 6647 T 12 rps16/trnQ-UUG LSC 13 Mono 6757 6766 T 10 rps16/trnQ-UUG LSC 14 Mono 7072 7084 T 13 rps16/trnQ-UUG LSC 15 Mono 7121 7130 A 10 rps16/trnQ-UUG LSC 16 Mono 7530 7541 A 12 rps16/trnQ-UUG LSC 17 Mono 7596 7605 A 10 rps16/trnQ-UUG LSC 18 Mono 8456 8466 A 11 trnQ-UUG/psbK LSC 19 Mono 8585 8598 A 14 trnQ-UUG/psbK LSC 20 Mono 8849 8859 T 11 psbK/psbI LSC 21 Mono 9182 9193 T 12 psbK/psbI LSC 22 Mono 9198 9209 A 12 psbK/psbI LSC 23 Mono 9755 9766 A 12 trnS-GCU/trnG-UCC LSC 24 Mono 9809 9818 T 10 trnS-GCU/trnG-UCC LSC 25 Mono 10103 10112 A 10 trnS-GCU/trnG-UCC LSC 26 Mono 10773 10785 A 13 trnG-UCC LSC 27 Mono 18745 18756 T 12 rps2/rpoC2 LSC 28 Mono 18760 18769 T 10 rps2/rpoC2 LSC 29 Mono 18771 18781 A 11 rps2/rpoC2 LSC 30 Mono 18806 18817 A 12 rps2/rpoC2 LSC 31 Mono 20911 20922 T 12 rpoC2 LSC 32 Mono 21020 21029 T 10 rpoC2 LSC 33 Mono 25135 25147 T 13 rpoC1 LSC 34 Mono 30202 30212 T 11 rpoB/trnC-GCA LSC 35 Mono 30534 30545 T 12 trnC-GCA/petN LSC 36 Mono 31381 31390 T 10 petN/psbM LSC 37 Mono 31617 31626 T 10 petN/psbM LSC 38 Mono 32204 32216 T 13 petN/psbM LSC 39 Mono 32637 32646 T 10 psbM/trnD-GUC LSC 40 Mono 32730 32741 T 12 psbM/trnD-GUC LSC 41 Mono 33226 33235 T 10 trnD-GUC/trnY-GUA LSC 42 Mono 34441 34452 T 12 trnT-GGU/psbD LSC 43 Mono 35448 35459 A 12 trnT-GGU/psbD LSC 44 Mono 39893 39904 T 12 psbZ/trnG-GCC LSC 45 Mono 40075 40084 T 10 trnG-GCC/trnfM-CAU LSC 46 Mono 40215 40224 A 10 trnG-GCC/trnfM-CAU LSC 47 Mono 46033 46043 A 11 psaA/ycf3 LSC 48 Mono 46699 46709 T 11 ycf3 LSC 49 Mono 47750 47759 A 10 ycf3 LSC 50 Mono 48153 48163 T 11 ycf3 LSC 51 Mono 48167 48176 T 10 ycf3 LSC 52 Mono 48227 48236 A 10 ycf3 LSC 53 Mono 48538 48548 T 11 ycf3/trnS-GGA LSC 54 Mono 48954 48963 T 10 trnS-GGA/rps4 LSC 55 Mono 49832 49842 T 11 rps4/trnT-UGU LSC 56 Mono 51825 51835 A 11 trnT-UGU/trnL-UAA LSC 57 Mono 52148 52158 A 11 trnT-UGU/trnL-UAA LSC 58 Mono 53806 53818 T 13 trnF-GAA/ndhJ LSC 59 Mono 55284 55293 T 10 ndhK/ndhC LSC 60 Mono 56023 56033 T 11 ndhC/trnV-UAC LSC 61 Mono 57738 57748 T 11 trnM-CAU/atpE LSC 62 Mono 59615 59624 T 10 atpB LSC Plastid genome of Surianaceae Page 5 of 7 109

Table 2 (contd)

Type Start position End position Repeat motif Repeat number Location Region

63 Mono 59865 59874 A 10 atpB/rbcL LSC 64 Mono 62371 62380 A 10 rbcL/accD LSC 65 Mono 62609 62619 A 11 rbcL/accD LSC 66 Mono 64425 64436 A 12 accD/psaI LSC 67 Mono 64753 64763 T 11 accD/psaI LSC 68 Mono 66154 66163 T 10 ycf4/cemA LSC 69 Mono 66165 66174 T 10 ycf4/cemA LSC 70 Mono 69823 69833 T 11 petA/psbJ LSC 71 Mono 71599 71609 A 11 psbE/petL LSC 72 Mono 73131 73141 A 11 trnP-UGG/psJ LSC 73 Mono 75065 75076 T 12 rps18/rpl20 LSC 74 Mono 75216 75226 T 11 rpl20 LSC 75 Mono 75472 75490 T 19 rpl20/rps12 LSC 76 Mono 75515 75524 A 10 rpl20/rps12 LSC 77 Mono 76970 76981 T 12 clpP LSC 78 Mono 77057 77068 T 12 clpP LSC 79 Mono 77078 77087 A 10 clpP LSC 80 Mono 77147 77158 T 12 clpP LSC 81 Mono 77185 77198 T 14 clpP LSC 82 Mono 78399 78409 A 11 clpP LSC 83 Mono 81685 81696 T 12 petB LSC 84 Mono 84561 84571 A 11 petD/rpoA LSC 85 Mono 86706 86716 T 11 rps11/rpl36 LSC 86 Mono 87782 87791 T 10 rps8/rpl14 LSC 87 Mono 89679 89690 T 12 rpl16 LSC 88 Mono 89695 89704 A 10 rpl16 LSC 89 Mono 89988 89999 T 12 rps12 LSC 90 Mono 90884 90894 T 11 rpl22/trnH-GUG LSC 91 Di 7894 7907 AT 7 trnQ-UUG/psbK LSC 92 Di 8098 8113 AT 8 trnQ-UUG/psbK LSC 93 Di 8314 8327 TA 7 trnQ-UUG/psbK LSC 94 Di 9941 9954 AT 7 trnS-GCU/trnG-UCC LSC 95 Di 9961 9974 TA 7 trnS-GCU/trnG-UCC LSC 96 Di 11864 11881 AT 9 trnR-UCU/atpA LSC 97 Di 12281 12292 TA 6 trnR-UCU/atpA LSC 98 Di 15317 15328 AT 6 atpF/atpH LSC 99 Di 17849 17860 TA 6 atpI/rps2 LSC 100 Di 17897 17908 TA 6 atpI/rps2 LSC 101 Di 38701 38712 TA 6 trnS-UGA/psbZ LSC 102 Di 49857 49868 TA 6 rps4/trnT-UGU LSC 103 Di 49882 49893 TA 6 rps4/trnT-UGU LSC 104 Di 50610 50627 AT 9 trnT-UGU/trnL-UAA LSC 105 Di 64598 64611 TA 7 accD/psaI LSC 106 Di 64854 64865 TA 6 accD/psaI LSC 107 Di 64993 65004 TA 6 accD/psaI LSC 108 Di 86372 86383 TA 6 rps11/rpl36 LSC 109 Tetra 39766 39789 TATT 6 psbZ/trnG-GCC LSC 110 Tetra 45962 45977 CTGA 4 psaA/ycf3 LSC 111 Mono 90992 91001 A 10 rpl22/trnH-GUG IRb 112 Mono 115960 115969 A 10 trnR-ACG/trnN-GUU IRb 113 Di 91223 91244 AT 11 trnH-GUG/rps19 IRb 114 Mono 119896 119906 A 11 ndhF/rpl32 SSC 115 Mono 120104 120113 A 10 ndhF/rpl32 SSC 116 Mono 120745 120756 A 12 rpl32/trnL-UAG SSC 117 Mono 120854 120863 A 10 rpl32/trnL-UAG SSC 118 Mono 121314 121331 T 18 rpl32/trnL-UAG SSC 119 Mono 121429 121440 A 12 rpl32/trnL-UAG SSC 120 Mono 121466 121478 T 13 rpl32/trnL-UAG SSC 121 Mono 123052 123064 T 13 ccsA/ndhD SSC 122 Mono 124675 124684 A 10 ndhD/psaC SSC 123 Mono 125657 125666 A 10 ndhE/ndhG SSC 124 Mono 129005 129016 A 12 ndhA SSC 109 Page 6 of 7 Qiang Lai et al.

Table 2 (contd)

Type Start position End position Repeat motif Repeat number Location Region

125 Mono 129362 129373 T 12 ndhA SSC 126 Mono 131813 131822 A 10 rps15/ycf1 SSC 127 Mono 133740 133750 T 11 ycf1 SSC 128 Mono 134857 134868 T 12 ycf1 SSC 129 Mono 136258 136268 T 11 ycf1 SSC 130 Di 120864 120879 TA 8 rpl32/trnL-UAG SSC 131 Di 120902 120915 AT 7 rpl32/trnL-UAG SSC 132 Di 120926 120937 AT 6 rpl32/trnL-UAG SSC 133 Di 120941 120956 AT 8 rpl32/trnL-UAG SSC 134 Di 121000 121019 AT 10 rpl32/trnL-UAG SSC 135 Di 121814 121829 AT 8 trnL-UAG/ccsA SSC

Figure 2. The ML tree and BI inferred from a concatenation of 77 protein-coding genes. The numbers before the slash are ML bootstrap support values, numbers after the slash are BI posterior probabilities; L, Leguminosae. likelihood method. All the other nodes among the species of Acknowledgements Fabales are strongly supported (figure 2). We would expect a better resolved and more strongly supported phylogeny of This work was financially supported by the National Natural Sci- ence Foundation of China (grant 31270266), the Ministry of Sci- Fabales in near future when a more comprehensive sampling ence and Technology of China (2013FY111200) and the Strategic is available, such as samples of Polygalaceae and the sub- Priority Research Programme of the Chinese Academy of Sciences families of and Duparquetioideae of (XDA13020500). Science and Technology Planning Project of Leguminosae. Guangdong Province (2019B030316020). Plastid genome of Surianaceae Page 7 of 7 109

References Katoh K. and Standley D. M. 2013 MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. Bankevich A., Nurk S., Antipov D., Gurevich A. A., Dvorkin M., Kearse M., Moir R., Wilson A., Stones H. S., Cheung M., Sturrock Kulikov A. S. et al. 2012 SPAdes: a new genome assembly S. et al. 2012 Geneious basic: an integrated and extendable algorithm and its applications to single-cell sequencing. J. desktop software platform for the organization and analysis of Comput. Biol. 19, 455–477. sequence data. Bioinformatics 28, 1647–1649. Bello M. A., Rudall P. J. and Hawkins J. A. 2012 Combined Langmead B. and Salzberg S. L. 2012 Fast gapped-read alignment phylogenetic analyses reveal interfamilial relationships and with Bowtie 2. Nat. Methods 9, 357–359. patterns of floral evolution in the eudicot order Fabales. Liu J. F., Li S. C., Chen H. J., Tu T. Y. and Zhang D. X. 2018 A Cladistics 28, 393–421. karyological study of Suriana maritima L. (Surianaceae) from Christenhusz M. J. M. and Byng J. W. 2016 The number of known Xisha Islands of South China Sea. Caryologia 71, 109–112. species in the world and its annual increase. Phytotaxa LPWG-the legume phylogeny working group. 2017 A new 261, 201–217. subfamily classification of the Leguminosae based on a Claxton F., Banks H., Klitgaard B. B. and Crane P. R. 2005 Pollen taxonomically comprehensive phylogeny. Taxon 66, 44–77. morphology of families Quillajaceae and Surianaceae (Fabales). Qu X. J., Moore M. J., Li D. Z. and Yi T. S. 2019 PGA: a software Rev. Palaeobot. Palyno. 133, 221–233. package for rapid, accurate, and flexible batch annotation of Daniell H., Wurdack K. J., Kanagaraj A., Lee S. B., Saski C. and plastomes. Plant Methods 15, 50. Jansen R. K. 2008 The complete nucleotide sequence of the Ronquist F. and Huelsenbeck J. P. 2003 MrBayes 3: Bayesian cassava (manihot esculenta) chloroplast genome and the evolu- phylogenetic inference under mixed models. Bioinformatics 19, tion of atpF in Malpighiales: RNA editing and multiple losses of 1572–1574. a group II intron. Theor. Appl. Genet. 116, 723–737. Shaw J., Lickey E. B., Schilling E. E. and Small R. L. 2007 Darling A. C. E., Mau B., Blattner F. R. and Perna N. T. 2004 Comparison of whole chloroplast genome sequences to choose Mauve: multiple alignment of conserved genomic sequence with noncoding regions for phylogenetic studies in angiosperms: the rearrangements. Genome Res. 14, 1394–1403. tortoise and the hare III. Am. J. Bot. 94, 275–288. Doyle J. J. and Doyle J. L. 1987 A rapid DNA isolation procedure Stamatakis A. 2014 RAxML version 8: a tool for phylogenetic for small quantities of fresh leaf tissue. Phytochem. Bull. 19, analysis and post-analysis of large phylogenies. Bioinformatics 11–15. 30, 1312–1313. Dugas D. V., Hernandez D., Koenen E. J. M., Schwarz E., Straub S. Sveinsson S. and Cronk Q. 2014 Evolutionary origin of highly and Hughes C. E. et al. 2015 Mimosoid legume plastome repetitive plastid genomes within the genus (Trifolium). evolution: IR expansion, tandem repeat expansions, and accel- BMC Evol. Biol. 14, 228. erated rate of evolution in clpP. Sci. Rep. 5, 16958. Thiel T., Michalek W., Varshney R. K. and Graner A. 2003 Freudenthal J. A., Pfaff S., Terhoeven N., Korte A., Ankenbrand M. Exploiting EST databases for the development and characteri- J. and Fo¨rster F. 2019 The landscape of chloroplast genome zation of gene-derived SSR-markers in barley (Hordeum vulgare assembly tools (https://doi.org/10.1101/665869). L.). Theor. Appl. Genet. 106, 411–422. Jin J. J., Yu W. B., Yang J. B., Song Y., Yi T. S. and Li D. Z. 2018 Wick R. R., Schultz M. B., Zobel J. and Holt K. E. 2015 Bandage: GetOrganelle: a simple and fast pipeline for de novo assembly of interactive visualization of de novo genome assemblies. Bioin- a complete circular chloroplast genome using genome skimming formatics 31, 3350–3352. data (https://doi.org/10.1101/256479).

Corresponding editor: MANOJ PRASAD