J Biosci (2021)46:43 Ó Indian Academy of Sciences

DOI: 10.1007/s12038-021-00166-2 (0123456789().,-volV)(0123456789().,-volV)

Chloroplast genome analysis of Angiosperms and phylogenetic relationships among members with particular reference to teak (Tectona grandis L.f)

PMAHESWARI,CKUNHIKANNAN and R YASODHA* Institute of Forest Genetics and Breeding, Coimbatore 641 002, India

*Corresponding author (Email, [email protected])

MS received 14 October 2020; accepted 22 April 2021

Availability of comprehensive phylogenetic tree for flowering which includes many of the economically important crops and is one of the essential requirements of biologists for diverse applications. It is the first study on the use of chloroplast genome of 3265 Angiosperm taxa to identify evolutionary relationships among the plant . Sixty genes from chloroplast genome was concatenated and utilized to generate the phylogenetic tree. Overall the phylogeny was in correspondence with Angiosperm Phylogeny Group (APG) IV classification with very few taxa occupying incongruous position either due to ambiguous or incorrect identification. Simple sequence repeats (SSRs) were identified from almost all the taxa indicating the possibility of their use in various genetic analyses. Large proportion (95.6%) of A/T mononucleotide was recorded while the di, tri, tetra, penta and hexanucleotide amounted to less than 5%. Ambiguity of the taxonomic status of Tectona grandis L.f was assessed by comparing the chloroplast genome with closely related Lamiaceae members through nucleotide diversity and contraction/expansion of inverted repeat regions. Although the gene content was highly conserved, structural changes in the genome was evident. Phylogenetic analysis suggested that Tectona could qualify for a subfamily Tectonoideae. Nucleotide diversity in intergenic and genic sequences revealed prominent hyper-variable regions such as, rps16-trnQ, atpH-atpI, psc4-psbJ, ndhF, rpl32 and ycf1 which have high potential in DNA barcoding applications.

Keywords. Angiosperms; Chloroplast genome; Lamiaceae; Nucleotide diversity; Phylogeny; Tectona grandis

1. Introduction and conserved nature. The major features of angios- perm chloroplast genome include quadripartite circular Chloroplasts are the sunlight driven energy factories structure with two copies of inverted repeat (IR) sustaining life on earth by generating carbohydrate and regions that are separated by a large single copy (LSC) oxygen. Besides photosynthesis, chloroplast performs region and a small single copy (SSC) region (Jansen many biosynthesis such as fatty acids, amino acids, et al. 2005). The genome includes SSRs, single phytohormones, metabolites and production of nitro- nucleotide polymorphisms (SNPs), insertions and gen source (Daniell et al. 2016). Retrograde signalling deletions (InDels), small inversions and divergent of chloroplasts plays a significant role in biotic and hotspots. The cp genome with GC content of 35 - 38 % abiotic stress responses in plants. Circular chloroplast encodes for 120–130 genes including protein coding (cp) genome with size range of 120 to 160 kb under- genes, tRNA and rRNA genes. Genes with introns are goes no recombination and uniparently inherited in 10–25 in number and few duplicated genes are also angiosperms. It has been targeted for complete observed (Wicke et al. 2011). Knowledge on cp gen- sequencing due to the importance of its gene content ome continues to reveal many variations and add

Supplementary Information: The online version contains supplementary material available at https://doi.org/10.1007/ s12038-021-00166-2. http://www.ias.ac.in/jbiosci 43 Page 2 of 9 P Maheswari et al. information on their functional and evolutionary taxa including the members of Chlorophyta, Charo- significance. phyta, Rhodopyta, Bryophyta, Pteridophyta, Gym- Sequencing of genome with unparalleled efficiency nospermae and Angiospermae (Yang et al. 2019). In and precision provides enormous options to completely the present study, cp genome sequences of 3265 sequence cp genome. Several de novo assemblies of cp Angiosperm taxa was analysed for their phylogenic genome are reported frequently for many genome relationships and distribution of simple sequence sequence deficient species thus providing a novel repeats (SSRs). Further, an attempt was made to elu- resource for functional genomics, evolutionary analysis cidate the cp genome of Tectona grandis (teak), an and molecular breeding (Daniell et al. 2016; Guyeux economically important timber tree species growing in et al. 2019). Unique information in cp genome allow 60 different tropical countries, within the family overwhelming applications in taxonomy, phylogeny, Lamiaceae through phylogenetics, characterization of phylogeography and DNA barcoding (Byrne and IR expansion and contraction and identification of Hankinson 2012; Dong et al. 2012;Yanet al. 2019). genetic hotspot regions. Although the gene content and gene ordering in cp genome of plants is highly conserved (Daniell et al. 2016), several rearrangements occurred during evolu- 2. Materials and methods tion of plants (Ali 2019). In few plant families such as Fabaceae and Geraniaceae, the loss of one IR or few 2.1 Sequence extraction and alignment genes were observed, indicating the atypical evolution and gene functional changes among them (Martin et al. The complete chloroplast genome sequences that 2014). Further, accelerated mutation rate in certain belong to Magnoliophyta or Angiospermae were regions of cp genome was also recorded. Both genic downloaded from NCBI on July 03, 2019 using the and intergenic regions show single nucleotide poly- link https://www.ncbi.nlm.nih.gov/genome/browse#!/ morphism (SNP), indels and simple sequence repeat organelles/magnoliophyta. Two species of Gym- (SSR) variations across and within plant species (Wang nospermae, Taxus baccata L. and Pinus sylvestris L. et al. 2018; Zhang et al. 2018a). SSRs are small were included as outgroups. The sequence sampling repeating units of DNA harbouring high level varia- included all the major lineages of Magnoliophyta tions in the sequence and used for species identification belonging to 57 orders, 233 families and 3376 species. in many crop species (Shukla et al. 2018; Takahashi However, 111 taxa had the problems of duplicate et al. 2018;Luet al. 2018). Comparison of cp genome genome entries or lesser number of genes and hence of two Apiaceae members, Angelica polymorpha removed from the analysis. A complete list of 3265 Maxim and Ligusticum officinale Koch showed the taxa with two out group taxa with GenBank accession presence of a 418 bp deletion in the ycf4-cemA inter- numbers is available in the supplementary table 1. genic region of A. polymorpha, which was used in SSRs in plastome sequence were mined using MISA, a discrimination of these taxa (Park et al. 2019). Perl script (http://pgrc.ipkgatersleben.de/misa/misa). Most of the studies on cp genome are limited to solve MISA detect microsatellites in FASTA formatted specific problems at genera or family level. High-res- nucleotide sequence and generates output along with olution plant phylogenies have been reported to iden- statistical data in two separate files. The MISA defini- tify the relationship between the wild and cultivated tion of microsatellites was set by unit size (x) and taxa of economically important species (Carbonell- minimum number of repeats (y): 1/10, 2/7, 3/5, 4/5, Caballero et al. 2015). Chloroplast genetic engineering 5/5, 6/5 (x/y). The maximal number of interrupting has been proven as one of the successful options in base pairs in a compound microsatellite was set to 100. crop genome modifications (Oldenburg and Bendich 2015; Daniell et al. 2016). Further, hybridization compatibility towards generation new cultivars can be 2.2 Phylogeny analysis assessed by their phylogenetic relationships. Geo- graphical origin and history of domestication of a crop Chloroplast genome phylogeny construction was per- variety can be traced using cp genome which paves formed with 60 genes extracted using BioEdit v7.0.5 way for conservation and utilization of unique genetic (Hall et al. 2011). The dataset included genes present in variations (Wang et al. 2019; Nock et al. 2019). all the families like genes encoding small-ribosomal Recently, phylogeny of green plants were analysed proteins (rps2, rps3, rps4, rps8, rps11, rps14, rps15, with 1879 taxa (Gitzendanner et al. 2018) and 3654 rps18), large ribosomal proteins (rpl14, rpl20, rpl22, Chloroplast genome analysis of Angiosperms Page 3 of 9 43 rpl32, rpl33, rpl36), DNA-dependent RNA polymerase the diversity and phylogenetic relationships among (rpoA, rpoB), photosynthesis and energy production them. T. baccata and P. sylvestris were included as out (psaA, psaB, psaC, psaI, psaJ, psbA, psbB, psbC, groups for phylogeny analysis. Maximum and mini- psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, mum plastome size of 2,42,575 bp and 1,13,490 bp psbN, psbT, psbZ, atpA, atpB, atpE, atpH, atpI, ndhB, was observed in Pelargonium transvaalense Knuth and ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, Aegilops cylindrica L., respectively, with the average ndhK, rbcL, petA, petG, petL, petN) and others (ycf4, size of 1,53,043 bp. The GC content was ranging from matK, ccsA, cemA). 33.6 to 43.6 with the average of 37.6%. Average gene The extracted gene sequences were aligned using content was 131 with maximum in Perilla (266 genes) MAFFT v7.409 (Katoh and Standley 2013) with the and minimum (86 genes) in Halesia diptera Ellis settings FFT-NS-2 and executed the output as FASTA (supplementary table 1). Sixty genes were considered format. The aligned sequences were trimmed to equal for phylogeny analysis with minimum, maximum and lengths at the ends using BioEdit sequence alignment average sequence length of 119 bp, 5591bp and 1366 editor software version 7.0.5 and the concatenated with bp respectively (supplementary table 2). The alignment MEGA X version 10.0.5 (Kumar et al. 2018). Phylo- of the concatenated data showed a minimum and genetic analysis was performed for 3,265 species and maximum size of 37,700 bp (Fargesia denudate Yi) outgroups with taxa, family and order levels using and 54,545 bp (Carpinus tientaiensis W.C.Cheng.) maximum parsimony method in PAUP* version 4.0b10 respectively with mean nucleotide composition of A = (Swofford 2002). The most parsimonious trees were 28.1%, C = 17.4%, G = 20.3% and T = 31.6% (sup- recovered with a heuristic search strategy employing plementary table 3). tree bisection and reconnection (TBR) with 100 ran- Although cp genome information of green plants was dom addition sequence replications. In an attempt to employed for phylogeny analysis, consolidated data on infer the taxonomic position of T. grandis, a subset of SSR distribution across different orders is not available data consisting of only Lamiaceae family members in the published literature. The assembled chloroplast along with two out group members were subjected to genomes of 3265 species were mined for the presence phylogeny analysis. All the phylogenetic trees were of SSRs and a total of 163940 SSRs from 3265 scaf- viewed and labelled using Interactive Tree Of Life folds were identified (table 1). Overall 119 repeat types (iTOL) v4 (https://itol.embl.de/) (Letunic and Bork were detected with two types of mono-nucleotides (A/ 2019). T and G/C), three types of di-nucleotides (AC/GT, AG/ Gene distribution in cp genome of 10 Lamiaceae CT and AT/AT), 12 types of tri-nucleotides, 6 types of species selected as close relatives of Tectona was tetra-nucleotides, 31 types of penta-nucleotides and 65 compared and visualized using mVISTA software in types of hexa nucleotides (supplementary table 4). The Shuffle-LAGAN mode (Frazer et al. 2004) with the major SSR types with more than 20 numbers were annotation of T. grandis as a reference. The same listed in the table 2. Among the identified SSRs, the alignment was used to calculate the nucleotide vari- mono-nucleotide was the most abundant SSR type, ability values (p) within Lamiaceae plastomes. The accounting for 95.6% of the total SSR motifs, which sliding window analysis was performed in DnaSP 6.10 was followed by di (2.93%), tri (0.97%), tetra (0.076%) (Rozas et al. 2017) with step size of 200 bp and win- penta (0.13%) and hexa (0.23%) nucleotide SSR dow length of 600 bp and nucleotide diversity, p values motifs. There was a large proportion of mononu- were plotted using excel. Expansion and contraction of cleotide while the rest amounted to less than 5%. the IR regions among the selected Lamiaceae members Although Poales had more number of SSRs, Asterales were investigated and plotted using IRscope (Amiry- has diverse types. A/T richness of the chloroplast ousefi et al. 2018). genome is identified in many previous studies (Cheon et al. 2019). It was suggested that such high amount of mononucleotide repeats in chloroplast genome may 3. Results and discussion contribute to heritable variations (Bi et al. 2018).

3.1 Organization, gene content, and characteristics of the chloroplast genome 3.2 Phylogeny of Angiosperms

Chloroplast genome of 3,265 taxa belonging to 56 Application of phylogenetic reconstruction methods, orders and 223 families were analysed to understand unequivocal availability of genomic data and 43 Page 4 of 9 P Maheswari et al. Table 1. Details on simple sequence repeats (SSRs) in 3265 taxa belonging to 56 orders

Total Number of Number of Number of SSRs Total number Total size of number of SSR sequences present in of sequences examined identified containing containing [ 1 compound S.No. Order examined sequences (bp) SSRs sequences SSR formation 1 Acorales 3 460489 195 3 3 22 2 Alistamales 17 2767927 1228 17 17 219 3 Amborellales 1 162686 38 1 1 1 4 Apiales 61 9372486 2408 61 61 135 5 Aquifoliales 2 317267 116 2 2 14 6 Arecales 39 6086377 2177 39 39 240 7 Asparagales 208 32005035 10935 208 208 1057 8 Asterales 316 49621443 10535 316 316 414 9 Austrobaileyales 9 1309677 532 9 9 73 10 Berberidopsidales 1 158794 65 1 1 10 11 Brassicales 93 14355912 6803 93 93 906 12 Buxales 2 319633 98 2 2 11 13 Canellales 1 160604 90 1 1 14 14 107 16219617 6545 107 107 621 15 Celastrales 3 472699 258 3 3 35 16 Ceratophyllales 1 156252 76 1 1 12 17 Chloranthales 4 633164 113 4 4 6 18 Commelinales 3 460188 142 3 3 10 19 Cornales 9 1434557 410 9 9 38 20 Cucurbitale 28 4406688 1489 28 28 130 21 Dilleniales 1 159266 54 1 1 11 22 Dioscoreales 29 4455210 1565 29 29 160 23 Dipsacales 30 4690662 1103 30 30 76 24 Ericales 150 23578334 7672 150 150 380 25 Fabales 143 20508628 10569 143 143 1496 26 Fagales 88 14110644 6094 88 88 694 27 Garryales 1 163586 31 1 1 4 28 42 6408173 1568 42 42 109 29 Geraniales 42 6944052 2147 42 42 160 30 Huerteales 1 161100 86 1 1 8 31 Icacinales 1 151994 47 1 1 4 32 234 36277144 9777 234 234 645 33 Laurales 27 4130657 1649 27 27 135 34 Liliales 75 11553406 3690 75 75 287 35 Magnoliales 35 5666544 1296 35 35 92 36 Malpighiales 145 22963979 12919 145 145 1630 37 Malvales 65 10478777 4161 65 65 445 38 Myrtales 105 16626435 6706 105 105 748 39 Nymphaeales 19 3046624 328 19 19 10 40 Oxalidales 2 298671 112 2 2 12 41 Pandanales 3 467176 238 3 3 38 42 Paracryphiales 1 157371 42 1 1 2 43 Petrosaviales 1 158020 123 1 1 12 44 Piperales 16 2597913 1285 16 16 137 45 Poales 533 73471619 15584 533 533 1068 46 Proteales 7 1130147 393 7 7 43 47 Ranunculales 125 19736988 5889 125 125 415 48 108 17003495 7150 108 108 795 49 Santalales 3 415563 795 3 3 74 50 Sapindales 63 10035349 4357 63 63 452 51 Saxifragales 40 6209707 1914 40 40 152 52 Solanales 156 24378449 6465 156 156 337 53 Trochodendrales 2 330412 129 2 2 12 54 Vitales 45 7243675 2713 45 45 286 55 Zingiberales 18 2956340 896 18 18 74 56 Zygophyllales 1 136194 140 1 1 13 Total 3265 499683799 163940 3265 3265 14984 Chloroplast genome analysis of Angiosperms Page 5 of 9 43 Table 2. The number of different repeat units in the 3265 Dilleniales formed a specific cluster. Similarly, super- cp genomes and superastrids formed clearly distinguishable clusters. The monocot families Arecales, Acorales, S. no. Repeat Repeat type Numbers identified Alismatales, Asparagales, Commelinales, Dioscoreales, 1 Mono A/T 152563 Liliales, Pandanales, Petrosaviales, Poales and Zin- C/G 3798 giberales could be identified individually. 2 Di AC/GT 20 All the families within 56 orders were assessed for AG/CT 111 their taxonomical position and were in correspondence AT/AT 4597 3 Tri AAC/GTT 53 with the APG IV system of classification (supplemen- AAG/CTT 476 tary figure 2). However, at taxa level, few species were AAT/ATT 1305 placed discordantly in different families. Recently, ACG/CGT 40 complete chloroplast genomes of 26 Gentianales spe- ACT/AGT 27 cies were analysed for phylogenetic relationships and AGG/CCT 25 ATC/ATG 132 found that Gynochthodes nanlingensis (Y.Z.Ruan) Others 5 Razafim. & B.Bremer was grouped under Apocynaceae 4 Tetra AAAT/ATTT 63 (Zhou et al. 2018). The present results were also con- ACCT/AGGT 22 firmed the grouping of G. nanlingensis under Apocy- Others 36 naceae instead of . Similarly, the 5 Penta AAAAT/ATTTT 21 Wightia of has attracted attention in AATAG/ATTCT 30 AATAT/ATATT 50 various studies (Zhou et al. 2014; Xia et al. 2019) and Others 111 in the present study Wightia speciosissimais (D. Don) 6 Hexa AATACT/AGTATT 25 Merr. was placed closed to members AAGTAC/ACTTGT 88 (supplementary figure 2). Similarly, the genus Pedicu- AAGAGG/CCTCTT 64 laris of the family Orobanchasceae was grouped in Others 278 Lamiaceae needs further studies. Recent report on molecular phylogeny of could not clear computational algorithms are continue to resolve sev- the species complex (Li et al. 2019). Cypripedium eral unanswered taxonomical problems in plant spe- macranthos Sw. of Orchidaceae was surrounded by cies. In the present study, phylogenetic trees had Asparagaceae members and Lagerstroemia villosa L. congruence with the Angiosperm Phylogeny Group belonging to the family Lythraceae was embedded in (APG) IV system of classification. Six orders namely, between Combretaceae, could be due to taxonomical Crossosomatales, Picraminiales, Metteniusales, Vah- misidentification (Kim et al. 2014). Similarly, the liales, Escalloniales and Bruniales did not have any species Rivinia humilis L. of the family representative taxa. Some of the orders such as Aco- and echinophorus L. of the family Phy- rales, Amborellales, Aquifoliales, Berberidopsidales, tolaccaceae were placed conflictingly which reflected Buxales, Canellales, Celastrales, Ceratophyllales, the uncertainty of their taxonomical status (Lee et al. Chloranthales, Commelinales, Garryales, Huerteales, 2013). M. echinophorus was considered as a member Icacinales, Oxalidales, Pandanales, Paracryphiales, of the family Petiveriaceae (Walker 2018). The taxo- Petrosaviales, Trochodendrales and Zygophyllales had nomic position of Chaetachme aristata Planch was 1–4 taxa representation. Different levels of diversifi- established in the family (Yang et al. cation of angiosperms were recently proposed (Soltis 2013; Zhang et al. 2018b) instead , and in the et al. 2019) and the results obtained in this study species tree it was grouped along with Cannabaceae showed high level of correspondence with ordinal level members. All such inconsistent placement of plant phylogeny of APG IV (supplementary figure 1). The species demands generation of cp genome information early diverging angiosperms including ANA grade, in all the plant taxa to suitably assess their phylogenic Magnoliids and Chloranthales formed a distinguishing status. clusters. COM clade comprising Celestrales, Oxali- dales, and Malphigiales and Zygophyllales under Fabids formed a separate cluster along with nitrogen 3.3 Taxonomy of Tectona fixing clade Cucurbitales, Fabales, Fagales and Ros- ales. Early diverging Ceratophyllales, Till date the taxonomical status of the species T. Ranunculales, Proteales, Trochodenrales, Buxales and grandis under Lamiaceae remain poorly understood 43 Page 6 of 9 P Maheswari et al.

Figure 1. Comparisons of the large single copy (LSC), inverted repeat a (IRa), small single copy (SSC), and inverted repeat b (IRb) boundaries among ten Lamiaceae chloroplast genomes. Boxes above the main line represent the genes at the IR/SC borders. phylogenetically. Complex morphological features of showed proximity to Premnoideae, Ajugoideaea and Tectona such as actinomorphic 5- to 7-lobed calyx and Scutellarioideae (Li et al. 2016). corolla, enlarged and inflated persistent calyx, and four Based on the results of phylogeny, 10 closely related chambered stony endocarp with small central cavity taxa including T. grandis were selected for further between the chambers posed problem in clear cut characterization of cp genome. IRscope analysis is usu- classifications. Many previous studies on molecular ally recommended for comparative cp genome analysis phylogeny of Tectona showed its unique phylogenetic at species level to infer the structural variations in the position in Lamiaceae (Wagstaff and Olmstead 1997; large single copy (LSC), small single copy (SSC) and Li et al. 2016; Yasodha et al. 2018). In congruence inverted repeats (IRs) junctions (Thode and Lohmann with earlier studies, dendrogram obtained in the present 2019). This study depicted the genetic architecture of ten study with 60 genes in 39 members of Lamiaceae Lamiaceae members in the vicinity of the sites con- confirmed the distinctness of the genus Tectona (sup- necting the IRs to LSC and SSC regions to provide plementary figure 3). All the lower level clades of deeper insights of these junctions. The cp genome phylogeny had 100% bootstrap values except Tectona- sequence of T. grandis (NC_020098.1) was 1,53,953 bp, Premna clade showed 99.3%, establishing its differ- comprising LSC spanning 85,318 bp, SSC spanning ence from Premna. It was also confirmed that Tectona 17,741 bp, and two IR regions each of 25,447 bp length. Chloroplast genome analysis of Angiosperms Page 7 of 9 43

Figure 2. Sliding window analysis of the whole chloroplast genome of 10 species of Lamiaceae members. (window length: 600 bp, step size: 200 bp). X-axis: position of the midpoint of a window, Y-axis: nucleotide diversity of each window.

Among the closely related species of Tectona, the region (figure 1). According to the chloroplast genome between IRb and LSC was conserved as rpl22-rps19- sequence alignment of the ten Lamiaceae taxa, about rpl2 and the gene rps19 extended in IRb with 36 to 61 bp 29 hyper-variable regions were discovered with (supplementary figure 4). The gene ndhF was placed in nucleotide diversity (p) value of 0.07 and above (fig- junction between IRb and SSC (JSB). The gene ndhF ure 2). Some of the important hotspot regions were was present in SSC region of T.grandis whereas in Pre- rps16-trnQ, atpH-atpI, psc4-psbJ, ndhF, rpl32 and mna microphylla it was present 6bp away from IRb ycf1. These mutational hotspots would serve as region. The JSA junction (SSC/IRa) and JLA junction potential loci for developing novel DNA barcodes for (LSC/IRa) was characterized by the presence of ycf1 and plant classification. Thus, phylogeny and distinctness trnH gene respectively, except T.grandis, where tRNA of the cp genome of Tectona confirmed the earlier was present in JLA junction and loss of trnH gene was proposal of a new subfamily Tectonoideae with Tec- observed. In baicalensis, the trnH was pre- tona as a monotypic genus (Li and Olmstead 2017). sent in LSC region. Duplication of the genes in cp gen- ome is widely reported (Raveendar et al. 2015) and duplicated rps19 and ycf1 among the analysed species 4. Conclusion was obvious. Chloroplast genome evolution is governed by the IR expansions and contractions leading to varia- Unprecedented availability of genomic data from tions in the genome size (Ko¨nyves et al. 2018;Liet al. organelles provides opportunities to discern the evo- 2018). However in this study, LSC region (81,770 bp to lutionary relationships among the plant species. In the 86,078 bp) had wide variation, whereas less variation in current study, chloroplast genome phylogeny of 3625 IR and SSC regions were observed. Further within genus Angiosperm taxa revealed the phylogenetic and taxo- such as Holcoglossum, no variations in IR regions were nomic status in congruence with APG IV classification. reported (Li et al. 2019). Valuable data generated on distribution of SSRs would Level of sequence divergence among the cp genome give an impetus on their use in species identification would provide the basic information on similarity and evolution of several angiosperm taxa. Many plant across individuals. Sequence divergence of T. grandis species requiring deeper understanding on phylogeny was assessed by the comparative analysis among the and taxonomy were identified and generation of more closely related Lamiaceae members using mVISTA. cp genome data in many of the unexplored plant spe- The cp genomes showed sequence divergence below cies becomes essential. The phyogeny of Lamiaceae 50%, indicating a low conservatism in the non coding members, variations in structure and gene content regions of these chloroplast genomes (figure 1). The confirmed the unique status of Tectona in the family. alignment revealed species sequence divergence across Nucleotide diversity among the closely related species the cp genome, signifying the lack of conservation of in Lamiaceae with several variable hotspots would be genome. The single-copy regions, intergenic regions useful to develop DNA markers suitable for discrimi- and genic regions were more divergent except for few nation of species and inference of phylogenetic genic regions like rrn16, rrn23, ndhB and, rps7 relationships. 43 Page 8 of 9 P Maheswari et al. Acknowledgements Jansen RK, Raubeson LA, Boore JL, dePamphilis CW, Chumley TW, et al. 2005 Methods for obtaining and The authors acknowledge the Department of Biotech- analyzing whole chloroplast genome sequences. Methods nology (DBT), India, for financial support. Junior Enzymol. 395 348–384 Research Fellowship provided to PM by DBT is Katoh K and Standley DM 2013 MAFFT multiple sequence acknowledged. alignment software version 7: improvements in perfor- mance and usability. Mol. Biol. Evol. 30 772–780 Kim HM, Oh SH, Bhandari GS, Kim CS and Park CW 2014 DNA barcoding of Orchidaceae in Korea. Mol. Ecol. Res. References 14 499–507 Ko¨nyves K, Bilsborrow J, David J and Culham A 2018 The Ali M 2019 Comparative chloroplast genomic analyses complete chloroplast genome of Narcissus poeticus L. revealed extensive genomic arrangement in some core (Amaryllidaceae: Amaryllidoideae). Mitochondrial DNA and non-core caryophyllales. Bangladesh j. Plant Taxon. B Resour. 3 1137–1138 26 106–117 Kumar S, Stecher G, Li M, Knyaz C and Tamura K 2018 Amiryousefi A, Hyvonen J and Poczai P 2018 IRscope: an MEGA X: Molecular evolutionary genetics analysis across online program to visualize the junction sites of chloro- computing platforms. Mol. Biol. Evol. 35 1547–1549 plast genomes. Bioinformatics 34 3030–3031 Lee J, Kim SY, Park SH and Ali MA 2013 Molecular Bi Y, Zhang M, Xue J, Dong R, Du Y and Zhang X phylogenetic relationships among members of the Phyto- 2018 Chloroplast genomic resources for phylogeny laccaceae sensu lato inferred from internal transcribed and DNA barcoding: a case study on Fritillaria. Sci. spacer sequences of nuclear ribosomal DNA. Genet. Mol. Rep. 8 1184 Res. 12 4515–4525 Byrne M and Hankinson M 2012 Testing the variability of Letunic I and Bork P 2019 Interactive Tree Of Life (iTOL) chloroplast sequences for plant phylogeography. Aust. j. v4: recent updates and new developments. Nucl. Acids Bot. 60 569–574 Res. 47 W256–W259 Carbonell-Caballero J, Alonso R, Iban˜ez V, Terol J, Talon M Li B, Cantino PD, Olmstead RG, Bramley GLC, Xiang CL, and Dopazo J 2015 A phylogenetic analysis of 34 Ma ZH, Tan YH and Zhang DX 2016 A large-scale chloroplast genomes elucidates the relationships between chloroplast phylogeny of the Lamiaceae sheds new light wild and domestic species within the genus Citrus. Mol. on its subfamilial classification. Sci. Rep. 6 34343 Biol. Evol. 32 2015–2035 Li B and Olmstead R 2017 Two new subfamilies in Cheon KS, Kim KA, Kwak M, Lee B and Yoo KO 2019 The Lamiaceae. Phytotaxa 313 222–226 complete chloroplast genome sequences of four Viola Li W, Liu Y, Yang Y, Xie X, Lu Y, Yang Z, Jin X, Dong W species (Violaceae) and comparative analyses with its and Suo Z 2018 Interspecific chloroplast genome congeneric species. PLoS One 14 e0214162 sequence diversity and genomic resources in Diospyros. Daniell H, Lin CS, Yu M and Chang WJ 2016 Chloroplast BMC Plant Biol. 18 210 genomes: diversity, evolution, and applications in genetic Li ZH, Ma X, Wang DY, Li YX, Wang CW and Jin XH 2019 engineering. Genome Biol. 17 134 Evolution of plastid genomes of Holcoglossum (Orchi- Dong WP, Liu J, Yu J, Wang L and Zhou SL 2012 Highly daceae) with recent radiation. BMC Evol. Biol. 19 63 variable chloroplast markers for evaluating plant phy- Lu X, Adedze YMN, Chofong GN, Gandeka M, Deng Z, logeny at low taxonomic levels and for DNA barcoding. Teng L, Zhang X, Sun G, Si L and Li W 2018 PLoS One 7 e35071 Identification of high-efficiency SSR markers for assess- Frazer KA, Pachter L, Poliakov A, Rubin EM and Dubchak I ing watermelon genetic purity. J. Genet. 97 1295–1306 2004 VISTA: computational tools for comparative Martin GE, Rousseau-Gueutin M, Cordonnier S, Lima O, genomics. Nucl. Acids Res. 32 273–279 Michon-Coudouel S, Naquin D, De Carvalho JF, Gitzendanner MA, Soltis PS and Wong GKS 2018 Plastid Ainouche M, Salmon A and Ainouche A 2014 The first phylogenomic analysis of green plants: a billion years of complete chloroplast genome of the Genistoid legume evolutionary history. Am. j. Bot. 105 291–301 Lupinus luteus: evidence for a novel major lineage- Guyeux C, Charr JC, Tran HTM, Furtado A, Henry RJ, specific rearrangement and new insights regarding plas- Crouzillat D, Guyot R and Hamon P 2019 Evaluation of tome evolution in the legume family. Ann. Bot. 113 chloroplast genome annotation tools and application to 1197–1210 analysis of the evolution of coffee species. PLoS One 14 Nock CJ, Hardner CM, Montenegro JD, Termizi AA, e0216347 Hayashi S, Playford J, Edwards D and Batley J 2019 Hall T, Biosciences I and Carlsbad C 2011 BioEdit: An Wild origins of Macadamia domestication identified important software for molecular biology. GERF Bull. through intraspecific chloroplast genome sequencing. Biosci. 2 60–61 Front. Plant Sci. 10 334 Chloroplast genome analysis of Angiosperms Page 9 of 9 43 Oldenburg DJ and Bendich AJ 2015 DNA maintenance in Philosophy (Ecology and Evolutionary Biology), Univer- plastids and mitochondria of plants. Front. Plant Sci. 6 883 sity of Michigan pp 202 Park I, Yang S, Kim WJ, Song JH, Lee HS, Lee HO, Lee JH, Wang DS, Wang ZS, Kang XY and Zhang JG 2019 Genetic Ahn SN and Moon BC 2019 Sequencing and comparative analysis of admixture and hybrid patterns of Populus analysis of the chloroplast genome of Angelica polymor- hopeiensis and P. tomentosa. Sci. Rep. 9 4821 pha and the development of a novel indel marker for Wang W, Chen S and Zhang X 2018 Whole-genome species identification. Molecules 24 1038 comparison reveals heterogeneous divergence and muta- Raveendar S, Jeon YA, Lee JR, Lee GA, Lee KJ and Cho tion hotspots in chloroplast genome of Eucommia GT 2015 The complete chloroplast genome sequence of ulmoides Oliver. Int. j. Mol. Sci. 30 E1037 Korean landrace ‘‘Subicho’’ pepper (Capsicum annuum Wicke S, Schneeweiss GM, dePamphilis CW, Mu¨ller KF var. annuum). Plant Breed. Biotech. 3 88–94 and Quandt D 2011 The evolution of the plastid Rozas J, Ferrer-Mata A, Sa´nchez-Del-Barri JC, Guirao-Rico chromosome in land plants: gene content, gene order, S, Librado P, Ramos-Onsins SE and Sa´nchez-Gracia A gene function. Plant Mol. Biol. 76 273–297 2017 DnaSP v6: DNA sequence polymorphism analysis Xia Z, Wen J and Gao Z 2019 Does the Enigmatic Wightia of large datasets. Mol. Biol. Evol. 34 299–302 belong to Paulowniaceae (Lamiales)? Front. Plant Sci. 10 Shukla N, Kuntal H, Shanker A and Sharma SN 2018 528 Mining and analysis of simple sequence repeats in the Yan M, Zhao X, Zhao Y, Ren Y and Yuan Z 2019 The chloroplast genomes of genus Vigna. Biotech. Res. Innov. complete chloroplast genome sequence of pomegranate 2 9–18 ‘Bhagwa.’ Mitochondrial DNA B Resour. 4 1967–1968 Soltis PS, Folk RA and Soltis DE 2019 Darwin review: Yang MQ, Velzen RV, Bakker FT, Sattarian A, Li DZ and Yi angiosperm phylogeny and evolutionary radiations. Proc. TS 2013 Molecular phylogenetics and character evolution r. Soc. Lond. B Biol. Sci. 286 20190099 of Cannabaceae. Taxon 62 473–485 Swofford DL 2002 PAUP*. Phylogenetic Analysis Using Yang T, Liao X, Yang L, Liu Y, Mu W, Sahu SK, Liu X, Parsimony (*and Other Methods) Version 4.0b10. Sinauer Strube ML, Zhong B and Liu H 2019 Comparative Associates, Sunderland, Mass analyses of 3654 chloroplast genomes unraveled new Takahashi D, Sakaguchi S, Isagi Y and Setoguchi H 2018 insights into the evolutionary mechanism of green plants. Comparative chloroplast genomics of series Sakawanum bioRxiv 655241 in genus Asarum (Aristolochiaceae) to develop single Yasodha R, Ramesh V, Swathi B, Sakthi AR, Abel N, et al. nucleotide polymorphisms (SNPs) and simple sequence 2018 Draft genome of a high value tropical timber tree, repeat (SSR) markers. J. for. Res. 23 387–392 Teak (Tectona grandis L. f): insights into SSR diversity, Thode VA and Lohmann LG 2019 Comparative chloroplast phylogeny and conservation. DNA Res. 25 409–419 genomics at low taxonomic levels: a case study using Zhang HL, Jin JJ, Moore MJ, Yi T and Li D 2018a Plastome Amphilophium (Bignonieae, ). Front. Plant characteristics of Cannabaceae. Plant Divers. 40 127–137 Sci. 10 796 Zhang XZ, Zhou RC and Chen SY 2018b The complete Wagstaff SJ and Olmstead RG 1997 Phylogeny of Labiatae chloroplast genome of Bambusa ventricosa (Bambu- and Verbenaceae inferred from rbcL sequences. Syst. Bot. soideae: Bambuseae). Mitochondrial DNA Part B Resour. 22 165–179 3 988–989 Walker JF 2018 Novel Phylogenomic Methods for Uncov- Zhou Q-M, Jensen SR, Liu GL, Wang S and Li HQ 2014 ering the Evolutionary History of the Hyperdiverse Clade Familial placement of Wightia (Lamiales). Plant Syst. Caryophyllales. Thesis submitted for Doctor of Evol. 300 2009–2017

Corresponding editor: MANCHIKATLA VENKAT RAJAM