Peng et al. BMC Ecol Evo (2021) 21:71 BMC Ecology and Evolution https://doi.org/10.1186/s12862-021-01800-1

RESEARCH ARTICLE Open Access Exploring the evolutionary characteristics between cultivated and its wild relatives using complete chloroplast genomes Jiao Peng1,2, Yunlin Zhao1,2, Meng Dong2, Shiquan Liu2, Zhiyuan Hu2, Xiaofen Zhong2 and Zhenggang Xu1,2,3*

Abstract Background: Cultivated tea is one of the most important economic and ecological distributed worldwide. Cul- tivated tea sufer from long-term targeted selection of traits and overexploitation of habitats by human beings, which may have changed its genetic structure. The chloroplast is an organelle with a conserved cyclic genomic structure, and it can help us better understand the evolutionary relationship of . Results: We conducted comparative and evolutionary analyses on cultivated tea and wild tea, and we detected the evolutionary characteristics of cultivated tea. The chloroplast genome sizes of cultivated tea were slightly diferent, ranging from 157,025 to 157,100 bp. In addition, the cultivated were more conserved than the wild species, in terms of the genome length, gene number, gene arrangement and GC content. However, comparing var. sinensis and Camellia sinensis var. assamica with their cultivars, the IR length variation was approximately 20 bp and 30 bp, respectively. The nucleotide diversity of 14 sequences in cultivated tea was higher than that in wild tea. Detailed analysis on the genomic variation and evolution of Camellia sinensis var. sinensis cultivars revealed 67 single nucleotide polymorphisms (SNPs), 46 insertions/deletions (indels), and 16 protein coding genes with nucleo- tide substitutions, while Camellia sinensis var. assamica cultivars revealed 4 indels. In cultivated tea, the most variable gene was ycf1. The largest number of nucleotide substitutions, fve amino acids exhibited site-specifc selection, and a 9 bp sequence insertion were found in the Camellia sinensis var. sinensis cultivars. In addition, phylogenetic relation- ship in the ycf1 suggested that the ycf1 gene has diverged in cultivated tea. Because C. sinensis var. sinensis and its cultivated species were not tightly clustered. Conclusions: The cultivated species were more conserved than the wild species in terms of architecture and linear sequence order. The variation of the chloroplast genome in cultivated tea was mainly manifested in the nucleotide polymorphisms and sequence insertions. These results provided evidence regarding the infuence of human activities on tea. Keywords: Chloroplast genome, Cultivated tea, Evolution, ycf1, Camellia

Background From ancient times, numerous species have been taken from their habitats and introduced into cultiva- *Correspondence: [email protected] tion—that is, into various human-made systems [1]. 1 Research Center of Engineering Technology for Utilization Te cultivation process has played an important role in of Environmental and Resources Plant, Central South University of Forestry and Technology, Changsha 410004, Hunan, People’s Republic human history and cultivated environments often pre- of China sent strong ecological contrasts with wild environments Full list of author information is available at the end of the article [2]. Wild species are exposed to natural selection that

© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat​ iveco​ mmons.​ org/​ licen​ ses/​ by/4.​ 0/​ . The Creative Commons Public Domain Dedication waiver (http://creat​ iveco​ ​ mmons.org/​ publi​ cdoma​ in/​ zero/1.​ 0/​ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Peng et al. BMC Ecol Evo (2021) 21:71 Page 2 of 17

operates to promote survival under abiotic and biotic number of full cp genomes available from Camellia spe- stresses, while cultivated species are subjected to artif- cies so far [7, 14, 17–21]. cial selection that emphasizes a steady supply, improved It has been proven that human interference has efects quality and increased yield. Te criteria for ftness are on the genetic structure, leaf nutrients and pollen mor- expected to change dramatically under both regimes. phology of Camellia [22–24]. For example, due to Terefore, alterations in vegetation phenology, growth human overexploitation of habitats and long-term tar- and reproductive traits occur because the plants are geted selection of traits, the genetic diversity of Camel- subjected to diferent levels of stress and distinctive lia germplasm resources has been signifcantly reduced selection pressures [3]. Pot experiments showed there [25]. Tus, it remains unclear what impact the artifcially were signifcant diferences in the fowering and pod set selected cultivated Camellia has had on the evolutionary between wild and cultivated types of soybean [4]. In addi- mechanism of the cp genome. tion, the compounds and microstructures have been sur- Current research often ignores material diferences veyed for many horticultural plants [5]. Te inadequate between cultivated and wild species. After sequencing the genetic information prevents us from fully understanding complete chloroplast genome of CSSA (MH042531), we the spreading process of cultivated plants. We need to wanted to explore evolutionary characteristics between compare the genetic diferences between cultivated spe- cultivated tea and its wild relatives [14]. To assess the var- cies and wild species in order to use these species more iations in the chloroplast genome in wild and cultivated efectively. species of Camellia, and to detect the evolutionary char- Camellia, containing approximately 280 species, is a acteristics of cultivated tea, we selected earlier published with high economic, ecological and phylogenetic Camellia chloroplast genomes and conducted compara- values in the family Teaceae [6, 7]. Camellia are native tive and evolutionary analysis. Tis can help us to bet- to Asia and have been cultivated for more than 1300 years ter understand the structure of the Camellia chloroplast [8]. Because their variety of uses, the cultivated species genomes and the phylogenetic relationships among spe- are now found all over the world [9, 10]. Camellia species cies, and provide more information about the infuence can provide many valuable products, including making of human activities on tea. We believe that this research tea with the young leaves and extracting edible oil from will encourage more researchers to pay attention to tea the seeds. Moreover, most Camellia species are also of resources. great ornamental value [11]. Te genus Camellia is com- posed of more than 110 taxa [12], of which Camellia sin- Results ensis (L.) O. Kuntze is the most important source of the Chloroplast genome features of cultivated tea beverage tea. Cultivated tea plant varieties mainly belong Te lengths of the whole genomes of cultivated tea to two major groups: Camellia sinensis var. sinensis (CSS; (CSSA, CSSL and CSAY) were slightly diferent, ranging Chinese type) and Camellia sinensis var. assamica (CSA; from 157,025 to 157,100 bp. However, compared with Assam type) [13]. Due to long-term cultivation and man- CSSA and CSSL, the genome of CSAY was diferent. ual selection, C. sinensis formed many local varieties, Both CSSA and CSSL contained 81 unique CDS genes, such as Camellia sinensis var. sinensis cv. Anhua (CSSA), 30 tRNA, 4 rRNA and 3 pseudogenes (ψycf1, ψycf2 and Camellia sinensis var. sinensis cv. Longjing43 (CSSL), ψycf15). Among them, atpF, ndhA, ndhB, petB, petD, Camellia sinensis var. assamica cv. Yunkang10 (CSAY) rpl2, rpl16, rpoC1, rps16, trnG-GCC​, trnI-GAU​, trnL- and so on. Wild tea plants are important genetic diversity UAA​, and trnV-UAC​ contained a single intron, while resources that can provide new traits for improved yield, clpP and ycf3 contained two introns. However, in CSAY, disease resistance and tolerance to diferent environ- orf42 and ycf15 were lost, and rps12 and trnA-UGC ​had mental conditions. For example, the leaves of CSSA, well an inserted intron sequence (Fig. 1). known for its specifc area, are the main sources of dark tea [14]. Te quality of dark tea products is related to the Comparison of chloroplast genomes between cultivated abundant cultivars, germplasm resources and geographi- tea and wild tea cal conditions [15]. In our study, frst, we compared CSS with its two cul- Te chloroplast (cp) genome is often used to ana- tivated species (CSSA and CSSL). Tese species were lyze the evolutionary process and the phylogenetic defned as the Chinese cultivated type. Ten, we com- status because of its high degree of conservation and pared CSA with its one cultivated species (CSAY). relatively compact gene alignment. Moreover, cp genome Tese species were defned as the Assam cultivated sequences are useful in the identifcation of closely type. Finally, we compared CSS, CSA and 12 wild but related, breeding-compatible plant species [16]. Although related species: Camellia azalea (CAZ), Camellia the cp genome is very useful, there are still a limited Peng et al. BMC Ecol Evo (2021) 21:71 Page 3 of 17

Camellia sinensis var. sinensis cv. Anhua 157,025 bp

Camellia sinensis var. sinensis cv. Longjing43 157,085 bp

Camellia sinensis var. assamica cv. Yunkang10 157,100 bp

Fig. 1 Gene map of the complete chloroplast genome of cultivated tea. The inner circle corresponds to the GC content, and the next circle corresponds to the GC skew. The next three circles correspond to the genes. Genes with clockwise arrows represent reverse strands, while genes with counterclockwise arrows represent forward strands. Blue, red and aqua colors of the blocks represent protein-coding genes, introns and RNA, respectively. The third circle corresponds to the shared genes among three cultivated tea. The fourth circle corresponds to the unique genes of Camellia sinensis var. sinensis Anhua and Camellia sinensis var. sinensis Longjing43. The ffth circle corresponds to the unique genes of Camellia sinensis var. assamica cv. Yunkang10

crapnelliana (CCR), Camellia cuspidate (CCU), Tese species were defned as the wild type (Tables 1 Camellia grandibracteata (CGR), Camellia impressin- and 2). ervis (CIM), (CPE), Camellia pitardii (CPI), Camellia pubicosta (CPU), Camellia reticulata Chloroplast genomic similarity (CRE), Camellia sinensis var. pubilimba (CSP), Camel- In the Chinese cultivated type, the average length across lia taliensis (CTA) and Camellia yunnanensis (CYU). the cultivated species was 62 bp smaller than CSS. In the Assam cultivated type, the genome length of CSAY Peng et al. BMC Ecol Evo (2021) 21:71 Page 4 of 17 CYU ​ 156,592 79,655 16,935 48,143 2813 9046 115 81 30 21 37.33 37.54 37.25 32.68 52.9 55.41 orf42, ycf1 CPE 157,121 80,650 15,196 49,427 2802 9046 115 81 30 18 37.29 37.56 36.41 32.92 52.86 55.39 rps12 ​ CCU 156,618 79,643 16,917 48,199 2813 9046 115 81 30 21 37.31 37.56 37.25 32.63 52.9 55.38 orf42, ycf1 ​ CCR 156,997 79,649 16,239 49,321 2742 9046 114 81 29 20 37.3 37.54 37.54 32.64 52.88 55.41 orf42, ycf1, trnG CRE 156,971 76,224 15,182 53,717 2802 9046 115 81 30 18 37.31 37.54 36.4 33.39 52.86 55.34 rps12 CPI 156,585 79,619 16,937 48,171 2812 9046 115 81 30 21 37.34 37.56 37.22 32.71 52.92 55.36 orf42, ycf1 CAZ 157,039 80,629 15,195 49,367 2802 9046 115 81 30 18 37.3 37.56 36.41 32.95 52.86 55.39 rps12 CPU 157,076 80,665 15,198 49,365 2802 9046 115 81 30 18 37.3 37.57 36.42 32.93 52.89 55.42 rps12 CIM 156,892 79,655 16,897 48,481 2813 9046 115 81 30 21 37.33 37.54 37.28 32.72 52.9 55.41 orf42, ycf1 ​ CTA 156,974 79,577 16,947 48,591 2813 9046 115 81 30 21 37.32 37.57 37.25 32.68 52.86 55.38 orf42, ycf1 ​ CGR 157,127 80,656 15,205 49,418 2802 9046 115 81 30 18 37.29 37.56 36.39 32.94 52.86 55.41 rps12 CSP 157,086 80,622 15,210 49,405 2802 9047 115 81 30 18 37.32 37.58 36.42 32.97 52.86 55.41 rps12 CSAY 157,100 79,092 17,902 48,268 2790 9048 113 79 30 22 37.29 37.47 37.91 32.46 52.97 55.39 orf42, ycf1, ycf15 CSA 157,028 79,093 17,902 48,200 2789 9044 113 79 30 22 37.3 37.47 37.91 32.48 52.99 55.40 orf42, ycf1, ycf15 CSSL 157,085 80,650 15,198 49,389 2802 9046 115 81 30 18 37.29 37.56 36.38 32.94 52.86 55.41 rps12 CSSA 157,025 80,620 15,196 49,361 2802 9046 115 81 30 18 37.3 37.57 36.38 32.94 52.86 55.41 rps12 CSS 157,117 80,542 15,192 49,535 2802 9046 115 81 30 18 37.3 37.58 36.41 32.93 52.86 55.39 rps12 species Camellia of seventeen genomic features Chloroplast 1 Table Species assamica cv. sinensis var. Camellia assamica, CSAY: sinensis var. Camellia CSA: Longjing43, sinensis cv. sinensis var. CSSL: Camellia Anhua, sinensis cv. var. sinensis CSSA: Camellia sinensis, sinensis var. CSS: Camellia , CRE: Camellia pitardii , CPI: Camellia azalea , CAZ: impressinervis Camellia pubicosta taliensis , CIM: Camellia , CPU: Camellia Camellia CTA: grandibracteata, Camellia pubilimba, CSP; sinensis var. Camellia CSP: Yunkang10, yunnanensis , CYU: petelotii Camellia , CPE: Camellia cuspidate Camellia , CCU: crapnelliana Camellia , CCR: reticulata Genome(bp) CDS (bp) Introns (bp) IGS (bp) tRNA (bp) rRNA (bp) Genes CDS genes tRNA genes Introns Genome GC CDS GC Introns GC IGS GC tRNA GC rRNA GC Gene losses Intron losses Peng et al. BMC Ecol Evo (2021) 21:71 Page 5 of 17

Table 2 Information regarding the complete chloroplast genomes of the research species Species Accession number Subgenus1 Section1 Types Sample location Location References

Camellia sinensis var. KJ806281 Thea Thea Wild Academy of Agri- Yunnan, China [66] sinensis cultural Science Camellia sinensis var. sinen- MH042531 Thea Thea Cultivar Hunan City University Hunan, China [14] sis cv. Anhua Camellia sinensis var. sinen- KF562708 Thea Thea Cultivar Huajiachi campus of Zheji- , China [17] sis cv. Longjing43 ang University Camellia sinensis var. MH394410 Thea Thea Wild Kunming Institute of Yunnan, China [21] assamica Botany, Kunming Camellia sinensis var. assa- MH019307 Thea Thea Cultivar Menghai County Yunnan, China [67] mica cv. Yunkang10 Camellia sinensis var. KJ806280 Thea Thea Wild Yunnan Academy of Agri- Yunnan, China [66] pubilimba cultural Science Camellia grandibracteata NC024659 Thea Thea Wild Yunnan Academy of Agri- Yunnan, China [66] cultural Science Camellia taliensis NC022264 Thea Thea Wild Kunming Institute of Yunnan, China [7] Botany Camellia impressinervis NC022461 Thea Archecamellia Wild Kunming Institute of Yunnan, China [7] Botany Camellia pubicosta NC024662 Thea Corallina Wild International Camellia Zhejiang, China [66] Species Garden Camellia azalea NC035574 Camellia Camellia Wild Yangchun County , China [19] Camellia pitardii NC022462 Camellia Camellia Wild Kunming Institute of Yunnan, China [7] Botany Camellia reticulata NC024663 Camellia Camellia Wild Kunming Institute of Yunnan, China [66] Botany Camellia crapnelliana NC024541 Camellia Heterogenea Wild Kunming Botanical Garden Yunnan, China [20] Camellia cuspidata NC022459 Thea Theopsis Wild Kunming Institute of Yunnan, China [7] Botany Camellia petelotii NC024661 Thea Archecamellia Wild International Camellia Zhejiang, China [66] Species Garden Camellia yunnanensis NC022463 Camellia Heterogenea Wild Kunming Institute of Yunnan, China [7] Botany

1 The taxonomic classifcation of Camellia is based on Ming’s research [47] was 72 bp larger than CSA. In the wild type, the aver- mVISTA and Blast Ring Image Generator (BRIG) were age length of the wild species was 156,923 bp, which used to compare the genomic sequence identity. Compar- was 194 bp and 105 bp variation compared with CSS ing CSS and CSA with their cultivated types, the regions and CSA, respectively. Tis showed that there was less with relatively low identity were psaA_ycf3, petL_petG length variation when comparing cultivated species with and ycf1_ndhF. Comparing CSS and CSA with other wild species (Table 1). Similarly, the number of genes wild types, the regions with relatively low identity were and the GC content of cultivated species were more sta- atpH_atpI, trnE-UCC​_trnT-GGU​, psaA_ycf3, ycf15_trnL- ble than that of wild species. After comparing the genes CAA​, ycf1_ndhF and ndhG_ndhI (Figs. 2 and 3). In con- and introns insertion or deletion among the Chinese clusion, at the genomic level, the cultivated species were cultivated type, Assam cultivated type and wild type, we more conserved than the wild species. found that introns of the rps12 gene were deleted in CSS and its two cultivated species. Te orf42, ycf1 and ycf15 The expansion and contraction of IR regions genes were deleted in CSA and CSAY. However, these Te locations of inverted repeat (IR) regions were events occurred randomly in wild species. Te difer- extracted via a self-BLASTN search, and the character- ences in the GC content of the CDS, intron and IGS in istics of the IR/Large single copy region (LSC) and IR/ the Chinese cultivated type and Assam cultivated type Small single copy region (SSC) boundary regions were were approximately 0.01–0.03%, and 0–0.02%, respec- analyzed. Te IRs boundary regions of the 17 complete tively, but we found that the diferences of the CDS, Camellia cp genomes were compared, showing slight dif- intron and IGS in the wild type were 0.02–1.05%. ferences in junction positions (Fig. 4). In order to detect Peng et al. BMC Ecol Evo (2021) 21:71 Page 6 of 17

C.sinensis var. sinensis

C.sinensis var. sinensis cv. Anhua

C.sinensis var. sinensis cv. Longjing43

C.sinensis var. assamica

C.sinensis var. assamica cv. Yunkang10

C.sinensis var. pubilimba

C.grandibracteata

C.taliensis

C.impressinervis

Camellia sinensis var. sinensis C.pubicosta

157117 bp C.azalea

C.pitardii

C.reticulata

C.crapnelliana

C.cuspidata

C.petelotii

C.yunnanensis

Fig. 2 The sequence identity of seventeen Camellia species. The inner circle is the reference genome. Next circles represent the sequence identity between C.sinensis var. sinensis and sixteen other species. The outermost circle corresponds to the protein-coding genes and intergenic spacer regions. Genes with clockwise arrows represent reverse strands, while genes with counterclockwise arrows represent forward strands possible IR border polymorphisms, frst of all, we com- of the compared cp genomes, the ycf1 gene spanned the pared the four IR boundaries of the Chinese cultivated SSC/IRa region and the length of ycf1 ranged from 963 type. No diference was found at the LSC/IRb or IRa/ to 1069 bp in IRa. Remarkably, most species have an ycf1 LSC border; meanwhile, only minor diferences were dis- pseudogene at the IRa/LSC junction, while this was not covered at the IRb/SSC and SSC/IRa borders. Next, we observed in CSA, CTA, CIM, CPI, CCR, CCU, or CYU. compared the four IR boundaries of the Assam cultivated Similar to most plants, the ndhF gene involved in pho- type, and the results were similar. Ten, we compared tosynthesis was located in the SSC region. However, the the cp genome boundaries of the wild type. Te rps19 ndhF gene was located at the IRb/SSC boundary of CRE, gene at the LSC/IRb boundary expanded 52 bp from and there was a 35 bp overlap between ndhF gene and the LSC region to the IRb side in CPU, while it stopped ψycf1gene. at 46 bp from the LSC region in the rest of the species. On the other side of the IRa/LSC boundary, the lengths Nucleotide diversity of the spacers between the IRa/LSC junction and the Comparisons based on the nucleotide diversity (Pi) val- rpl2 gene (in IRa) were 112 bp for CPU, while those of ues of the Chinese cultivated type, Assam cultivated type, the rest of the species were all 106 bp. Consistently, in all and wild type were presented, including the intergeneric Peng et al. BMC Ecol Evo (2021) 21:71 Page 7 of 17

rpoB trnE-UUC psbC trnH-GUG trnQ-UUG trnR-UCU trnY-GUA trnfM-CAU trnK-UUU trnK-UUU trnS-GCU atpA atpFatpH atpI rpoC2 rpoC1 trnC-GCA psbM psbD trnS-UGA psaB psbA matK rps16 psbK trnG-GCC rps2 petN trnD-GUC psbZ rps14 psbI C.sinensis var.sinensis KJ806281 trnT-GGU trnG-UCC 100% C.sinensis var. sinensis cv. Anhua MH042531 50% C.sinensis var.sinensis cv. Longjing43 KF562708 50% C.sinensis var.assamica MH394410 50% C.sinensis var.assamica cv. Yunkang10 MH019307 50% C.sinensis var.pubilimba KJ806280 50% C.grandibracteata NC024659 50% C.taliensis NC022264 50% C.impressinervis NC022461 50% C.pubicosta NC024662 50% C.azale NC035574 50% C.pitardii NC022462 50% C.reticulata NC024663 50% C.crapnelliana NC024541 50% C.cuspidata NC022459 50% C.petelotii NC024661 50% C.yunnanensis NC022463 50% 0k 4k 8k 12k 16k 20k 24k 28k 32k 36k trnS-GGA atpB psbF petL psaJ ndhC psbJ trnW-CCA rps18 psaB psaA ycf3 rps4 trnL-UAA ndhJ trnV-UAC rbcL accD psaI cemA psbE trnP-UGG rpl20 clpP psbBpsbN petB atpE trnT-UGU trnF-GAA ndhK trnM-CAU ycf4 petA psbL petG rpl33 rps12 psbT psbH C.sinensis var.sinensis KJ806281 C.sinensis var. sinensis cv. Anhua MH042531 100% C.sinensis var.sinensis cv. Longjing43 50% KF562708 50% C.sinensis var.assamica MH394410 50% C.sinensis var.assamica cv. Yunkang10 MH019307 50% C.sinensis var.pubilimba KJ806280 50% C.grandibracteata NC024659 50% C.taliensis NC022264 50% C.impressinervis NC022461 50% C.pubicosta NC024662 50% C.azale NC035574 50% C.pitardii NC022462 50% C.reticulata NC024663 50% C.crapnelliana NC024541 50% C.cuspidata NC022459 50% C.petelotii NC024661 50% C.yunnanensis NC022463 50% 40k 44k 48k 52k 56k 60k 64k 68k 72k 76k rpl36 rpl22 trnA-UGC rrn4.5 infA rrn5 petB rpoA rps8 rpl16 rps3 rpl2 trnI-CAU ycf2 ycf15 ndhB rps7 trnV-GAC trnI-GAU orf42 trnR-ACG ycf1 ndhF rpl32 ccsA petD rps11 rpl14 rps19 rpl23 trnL-CAA rps12 rrn16 trnA-UGC rrn23 trnN-GUU trnL-UAG C.sinensis var.sinensis KJ806281 C.sinensis var. sinensis cv. Anhua MH042531 100% var.sinensis cv. Longjing43 50% C.sinensis KF562708 50% C.sinensis var.assamica MH394410 50% C.sinensis var.assamica cv. Yunkang10 MH019307 50% C.sinensis var.pubilimba KJ806280 50% C.grandibracteata NC024659 50% C.taliensis NC022264 50% C.impressinervis NC022461 50% C.pubicosta NC024662 50% C.azale NC035574 50% C.pitardii NC022462 50% C.reticulata NC024663 50% C.crapnelliana NC024541 50% C.cuspidata NC022459 50% C.petelotii NC024661 50% C.yunnanensis NC022463 50% 80k 84k 88k 92k 96k 100k 104k 108k 112k 116k ndhH trnN-GUU trnI-GAU psaC trnA-UGC trnR-ACG ccsA ndhG ndhA rps15 ycf1 rrn5 rrn23 orf42 rrn16 rps12 ndhB ycf15 ycf2 trnI-CAU ndhD ndhE ndhI rrn4.5 trnA-UGC trnV-GAC rps7 trnL-CAA rpl23rpl2 C.sinensis var.sinensis KJ806281 C.sinensis var. sinensis cv. Anhua MH042531 100% C.sinensis var.sinensis cv. Longjing43 KF562708 50% C.sinensis var.assamica MH394410 50% C.sinensis var.assamica cv. Yunkang10 MH019307 50% C.sinensis var.pubilimba KJ806280 50% C.grandibracteata NC024659 50% C.taliensis NC022264 50% C.impressinervis NC022461 50% C.pubicosta NC024662 50% gene C.azale NC035574 50% NC022462 50% exon C.pitardii 50% C.reticulata NC024663 50% UTR C.crapnelliana NC024541 50% C.cuspidata NC022459 50% CNS C.petelotii NC024661 50% C.yunnanensis NC022463 50% 120k 124k 128k 132k 136k 140k 144k 148k 152k 156k Fig. 3 Alignment visualization of the seventeen Camellia chloroplast genome sequences using C.sinensis var. sinensis as a reference. The vertical scale indicates the percentage of identity, ranging from 50 to 100%. Arrows indicate the annotated genes and their transcriptional direction. The diferent colored boxes correspond to exons, tRNA or rRNA, and noncoding sequences (CNSs) regions (IGS), protein-coding genes and introns (Addi- Furthermore, although the average Pi values of the tional fle 1: Table S1, Fig. 5). In our study, the average Pi cultivated species were lower, we still found that the Pi values for the genes, introns and IGS in wild type were values of rps16, rps4, trnL-UAA​_intron, rps4_trnT-UGU​ approximately 6.6, 3.5 and 9.1 times that of the Chi- , ndhC_trnV-UAC,​ cemA_petA, rpl33_rps18, psbN_psbH, nese cultivated type. In addition, the Pi values for all rpl36_infA, rpl14_rpl16, rps7_rps12, ndhG_ndhI, trnV- regions in the Assam cultivated type were 0. Compar- GAC​_rps12, and rps12_rps7 in the Chinese cultivated ing Chinese cultivated type with wild type, the Pi values type were higher than those in wild species, and these of most genes, introns and IGS in the wild species were diference sequences were mainly located in the LSC higher than those of in the cultivated species. For exam- region (Fig. 5). ple, rps12, petD, rps19, trnI-CAU_rpl23, trnI-CAU_ycf2, trnI-GAU_rrn16, clpP_intron, rps16_intron, and atpF_ Phylogenetic analysis of cultivated tea and wild tea intron were highly variable in the wild species, but they We constructed three phylogenetic trees of cultivated and were not variable in the three cultivated species. For the wild tea, namely, the complete cp genomic tree (complete photosynthetic genes, except for ndhD, ndhF, ndhH and cp-Tree), all shared protein coding genes among all spe- psbC, the Pi values of the photosynthetic genes of three cies tree (SCDS-Tree) and the ycf1 gene tree (ycf1-Tree) cultivated tea were 0. Te Pi values of these genes were (Figs. 6, 7 and 8). All phylogenetic trees supported the smaller than that of the wild species. Tese results indi- hypothesis that the Tea subgenus could be divided into cate that these genes and noncoding regions were more two clades: clade I, including CSS, CSSL, CSSA, CSA, conserved among the cultivated species than among the CSAY, CGR, CPU and CSP, and clade II, including CPE wild species. CIM, CTA and CCU. Clade I was strongly supported, Peng et al. BMC Ecol Evo (2021) 21:71 Page 8 of 17

JLB( LSC-IRb) JSB( IRb-SSC) JSA( SSC-IRa) JLA( IRa-LSC) 13 bp 1381 bp 106 bp Ψ ycf1 trn-N rpl2 LSC IRb SSC IRa LSC Camellia sinensis var. sinensis 86662 bp 26090 bp 18275 bp 26090 bp rps19 trn-N ndhF ycf1 trn-H 233 bp 164 bp 46 bp 1381 bp 4553 bp 1069 bp 1bp 13 bp 1372 bp 106 bp

Ψycf1 trn-N rpl2 LSC IRb SSC IRa LSC Camellia sinensis var. sinensis cv. Anhua 86586 bp 26081 bp 18277 bp 26081 bp rps19 trn-N ndhF ycf1 trn-H 233 bp 46 bp 1372 bp 56 bp 1060 bp 1bp 4547 bp 13 bp 1381 bp 106 bp Ψycf1 trn-N rpl2 LSC IRb SSC IRa LSC Camellia sinensis var. sinensis cv. Longjing43 86642 bp 26080 bp 18283bp 26080 bp rps19 trn-N ndhF ycf1 trn-H 233 bp 46 bp 1381 bp 56 bp 4553 bp 1069 bp 1bp 1355 bp 106 bp

trn-N rpl2 LSC IRb SSC IRa LSC Camellia sinensis var. assamica 86632 bp 26068 bp 18260 bp 26068 bp rps19 trn-N ndhF ycf1 trn-H 233 bp 1355 bp 46 bp 5 bp 4585 bp 1043 bp 1bp 1381 bp 106 bp

trn N rpl2 IRb SSC IRa LSC LSC Camellia sinensis var. assamica cv. Yunkang10 86649 bp 26083 bp 18285 bp 26083 bp rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1382 bp 56 bp 1068 bp 1bp 4559 bp 9 bp 112 bp 1349 bp Ψ ycf1 trn N rpl2 LSC SSC IRb IRa LSC Camellia sinensis var. pubilimba 86678 bp 26071 bp 18266 bp 26071 bp rps19 trn N ndhF ycf1 trn H 233 bp 1349 bp 52 bp 5 bp 1037 bp 1bp 13 bp 4579 bp 1381bp 106 bp Ψ ycf1 trn N rpl2 LSC SSC IRb IRa LSC Camellia grandibracteata 86656 bp 26093 bp 18285 bp 26093 bp rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1381 bp 56 bp 4559 bp 1069 bp 1bp 1381 bp 106 bp trn N rpl2 LSC IRb SSC IRa LSC Camellia taliensis 86572 bp 25933 bp 18436 bp 25933 bp rps19 trn N ndhF ycf1 trn H 233 bp 56 bp 1069 bp 46 bp 1381 bp 4541 bp 1 bp 1355 bp 106 bp trn N rpl2 LSC IRb SSC IRa Camellia impressinervis 86578 bp 26041 bp 18232 bp 26041 bp LSC rps19 trn N ndhF ycf1 trn H 5 bp 233 bp 1355 bp 1043 bp 1bp 46 bp 4579 bp 106 bp 2 bp 1375 bp Ψ ycf1 trn N rpl2 LSC IRb SSC IRa LSC Camellia pubicosta 86649 bp 26074 bp 18279 bp 26074 bp rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1375 bp 56 bp 4553 bp 1069 bp 1bp 2 bp 1381 bp 106 bp Ψ ycf1 trn N rpl2 IRb SSC LSC IRa LSC Camellia azalea 86674 bp 26042 bp 18281 bp 26042 bp rps19 trn N ndhF ycf1 trn H 233 bp 1381 bp 1069 bp 46 bp 56 bp 4553 bp 1bp 1355 bp 106 bp trn N rpl2 LSC IRb SSC IRa LSC Camellia pitardii 86212 bp 26057 bp 18259 bp 26057 bp rps19 trn N ndhF ycf1 trn H 233 bp 1355 bp 46 bp 5 bp 4573 bp 1043 bp 1bp 4 bp 1361 bp 106 bp Ψ ycf1 trn N rpl2 LSC IRb SSC IRa LSC Camellia reticulata 86605 bp 26066 bp 18234 bp 26066bp rps19 trn N ndhF ycf1 trn H 233 bp 1361 bp 46 bp 4567 bp 1049 bp 1bp 39 bp 2241 bp 1275 bp 106 bp trn N rpl2 LSC IRb SSC IRa Camellia crapnelliana 86655 bp 25968 bp 18406 bp 25968 bp LSC rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1275 bp 68bp 963 bp 1bp 4659bp 1341 bp 100 bp trn N rpl2 LSC IRb SSC IRa LSC Camellia cuspidata 86294 bp 26025 bp 18302 bp 25999 bp rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1341 bp 1069 bp 1bp 56 bp 4541 bp 13 bp 1381 bp 106 bp Ψ ycf1 trn N rpl2 LSC IRb SSC IRa Camellia petelotii 86659 bp 26090 bp 18282 bp 26090 bp LSC rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1341 bp 56 bp4553 bp 1069 bp 1bp 106 bp 1381 bp trn N rpl2 LSC IRb SSC IRa Camellia yunnanensis 86236bp 26032 bp 18318bp 26006bp LSC rps19 trn N ndhF ycf1 trn H 233 bp 46 bp 1381 bp 56 bp 1069 bp 1bp 4553 bp Fig. 4 Comparison of IR boundary regions among the 17 Camellia chloroplast genomes, using C. sinensis var. sinensis as the reference. Boxes above or below the line are forward strands and reverse strands, respectively Peng et al. BMC Ecol Evo (2021) 21:71 Page 9 of 17

a Chinese cultivated type Assam cultivated type Wild Type Pi I I n n n n n n n 2 1 n n 3 2 3 0 6 4 6 2 5 8 2 1 9 4 6 F B C aI l2 f2 f1 f4 f3 pI aJ bI bJ s2 hJ s7 s8 s3 s4 tL tB on on tK on tN tD tG tA fA cL pF pP hF aC aB bE bL pE bZ bT nI cD bC bB aA pB oB pH pA bD bN bH bA bK oA l2 l3 l3 l2 l3 l1 l1 l2 ro ro ro ro ro ro ro ro bM s1 s1 s1 s1 s1 s1 s1 at tr tr rp tr ps onI oC oC rp rp ps rp rp ps rp yc ps at yc cl yc yc in pe at ndh I f2_2 pe at nd rb at pe pe pe psb pe at ro ccsA ps ps rp ps ps ps ps ac rp ps ps ps nd nt nt nt nt nt nt nt ps rp rp ndhE rp rp ps ps rp rp rp rp ps ps nt ndh ndh ma ndhD ndhG ndhA ndhH rp rp rp rp rp ndhK rp rp ps in cemA rp rp nt ntr yc _i _i 6_i 3_i 1_ in 6_ in _i _i A_ l2_intro pP_i pF_i hB_i l1 AC_i s1 AU AA f3 rp oC pP cl ycf at rp nd rp ndh cl yc -G rp V-U nI nL -U tr tr trn

b Chinese cultivated type Assam cultivated type Wild Type Pi I I I J J 9 8 2 0 6 3 2 5 6 1 8 7 4 5 2 4 F L L E E L C C B B C C C B C C B C G D G D A U A A U U A G U U H A U G N H U A A A A A A D l2 l2 f4 f1 M s3 s7 23 11 12 14 33 12 32 15 16 16 23 fA tp oB C2 sa sb 4. 5 4.5 rn tp dh I rn et tp AC CG AU AU AU et sb F sb sb T sb tp sa sb Z n2 sb sb N sb sb K sb A dh E oC poA dh G dhK dh H dh D n clpP yc in rn rn rn em rp psaB CC A UC GC rn CA UG GA UG GU UU GC CA AC UG GU GU UA GA GG UG GU GG UG GU rr _r ndh A _rps _p _a _n _rbc _p _ndh -A rr _pet _pet _pet _atp _p -U UG -U AG _p _p _p _p _p sa _rpl _p 9_rp 3_rp _rpl _rpl 23 _p _p _n _rpl _p _p _n dh _n dh _r _r _r 6_rp _n _rps _ _n 2_rp _n 2_ 5_ 6_ _c _r .5 1_ P- L-CA F- S- S- E- C- G- T- T- L- R- 4_rpl1 8_rpl2 1_rpl3 V-UA Y- N- H- N- 2_rps1 3_rps1 0_rps1 nI-C fA pH_a W- M- nI-G bJ _p hB_rps tB l2 l2 bE_p et aJ_rpl U_rps1 s1 l1 s8 s7 nR rnI- nL l3 s7 s1 nV-G bB GA_rps aB nQ f4 AU_ycf AU atpI_rps bN CA CU rnS- hE tr rn rn rn rn rn s2_rpo s14_ rn rn l1 rn rn rn rn mA GA hA psaI_yc rn at l2 l3 l2 rn rn rn rn UG UG GU AU in AA AC_rps AU rn rn tr petA_psb accD_p atpA_a atpF_a rp rp tr petL_p atpB ps clpP_psb ps pe ps bK psaA_ycf 3 rp rp ps rp _t nd rp ps bF ps bL rbcL_acc rp rp rp rnfM-C ps bH petD_r rps1 rps1 ps nd hI nd hG ps ps bT ndhJ _n _t rrn5 rrn4 psaC _t _t yc _t _t oC2_rp ndhF -C _t _t _t -G ps petN_p sb rp nd hD ccsA rp rp _t -C _trnA- _trnD- _trnV- rpoA_rps ce _t _t nd -G -U rp rp rps1 rp rps1 -G AA _t 6_tr ndhH_rps nd -U rpoC -C rrn4.5_ rrn2 3_ -G AC -U GC -G 2_ tr f2 2_ _t 6_ tr rp f3 nI s4_t l2_t l23_ f1 nR-ACG_r nS GC_trnI-GAU bC_t oB_t nI tG n1 nF nI l3 CA nC nS nR UA_trn AA UC nM tr yc UC_t CG nL-C AU_trnA- nT-G s1 AC_t nQ-U nH-G ps bI nV nA tr rrn5_ tr nV-G rp rp yc rp tr tr CC rr tr yc tr tr tr rp psbZ_t ps tr tr tr ndhB rp pe nfM-CA ndhC tr tr rrn23 tr rrn16 tr rps1 tr ps bM rp -U -U GU -G UU -G tr nA-U nI nL nE-U nT nY-G nN nR-A nW-C nD-G nV-U tr tr tr nG-U tr tr tr tr tr tr tr tr tr

Fig. 5 Comparative analysis of nucleotide variability (Pi) values among Chinese cultivated type, Assam cultivated type and wild type. X-axis: the names of protein-coding genes, introns or intergenic regions, Y-axis: nucleotide diversity of each window because the posterior probabilities or bootstrap values and ycf1-Tree, because most bootstrap values or pos- obtained by neighbor-joining (NJ), maximum parsi- terior probabilities were less than 50% for each lineage. mony (MP), Bayesian inference (BI) and maximum likeli- Tese results may be caused by unbalanced sampling. hood (ML) were very high for each lineage. Tese results Te cp-Tree showed some structural variations among suggested that the seven species in clade I were closely the Camellia cp genomes (Fig. 6). Te clade, which was related. All phylogenetic trees proved that CSS was the made up of CSS, CSSL, CSSA, CSA, CSAY, CGR, CPU, closest relative to CSSA and CSSL, and CSA was the clos- CSP and CPE, was characterized by the rps12 intron est relative to CSAY. In particular, in the ycf1-Tree, the deletion, the ψycf1 gene, and the ψycf15 gene (except for posterior probabilities or bootstrap values of these spe- CSA and CSAY). Te other species, except for CRE and cies were lower than those of the complete cp-Tree and CAZ, had lost the ψycf1 gene and the orf42 gene. the SCDS-Tree. Te value of CSSA was less than 50%. Tese results suggested that the ycf1 gene has diverged in Chloroplast genome variation and evolution in cultivated cultivated tea. tea In addition, we found confict among the three trees To explain the changes in the cp genome structure of (Figs. 6, 7 and 8). Te topological structures consisting of the cultivated tea group, we detected single nucleotide the Camellia subgenus (CPI, CRE, CAZ, CCR, and CYU) polymorphism (SNP) and insertion/deletion (indel) in and the Tea subgenus (CPE, CIM, CTA and CCU) were the cp genome of cultivated tea. In the Chinese culti- poorly supported by the complete cp-Tree, SCDS-Tree vated type, after comparing the whole cp genome of Peng et al. BMC Ecol Evo (2021) 21:71 Page 10 of 17

100/100/1/100 Camellia sinensis var. assamica 100/100/1/100 Pseudo ycf1 Camellia sinensis var. assamica cultivar Yunkang 100/100/1/100 Pseudo ycf15 Camellia grandibracteata

orf42 100/100/1/100 Camellia pubicosta rps12 intron 91/95/1/97 Camellia sinensis var. sinensis subgen. Thea 100/100/1/100 100/100/1/100 Camellia sinensis cultivar Longjing

71/-/0.9/- Camellia sinensis cultivar Anhua Camellia sinensis var. pubilimba Camellia petelotii

-/54/0.7/- 98/90/1/97 Camellia pitardii subgen. Camellia -/-/1/- Camellia reticulata -/-/0.6/- Camellia impressinervis

-/-/1/- Camellia taliensis subgen. Thea 100/50/1/100 -/-/0.8/- Camellia cuspidata 100/100/1/99 Camellia azalea

Camellia crapnelliana subgen. Camellia Camellia yunnanensis

100/50/1/100 Coffea arabica outgroup Coffea canephora

Fig. 6 The phylogenetic tree of Camellia species based on the complete cp genomes (complete cp-Tree). Cofea canephora and Cofea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. Bootstrap values less than 50% are represented by "-". As indicated in the legend at the top left, the unique genes and introns of each species were plotted onto branches using colored squares

three species, 67 SNPs and 46 indels were found. Te was inserted into the IRb/SSC boundary region (Addi- LSC, IRb, SSC and IRa regions contained 43, 3, 13, and tional fle 3: Table S3). 8 SNPs and 37, 2, 5, and 2 indels, respectively (Addi- To have a clear view of the evolution of cultivated spe- tional fle 2: Table S2). Most of the SNPs and indels cies, we used their 80 shared protein coding genes to cal- were located in the noncoding region (IGS and intron). culate their nonsynonymous nucleotide substitution (Ka) Tere were 39 SNPs and 41 indels in this region, while rates, synonymous nucleotide substitution (Ks) rates and 28 SNPs and 5 indels were found in the protein cod- Ka/Ks ratio. First, we compared CSS and its cultivated ing region. Te two ycf1 genes, which are located at species. Te results showed that only 16 protein coding the junction of SSC and IRa, contained the most SNPs genes had synonymous or nonsynonymous mutations and indels, 6 and 2, respectively. For the photosynthetic (Fig. 9, Additional fle 4: Table S4). Among them, there genes, psbC, ndhD, ndhF and ndhH presented SNP var- were nonsynonymous mutations in matK, rps16, rpoC2, iations, while the psbI gene presented indel variation. rpoB, accD, clpP, rps8, ycf1, ndhD, ndhH and rps15. Te For the 14 sequences with higher Pi values in cultivated genes with the highest rate of nonsynonymous mutations species than in wild species, trnV-GAC_rps12 and were rps16, rps8 and rps15. Tere were synonymous ndhG_ndhI contained the most abundant SNPs, with 5 mutations in rpoB, psbC, rps4, ycf4, rpoA and ndhF. Te and 2 respectively (Fig. 5). In the Assam cultivated type, highest mutation rates were rps4, ycf4 and rpoA. Of the after comparing the whole cp genome of two species, 4 80 genes, 79 had a Ka / Ks value of 0, and only rpoB, had indels were found, but no SNPs. All indels were located a Ka/Ks value of 0.3004 < 0.5, suggesting very strong puri- in the IGS region. In particular, a long sequence (77 bp) fying selective pressure. Ten, we compared CSA and its Peng et al. BMC Ecol Evo (2021) 21:71 Page 11 of 17

100/100/1/100 Camellia sinensis var. assamica 100/100/1/100 Camellia sinensis var.assamica cv. Yu nkang10 100/100/1/100 Camellia grandibracteata Camellia pubicosta 90/100/1/100 subgen. Thea 100/98/1/99 Camellia sinensis var. sinensis 100/100/1/100 100/100/1/100 Camellia sinensis var. sinensis cv. Longjing43 Camellia sinensis var. sinensis cv. Anhua 51/-/1/62 Camellia sinensis var. pubilimba

-/-/0.2/65 Camellia reticulata subgen. Camellia -/-/0.2/54 Camellia petelotii subgen. Thea Camellia azalea 100/50/1/100 subgen. Camellia -/-/0.5/- Camellia pitardii -/-/0.3/51 Camellia cuspidata -/-/1/64 Camellia taliensis subgen. Thea -/-/0.3/- Camellia impressinervis -/-/0.5/- Camellia crapnelliana subgen. Camellia Camellia yunnanensis

100/50/1/100 Coffea arabica outgroup Coffea canephora Fig.7 The phylogenetic tree of Camellia species based on the all shared coding protein genes among all species (SCDS-Tree). Cofea canephora and Cofea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% are represented by "-"

99/95/1/99 Camellia sinensis var. assamica 87/97/1/99 Camellia sinensis var. assamica cv. Yunkang10 97/98/1/99 Camellia grandibracteata

-/-/0.7/- Camellia pubicosta 95/84/1/92 Camellia sinensis var. sinensis subgen. Thea 61/63/0.7/- -/-/0.8/- Camellia sinensis var. sinensis cv. Longjing43 Camellia sinensis var. sinensis cv. Anhua Camellia sinensis var. pubilimba

-/-/0.9/- Camellia taliensis subgen. Thea -/-/0.3/- Camellia azalea subgen. Camellia -/50/1/100 Camellia impressinervis subgen. Thea -/-/0.6/- -/-/0.5/- Camellia pitardii -/-/0.5/- subgen. Camellia -/-/0.1/- Camellia reticulata Camellia cuspidata subgen. Thea

-/-/0.5/- -/-/0.2/- Camellia crapnelliana subgen. Camellia Camellia petelotii subgen. Thea Camellia yunnanensis subgen. Camellia 100/50/1/100 Coffea arabica outgroup Coffea canephora Fig. 8 The phylogenetic tree of Camellia species based on the ycf1 gene (ycf1-Tree). Cofea canephora and Cofea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% are represented by "-" Peng et al. BMC Ecol Evo (2021) 21:71 Page 12 of 17

C. sinensis var. sinensis cv. Anhua vs C. sinensis var. sinensis a 0.006 C. sinensis var. sinensis cv. Longjing43 vs C. sinensis var. sinensis C. sinensis var. sinensis cv. Anhua vs C. sinensis var. sinensis cv. Longjing43 0.005

0.004

Ka 0.003

0.002

0.001

0 2 3 2 4 6 6 3 0 9 6 5 1 4 8 2 atpI rpl2 ycf2 ycf1 ycf4 infA ycf3 psbI psaI rps7 rps2 rps3 rps8 rps4 petL ndhI petB petA psbJ psaJ clpP ccsA atpA atpF petN atpE atpB rbcL petD psbZ ndhJ psbL petG psbT atpH rpoB psbA rpoA psaB psaA psbF psbE psbB psbK psaC psbC psbN ndhB ndhF ndhE ndhA psbD accD psbH rpl2 rpl2 rpl3 matK rpl1 rpl1 ndhK ndhC rpl3 rpl3 rpl2 ndhD ndhG ndhH psbM cemA rps1 rps1 rps1 rps1 rps1 rps1 rps1 rpoC2 rpoC1

b 0.009 0.008 0.007 0.006 0.005 Ks 0.004 0.003 0.002 0.001 0 2 1 6 4 6 2 3 2 3 0 1 9 5 6 4 8 2 atpI infA rpl2 ycf2 ycf1 ycf3 ycf4 psbI psaI rps2 rps8 rps3 rps7 rps4 petL psaJ clpP petB ndhI petA psbJ atpA atpF petN ccsA atpE atpB rbcL psbZ ndhJ petG psbT petD psbL atpH rpoB psbB rpoA psbA psaB psaA psbF psbE psbK psbC psbN psaC psbD psbH ndhB ndhF ndhE ndhA accD rpl3 rpl1 rpl1 rpl2 rpl2 rpl3 matK ndhK ndhC rpl3 rpl2 cemA psbM ndhD ndhG ndhH rps1 rps1 rps1 rps1 rps1 rps1 rps1 rpoC rpoC LSC IR SSC Fig. 9 Nonsynonymous nucleotide substitution (Ka) and synonymous nucleotide substitution (Ks) of homologous protein-coding genes from C. sinensis var. sinensis, C. sinensis var. sinensis cv. Longjing43 and C. sinensis var. sinensis cv. Anhua

cultivated species. However, no protein coding genes had and GC content) (Table 2, Figs. 2 and 3). For other land synonymous or nonsynonymous mutations, suggesting plant species, such as peanuts, cherries and radishes, the very strong purifying selective pressure (Additional fle 5: cp genome size and structure, as well as the gene content Table S5). and order, are highly conserved among the cultivated and Te site specifc selection events of 16 genes with syn- wild species [29–31]. onymous or non-synonymous mutations were analyzed We found that the IR regions of cultivated tea had by Bayesian Empirical Bayes (BEB), and we found that expanded or contracted. Te IR length of the CSSA and some amino acid sites of ycf1 and rps15 exhibited site- CSSL was approximately 20 bp smaller than that of the specifc selection (Additional fle 6: Table S6). In ycf1, CSS, accounting for 32% of the diference in the complete there were six sites under positive selection, and in rps15, genome length. Te IR length of the CSAY was approxi- there was one site under positive selection. For example, mately 30 bp larger than that of the CSA, accounting for in the rps15 gene, the codon ACC (threonine) of CSS was 42% of the diference in the complete genome length mutated to AAC (asparagine) in two cultivated species. (Fig. 4). In fact, the contraction and expansion of IRs is considered to be one of the important reasons for the Discussion cp genome length variation [32]. Further SNP and indel Understanding the genetic variation between cultivated analysis showed that ycf1 and trnV-GAC​_rps12 changed and wild species is crucial for introducing interesting in the Chinese cultivated type, while trnN-GUU_​ ndhF traits from wild species into cultivars [26]. Organelle and rrn5_trnR-ACG​ changed in the Assam cultivated genome sequencing has proven to be an efective way to type. In CSS and CSSL, a 9 bp sequence (TCC​TTC​TTC/ resolve phylogenetic relationships among closely related GAA​GAA​GGA) was inserted into the ycf1 gene (Addi- species [27, 28]. Here, we constructed and compared the tional fle 2: Table S2). Tis is suggested that ycf1 is one of complete cpDNA genome sequences of three cultivars the important reasons for the expansion or contraction of and fourteen wild species of Camellia. At the genomic the IRs of the Chinese cultivated type. Te same results level, cultivated species were more conserved than wild were also found in Zheng’s study [33]. He analyzed the species, in terms of both architecture and linear sequence cp genome length variation in 272 species and found that order (the length, genes number, genes arrangement, atpA, accD and ycf1 accounted for 13% of the diference Peng et al. BMC Ecol Evo (2021) 21:71 Page 13 of 17

in length. Terefore, ycf1, which is associated with plant 0.3004. In addition, some amino acids of ycf1 and rps15 survival, may play a key role in the cp genome size vari- exhibited site-specifc selection (Additional fle 4: Tables ations of cultivated tea. In CSAY, a 77 bp sequence was S4 and Additional fle 6: S6). rpoB is crucial for genetic inserted into the trnN-GUU_ndhF region (IRb/SSC information transmission, and it afects the transcrip- boundary region) (Additional fle 3: Table S3). Tis is the tion of DNA into RNA and the translation of RNA into main reason for the expansion or contraction of the IRs protein. Tey were also found to be under selective pres- of the Assam cultivated type. sure in beverage crops [13]. Te rps15 gene has a func- In addition to the variations in genome size, there were tion in chloroplast ribosome subunits [35]. ycf1, encoding also some nucleotide mutations in the cultivated spe- a component of the chloroplast’s inner envelope mem- cies. In this study, the nucleotide diversity of cultivated brane protein translocon, is one of the largest plastid tea was lower than that of wild tea (Fig. 5), but the unbal- genes [13], and it is also essential for almost all plant anced sampling between the 14 wild tea and 3 cultivated lineages [36]. Tese positively selected genes may have tea may lead to nucleotide diversity diference of cpDNA played key roles in the adaptation of cultivated tea to var- fragments. Te nucleotide diversity comparison of 358 ious environments. cultivated rice and 54 wild rice also presented similar Generally, the deletion or insertion of amino acids in results [34]. Nevertheless, we found that the nucleotide the encoded protein will afect the structure and func- diversity of 14 sequences in the Chinese cultivated tea tion of this gene [37–39]. In the Chinese cultivated type, was higher than that of wild tea (rps16, rps4, trnL-UAA​ 16 protein coding genes had nucleotide substitutions, _intron, rps4_trnT-UGU​, ndhC_trnV-UAC,​ cemA_petA, among which the ycf1 gene had the largest number of rpl33_rps18, psbN_psbH, rpl36_infA, rpl14_rpl16, nucleotide substitution. At the same time, in ycf1, fve rps7_rps12, ndhG_ndhI, trnV-GAC​_rps12, and amino acid sites exhibited site-specifc selection, and a rps12_rps7) (Fig. 5). Tese sequences suggested the vari- 9 bp sequence insertion was found in CSSA (Additional ations in the cp genomes of cultivated tea, and they are fle 4: Table S4 and Additional fle 6: S6, Fig. 9). potential molecular markers for distinguishing Camellia ycf1 has an open reading frame of unknown function, species and for the phylogenetic analysis of Camellia. but some studies have inferred that ycf1 is very important Previous studies have proven that human interfer- for plant survival [33, 40]. In tobacco, a chimeric gene ence had efects on the genetic structure, leaf nutrients conferring resistance to aminoglycoside antibiotics has and pollen morphology of Camellia. Yan et al. analyzed been transferred into ycf1 in the cp genome. Ten, the the genetic relationship of fve semi-wild tea which due plantlets were cultured in plant regeneration medium to lack of human management for a long time were stud- containing the antibiotic spectinomycin. After that, the ied by using genome-wide SNP. Tey found that human maintenance of a fairly constant ratio of wild-type ver- interference will afect the genetic structure of tea. After sus transformed genome copies was found. However, the the human interference stopped, the tea from fve dif- wild-type genome was still present in all samples whereas ferent geographical regions could be divided into three the transplastomic fragments were missing from several diferent groups because of the absence of free pollina- samples after culturing in antibiotic-free medium. Tis tion [22]. Xiong et al. made comparative analyses of the experiment proved that ycf1 encodes products that are nutrient content in the leaves of cultivated and wild C. essential for cell survival. ycf1 is also an important molec- nitidissima. Tey found that cultivated C. nitidissima ular marker of plants [41, 42], because it has higher vari- had signifcantly higher contents of essential amino acids ability than other known cp molecular markers (such as (26.05%) and total amino acids (33.27%) than wild C. the widely used rbcL and matk genes), for both the total nitidissima [23]. Shu et al. proved that there are obvious number of parsimony informative characters and the diferences in pollen morphology and exine morphology percent variability. between cultivated and wild species of Camellia [24]. Phylogenetic analysis of cultivated and wild tea showed Terefore, to explore specifc evolutionary characteristics that CSSA and CSSL were closely related to the CSS, and between cultivated tea and its wild relatives, we subse- CSAY was closely related to CSA (Figs. 6 and 7), which quently performed evolutionary research on cultivated supports the previous fnding that most of the cultivated tea. tea originated directly from CSS and CSA [43]. However, First, to have a clear view of the cp genomic adaptive in the ycf1-Tree, the posterior probabilities or bootstrap evolution of cultivated tea, we performed evolution- values of the cultivated tea branch were lower than that ary analysis on the protein-coding sequences. Te Ka/ of the complete cp-Tree and the SCDS-Tree, which sug- Ks ratio is very useful for measuring selective pressure at gested that the ycf1 gene has diverged in cultivated tea the protein level [35]. In the Chinese cultivated type, Ka/ (Figs. 6, 7 and 8). Similar results have been found in Cory- Ks value of 79 genes was 0, and only rpoB had a value of lus [44]. Te ycf1 gene of Corylus chinensis and Corylus Peng et al. BMC Ecol Evo (2021) 21:71 Page 14 of 17

avellana have a similar evolutionary history, which is dif- compared in these species. Genomic variation analyses ferent from that of Corylus heterophylla. Tis evolution showed that the cultivated species were more conserved of cultivated plants may be related to the utilization ef- than the wild species in terms of architecture and linear ciency of photosynthesis. Photosystem biogenesis regu- sequence order. In the Assam cultivated type, the varia- lator 1 (PBR1), the RNA binding protein encoded by the tion in the chloroplast genome was mainly manifested by nuclear genome, can improve the translation efciency sequence insertion of IGS regions. In the Chinese culti- of ycf1 in the Arabidopsis thaliana cp genome. Addition- vated type, the variation in the chloroplast genome was ally, the symbiosis and stability maintenance of the three mainly manifested by the nucleotide polymorphism and photosynthetic complexes are regulated [45]. However, at sequence insertion of some sequences. Tese nucleotide present, the efect of mutations in the single amino acid polymorphisms also led to the mutation of amino acid site and the insertion or deletion of the short sequence sites in some genes, among which ycf1 was the gene with on the function of ycf1 is still not clear, and cultivated the most mutation sites. In addition to amino acid muta- tea may provide important materials for this kind of tions, there was a 9 bp base insertion in the ycf1 gene. research. ycf1 is believed to be a critical gene for plant survival, and In the phylogenetic trees, CSS, CSA, CGR and CPU it may infuence photosynthesis and be related to plant formed a monophyletic clade with 100% bootstrap val- adaptation. Evolutionary processes analyses showed that ues. CSS, CSA and CGR were classifed into the sect. CSA and its cultivated species were tightly clustered, Tea, but CPU was classifed into the sect. Corallina while CSS and its cultivated species were not tightly clus- (Table 2). Tis indicates that CPU and sect. Tea plants tered. Te evolutionary relationship between CSS and have close genetic relationship. It also supports the result CSSL was closer than that with CSSA in the ycf1-Tree. of Huang’s research [18]. However, CTA belongs to sect. However, at present, the efect of the mutation in the sin- Tea, together with two species of sect. Archecamel- gle amino acid site and insertion or deletion of the short lia and one species of sect. Teopsis that were located sequence on the function of ycf1 are still not clear, and in another clade, which indicates that the phylogenetic cultivated tea may provide important materials for this direction of CTA is diferent from that of the other sect. kind of research. Tea species. CTA is often considered to be a wild rela- tive of cultivated tea [43]. Both are monoecious, insect- Methods pollinated and outcrossing species. However, there are Genomic materials collection of cultivated tea diferences in their morphological characters. For exam- Te complete cp genome of CSSA has been presented ple, CTA has the features of 5-locule ovaries and large and annotated in our previous study [14] with GenBank sepals and petals, whereas CSS has features of 3-locule accession number MH042531. Meanwhile, we searched ovaries and small sepals and petals [46, 47]. Based on the in the National Center for Biotechnology Information evidence of the chloroplast genome, we hypothesized (NCBI) dataset to fnd the published cultivated tea’s that CTA and CSS have diferent genetic polymorphism. complete cp genomes, and only CSSL and CSAY with In this study, CIM and CPE were not clustered into the accession numbers KF562708 and MH019307 have been same branch. Te of CIM is controversial. published [17]. Gene map of the three cultivated tea was CIM and CPE were classifed into the sect. Archecamel- generated using BRIG [49]. lia by Ming et al. [47], while Chang et al. [46] classifed CIM into the sect. Chrysantha. Terefore, we infer that Comparative analysis between cultivated tea and wild tea it is not acceptable to combine the sect. Archecamellia Te Basic Local Alignment Search Tool (BLAST) was and the sect. Chrysantha. In the subgenus Camellia, CPI used to fnd closely related cp genomes of CSSA in and CRE formed a clade, as did CAZ and CCR, and the NCBI. After the cp genome of Camellia was screened, bootstrap value was 97–100%. Among them, CPI, CRE 17 Camellia cp genomes with sampling information and CAZ are all sect. Camellia plants, while CCR is clas- remained, including 3 cultivated species (CSSA, CSSL sifed into sect. Heterogenea [47] or sect. Furfuracea [46]. and CSAY) and 14 wild species (Table 2). Previous stud- However, both morphological and molecular characteris- ies have shown that both CSSA and CSSL originated tics indicate that CCR is closely related to some plants in directly from CSS, while CSAY originated directly from sect. Camellia [48]. CSA [43, 49]. Terefore, we used CSS and CSA as the reference sequence to study the genomic variations and Conclusion evolution direction between cultivated tea and wild tea. In this work, the complete cp genomes of three culti- Tree methods were used for comparative genomic vated species and 14 wild species of Camellia were stud- analysis: (I) Te comparison of the cp genomic sequence ied. Genomic variation and evolutionary processes were identity was based on the method of Li [50] using Peng et al. BMC Ecol Evo (2021) 21:71 Page 15 of 17

mVISTA in Shufe-LAGAN mode and BRIG, respec- Evolutionary analysis of cultivated tea tively. (II) Te comparison of the expansion and contrac- After alignment of the cultivated and wild species, tion of IR regions was presented. First, we annotated and the number and position of SNPs and indels in the extracted the IR boundary of the Camellia cp genomes genomes were presented in DnaSP v6.10.04 according by Plastid Genome Annotator (PGA) [51]. Ten, the IR to the Wu’s method [63]. boundary regions were visualized by using Visio profes- Te Ka and Ks rates as well as the Ka/Ks ratio in the sional 2016. (III) Comparisons based on the Pi values homologous protein-coding genes were used to evalu- of the Chinese cultivated type, Assam cultivated type, ate the adaptive evolution of the cultivated species. After and wild type were performed according to the method aligning each gene using the ClustalW (Codons) program of Njuguna [52]. First, we used annotation information in MEGA7.0, the Ks, Ka and Ka/Ks values of each gene to extract intergenic regions, protein coding genes and were determined according to Dong’s method [64] with intron regions of 17 Camellia species in Tbtools v0.6666 the program from the PAML package [65]. For identifca- [53]. After comparing these sequences, 211 loci shared tion of site-specifc selection, four models, M1 (neutral), among Camellia species were found, including 80 pro- M2 (selection), M7 (beta) and M8 (beta & ω), were used tein coding genes, 117 intergenic regions, and 14 intron in codeml from the PAML package. Te BEB was used to regions. Each loci was divided into three datasets: (I) the calculate the posterior probabilities for site classes. Only sequences consisted of the Chinese cultivated type, (II) sites with posterior probabilities > 0.9 were selected. the sequences consisted of the Assam cultivated type; (III) the sequences consisted of wild type. Each sequence Abbreviations was aligned using clustal alignment with default set- BEB: Bayesian Empirical Bayes; BI: The Bayesian inference; BRIG: Blast Ring tings in MEGA7.0 [54]. Te Pi of these regions was cal- Image Generator; CAZ: Camellia azalea; CCR​: Camellia crapnelliana; CCU​ : Camellia cuspidate; CDS: Protein-coding regions; CGR​: Camellia grandibrac- culated using DnaSP v6.10.04 [55] to show divergence at teata; CIM: Camellia impressinervis; cp: Chloroplast; CPE: Camellia petelotii; sequence level. CPI: Camellia pitardii; CPU: Camellia pubicosta; CRE: Camellia reticulate; CSA: Camellia sinensis var. assamica; CSSA: Camellia sinensis var. sinensis cv. Anhua; CSSL: Camellia sinensis var. sinensis cv. Longjing43; CSP: Camellia sinensis var. Phylogenetic analysis of Camellia pubilimba; CSS: Camellia sinensis var. sinensis; CTA:​ Camellia taliensis; CYU​ : Camellia yunnanensis; IGS: Intergeneric regions; Indel: Insertion/deletion; IR: Tree datasets were used to construct the following phy- Inverted repeat; Ka: Nonsynonymous nucleotide substitution; Ks: Synonymous logenetic trees of Camellia: (I) the complete cp genomes, nucleotide substitution; LSC: Large single copy region; ML: The maximum like- (II) the all shared protein coding genes among all species lihood; MP: The maximum parsimony; NCBI: National Center for Biotechnology Information; NJ: The neighbor-joining; PBR1: Photosystem biogenesis regula- (SCDS), and (III) ycf1 gene sequences. First, all datasets tor 1; PGA: Plastid Genome Annotator; Pi: Nucleotide diversity; SNP: Single were aligned using MAFFT v7.380 [56] under the FFT- nucleotide polymorphism; SSC: Small single copy region. NS-2 default setting. Te alignments were used for phy- logenetic analysis. After that, according to the method Supplementary Information described by Xie et al. [57] and Zhang et al. [58], we The online version contains supplementary material available at https://​doi.​ used four methods to construct phylogenetic trees: NJ org/​10.​1186/​s12862-​021-​01800-1. method, MP method, BI method and ML method. Cof- fea canephora and Cofea arabica were selected as the Additional fle 1: Table S1. Comparative analysis of nucleotide variability outgroup. (Pi) values among the Chinese cultivated type, Assam cultivated type and wild type Te NJ analysis was reconstructed via MEGA7.0 [54] Additional fle 2: Table S2. Single nucleotide polymorphism (SNP) and under the default settings with 1000 bootstrap values. insertion/deletion (indel) information from comparisons among C. sinensis Te MP analysis was performed in PAUP 4.0a167 [59] var. sinensis, C. sinensis var. sinensis cv. Longjing43 and C. sinensis var. with heuristic searches with 1000 bootstrap replicates. sinensis cv. Anhua Te BI analysis was performed with Mrbayes 3.2.7 [60] Additional fle 3: Table S3. Single nucleotide polymorphism (SNP) and insertion/deletion (indel) information from comparisons between C. sinen- under the best substitution models and parameters. Te sis var. assamica and C. sinensis var. sinensis assamica cv. Yunkang10 analysis parameters were set as four chains that were run Additional fle 4: Table S4. Nonsynonymous nucleotide substitution (Ka) simultaneously for 10,000,000 generations or until the and synonymous nucleotide substitution (Ks) rates, as well as the Ka/Ks average standard deviation of the split frequencies fell ratio of homologous protein-coding genes from C. sinensis var. sinensis, C. below 0.01. Te best substitution models and parameters sinensis var. sinensis cv. Longjing43 and C. sinensis var. sinensis cv. Anhua were computed by jmodeltest 2.1.7 [61]. Te ML analy- Additional fle 5: Table S5. Nonsynonymous nucleotide substitution (Ka) and synonymous nucleotide substitution (Ks) rates, as well as the Ka/Ks sis was carried out in IQ-TREE [62] using the default set- ratio of homologous protein-coding genes from C. sinensis var. assamica tings, with 1000 bootstrap values for tree evaluation. Te and C. sinensis var. sinensis assamica cv. Yunkang10 best substitution models were computed by IQ-TREE. Additional fle 6: Table S6. Positive selection sites identifed among 16 All the best substitution models mentioned earlier were genes with synonymous or nonsynonymous mutations listed in Additional fle 7: Table S7. Peng et al. BMC Ecol Evo (2021) 21:71 Page 16 of 17

5. Aharoni A, Giri AP, Verstappen FWA, Bertea CM, Sevenier R, Sun Z, Additional fle 7: Table S7. The best substitution models in the phyloge- Jongsma MA, Schwab W, Bouwmeester HJ. Gain and loss of fruit favor netic analysis of Camellia compounds produced by wild and cultivated strawberry species. Plant Cell. 2004;16(11):3110–31. 6. Vijayan K, Zhang W-J, Tsou C-H. Molecular taxonomy of Camellia Acknowledgements () inferred from nrITS sequences. Am J Bot. 2009;96(7):1348–60. We sincerely appreciate Dr. Huang Hui—from Kunming Institute of Botany— 7. Yang J-B, Yang S-X, Li H-T, Yang J, Li D-Z. Comparative chloroplast for providing us with the samples collection information of Camellia species. genomes of Camellia species. PLoS ONE. 2013;8(8):e73053. We also thank Chen Yi for the help during the analysis process. 8. Gao JY, Cliford RP, Du YQ. Collected species of the genus Camellia—an illustrated outline. Zhejiang: Zhejiang Science and Technology Press; Authors’ contributions 2005. JP and ZXG conceived the study. All authors collected feld samples. MD, SLQ, 9. Moore MJ, Soltis PS, Bell CD, Burleigh JG, Soltis DE. Phylogenetic analysis ZHY, XZF analyzed the fnal data. YZL and ZXG acquired funds for this study. of 83 plastid genes further resolves the early diversifcation of . JP wrote the original manuscript, and all authors have read and approved the Proc Natl Acad Sci USA. 2010;107(10):4623–8. manuscript. 10. Wachira F, Tanaka J, Takeda Y. Genetic variation and diferentiation in tea (Camellia sinensis) germplasm revealed by RAPD and AFLP variation. J Funding Hortic Sci Biotechnol. 2001;76(5):557–63. This study was supported by the Key Projects of National Forestry and 11. Jianfei Z. , vol. 12. Beijing: Flora of China Editorial Commit- Grassland Bureau (201801), Forestry Science and Technology Project of tee; 2007. Hunan Province (XLK201920), Natural Science Foundation of Hunan Province 12. Ming TL. A systematic synopsis of the genus Camellia. Acta Bot Yun- (2019JJ50027), Postgraduate Scientifc Research Innovation Project of Hunan nanica. 1999;21(2):3–5. Province (CX20200711) and Scientifc Innovation Fund for Post-graduates of 13. Wei C, Yang H, Wang S, Zhao J, Liu C, Gao L, Xia E, Lu Y, Tai Y, She G, et al. Central South University of Forestry and Technology (CX20201010). The fund- Draft genome sequence of Camellia sinensis var. sinensis provides insights ing bodies played no role in the design of the study and collection, analysis, into the evolution of the tea genome and tea quality. Proc Natl Acad Sci and interpretation of data and in writing the manuscript. USA. 2018;115(18):E4151–8. 14. Dong M, Liu S, Xu Z, Hu Z, Ku W, Wu L. The complete chloroplast genome Availability of data and materials of an economic plant, Camellia sinensis cultivar Anhua, China. Mitochon- Raw sequences data of CSSA were submitted to National Center for Biotech- drial DNA B. 2018;3(2):558–9. nology Information (NCBI) database with accession number MH042531. Other 15. Chen SH, Deng YS, Gong ZH, Zhu HY. Analysis of the evolution of genomic data mentioned in the article can be accessed from NCBI and the technological innovation model of Anhua Dark Tea Industry. J Agric. details of accession number has been provided in Table 2. 2015;5(6):96–101. 16. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolu- Declarations tion, and applications in genetic engineering. Genome Biol. 2016;17:134. 17. Ye XQ, Zhao ZH, Zhu QW, Wang YY, Lin ZX, Ye CY, Fan LJ, Xu HR. Entire Ethics approval and consent to participate chloroplast genome sequence of tea (Camellia sinensis cv. Longjing 43): a Not applicable. molecular phylogenetic analysis. J Zhejiang Univ. 2014;40(4):404–12. 18. Huang H, Shi C, Liu Y, Mao S-Y, Gao L-Z. Thirteen Camellia chloroplast Consent for publication genome sequences determined by high-throughput sequencing: Not applicable. genome structure and phylogenetic relationships. BMC Evol Biol. 2014;14:151. Competing interests 19. Xu X, Zheng W, Wen J. The complete chloroplast genome of the long The authors declare that they have no competing interests. blooming and critically endangered Camellia azalea. Conserv Genet Resour. 2018;10(1):5–7. Author details 20. Yang JB, Li DZ, Li HT. Highly efective sequencing whole chloroplast 1 Hunan Research Center of Engineering Technology for Utilization of Environ- genomes of angiosperms by nine novel universal primer pairs. Mol Ecol mental and Resources Plant, Central South University of Forestry and Technol- Resour. 2014;14(5):1024–31. ogy, Changsha 410004, Hunan, People’s Republic of China. 2 Hunan Provincial 21. Zeng CX, Hollingsworth PM, Yang J, He ZS, Zhang ZR, Li DZ, Yang JB. Key Lab of Dark Tea and Jin‑Hua, Hunan City University, Yiyang 413000, Hunan, Genome skimming herbarium specimens for DNA barcoding and phy- People’s Republic of China. 3 Key Laboratory of National Forestry and Grass- logenomics. Plant Methods. 2018;14:43. land Administration on Management of Western Forest Bio‑Disaster, College 22. Yan G, Da-he Q, Chun Y, Yan L, Zheng-wu C, Juan C. Genetic diversity of Forestry, Northwest A & F University, Yangling 712100, , People’s of old tea plant resources in Jiuan City of Province, using Republic of China. genome-wide SNP. J Plant Genet Resour. 2019. https://​doi.​org/​10.​1186/​ s12870-​019-​1917-5. Received: 7 February 2020 Accepted: 22 April 2021 23. Xiong Z, Qi X, Wei X, Chen Z, Tang H, Chai S. Nutrient composi- tion in leaves of cultivated and wild . Pak J Bot. 2012;44(2):635–8. 24. Shu JL, Chen L, Wang HS, Wang PS, Xu M, Song WX. Pollen morphology, ultrastructure and evolution of tea plant and other genus Camellia plants. References J Tea Sci. 1998;18(1):6–15. 1. Zohary D. Domestication of crop plants. In: Levin SA, editor. Encyclopedia 25. Zhao DW, Yang JB, Yang SX, Kato K, Luo JP. Genetic diversity and domes- of biodiversity. New York: Elsevier; 2001. p. 217–27. tication origin of tea plant Camellia taliensis (Theaceae) as revealed by 2. Denison RF, Kiers ET, West SA. Darwinian agriculture: when can humans microsatellite markers. BMC Plant Biol. 2014;14:14. fnd solutions beyond the reach of natural selection? Q Rev Biol. 26. Amar MH, Magdy M, Wang L, Zhou H, Zheng B, Jiang X, Atta AH, 2003;78(2):145–68. Han Y. Peach chloroplast genome variation architecture and phylog- 3. Matesanz S, Gianoli E, Valladares F. Global change and the evolution of enomic signatures of cpDNA introgression in Prunus. Can J Plant Sci. phenotypic plasticity in plants. Ann N Y Acad Sci. 2010;1206:35–55. 2019;99(6):885–96. 4. Kuniyuki Saitoh KN. Toshiro Kuroda: characteristics of fowering 27. Ivanova Z, Sablok G, Daskalova E, Zahmanova G, Apostolova E, Yahubyan and pod set in wild and cultivated types of soybean. Plant Prod Sci. G, Baev V. Chloroplast genome analysis of resurrection tertiary relict 2004;7(2):172–7. Haberlea rhodopensis highlights genes important for desiccation stress response. Front Plant Sci. 2017;8:204. Peng et al. BMC Ecol Evo (2021) 21:71 Page 17 of 17

28. Ma PF, Zhang YX, Zeng CX, Guo ZH, Li DZ. Chloroplast phylogenomic 48. Jiang ZD. Preliminary study of molecular phylogenetics and biogeogra- analyses resolve deep-level relationships of an intractable bamboo tribe phy of the genus Camellia L. based on chloroplast DNA. 2017; Master’s Arundinarieae (Poaceae). Syst Biol. 2014;63(6):933–50. Thesis, Zhejiang SCI-TECH University, Hangzhou. 29. Cho M-S, Yoon HS, Kim S-C. Complete chloroplast genome of cultivated 49. Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Gen- fowering cherry, Prunus xyedoensis “Somei-yoshino” in comparison with erator (BRIG): simple prokaryote genome comparisons. BMC Genomics. wild Prunus yedoensis Matsum. (Rosaceae). Mol Breed. 2018;38(9):112. 2011. https://​doi.​org/​10.​1186/​1471-​2164-​12-​402. 30. Wang J, Li Y, Li CJ, Yan CX, Zhao XB, Yuan CL, Sun QX, Shi CR, Shan SH. 50. Li C, Zhao Y, Xu Z, Yang G, Peng J, Peng XY. Initial characterization of the Twelve complete chloroplast genomes of wild peanuts: great genetic chloroplast genome of Vicia sepium, an important wild resource plant, resources and a better understanding of Arachis phylogeny. BMC Plant and related inferences about its evolution. Front Plant Sci. 2020;11:73. Biol. 2019;19(1):504. 51. Qu XJ, Moore MJ, Li DZ, Yi TS. PGA: a software package for rapid, accurate, 31. Yamagishi H, Terachi T, Ozaki A, Ishibashi A. Inter- and intraspecifc and fexible batch annotation of plastomes. Plant Methods. 2019;15:12. sequence variations of the chloroplast genome in wild and cultivated 52. Njuguna AW, Li Z-Z, Saina JK, Munywoki JM, Gichira AW, Gituru RW, Raphanus. Plant Breed. 2009;128(2):172–7. Wang Q-F, Chen J-M. Comparative analyses of the complete chloroplast 32. Palmer JD, Nugent JM, Herbon LA. Unusual structure of geranium chlo- genomes of nymphoides and menyanthes species (menyanthaceae). roplast DNA: a triple-sized inverted repeat, extensive gene duplications, Aquat Bot. 2019;156:73–81. multiple inversions, and two repeat families. Proc Natl Acad Sci USA. 53. Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R. TBtools: an 1987;84(3):769–73. integrative toolkit developed for interactive analyses of big biological 33. Zheng XM, Wang JR, Feng L, Liu S, Pang HB, Qi L, Li J, Sun Y, Qiao WH, data. Mol Plant. 2020;13(8):1194–202. Zhang LF, Chen YL, Yang QW. Inferring the evolutionary mechanism of 54. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics the chloroplast genome size by comparing whole-chloroplast genome analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4. sequences in seed plants. Sci Rep. 2017;7(1):1555. 55. Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, 34. Cheng L, Nam J, Chu SH, Rungnapa P, Min MH, Cao Y, Yoo JM, Kang JS, Ramos-Onsins SE, Sánchez-Gracia A. DnaSP 6: DNA sequence polymor- Kim KW, Park YJ. Signatures of diferential selection in chloroplast genome phism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302. between japonica and indica. Rice. 2019;12(1):65. 56. Katoh K, Standley DM. MAFFT multiple sequence alignment software 35. Gao CM, Deng YF, Wang J. The complete chloroplast genomes of Echina- version 7: improvements in performance and usability. Mol Biol Evol. canthus species (Acanthaceae): phylogenetic relationships, adaptive evo- 2013;30(4):772–80. lution, and screening of molecular markers. Front Plant Sci. 2019;9:1989. 57. Xie DF, Yu Y, Deng YQ, Li J, Liu HY, Zhou SD, He XJ. Comparative analysis of 36. Dong WL, Wang RN, Zhang NY, Fan WB, Fang MF, Li ZH. Molecular evolu- the chloroplast genomes of the Chinese endemic genus Urophysa and tion of chloroplast genomes of Orchid species: insights into phylogenetic their contribution to chloroplast phylogeny and adaptive evolution. Int J relationship and adaptive evolution. Int J Mol Sci. 2018;19(3):716. Mol Sci. 2018;19(7):1847. 37. Cai Z, Guisinger M, Kim HG, Ruck E, Blazier JC, McMurtry V, Kuehl JV, 58. Zhang Y-b, Yuan Y, Pang Y-x, Yu F-l, Yuan C, Wang D, Hu X. Phyloge- Boore J, Jansen RK. Extensive reorganization of the plastid genome of netic reconstruction and divergence time estimation of Blumea DC. Trifolium subterraneum (Fabaceae) is associated with numerous repeated (Asteraceae: Inuleae) in China based on nrDNA ITS and cpDNA trnL-F sequences and novel DNA insertions. J Mol Evol. 2008;67(6):696–704. sequences. Plants. 2019;8(7):210. 38. Dugas DV, Hernandez D, Koenen EJM, Schwarz E, Straub S, Hughes CE, 59. Swoford D. PAUP*; Version 4. Sunderland, MA: Sinauer Associates; 2003. Jansen RK, Nageswara-Rao M, Staats M, Trujillo JT, et al. Mimosoid legume 60. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic plastome evolution: IR expansion, tandem repeat expansions, and accel- trees. Bioinformatics. 2001;17(8):754–5. erated rate of evolution in clpP. Sci Rep. 2015;5:16958. 61. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, 39. Wu Y, Liu F, Yang DG, Li W, Zhou XJ, Pei XY, Liu YG, He KL, Zhang WS, Ren new heuristics and parallel computing. Nat Methods. 2012;9(8):772–772. ZY, Zhou KH, Ma XF, Li ZH. Comparative chloroplast genomics of Gos- 62. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and sypium species: insights into repeat sequence variations and phylogeny. efective stochastic algorithm for estimating maximum-likelihood phy- Front Plant Sci. 2018;9:367. logenies. Mol Biol Evol. 2014;32(1):268–74. 40. Drescher A, Ruf S, Calsa T Jr, Carrer H, Bock R. The two largest chloroplast 63. Wu Z, Gu C, Tembrock LR, Zhang D, Ge S. Characterization of the whole genome-encoded open reading frames of higher plants are essential chloroplast genome of Chikusichloa mutica and its comparison with genes. Plant J. 2000;22(2):97–104. other rice tribe (Oryzeae) species. PLoS ONE. 2017;12(5):e0177553. 41. Dong WP, Xu C, Li CH, Sun JH, Zuo YJ, Shi S, Cheng T, Guo JJ, Zhou SL. 64. Dong M, Zhou XM, Ku WZ, Xu ZG. Detecting useful genetic markers ycf1, the most promising plastid DNA barcode of land plants. Sci Rep. and reconstructing the phylogeny of an important medicinal resource 2015;5:8348. plant, Artemisia selengensis, based on chloroplast genomics. PLoS ONE. 42. Neubig KM, Whitten WM, Carlsward BS, Blanco MA, Endara L, Williams 2019;14(2):e0211340. NH, Moore M. Phylogenetic utility of ycf1 in orchids: a plastid gene more 65. Yang Z, Nielsen R. Synonymous and nonsynonymous rate variation in variable than matK. Plant Syst Evol. 2009;277(1–2):75–84. nuclear genes of mammals. J Mol Evol. 1998;46(4):409–18. 43. Liu Y, Yang SX, Ji PZ, Gao LZ. Phylogeography of Camellia taliensis 66. Huang H, Shi C, Liu Y, Mao SY, Gao LZ. Thirteen Camelliachloroplast (Theaceae) inferred from chloroplast and nuclear DNA: insights into genome sequences determined by high-throughput sequencing: evolutionary history and conservation. BMC Evol Biol. 2012;12:92. genome structure and phylogenetic relationships. Bmc Evol Biol. 44. Wei YL, Wen ZF, Liu F, Zhang JW, Huang WG, Lan YP, Cheng LL, Cao QC, 2014;14:151. Hu GL. Bioinformatics analysis of ycf1 gene in Corylus. J Shanxi Agric Sci. 67. Zhang F, Li W, Gao CW, Zhang D, Gao LZ. Deciphering tea tree chloroplast 2018;46(08):1244–7, 1333. and mitochondrial genomes of Camellia sinensis var. assamica. Sci Data. 45. Yang XF, Wang YT, Chen ST, Li JK, Shen HT, Guo FQ. PBR1 selectively 2019;6:209. controls biogenesis of photosynthetic complexes by modulating translation of the large chloroplast gene Ycf1 in Arabidopsis. Cell Discov. 2016;2:16003. Publisher’s Note 46. Zhang HD. Thea—a section of beveragial tea trees of the genus Camellia. Springer Nature remains neutral with regard to jurisdictional claims in pub- Acta Sci Nat Univ Sunyatseni. 1981;1:87–99. lished maps and institutional afliations. 47. Ming TL. A revision of Camellia sect. Thea. Acta Bot Yunnanica. 1992;14(2):115–32.