<<

Physiol Mol Biol (March 2020) 26(3):409–418 https://doi.org/10.1007/s12298-019-00736-7

RESEARCH ARTICLE

Chloroplast genome of an extremely endangered sutchuenensis Franch.: gene organization, comparative and phylogenetic analysis

1 2 1 2 Tao Yu • Bing-Hong Huang • Yuyang Zhang • Pei-Chun Liao • Jun-Qing Li1

Received: 5 May 2019 / Revised: 24 October 2019 / Accepted: 22 November 2019 / Published online: 1 January 2020 Ó Prof. H.S. Srivastava Foundation for Science and Society 2020

Abstract is a critically endangered Keywords Thuja sutchuenensis Á Chloroplast genome Á tertiary relict species of from southwestern Sequence divergence Á Non-synonymous substitution Á . We sequenced the complete chloroplast (cp) gen- Phylogenomics ome of T. sutchuenensis, showing the genome content of 129,776 bp, 118 unique genes including 82 unique protein- coding genes, 32 tRNA genes, and 4 rRNA genes. The Introduction genome structures, gene order, and GC content are similar to other typical gymnosperm cp genomes. Thirty-eight The chloroplast (cp) is a semiautonomous organelle in simple sequence repeats were identified in the T. sutchue- plants, encoding several key proteins involved in photo- nensis cp genome. We also found an apparent inversion synthesis and interactions between plants and the sur- between trnT and psbK between genera Thuja and Thu- rounding environment (Saski et al. 2007; Daniell et al. jopsis. In addition, positive selection signals were detected 2016). With the rapid development of high-throughput in seven genes with high Ka/Ks ratios. The reconstructed sequencing technologies, an increasing number of phylogeny based on locally collinear blocks of cp genomes sequenced cp genomes can easily be acquired from online among 21 gymnosperms species is similar to previous databases such as the National Center for Biotechnology inferences. We also inferred a Late-Miocene divergence Information (NCBI, http://www.ncbi.nlm.nih.gov/gen between T. sutchuenensis and T. standishii, according to omes/) (Daniell et al. 2016). The size of the cp genome in the dating of * 11.05 Mya by cp genomes. These results almost all vascular species ranges between 120 and will be helpful for future studies of Cupressaceae phy- 160 kb in length, and contains about 130 genes (Chumley logeny as well as studies in population genetics, system- et al. 2006). In most gymnosperm plants, the cp genome is atics, and cp genetic engineering. inherited from the patrilineal lineage, and exhibits very little or no recombination (Yi et al. 2013). Due to the rel- atively simple features of the cp genome, the cp sequences are commonly used as DNA barcodes for genetic identifi- Electronic supplementary material The online version of this cation, for systematic plant studies, and studies of plant article (https://doi.org/10.1007/s12298-019-00736-7) contains sup- plementary material, which is available to authorized users. biodiversity, biogeography, adaptation, etc. (Wambugu et al. 2015; Brozynska et al. 2016). & Pei-Chun Liao Thuja sutchuenensis is a tertiary relict species of [email protected] Cupressaceae (Tang et al. 2015). It is found in Sichuan and & Jun-Qing Li Chongqing Provinces in China, with an altitudinal range [email protected] between 800 and 2100 m (mostly between 1000 and 1 Forestry College, Beijing Forestry University, 35 Qinghua 1500 m) (Qiaoping et al. 2015). This species was firstly East Road, Haidian District, Beijing 100083, China discovered by a French botanist, Paul Guillaume Farges, 2 School of Life Science, National Normal University, who collected specimens from 1892 to 1900 and described 88 Ting-Chow Rd., Sec. 4, Taipei 116, Taiwan it in 1899. It was not seen again for almost 100 years. T. 123 410 Physiol Mol Biol Plants (March 2020) 26(3):409–418 sutchuenensis was then classified as extinct in the wild fragments with relatively concentrated sizes were screened (EW) until it was rediscovered in northeastern Chongqing according to the requirements of subsequent analysis. in 1999 (Qiaoping et al. 2015). It was reassessed as criti- Finally, enrich the target fragment and obtain the library for cally endangered (CR) in 2003 (Qiaoping et al. 2015; Gao sequencing (Meyer and Kircher 2010). We gener- et al. 2018). Southwest China is rich in vegetation, but after ated 9.5 Gb of total data with a 150-bp average read length. decades of excessive deforestation and habitat destruction, the original forest of T. sutchuenensis was fragmented, and Genome assembly, annotation only a small area remains. Protection of this critically endangered plant requires adequate basic research. High-quality data were filtered from raw sequence data using The study of conservation genetics and the reconstruction the NGSQC (Dai et al. 2010) with default settings. Clean of evolutionary history depend on suitable molecular reads were assembled using MITObim v1.8 (Hahn et al. markers. However, molecular research on T. sutchuenensis is 2013) and NOVOplasty (Dierckxsens et al. 2016) and using scarce. Previous population genetics studies based on inter the cp genome of T. standishii as the reference (Qu et al. simple sequence repeats (ISSR) can roughly describe genetic 2017). Samtools (Li et al. 2009) was used to recheck the diversity but cannot comprehensively illustrate evolutionary sequences based on the degree of coverage. Gaps between history due to the limitation of dominant markers and small the plastomic contigs were filled up using Sanger sequenc- numbers of loci used (Liu et al. 2013), particularly consid- ing. The specific primers used for PCR are shown in Sup- ering the large size of the T. sutchuenensis genome plementary Table S1. We annotated protein-coding genes [1C = 12.10 pg (Zonneveld 2012)]. The large genome size using Cpgavas (Yong and Zheng 2012) and checked gene makes the development of ample nuclear markers difficult. boundaries by comparing T. standishii cp genomes using the Instead, it is easier to obtain homologous genes from small- BLASTN software in NCBI (http://www.ncbi.nlm.nih.gov). sized cp genomes, and they are easier to sequence and pro- The error-corrected SQN-file of T. sutchuenensis cp genome vide comparable genetic information. In this study, we used sequence was submitted to GenBank (accession number: an Illumina Miseq Platform to assemble the cp genomes of T. MH784400). The circular gene map of the cp genome sutchuenensis in order to (1) deepen the understanding of the (Fig. 1) was drawn by the Organella Genome DRAW soft- cp genome structure of T. sutchuenensis and (2) to develop ware (OGDRAW v1.2.) (Lohse et al. 2007). effective molecular markers for T. sutchuenensis that can be applied to conservation genetics and evolutionary inference. Identifying cp SSRs We also reconstructed a phylogenomic with other pub- lished cp genomes of Cupressaceae species in order to Simple sequence repeats (SSRs) in T. sutchuenensis and T. acquire more robust phylogenetic inference in the subfamily standishii cp genomes were detected by MISA Perl script Cupressideae. (http://pgrc.ipk-gatersleben.de/misa/). The parameter of minimum repeat units was set at 10 for mono-, 6 for di-, and 5 for tri- to hexanucleotides. Materials and methods Genome-wide homologous comparison Plant materials and DNA sequencing and divergence of coding gene sequences

Young from T. sutchuenensis in the Chinese Whole cp genomes of T. sutchuenensis, T. standishii, and Academy of Forestry were collected and dried immediately dolabrata were aligned in MAUVE (Darling with silica gel for preservation. Whole genomic DNA was et al. 2004) under default settings to test rearrangement extracted using the plant genome DNA extraction kit events. Coding genes in each cp genome were determined (TIANGEN, Beijing, China) based on the manufacturer’s and aligned by MAFFT (Katoh et al. 2005). To identify protocol. The average insert sizes were approximately positive selection of T. sutchuenensis and T. standishii cp 350 bp. Paired-end Library preparations were constructed genome, synonymous (Ks) and non-synonymous substitu- using the Illumina Miseq platform according to the Illumina tion rate (Ka) of each coding gene was calculated in DnaSP standard method at Megagenomics Company (Beijing, 5.0 software (Librado and Rozas 2009). China). First, DNA sequence fragmentation by ultrasound, then terminal repair phosphorylation: DNA polymerase is Phylogenetic analysis and divergence time used to repair fragmented DNA into dsDNA at the flat ter- estimation minal, and T4 poly nucleotide kinase phosphorylates the 5 ‘terminal, and add A prominent A tail to the 3’ end of Twenty-one cp genomes of Cupressideae were used to dsDNA. Connect sequencing joints, and appropriate reconstruct the phylogenomic tree. Species of subfamilies 123 Physiol Mol Biol Plants (March 2020) 26(3):409–418 411

Fig. 1 Map of the chloroplast genome of Thuja sutchuenensis. Genes different functional groups are color-coded. The dashed area in the inside and outside of the circle are transcribed in the clockwise and inner circle indicates the GC content of the chloroplast genomes counterclockwise directions, respectively. Genes belonging to (colour figure online)

Taiwanioideae ( crypyomerioides and T. flou- analysis between locally collinear block sequences and T. siana), Cunninghamhioideae ( lanceolata), sutchuenensis genes sequence. PhyML (Guindon et al. ( glyptostroboides), and Taxo- 2009) and MrBayes (Huelsenbeck and Ronquist 2001) dioideae ( japonica) were selected as out- were used to reconstruct the phylogenetic tree based on ML groups. Accession numbers of all used cp genome and BI, respectively. The best substitution model was sequences are listed in Table S2. We used HomBlocks (Bi determined according to the Akaike information criterion et al. 2017) to determine locally collinear blocks (LCBs) (AIC), as suggested by Modeltest in MEGA7 (Kumar et al. among cp genomes and used Circoletto (http://tools.bat. 2016). The approximate likelihood-ratio test (aLRT) was infspire.org/circoletto/) (Darzentas 2010) to correspond used to evaluate the supporting values of branches in the

123 412 Physiol Mol Biol Plants (March 2020) 26(3):409–418

ML tree (Anisimova and Gascuel 2006). For the BI tree, mononucleotide-, one dinucleotide-, two trinucleotide-, six two parallel Markov chain Monte Carlo (MCMC) simula- tetranucleotide-, and one pentanucleotide-motif SSRs tions of 10 million generations that sampled every 1000 (Table 4; Fig. 2). Among these common SSRs, eight are generations, with 25% burn-in was used for obtaining the exons of coding genes, five in introns, and the remaining consensus tree. Divergence time was estimated using the nine are in non-coding regions. These SSRs can function as BEAST program under a relaxed clock model (Drummond underlying biomarkers for studies of genetic diversity. and Rambaut 2007). Three internal fossil data crowns of Cupressoideae 157.2 (Mya), Thuja–Thujopsis clade 58.5 Rearrangements of cp genomes in Thuja (Mya), and Juniperus 33.8 (Mya) were used as calibration points (Kangshan et al. 2012). MCMC procedures had a The extent of structural rearrangements between cp gen- burn-in of 100 million iterations. The default settings were omes of genera Thuja and Thujopsis can be visualized in adopted for other parameters when performing BEAST the Mauve genome alignment. One large inversion was analysis (Drummond and Rambaut 2007). Chronogram was found between genes trnQ-UUQ and trnQ-UUG (Fig. 3). drawn using FigTree v1.4.0 (http://tree.bio.ed.ac.uk/). The inversion changes the orders of trnT, rps4, trnS, ycf3, psaA, psaB, rps14, trnfM, trnG, psbZ, trnS, psbC, psbD, trnT, trnE, trnY, trnD, psbM, petN, trnC, rpoB, rpoC1, Results and discussion rpoC2, rps2, atpI, atpH, atpF, atpA, trnR, trnG, psaM, trnS, psbI and psbK in T. sutchuenensis and T. standishii The overall features of cp DNA of T. sutchuenensis from the sister genus Thujopsis. Inversion is an important mechanism in plant evolution that forms and maintains The gene map for the T. sutchuenensis cp genome is shown interspecific differentiation. The standing inversion may in Fig. 1. The complete cp genome of T. sutchuenensis is predominate over new mutations in maintaining local 129,776 bp, which is 729 bp shorter than that of T. stan- adaptation, although not absolutely (He and Knowles dishii. The cp genome contained 118 unique genes, 2017). The inversion of the plastid genome is associated including 82 unique protein-coding genes, 32 tRNA genes, with the origin of plant groups, for example, two cp and 4 rRNA genes (Tables 1 and 2). Among these genes, inversion events of the family Asteraceae likely occurred 14 genes harbored a single intron (trnA-UGC, trnG-UCC, around the same time during the origin of Asteraceae (Kim trnI-GAU, trnK-UUU, trnL-UAA, trnV-UAC, atpF, ndhA, et al. 2005). This long-fragment inversion characterized the ndhB, petB, petD, rpoC1, rpl2, rpl16) and two genes cp genome of Thuja, that can be used as a very reliable (ycf3 and rps12) harbored two introns (Table 3). The phylogenetic marker (Jansen and Palmer 1987; Doyle et al. overall GC content of the T. sutchuenensis cp genome was 1992, 1996; Kim et al. 2005). Since inverted repeats (IRs) 34.2%, which is similar to that of T. standishii. So did other play critical roles in stabilizing cp genomes against major genomic features, including the gene content, gene order, structural variations, and the loss of IRs could result in introns, and intergenic spacers. shorter intergenic spacers (Wu and Chaw 2014), more gene The cp genome of each Thuja species has 38 SSRs, loss, and structural rearrangements (Zheng et al. 2016). comprised of motifs of mono- to penta-nucleotides. Most One example of this is Pinaceae evolution, with short of the mononucleotide- and dinucleotide-motifs are com- repeats that can increase the diversity of cpDNA variety prised of A/T repeats, which are consistent with the view and complement the reduced IRs (Wu et al. 2011). Thus, that cpSSRs are attributed to AT richness (Liu et al. 2018). loss of the large IRs in Thuja and Thujopsis may be the There are 24 SSRs shared between cp genomes of T. main cause of rearrangements in gene block order. sutchuenensis and T. standishii, which comprise 14 Evolution of T. sutchuenensis and T. standishii

Table 1 Chloroplast genome characteristics of T. sutchuenensis and To pinpoint whether any genes of the cp genome under- T. standishii went adaptive evolution in T. sutchuenensis, we compared Thuja sutchuenensis the cp genome between T. sutchuenensis and the closely Genome size (bp) 129,776 130,505 related species T. standishii. Our analysis showed that the GC contents 34.4% 34.2% average Ka/Ks ratio of 82 protein genes was 0.256 between Number of genes 82 82 two cp genomes, which are similar to previous studies of Number of tRNA 32 32 Haberlea rhodopensis cp genome gene evolution (Ivanova Number of rRNA 4 4 et al. 2017). Among these 82 genes, there are 72 genes reveal a low synonymous substitution rate (Ks \ 0.01 between two species), suggesting recent species divergence 123 Physiol Mol Biol Plants (March 2020) 26(3):409–418 413

Table 2 Genes present in the T. sutchuenensis chloroplast genome Group of gene Genes name

Photostsyem I psaA, psaB, psaC, psaI, psaJ, psaM,ycf3**, ycf4 Photostsyem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ Cytochrome b/f complex petA, petB*, petD*, petG, petL, petN ATP synthase atpA, atpB, atpE, atpF*, atpH, atpI Chlorophyll biosynthesis chlB, chlL, chlN NADH dehydrogenase ndhA*, ndhB*, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK RubisCO large subunit rbcL RNA polymerase rpoA, rpoB, rpoC1*, rpoC2 Ribosomal proteins (SSU) rps2, rps3, rps4, rps7, rps8, rps11, rps12**,T, rps14, rps15, rps18, rps19 Ribosomal proteins (LSU) rpl2*, rpl14, rpl16*, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36 Other gene clpP, matK, accD, ccsA, infA, cemA, ycf1, ycf2 Transfer RNAs 32 tRNAs (six contain a single intron) Ribosomal RNAs rrn4.5, rrn5, rrn16, rrn23 A single asterisk (*) preceding gene names indicate intron-containing genes, and double asterisks (**) preceding gene names indicate two introns in the gene; T trans-splicing of the related gene

Table 3 The genes with introns Gene Exon I (bp) Intron I (bp) Exon II (bp) Intron II (bp) Exon III (bp) in the T. sutchuenensis chloroplast genome and the atpF 145 672 230 length of the exons and introns ndhA 558 759 549 ndhB 723 699 756 petB 6 788 642 petD 8 677 493 rpl16 9 860 411 rpl2 400 643 431 rpoC1 471 748 1674 rps12 114 – 232 526 26 ycf3 126 765 228 705 156 trnA-UGC 38 776 35 trnG-UCC 24 749 48 trnI-GAU 37 904 35 trnK-UUU 37 2447 35 trnL-UAA 35 494 50 trnV-UAC 35 521 37 or genetic constraint between T. sutchuenensis and T. ratio in the cp genome. It plays multiple ecological func- standishii. Since Ka/Ks are primarily gene-specific, we tions of the tolerance and resistance of stresses, insect calculated the Ka/Ks of each gene one by one. Seven genes, predation, and pathogens (Sablok et al. 2016), explaining accD, ycf2, ycf1, rpl22, rps18, rpoC1, and rps15, have an the signature of positive selection it presents. The genes of estimated Ka/Ks [ 1. We further drew a plot of Ka/Ks ycf1 and ycf2 showed the second and third highest Ka/Ks. against Ks to assess the timing of the adaptive divergence Although their functions are unclear, the positive-selection of these genes. Overall, the distribution of these seven signal was also reported in buckwheat (Cho et al. 2015), genes is not left-skewed in Ka/Ksversus Ks plot (Fig. 4), sesame (Zhang et al. 2013), Pinus (Parks et al. 2009), and excluding the recent adaptive divergence of the cp genome orchids (Neubig et al. 2009). The ycf1 and ycf2, with their between T. sutchuenensis and T. standishii. high evolutionary rate, are further suggested as molecular The accD gene, which has a basic metabolic function in markers for phylogenetic reconstruction (Neubig et al. producing acetyl-CoA carboxylase, has the highest Ka/Ks 2009; Parks et al. 2009). Three ribosomal protein genes

123 414 Physiol Mol Biol Plants (March 2020) 26(3):409–418

Table 4 List of simple sequence repeats (SSRs) shared in T. sutchuenensis and T. standishii are distributed in south- sutchuenensis and T. standishii chloroplast genome western China and southern Japan, respectively. Highly SSR type Base Location differential environmental and growth conditions between these two species (Kusumi et al. 2006; Tang et al. 2015) p1 A/T rps4 may facilitate the adaptive divergence of these cp genes. p1 A/T psbM–petN The mean annual temperature of the major habitat areas of p1 A/T petN–trnC T. sutchuenensis in the Dabashan Nature Reserve of p1 A/T rpoC1 intron Chengkou County is 14 °C, with 2.7 °C in the coldest p1 A/T rps2–atpI month (January) (Tang et al. 2015). In contrast, T. stan- p1 A/T atpI–atpH dishii grows primarily in regions where the climate is rel- p1 A/T atpF intron atively cool and humid, and the winter is snowy and much p1 A/T atpF intron colder (Kusumi et al. 2006). Such climatic differences may p1 A/T rpl32–ndhF lead to the differential efficiency of physiological and p1 A/T rps19 pathological processes of the cp genome, particularly the p1 A/T rps8 ribosomal protein genes (Xiang et al. 2015), between these p1 A/T rpl36–rps11 two Thuja species. However, we cannot preclude the pos- p1 A/T petB intron sibility of false positives since these genes do not deviate p1 A/T ndhC–trnV too far from neutrality (Ka/Ks = 1). P2 TA/AT rrn16–trnV P3 TTC/GAA matK–chlB Phylogenetic analysis and estimation of divergence P3 GAA/TTC rpoB time P4 TCCA/TGGA trnK intron P4 TACT/AGTA rpoB The cp genome sequence is useful for systematics. Specific P4 TTTC/GAAA ndhF relationships of the Cupressoideae remain obscure due to P4 CTAC/GTAG rrn23 complicated evolutionary history. Numerous studies have P4 CTTG/CAAG trnI–rrn16 tried to resolve intergeneric phylogenetic relationships P4 TTAA/AATT trnI–ycf2 (Mao et al. 2012; Yang et al. 2012; Qu et al. 2017). To P5 ATATT/AATAT chlL acquire a reasonable phylogeny and molecular dating of Cupressoideae, we reconstruct a phylogenetic tree based on cp genomes. A total of 95,128 bp length LCB sequences rpl22, rps18, and rps15 also reveal a high Ka/Ks ratio, and were recognized by HomBlocks. The LCB sequence con- have been demonstrated to play versatile roles in plant tains all of the coding genes, rRNAs genes, and a part of growth and development (Fleischmann et al. 2011). Thuja tRNAs. The tRNAs of trnL-CAA, trnI-CAU, trnQ-UUG,

Fig. 2 Distribution of SSRs present in T. sutchuenensis and T. standishii chloroplast genomes

123 Physiol Mol Biol Plants (March 2020) 26(3):409–418 415

Fig. 3 Synteny and rearrangements detected in Thuja and Thujopsis blocks are related to the level of sequence similarities. Lines link chloroplast genomes using the Mauve multiple-genome alignment. blocks with homology between two genomes. The red box appears as Colored outlined blocks surround regions of the genome sequence an inverted region (colour figure online) that align with part of another genome. The colored bars inside the

Fig. 4 Gene-specific Ks and Ka/Ks values between the chloroplast genomes of Thuja sutchuenensis and T. standishii

trnN-GUU, trnT-GGU, trnQ-UUG in T. sutchuenensis long-branch attraction may blur the phylogenetic relation- didn’t include in LCB sequence (Fig. S1 in the Supple- ships among these genera, particularly for the relatively mentary materials). The reconstructed phylogenetic topol- low supporting values in the J–C cluster (Fig. 5). ogy of the ML and BI analysis had high bootstrap supports Previous studies of T. sutchuenensis were based on that can provide a proper evolutionary placement for T. several DNA segments (Jian-Hua and Xiang 2005; Yang sutchuenensis. et al. 2012), which may be heavily biased in estimating the The highly supportive clustering of T. sutchuenensis and true branch length. To compensate for this deficiency, we T. standishii indicates efficient discrimination of the cp applied the LCBs of complete cp genome sequences to genome between Thuja and other genera. In our recon- estimate the splitting times between T. sutchuenensis and structed phylogeny, the Tuja–Tujopsis cluster diverged relatives. Divergence time within the Cupressaceae earlier from other genera in Cupressoideae clade, follow- between subclades J and C-HCX was estimated at about ing in sequence , clade –Platy- 49.72 million years ago (Mya), and the split between C and cladus, and clade Juniperus (J)– (C)– HCX about 46.29 Mya (Fig. 6), which is similar to the Hesperocyparis–Callitropsis (HCX, here the X, Xantho- estimation by Qu et al. (2017). The splitting time between cyparis, is not included) (Fig. 5), which is similar to Qu genera Thuja and Thujopsis is 58.46 Mya (Fig. 6), which is et al.’s (2017) inference. Our inference, however, partially also consistent with Qu et al.’s (2017) estimation at disagrees with the inference of ‘‘(J, (C, (HCX)))’’ by Qu 58.5 Mya. Our study further estimated the divergence time et al. (2017), but we agree with their hypothesis that the of T. sutchuenensis and T. standishii, which is roughly at

123 416 Physiol Mol Biol Plants (March 2020) 26(3):409–418

Fig. 5 Phylogenetic tree constructed by maximum likelihood (ML) and Bayesian inference (BI) methods based on chloroplast genome locally collinear blocks sequence of 23 Cupressoideae. The phylogenetic tree was drawn using Cunninghamia lanceolata, Taiwania crypyomerioides, and T. flousiana as outgroup. Nodes with 100/100 support values not showed. ML support values were in the front

the base of the Late Miocene (11.05 Mya, Fig. 6). Our inference still needs further verification, the estimates dating is almost consistent, although slightly earlier, with given are reasonable. the previous estimation based on nuclear ribosomal DNA internal transcribed spacer (9.53 ± 2.30 Mya) (Jian-Hua and Xiang 2005). This stage was roughly after the Maxi- Conclusions mum Transgression in the Tortonian Period (Late Mio- cene), when the climate was warmer than it is now (Ogg In this study, we present the whole cp genome of endan- et al. 2016). Warmer climates may result in population gered species T. sutchuenensis. The genome size and decline for these cooler adapted species, accelerating the genomic contents are similar to other Cupressaceae spe- differentiation of populations and/or species. Although this cies. We also found an inversion between trnT and psbK

Fig. 6 Bayesian chronogram for Cupressoideae and related subfamilies. Estimates of divergence times for major clades are listed, and blue bars show the 95% highest posterior density (HPD) of relevant nodes (colour figure online)

123 Physiol Mol Biol Plants (March 2020) 26(3):409–418 417 between Thuja and Thujopsis. In addition, seven genes Dierckxsens N, Mardulyn P, Smits G (2016) NOVOPlasty: de novo underlying positive selection were inferred by high Ka/Ks assembly of organelle genomes from whole genome data. Nucl Acids Res 45:e18 ratio. We also found 38 interspecific cpSSRs between T. Doyle JJ, Davis JI, Soreng RJ et al (1992) Chloroplast DNA sutchuenensis and T. standishii and estimated a splitting inversions and the origin of the grass family (Poaceae). Proc Natl time at approximately 11.05 Mya according to the cp Acad Sci USA 89:7722–7726 genome sequences. In addition to validating previous Doyle JJ, Doyle JL, Ballenger JA, Palmer JD (1996) The distribution and phylogenetic significance of a 50-kb chloroplast DNA research results with cp genome, this study provides inversion in the flowering plant family leguminosae. Mol additional information that can facilitate fundamental Phylogenetics Evol 5:429–438 research in systematics, population genetics, and applica- Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary tions for the DNA barcoding and genetic engineering of analysis by sampling trees. BMC Evol Biol 7:214 Fleischmann TT, Scharff LB, Alkatib S et al (2011) Nonessential this species. plastid-encoded ribosomal proteins in tobacco: a developmental role for plastid translation and implications for reductive genome Acknowledgements This research was financially supported by the evolution. Plant Cell 23:3137–3155 program ‘‘Reintroduction Technologies and Demonstration of Gao J, Liu Y, Bogonovich M (2018) Habitat is more important than Extremely Rare Wild Plant Population’’ of National Key Research climate and animal richness at shaping latitudinal variation in and Development Program (2016YFC0503106) and subsidized by the plant diversity in China. Biodivers Conserv. https://doi.org/10. Ministry of Science and Technology of Taiwan (Grant Nos.: MOST 1007/s10531-018-1620-0 105–2628-B-003–001-MY3 and MOST 105–2628-B-003–002-MY3) Guindon S, Dufayard JF, Hordijk W et al (2009) PhyML: fast and and National Taiwan Normal University (NTNU). accurate phylogeny reconstruction by maximum likelihood. Infect Genet Evol 9:384–385 Data archiving The complete chloroplast genome sequence data of Hahn C, Bachmann L, Chevreux B (2013) Reconstructing mitochon- Thuja sutchuenensis has been submitted to GenBank of NCBI with drial genomes directly from genomic next-generation sequencing accession number MH784400. Raw data used for the assembly of the reads—a baiting and iterative mapping approach. Nucl Acids chloroplast genome was submitted to NCBI, the SRA accession: Res 41:e129 PRJNA578239. The data will be available publically after the He Q, Knowles LL (2017) Rapid adaptation with gene flow via a acceptance of the manuscript. reservoir of chromosomal inversion variation? bioRxiv: 150771. https://doi.org/10.1101/150771 Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755 Ivanova Z, Sablok G, Daskalova E et al (2017) Chloroplast genome References analysis of resurrection tertiary relict Haberlea rhodopensis highlights genes important for desiccation stress response. Front Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test Plant Sci 8:204 for branches: a fast, accurate, and powerful alternative. Syst Biol Jansen RK, Palmer JD (1987) A chloroplast DNA inversion marks an 55:539–552 ancient evolutionary split in the sunflower family (Asteraceae). Bi G, Mao Y, Xing Q, Cao M (2017) HomBlocks: a multiple- Proc Natl Acad Sci USA 84:5818–5822 alignment construction pipeline for organelle phylogenomics Jian-Hua LI, Xiang QP (2005) Phylogeny and biogeography of Thuja based on locally collinear block searching. Genomics 110:18–22 L. (Cupressaceae), an Eastern Asian and North American Brozynska M, Furtado A, Henry RJ (2016) Genomics of crop wild Disjunct Genus. J Integr Plant Biol 47:651–659 relatives: expanding the gene pool for crop improvement. Plant Katoh K, Kuma K, Toh H, Miyata T (2005) MAFFT version 5: Biotechnol J 14:1070–1085 improvement in accuracy of multiple sequence alignment. Nucl Cho KS, Yun BK, Yoon YH et al (2015) Complete chloroplast Acids Res 33:511–518 genome sequence of tartary buckwheat (Fagopyrum tataricum) Kim KJ, Choi KS, Jansen RK (2005) Two chloroplast DNA and comparative analysis with common buckwheat (F. esculen- inversions originated simultaneously during the early evolution tum). PLoS ONE 10:e0125332 of the sunflower family (Asteraceae). Mol Biol Evol Chumley TW, Palmer JD, Mower JP et al (2006) The complete 22:1783–1792 chloroplast genome sequence of Pelargonium 9 hortorum: Kumar S, Stecher G, Tamura K (2016) MEGA7: molecular evolu- organization and evolution of the largest and most highly tionary genetics analysis version 7.0 for bigger datasets. Mol rearranged chloroplast genome of land plants. Mol Biol Evol Biol Evol 33:1870 23:2175–2190 Kusumi J, Sato A, Tachida H (2006) Relaxation of functional Dai M, Thompson RC, Maher C et al (2010) NGSQC: cross-platform constraint on light-independent protochlorophyllide oxidoreduc- quality analysis pipeline for deep sequencing data. BMC Genom tase in Thuja. Mol Biol Evol 23:941–948 11(Suppl 4):S7 Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/ Daniell H, Lin CS, Ming Y, Chang WJ (2016) Chloroplast genomes: map (SAM) format and SAMtools. Bioinformatics diversity, evolution, and applications in genetic engineering. 25:1653–1654 Genome Biol 17:134 Librado P, Rozas J (2009) DnaSP v5: a software for comprehensive Darling ACE, Mau B, Blattner FR, Perna NT (2004) Mauve: multiple analysis of DNA polymorphism data. Bioinformatics alignment of conserved genomic sequence with rearrangements. 25:1451–1452 Genome Res 14:1394–1403 Liu J, Shi S et al (2013) Genetic diversity of the critically endangered Darzentas N (2010) Circoletto: visualizing sequence similarity with Thuja sutchuenensis revealed by ISSR markers and the impli- Circos. Bioinformatics 26:2620–2621 cations for conservation. Int J Mol Sci 14:14860–14871 Liu X, Li Y, Yang H, Zhou B (2018) Chloroplast genome of the folk medicine and vegetable plant Talinum paniculatum (Jacq.)

123 418 Physiol Mol Biol Plants (March 2020) 26(3):409–418

Gaertn.: gene organization, comparative and phylogenetic anal- Wu CS, Chaw SM (2014) Highly rearranged and size-variable ysis. Molecules 23:857 chloroplast genomes in II clade (cupressophytes): Lohse M, Drechsel O, Bock R (2007) OrganellarGenomeDRAW evolution towards shorter intergenic spacers. Plant Biotechnol (OGDRAW): a tool for the easy generation of high-quality J 12:344–353 custom graphical maps of plastid and mitochondrial genomes. Wu CS, Lin CP, Hsu CY et al (2011) Comparative chloroplast Curr Genet 52:267–274 genomes of Pinaceae: insights into the mechanism of diversified Mao K, Milne RI, Zhang L et al (2012) Distribution of living genomic organizations. Genome Biol Evol 3:309–319. https:// Cupressaceae reflects the breakup of Pangea. Proc Natl Acad Sci doi.org/10.1093/gbe/evr026 USA 109:7793–7798 Xiang Z, Wen-Juan L, Jun-Ming L et al (2015) Ribosomal proteins: Meyer M, Kircher M (2010) Illumina sequencing library preparation functions beyond the ribosome. J Mol Cell Biol 7:92–104 for highly multiplexed target capture and sequencing. Cold Yang ZY, Ran JH, Wang XQ (2012) Three genome-based phylogeny Spring Harb Protoc 2010:pdb.prot5448 of Cupressaceae s.l.: further evidence for the evolution of Neubig KM, Whitten WM, Carlsward BS et al (2009) Phylogenetic gymnosperms and Southern Hemisphere biogeography. Mol utility of ycf 1 in orchids: a plastid gene more variable than Phylogenetics Evol 64:452–470 matK. Plant Syst Evol 277:75–84 Yi X, Gao L, Wang B et al (2013) The complete chloroplast genome Ogg JG, Ogg G, Gradstein FM (2016) A concise geologic time scale: sequence of Cephalotaxus oliveri (Cephalotaxaceae): evolution- 2016. Elsevier, Amsterdam ary comparison of Cephalotaxus chloroplast DNAs and insights Parks M, Cronn R, Liston A (2009) Increasing phylogenetic into the loss of inverted repeat copies in gymnosperms. Genome resolution at low taxonomic levels using massively parallel Biol Evol 5:688–698 sequencing of chloroplast genomes. BMC Biol 7:84 Yong Y, Zheng X (2012) CpGAVAS, an integrated web server for the Qiaoping X, Fajon A, Zhenyu L et al (2015) Thuja sutchuenensis: a annotation, visualization, analysis, and GenBank submission of rediscovered species of the Cupressaceae. Bot J Linn Soc completely sequenced chloroplast genome sequences. BMC 139:305–310 Genom 13:715 Qu XJ, Jin JJ, Chaw SM et al (2017) Multiple measures could Zhang H, Li C, Miao H, Xiong S (2013) Insights from the complete alleviate long-branch attraction in phylogenomic reconstruction chloroplast genome into the evolution of Sesamum indicum L. of Cupressoideae (Cupressaceae). Sci Rep 7:41005 PLoS ONE 8:e80508 Sablok G, Mudunuri SB, Edwards D, Ralph PJ (2016) Chloroplast Zheng W, Chen J, Hao Z, Shi J (2016) Comparative analysis of the genomics: expanding resources for an evolutionary conserved chloroplast genomic information of Cunninghamia lanceolata miniature molecule with enigmatic applications. Curr Plant Biol (Lamb.) Hook with sibling species from the Genera Cryptomeria 7:34–38 D. Don, Taiwania Hayata, and Calocedrus Kurz. Int J Mol Sci Saski C, Lee SB, Fjellheim S et al (2007) Complete chloroplast 17:1084 genome sequences of Hordeum vulgare, Sorghum bicolor and Zonneveld BJM (2012) Conifer genome sizes of 172 species, Agrostis stolonifera, and comparative analyses with other grass covering 64 of 67 genera, range from 8 to 72 picogram. Nord genomes. Theor Appl Genet 115:591 J Bot 30:490–502 Tang CQ, Yang Y, Ohsawa M et al (2015) Community structure and survival of tertiary relict Thuja sutchuenensis (Cupressaceae) in Publisher’s Note Springer Nature remains neutral with regard to the subtropical daba mountains, Southwestern China. PLoS ONE jurisdictional claims in published maps and institutional affiliations. 10:e0125307 Wambugu PW, Brozynska M, Furtado A et al (2015) Relationships of wild and domesticated rices (Oryza AA genome species) based upon whole chloroplast genome sequences. Sci Rep 5:13957

123