Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Article Seven Complete Chloroplast Genomes from Symplocos: Genome Organization and Comparative Analysis Sang-Chul Kim 1 , Jei-Wan Lee 1,* and Byoung-Ki Choi 2 1 Department of Forest Bioresources, National Institute of Forest Science, Suwon 16631, Korea; [email protected] 2 Warm Temperate and Subtropical Forest Research Center, National Institute of Forest Science, 22, Donnaeko-Ro, Seogwipo-Si 63582, Korea; [email protected] * Correspondence: [email protected] Abstract: In the present study, chloroplast genome sequences of four species of Symplocos (S. chinensis for. pilosa, S. prunifolia, S. coreana, and S. tanakana) from South Korea were obtained by Ion Torrent sequencing and compared with the sequences of three previously reported Symplocos chloroplast genomes from different species. The length of the Symplocos chloroplast genome ranged from 156,961 to 157,365 bp. Overall, 132 genes including 87 functional genes, 37 tRNA genes, and eight rRNA genes were identified in all Symplocos chloroplast genomes. The gene order and contents were highly similar across the seven species. The coding regions were more conserved than the non- coding regions, and the large single-copy and small single-copy regions were less conserved than the inverted repeat regions. We identified five new hotspot regions (rbcL, ycf4, psaJ, rpl22, and ycf1) that can be used as barcodes or species-specific Symplocos molecular markers. These four novel chloroplast genomes provide basic information on the plastid genome of Symplocos and enable better taxonomic characterization of this genus. Citation: Kim, S.-C.; Lee, J.-W.; Choi, B.-K. Seven Complete Chloroplast Keywords: chloroplast; genome; next-generation sequencing; phylogenetics; simple sequence repeat Genomes from Symplocos: Genome Organization and Comparative Analysis. Forests 2021, 12, 608. https://doi.org/10.3390/f12050608 1. Introduction Chloroplasts (CPs) are characteristic plant organelles that play an important role in Academic Editor: Filippos photosynthesis. The CP genome is markedly similar across most land plant lineages in A. Aravanopoulos terms of gene order, gene content, structure, and intron content [1]. The CP genome can harbor as many as 101–118 genes including 66–82 protein-coding genes, 29–32 tRNA genes, Received: 4 March 2021 and four rRNA genes [2]. CPs contain independently replicated genomes, most of which ex- Accepted: 6 May 2021 Published: 12 May 2021 hibit a four-segment molecular structure with a large single-copy (LSC, 80–90 kb in length) and a small single-copy (SSC, 16–27 kb in length) region separated by a pair of inverted Publisher’s Note: MDPI stays neutral repeats (IRa and IRb, 20–28 kb in length) [1,3]. However, this typical structure is altered in with regard to jurisdictional claims in some plant lineages. For instance, in Cupressaceae [4] and Taxaceae [5], one IR has been published maps and institutional affil- lost. In Pinaceae, the IR length is reduced to below 1 kb [6,7]. In contrast, in Ericaceae, IR iations. region expansion resulted in a significant decrease in the SSC region size [8,9]. Additionally, events such as rearrangement, gene loss, gene replication, pseudogene generation, and intron gain/loss have occurred in the CP genomes of various plant lineages [10,11]. CPs are frequently used in taxonomic and evolutionary studies [12] as they are uniparentally inherited (mostly maternally transmitted, but paternally transmitted in conifers), have Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. well-preserved gene arrangement and content, and small size [13]. This article is an open access article The genus Symplocos Jacquin consists of woody flowering plants found mainly in distributed under the terms and humid tropical forest woods, with approximately 300 species distributed in the New World conditions of the Creative Commons and the Western Pacific Rim [14]. Symplocos was originally recognized as the sole genus of Attribution (CC BY) license (https:// Symplocaceae Jacquin [7,15], but the Angiosperm Phylogeny Group [16,17] now recognizes creativecommons.org/licenses/by/ two Symplocos genera (Cordyloblaste Moritzi and Symplocos). Although several molecular 4.0/). studies of Symplocos have been conducted and supported their monophyly, only some Forests 2021, 12, 608. https://doi.org/10.3390/f12050608 https://www.mdpi.com/journal/forests Forests 2021, 12, 608 2 of 15 protein-coding gene sequences (rpl16, matK) and partial non-coding sequences (nr-ITS, trnL– trnF, trnC-trnD, and trnH-psbA) were used in the analyses, and no genomic comparative analyses of Symplocos species have been conducted to date [14,18,19]. Four Symplocos species are endemic to South Korea [20–22]. Of these, S. prunifolia Siebold & Zucc. and S. coreana (H. Lév.) Ohwi grow only on Jeju Island in South Korea [23]. S. prunifolia is classified as an endangered, rare plant [24]. Thus, comparing the CP genomes of these species is essential to enable the discrimination of these species at the molecular level and supports the ongoing conservation of these plants. To date, the CP genome sequences of only three Symplocos species (S. paniculata [Thunb.] Miq., S. ovatilobata Noot., and S. costaricana Hemsl.) have been deposited in the National Center for Biotechnology Information database (NCBI), and no complete CP genome sequence of the Korean Symplocos species has been reported. In the present study, we aimed to sequence the CP genomes of four species of South Korean Symplocos. These CP genomes will provide the basis for studying the evolutionary history of Symplocos species and enable accurate taxonomic identification of vulnerable species. 2. Materials and Methods 2.1. Sample Collection, DNA Extraction, and CP Genome Sequencing Fresh leaf samples were obtained from four Symplocos species growing on Jeju Island in South Korea, and total genomic DNA was extracted using a Plant SV Mini Kit (GeneAll Biotechnology, Seoul, Korea), according to the manufacturer’s instructions. Intact leaf specimens were deposited into the herbarium at the Warm Temperate and Subtropical Forest Research Center (WTFRC; Table1). The extracted DNA was quantified using a spec- trophotometer (ND-1000, Nano-Drop Technologies, Wilmington, DE, USA). Genomic DNA libraries were produced, amplified, and sequenced using an Ion Xpress™ Plus Fragment Library Kit (Thermo Fisher Scientific, Waltham, MA, USA), Ion PI™ Hi-Q™ Sequencing 200 Kit (Thermo Fisher Scientific), and Ion PI™ Chip v3 Kit (Thermo Fisher Scientific). Table 1. Summary of the assembly data for Symplocos chloroplast genomes. Category S. chinensis for. pilosa S. coreana S. prunifolia S. tanakana Specimen number WTFRC10032701 WTFRC10031678 WTFRC10032813 WTFRC10031670 Accession number MW307951 MW307952 MW307953 MW307954 Total bases (GB) 7.24 11 9.57 12.8 Total reads 42,998,603 64,043,107 56,918,749 77,435,124 Read length (bp) 177 180 175 171 Genome size [GC (%)] 156,961 [37.5] 157,365 [37.5] 157,204 [37.5] 156,971 [37.5] LSC [GC (%)] 87,006 [35.5] 87,434 [35.5] 87,219 [35.4] 87,017 [35.5] SSC [GC (%)] 17,817 [31.0] 17,879 [31.0] 17,795 [31.0] 17,814 [31.0] IR [GC (%)] 26,069 [43.1] 26,026 [43.1] 26,095 [43.1] 26,070 [43.1] 2.2. CP Genome Assembly and Annotation CP DNA data were filtered using SPAdes [25]. Four CP genomes were assembled using Geneious 10.2.6 [26] and annotated using DOGMA [27], followed by manual edit- ing of non-annotated portions such as exons and introns. The tRNA sequences were confirmed using tRNAscan-SE 1.21 [28]. All annotations were checked against the refer- ence genomes (MG719832, MF770705, and MF179496). Genome maps were drawn using OrganellarGenomeDRAW (OGDRAW) [29]. 2.3. Genome Comparison The CP genomes were aligned using MAFFT [30]. The complete CP genomes of the seven Symplocos species were compared using m-VISTA [31]. Additionally, the CP genome junctions were visualized and compared using IRscope [32]. Forests 2021, 12, 608 3 of 15 2.4. Simple-Sequence Repeat (SSR) and Long Repeat Sequence Analysis SSR within the seven CP genomes were detected using the MISA Perl script (MI- croSAtellite) [33]. The minimum number of mononucleotide repeats was set to 10; that of dinucleotide repeats to 5; trinucleotide repeats to 4; and tetra-, penta-, and hexanucleotide repeats to 3. REPuter was used to identify forward, reverse, complementary, and palin- dromic sequences with a minimum repeat size of 30 bp and the sequence identity set to 90% [34]. 2.5. Divergent Hotspot Identification The seven Symplocos CP genomes were aligned using MAFFT and Geneious 10.2.6. Nucleotide diversity was analyzed using DnaSP version 6.12.03. [35], with the window length set to 800 bp and the step size set to 200 bp. 2.6. Phylogenetic Analysis The complete CP genomic sequences of seven Symplocaceae and four other Ericales species were downloaded from the NCBI database (Changiostyrax dolichocarpus (C.J.Qi) Tao Chen (MG722902), Pterostyrax hispidus Siebold & Zucc. (MG719840), Halesia carolina L. (MG719830), Sinojackia xylocarpa Hu (MG719827)), and used for maximum likelihood (ML) phylogenetic analysis. Eighty genes from 11 species were aligned using MAFFT in Geneious 10.2.6. The program jModelTest 2 was employed to determine the optimal substitution model [36]. ML analysis was performed using RAxML and a GTR+I+G model [37]. 3. Results 3.1. General Features of the CP Genomes Using the Ion Torrent system, we obtained the sequences of whole CP genomes of four South Korean Symplocos species [S. chinensis. pilosa (Nakai) Ohwi, S. coreana, S. prunifolia, and S. tanakana Nakai]. Sequencing of these genomes generated 7.24 GB (S. chinensis for. pilosa), 11 GB (S. coreana), 9.75 GB (S. prunifolia), and 12.8 GB (S. tanakana) of raw data, with an average read length of 175 bp. The genome lengths ranged from 159,961 bp for S. chinensis for. pilosa (MW307951) to 157,365 bp for S. coreana (MW307952). The SSC region length ranged from 17,795 bp for S.