Complete Plastid Genome of Suriana Maritima L. (Surianaceae) and Its Implications in Phylogenetic Reconstruction of Fabales
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Genetics (2019)98:109 Ó Indian Academy of Sciences https://doi.org/10.1007/s12041-019-1157-3 (0123456789().,-volV)(0123456789().,-volV) RESEARCH NOTE Complete plastid genome of Suriana maritima L. (Surianaceae) and its implications in phylogenetic reconstruction of Fabales QIANG LAI1,2, CHENGJIE ZHU3, SHIRAN GU1,2, TIEYAO TU1* and DIANXIANG ZHANG1 1Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, People’s Republic of China 2University of Chinese Academy of Sciences, Beijing 100049, People’s Republic of China 3Shenzhen Academy of Metrology and Quality Inspection, Shenzhen 518055, People’s Republic of China *For correspondence. E-mail: [email protected]. Received 30 May 2019; revised 8 September 2019; accepted 12 September 2019 Abstract. The present paper reports for the first time the characteristics of the complete plastid genome of Surianaceae (Suriana maritima L.) in the order Fabales. The circular complete plastid genome is 163,747 bp in length with a typical quadripartite organization containing 115 unique genes, of which 80 are protein-coding genes, 31 tRNA genes and four rRNA genes. The plastid genome of S. maritima is characterized by absence of intron in the atpF gene, which has never been reported for any other species of the Fabales. The gene content and their orders in the plastid genome of Surianaceae are similar to the basal lineages of the legume family (Cercidoideae, Detarioideae) and Quillajaceae, supporting a likely common ancestor for the three families. Phylogenetic analysis supported the sister relationship between Surianaceae and Leguminosae, with strongly supported by Bayesian method and moderately supported by likelihood method. The complete plastid genome of Surianaceae could provide potential benefit in resolving the long-standing unresolved interfamily relationships of Fabales when a more comprehensive sampling from Polygalaceae and Leguminosae is available for future studies. Keywords. chloroplast genome; Leguminosae; plastome; Polygalaceae; Quillajaceae. Introduction Surianaceae within the order Fabales and for better understanding of its genetic and biogeographic diversifica- Family Surianaceae, with five genera and eight species of tion on the Pacific islands and coasts. trees and shrubs, is a small family of Fabales occurring exclusively in the pan tropics (Christenhusz and Byng 2016). Suriana maritima L., also known as bay cedar, is the Materials and methods only species of the genus Suriana L. which is distributed in tropical coasts and islands in Asia, America, Australia, East We collected fresh and healthy leaf tissues of S. maritima Africa and the Pacific Ocean (Claxton et al. 2005). from Xisha Islands (also known as Paracel Islands) of China, S. maritima can be easily recognized by its actinomorphic, and deposited the voucher specimens (TuTY2599-XS) in the pentamerous and yellow flowers with 4–5 carpels, and nut- herbarium of South China Botanical Garden (IBSC). The like fruits. This species has a high tolerance for drought, salt, whole genomic DNA was isolated with a modified CTAB heat and wind, and is considered as an ideal plant species for method (Doyle and Doyle 1987). The isolated total genomic coastal landscapes (Liu et al. 2018). Unfortunately, the DNA was fragmented to construct short-insert (300–500 bp) genetic and biogeographic diversification of the family are libraries following the manufacturer’s manual (Illumina). poorly understood and its phylogenetic relationships with Paired-end (PE) sequencing was conducted on the Illumina other three families (Leguminosae, Quillajaceae and Poly- HiSeq X-Ten instrument at Beijing Genomics Institute (BGI) galaceae) of Fabales have long been controversial (Bello in Wuhan, China. The clean data of sequencing were filtered et al. 2012). Knowledge on the genome of this species will and assembled using GetOrganelle pipeline (Jin et al. 2018), be crucial for recovering the phylogenetic position of which has been considered the most effecient assembler in 109 Page 2 of 7 Qiang Lai et al. assembling plastid genomes (Freudenthal et al. 2019). The We used microsatellite identification tool, MISA (Thiel GetOrganelle pipeline employs combined commands to et al. 2003) to locate the simple sequence repeats (SSRs) loci recruit plastid-like reads using Bowtie2 (Langmead and in the plastid genomes of S. maritima. The minimum Salzberg 2012) and to assemble the filtered reads using (threshold) number of the SSRs was set as 10, 6, 5, 4 for SPAdes (Bankevich et al. 2012). We manually corrected the mono-, di-, tri-, tetra-nucleotides SSRs. The IRa was not de novo assembly graphs and generated the complete cir- included in this analysis. cular chloroplast genome using Bandage (Wick et al. 2015) To reconstruct the phylogeny of Surianaceae and related based on the results from GetOrganelle. We used Geneious families, we included Prunus persica (L.) Batsch (Rosa- 9.1.8 (Kearse et al. 2012) to verify the accuracy of the ceae), Cucumis hystrix Chakrav. (Cucurbitaceae) and Cas- assembly, and used plastid genome annotator (PGA) (Qu tanea mollissima Blume (Fagaceae) as outgroups besides the et al. 2019) for genome annotation coupled with manual ingroup species mentioned above (table 1). We aligned 77 correction of start and stop codons and intron/exon bound- protein-coding genes using MAFFT 7.308 (Katoh and aries in Geneious. The annotated plastome was deposited in Standley 2013), followed by manually adjusting wherever GenBank (accession number: MK830069). necessary and concatenated them into a supermatrix by We applied Mauve 2.3.1 (Darling et al. 2004) to compare Geneious. We reconstructed the phylogeny by the methods the plastome structures. Because the plastid genome of Poly- of maximum likelihood (ML) using RAxML 8.2.10 (Sta- galaceae is not available so far, the dataset for plastome matakis 2014), including tree robustness assessment using structure comparisons included only three of four families in 1000 replicates of rapid bootstrap under the substitution Fabales, namely Surianaceae, Quillajaceae and Leguminosae. model of GTR ? G. We also reconstructed the phylogeny We employed genome of Quillaja saponaria Molina from using the method of Bayesian inference (BI) by MrBayes 3.2 GenBank to represent Quillajaceae, a small family with only (Ronquist and Huelsenbeck 2003) with the setting of one two species from Brazil and Peru. For the much larger family million generations and under the substitution model of Leguminosae, we recruited plastomic data of nine species to GTR?I?G. represent the four of six subfamilies according to the classi- fication of LPWG (2017) (table 1). We also hired Trifolium aureum Pollich, which has a typical smaller plastome due to Results and discussion loss of IRa, as a characterize species in the inverted repeat- lacking clade (IRLC) (Sveinsson and Cronk 2014), and Aca- The circular complete chloroplast genome of S. maritima is cia ligulata Benth., which has typical larger plastomes caused 163,747 bp in length and presents a quadripartite organiza- by IRs expanding into the SSC, to represent the inverted tion: a large single copy (LSC) region of 90,899 bp and a repeat-expanding clade (IREC) (Dugas et al. 2015). The IRa small single copy (SSC) region of 20,284 bp, respectively. region was not included in the plastome structure analysis These two regions were separated by two inverted repeat because it was lost in T. aureum Pollich and it has not been regions (IRa and IRb), each 26,282 bp in length. A total of reported for Q. saponaria. To estimate the plastid genome 115 unique genes were recovered, consisting of 80 protein- length of Quillajaceae, we added an IR region according to the coding genes, 31 tRNA genes and four rRNA genes. The sequences of Q. Saponaria (accession number: MH880827). overall GC content of the whole plastome is 35.3%. It was Table 1. Sampled species and characteristics of plastid genomes. GenBank Number accession of unique GC Sequence Species name Order Family/subfamily number genes content/% length (bp) Suriana maritima L. Fabales Surianaceae MK830069 115 35.3 163,747 Cercis canadensis L. Fabales Leguminosae/Cercidoideae KF856619 107 36.2 158,995 Bauhinia acuminata L. Fabales Leguminosae/Cercidoideae MF135595 112 36.5 155,548 Ceratonia siliqua L. Fabales Leguminosae/Caesalpinioideae KJ468096 108 36.7 156,367 Acacia ligulata Benth. Fabales Leguminosae/Caesalpinioideae NC_026134 112 35.4 174,233 Intsia bijuga (Colebr.) Kuntze Fabales Leguminosae/Detarioideae KX673214 108 36.2 159,215 Tamarindus indica L. Fabales Leguminosae/Detarioideae KJ468103 108 36.2 159,551 Lupinus luteus L. Fabales Leguminosae/Papilionoideae NC_023090 112 36.6 151,894 Trifolium aureum Pollich Fabales Leguminosae/Papilionoideae NC_024035 112 34.6 126,970 Astragalus nakaianus Y.N.Lee Fabales Leguminosae/Papilionoideae NC_028171 110 34.1 123,633 Quillaja saponaria Molina Fabales Quillajaceae MH880827 109 36.5 132,838 Cucumis hystrix Chakrav. Cucuribitales Cucurbitaceae NC_023544 113 37.0 155,031 Prunus persica (L.) Batsch Rosales Rosaceae NC_014697 113 36.8 157,790 Castanea mollissima Blume Fagales Fagaceae NC_014674 113 36.8 160,799 Plastid genome of Surianaceae Page 3 of 7 109 higher in IRs (42.4%), moderate in LSC (32.6%), while between the genome size variation and species richness much lower in SSC (28.7%). within Fabales until a much denser sampling covering all the The plastid genomes are relatively conserved in angios- major clades