Complete Chloroplast Genome Sequences from Yellowhorn (Xanthoceras Sorbifolia) and Evolution Analysis Based on Codon Usage Bias
Total Page:16
File Type:pdf, Size:1020Kb
INTERNATIONAL JOURNAL OF AGRICULTURE & BIOLOGY ISSN Print: 1560–8530; ISSN Online: 1814–9596 20F–065/2020/24–4–676–684 DOI: 10.17957/IJAB/15.1487 http://www.fspublishers.org Full Length Article Complete Chloroplast Genome Sequences from Yellowhorn (Xanthoceras sorbifolia) and Evolution Analysis Based on Codon Usage Bias Xiaonong Guo1,2,3*, Yaling Wang2 and Suomin Wang1 1State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou 730020, PR, China 2Key Laboratory of Biotechnology and Bioengineering of State Ethnic Affairs Commission,Biomedical Research Center, Northwest Minzu University, Lanzhou 730030, PR, China 3Life Science and Engineering College, Northwest Minzu University, Lanzhou 730030, PR, China *For correspondence: [email protected] Received 24 March 2020; Accepted 18 April 2020; Published 16 August 2020 Abstract Yellowhorn (Xanthoceras sorbifolia Bunge) is an economically important tree, and possesses higher genetic diversity. To explore the factors affecting the use of synonymous codons, and also analyze the relationship of codon usage patterns and evolutionary factors in yellowhorn, the complete chloroplast (cp) genome was determined, and the codon usage bias was analyzed. The evolutionary forces of cp genome in yellowhorn were inferred. The size of yellowhorn cp genome was 159,474 bp, which comprised of 4 regions. A total of 48 genes were selected to analyze the codon usage, the codon usage bias in yellowhorn cp genome was weak and codons preferred A/T ending. The results of Principal component analysis (PCA) and Corresponding analysis (COA) indicated that base composition (such as GC content) for mutation bias might affect the codon bias. Furthermore, combined with the analysis of ENC-plot, parity rule 2 (PR2) plot, and neutrality plot, natural selection might dominate the codon usage bias in yellowhorn cp genome except the notable mutation pressure. Our results will help understand the genetic architecture and mechanisms in yellowhorn, and also contribute to enriching genetic resources and conservation of endemic yellowhorn species. © 2020 Friends Science Publishers Keywords: Yellowhorn; Chloroplast genome; Synonymous codon usage; Evolutionary forces Introduction hybridization and selection (Bi and Guan 2014). Yellowhorn possesses higher genetic diversity, which exhibits abundant Yellowhorn (Xanthoceras sorbifolia Bunge), belonging to phenotypic variation even though in a small range under the family Sapindaceae, is a deciduous tree species distributed same management practices (Ruan et al. 2017). Unlike naturally on hills and slopes in Northern China (Wang et al. other plant species, yellowhorn can survive for hundreds of 2019a). As an economically important tree, the oil content years (Wang et al. 2019b). Thus, evolution analyses can of seed (50–68% of kernel) is high, accompanied with high expand the genetic engineering for yellowhorn resources content unsaturated fatty acid (85–93%), which is mainly and help understanding the genetic diversity based on the comprised by linoleic acid, oleic acid, and nervonic acid geographical distribution. content (Ji et al. 2017). Moreover, yellowhorn exhibits Codons are the basic genetic codes of mRNA, the strong stress resistance to cold (even below −40ºC), salinity, degeneracy of which exists in the process of encoding amino and drought, implying important ecological value (Ruan et acids in all living organisms (Taylor and Coates 1989). The al. 2017). Yellowhorn can be used to treat rheumatism, gout, pattern of synonymous codon usage is not random, exhibit and children enuresis. Besides, triterpenoid saponins and usage bias in different genomes (Sharp and Cowe 1991). barringenol-like triterpenoids extracted from different Codon bias is unique to a given organism, and is influenced yellowhorn tissues have antitumor and anti-inflammatory by a series of factors, such as GC content, gene lengths, gene effects to treat Alzheimer disease (Ding et al. 2019). expression levels and so on (Plotkin and Kudla 2010). The Yellowhorn is an andromonoecious plant, evident main two evolutionary forces of natural selection and differences can be found in the morphological and mutation bias can be reflected by codon usage bias (Akashi physiological indicators of flower and fruit, including size, 1994; Hershberg and Petrov 2008). Thus, codon usage bias color, yield and oil content, after years of natural can provide clues on plant species evolution. To cite this paper: Guo X, Y Wang, S Wang, (2020). Complete chloroplast genome sequences from yellowhorn (Xanthoceras sorbifolia) and evolution analysis based on codon usage bias. Intl J Agric Biol 24:676‒684 Chloroplast Genome and Synonymous Codon usage of Xanthoceras sorbifolia / Intl J Agric Biol, Vol 24, No 4, 2020 Yellowhorn is an economically important tree, exhibits using MAFFT (Katoh et al. 2002). The phylogenetic tree the strong stress resistance (Wang et al. 2017). High-quality was generated by MEGA-X (Kumar et al. 2018). yellowhorn genome database has been generated recently by Liang et al. (2019). The comparative de novo transcriptome Codon usage bias analysis of yellowhorn was conducted by Zhou and Zheng (2015). Although the genetic resources of yellowhorn are Based on the cp genome data of yellowhorn, filtering the increasing, it is very important to analyze the synonymous repeated sequences and the sequences length less than 300 codons usage, which is currently unknown. The cp genome bp, 48 sequences with the CDSs were retained to do the data can provide molecular phylogenetic information for analysis of codon usage bias (Comeron and Aguadé 1998). developing commercially important yellowhorn species. In The important indicators were performed by using codon W the present study, we obtained the complete cp genome software version 1.3 sequences in yellowhorn, which comprised by a pair of IR, (https://sourceforge.net/projects/codonw/), including RSCU LSC, and SSC regions. We identified complete (the relative synonymous codon usage value), ENC (the characteristics of the yellowhorn chloroplast (cp) genome, effective number of codons), CAI (the codon adaptation focusing on the codon usage bias by using multivariate index), GC (G + C content of the gene), GC3s (the frequency statistical analysis, and analyzed the evolutionary forces of the nucleotides G + C at the 3rd of synonymous codons), preference. Our results will provide the information to help and the base compositions (A3s, T3s, G3s, and C3s) (Puigbò et understanding the genetic architecture and mechanisms in al. 2008). The G+C content at the 1st, 2 nd, 3rd of codons st yellowhorn, and also contribute to enriching genetic (GC1, GC2, GC3) and GC12 (the average GC content of 1 resources and conservation of endemic yellowhorn species. and 2nd) were calculated by Cusp function from EMBOSS (http://imed.med.ucm.es/EMBOSS/) (Rice et al. 2000). Materials and Methods Identification of the optimal codon Experimental material Using ENC values as preference standard, 48 sequences of Yellowhorn was planted in Wenhuagong in the Loess yellowhorn were ordered, and 5% high bias dataset and 5% Plateau Area (36°36′N,103°48′E), Lanzhou city, Gansu low bias dataset were selected. ΔRSCU of the codon was province, China. Fresh leaves were collected on August 29th calculated by the RSCU value of each codon with high bias 2019, and then kept at −80°C after immediately frozen. minus the RSCU value with low bias (Sharp and Li 1987). Finally, the optimal codon of the gene was speculated by the DNA extraction, sequencing and assembly codon possessing the highest and largest ΔRSCU. Genomic DNA was extracted by using the method of Li et Multivariate statistical analysis al. (2018). The integrity and quality of DNA was validated by a spectrophotometer (OD-1000,Shanghai, China). With The distribution of all the genes under a 59 vector space reference to the NEBNext® UltraTM DNA Library Prep Kit according to the RSCU values was analyzed by Principal for Illumina® instruction, the library with 250 bp length was component analysis (PCA). The data with different axes constructed and sequenced on an Illumina NovaSeq were obtained, and the axes consistent with the most platform (Benagen Tech Solution Co., Ltd, Wuhan, China). important factors which held important implications of After the Illumina PCR adapter reads, low-quality reads and codon usage variation were revealed (Wold et al. 1987). reads of more than 5% unknown nucleotide “Ns” were Corresponding analysis (COA) was used to compare two or filtered from the paired-end raw reads in the quality control more categories of variable data, and provide the visual step. All good-quality paired clean reads were obtained results of the major changes in trend of codon usage and using SOAPnuke software, version: 1.3.0 (Chen et al. genes (Perriere and Thioulouse 2002). ENC-plot mapping 2017). After performing bidirectional iterative derivation of analysis was used to identify the key factors affecting the the assembled reads by NOVOPlasty (k-mer = 39) codon usage bias. The ENC plot of the ENC values against (Dierckxsens et al. 2016), the whole circular genome the GC3S values was drawn by EXCEL 2016. The ideal sequence was obtained. All circled sequences were searched relationship of ENC and GC3s can be observed from the by BLASTN (version: BLAST 2.2.30+, E-value ≤1e-5) standard curve. against the reference database (Goodwin et al. 2015). Parity rule 2 (PR2) plot mapping analysis was constructed to show the relationship of the values A3/(A3 + Phylogenetic analysis T3) and G3/(G3 + C3), and the data were distributed into four quadrants in a scatter diagram (Sueoka 1999). Analysis of The total of 31 complete cp genome data from different the relationship between GC12 and GC3 values of all genes plant species were downloaded from the NCBI database and was performed by using neutrality plot mapping. In the the sequences alignment of which was initially conducted neutral graph, the value of GC12 is used as vertical 677 Guo et al. / Intl J Agric Biol, Vol 24, No 4, 2020 coordinate, and the value of GC3 is used as horizontal axis PCA analysis (Wei et al.