Dissecting the Genome of Star Fruit (Averrhoa Carambola L.)
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Copenhagen University Research Information System Dissecting the genome of star fruit (Averrhoa carambola L.) Fan, Yannan; Sahu, Sunil Kumar; Yang, Ting; Mu, Weixue; Wei, Jinpu; Cheng, Le; Yang, Jinlong; Mu, Ranchang; Liu, Jie; Zhao, Jianming; Zhao, Yuxian; Xu, Xun; Liu, Xin; Liu, Huan Published in: Horticulture Research DOI: 10.1038/s41438-020-0306-4 Publication date: 2020 Document version Publisher's PDF, also known as Version of record Document license: CC BY Citation for published version (APA): Fan, Y., Sahu, S. K., Yang, T., Mu, W., Wei, J., Cheng, L., ... Liu, H. (2020). Dissecting the genome of star fruit (Averrhoa carambola L.). Horticulture Research, 7, [94]. https://doi.org/10.1038/s41438-020-0306-4 Download date: 23. Jun. 2020 Fan et al. Horticulture Research (2020) 7:94 Horticulture Research https://doi.org/10.1038/s41438-020-0306-4 www.nature.com/hortres ARTICLE Open Access Dissecting the genome of star fruit (Averrhoa carambola L.) Yannan Fan 1,SunilKumarSahu 1,TingYang1,WeixueMu1,JinpuWei1,LeCheng2, Jinlong Yang2, Ranchang Mu3, Jie Liu3, Jianming Zhao3, Yuxian Zhao4,XunXu1,5,XinLiu1,6 and Huan Liu 1,7 Abstract Averrhoa carambola is commonly known as star fruit because of its peculiar shape, and its fruit is a rich source of minerals and vitamins. It is also used in traditional medicines in countries such as India, China, the Philippines, and Brazil for treating various ailments, including fever, diarrhea, vomiting, and skin disease. Here, we present the first draft genome of the Oxalidaceae family, with an assembled genome size of 470.51 Mb. In total, 24,726 protein-coding genes were identified, and 16,490 genes were annotated using various well-known databases. The phylogenomic analysis confirmed the evolutionary position of the Oxalidaceae family. Based on the gene functional annotations, we also identified enzymes that may be involved in important nutritional pathways in the star fruit genome. Overall, the data from this first sequenced genome in the Oxalidaceae family provide an essential resource for nutritional, medicinal, and cultivational studies of the economically important star-fruit plant. Introduction diffusion into the bloodstream and controlling the blood 1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,; The star-fruit plant (Averrhoa carambola L.), a member glucose concentration. of the Oxalidaceae family, is a medium-sized tree that is In addition to its use as a food source, star fruit is uti- distinguished by its unique, attractive star-shaped fruit lized as an herb in India, Brazil, and Malaysia and is widely (Supplementary Fig. 1). A. carambola is widely distributed used in traditional Chinese medicine preparations4 as a around the world, especially in tropical countries such as remedy for fever, asthma, headache, and skin diseases5. India, Malaysia, Indonesia, and the Philippines, and is Several studies have demonstrated the presence of various considered an important species; thus, it is extensively phytochemicals, such as saponins, flavonoids, alkaloids, cultivated in Southeast Asia and Malaysia1,2. It is also a and tannins, in the leaves, fruits, and roots of star-fruit popular fruit in the markets of the United States, Aus- plants6,7; these compounds are known to confer anti- tralia, and the South Pacific Islands3. Star fruits have a oxidant and specific healing properties. A study by Cab- unique taste, with a slightly tart, acidic (in smaller fruits) rini et al.5 indicated that the ethanolic extract from A. or sweet, mild flavor (in large fruits). Star fruit is a good carambola is highly effective in minimizing the symptoms source of various minerals and vitamins and is rich in of ear swelling (edema) and cellular migration in mice. A natural antioxidants such as vitamin C and gallic acid. flavonoid compound (apigenin-6-C-β-fucopyranoside) Moreover, the presence of high amounts of fiber in these isolated from A. carambola leaves showed anti- fruits aids in absorbing glucose, retarding glucose hyperglycemic action in rats, and might show potential for use in the treatment and prevention of diabetes8. Moreover, 2-dodecyl-6-methoxycyclohexa-2,5-diene-1,4- Correspondence: Huan Liu ([email protected]) dione (DMDD) extracted from the roots of A. carambola 1State Key Laboratory of Agricultural Genomics, China National GeneBank, BGI- exhibits potential benefits in the treatment of obesity, Shenzhen, 518120 Shenzhen, China fi ’ 2 insulin resistance, and memory de cits in Alzheimer s BGI-Yunnan, BGI-Shenzhen, 650106 Kunming, China 9,10 Full list of author information is available at the end of the article. disease . These authors contributed equally: Yannan Fan, Sunil Kumar Sahu © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a linktotheCreativeCommons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Fan et al. Horticulture Research (2020) 7:94 Page 2 of 10 Although A. carambola plays very significant roles in BWA-MEM software15. In total, 943,278,896 (99.12%) traditional medicine applications, there are very limited reads could be mapped to the genome, and 88.13% of studies on A. carambola at the genetic level, mainly due to them were properly paired (Supplementary Table 2). a lack of genome information. Therefore, filling this genomic gap will help researchers to fully explore and Genome annotation understand this agriculturally important plant. As a part A total of 68.15% of the assembled A. carambola gen- of the 10KP project11,12, the draft genome of A. carambola ome was composed of repetitive elements (Supplementary collected from the Ruili Botanical Garden in Yunnan, Table 3). Among these repetitive sequences, LTRs were China, was assembled in this study using an advanced 10X the most abundant, accounting for 61.64% of the genome. genomics technique to further elucidate the evolution of DNA class repeat elements represented 4.19% of the the Oxalidaceae family. A fully annotated genome of A. genome; LINE and SINE classes accounted for 0.28% and carambola will serve as a foundation for pharmaceutical 0.016% of the assembled genome, respectively. For gene applications of the species and the improvement of prediction, we combined homology-based and de novo- breeding strategies for the star-fruit plant. based approaches and obtained a non-redundant set of 24,726 gene models with 4.11 exons per gene on average. Results The gene length was 3457 bp on average, while the Genome assembly and evaluation average exon and intron lengths were 215 and 827 bp, Based on k-mer analysis, a total of 35,655,285,391 k- respectively. The gene model statistical data compared mers were used, with peak coverage of 75. The A. car- with seven other closely related species are shown in ambola genome was estimated to be ~475 Mb in size Supplementary Fig. 3. To evaluate the completeness of the (Supplementary Fig. 2). To perform genome assembly, a gene models for A. carambola, we used BUSCO with total of 156 Gb of clean reads were utilized by Supernova Embryophyta odb9. A total of 1281 (88.9%) complete v2.1.113. The final assembly contained 69,402 scaffold orthologs were detected from the predicted star fruit gene sequences, with an N50 of 2.76 Mb, and 78,313 contig sets (Supplementary Table 4). sequences, with an N50 of 44.84 Kb, for a total assembly Functions were assigned to 16,490 (66.69%) genes. size of 470.51 Mb (Table 1). Completeness assessment These protein-coding genes were then subjected to fur- was performed using Benchmarking Universal Single- ther exploration against the KEGG, NR, and COG protein Copy Orthologs (BUSCO) version 3.0.114 with Embry- sequence databases16, in addition to the SwissProt and ophyta odb9. The results showed that 1327 (92.20%) of TrEMBL databases17, and InterProScan18 was finally used the expected 1440 conserved plant orthologs were to identify domains and motifs (Supplementary Table 5, detected as complete (Supplementary Table 1). To further Supplementary Fig. 4). Noncoding RNA genes in the evaluate the completeness of the assembled genome, we assembled genome were also annotated. We predicted performed short read mapping using clean raw data with 759 tRNA, 1341 rRNA, 90 microRNA (miRNA), and 2039 small nuclear RNA (snRNA) genes in the assembled genome (Supplementary Table 6). Table 1 Statistics of genome assembly. Since star fruit is an important cultivated plant, the fi Contig Scaffold identi cation of disease resistance genes was one of the focuses of our study. Nucleotide-binding site (NBS) genes Size (bp) Number Size (bp) Number play an important role in pathogen defense and the cell cycle. We identified a total of 80 non-redundant NBS- N90a 5420 12,457 7033 3988 encoding orthologous genes in the star fruit genome N80 13,875 7548 39,210 608 (Supplementary Table 7). Among these genes, TIR (encoding the toll interleukin receptor) motif was found N70 23,325 5165 34,6109 131 to be significantly smaller than in other eudicot plants, N60 33,460 3619 1,307,770 60 except for cocoa. Unlike other plants, the leucine-rich N50 44,841 2503 2,757,598 35 repeat (LRR) motif was not the most or second most 19 Longest 717,770 – 14,768,062 – common motif in the NBS gene family in star fruit .