Zhang et al. Horticulture Research (2021) 8:183 Horticulture Research https://doi.org/10.1038/s41438-021-00621-z www.nature.com/hortres

ARTICLE Open Access Chromosome-scale assembly of the chrysotoxum genome enhances the understanding of orchid evolution Yongxia Zhang1, Guo-Qiang Zhang 2,3,DiyangZhang 4,Xue-DieLiu4,5,Xin-YuXu4, Wei-Hong Sun4,5,XiaYu4, ✉ Xiaoen Zhu1, Zhi-Wen Wang6, Xiang Zhao6, Wen-Ying Zhong6, Hongfeng Chen7,Wei-LunYin4,8, Tengbo Huang 1 , ✉ ✉ Shan-Ce Niu9 and Zhong-Jian Liu4

Abstract As one of the largest families of angiosperms, the family is diverse. Dendrobium represents the second largest genus of the Orchidaceae. However, an assembled high-quality genome of species in this genus is lacking. Here, we report a chromosome-scale reference genome of Dendrobium chrysotoxum, an important ornamental and medicinal orchid species. The assembled genome size of D. chrysotoxum was 1.37 Gb, with a contig N50 value of 1.54 Mb. Of the sequences, 95.75% were anchored to 19 pseudochromosomes. There were 30,044 genes predicted in the D. chrysotoxum genome. Two whole-genome polyploidization events occurred in D. chrysotoxum. In terms of the second event, whole-genome duplication (WGD) was also found to have occurred in other Orchidaceae members, which diverged mainly via gene loss immediately after the WGD event occurred; the first duplication was found to have occurred in most monocots (tau event). We identified sugar transporter (SWEET) gene family expansion, which might be related to the abundant medicinal compounds and fleshy stems of D. chrysotoxum. MADS-box genes were

1234567890():,; 1234567890():,; 1234567890():,; 1234567890():,; identified in D. chrysotoxum, as well as members of TPS and Hsp90 gene families, which are associated with resistance, which may contribute to the adaptive evolution of orchids. We also investigated the interplay among carotenoid, ABA, and ethylene biosynthesis in D. chrysotoxum to elucidate the regulatory mechanisms of the short flowering period of orchids with yellow flowers. The reference D. chrysotoxum genome will provide important insights for further research on medicinal active ingredients and breeding and enhances the understanding of orchid evolution.

Introduction of Orchidaceae, Dendrobium encompasses ~1450 species With more than 25,000 species, Orchidaceae is the with fleshy stems3. Many species of Dendrobium have largest angiosperm family1 and comprises 8–10% of high medicinal and commercial value, and the main – flowering . Orchids are renowned for their specia- medicinal active ingredients are in the stems4 9. There- lized flowers, which have a very wide variety of growth fore, studying the molecular mechanism of these active forms, and have been successful colonizers of a wide ingredients and breeding cultivars with increased contents variety of different habitats2. As one of the largest genera of natural products are the main objectives in Den- drobium scientific research and industrialization10. Guchui Shihu (鼓槌石斛) Dendrobium chrysotoxum,a Correspondence: Tengbo Huang ([email protected])or medicinal species, is listed in the Chinese Pharmacopoeia Shan-Ce Niu ([email protected]) or Zhong-Jian Liu ([email protected]) 1Guangdong Provincial Key Laboratory for Epigenetics, College of Life (2020, 2015, and 2010 edition) and contains an abundance Sciences and Oceanography, University, Shenzhen 518071, of erianin, gigantol, polysaccharides, and fluorenones, 2 – Laboratory for Orchid Conservation and Utilization, Orchid Conservation and among other compounds11 19 (Fig. 1). These compounds Research Center, The National Orchid Conservation Center, Shenzhen 518114, China show antipyretic, analgesic, antihyperglycemic, and Full list of author information is available at the end of the article

© The Author(s) 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a linktotheCreativeCommons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Zhang et al. Horticulture Research (2021) 8:183 Page 2 of 14

the functions of medicinal active ingredients, provides a reference for breeding new varieties and enhances the understanding of orchid evolution.

Results and discussion Genome sequencing and characteristics D. chrysotoxum has a karyotype of 2N = 2X = 38, with uniform chromosomes31. To completely sequence the D. chrysotoxum genome, 138.15 Gb of clean reads were generated by BGISEQ sequencing system (Supplementary Table 1). The estimated genome size was 1.38 Gb with 1.84% heterozygosity, as determined by K-mer analysis (Supplementary Fig. 1). To obtain a better assembly, PacBio technology was employed, and 132.64 Gb of Pac- Bio sequencing data were generated (Supplementary fl D. chrysotoxum Fig. 1 A owering plant. D. chrysotoxum is Table 1). The assembly size was 1.37 Gb with a corre- lithophytic on rocks or epiphytic on tree trunks with bright yellow flowers and fusiform, fleshy stems sponding contig N50 value of 1.54 Mb (Supplementary Table 2). The BUSCO32 assessment indicated that the completeness of the gene set of the assembled genome – antioxidant effects and enhance immune function11 19. was 90.3% (Supplementary Table 3). This indicates that Recently, preliminary clinical study results suggested that the D. chrysotoxum genome assembly was complete and gigantol could delay lens turbidity through the inhibition could be used for subsequent analysis. We further used of aldose reductase and aldose reductase mRNA expres- 125.96 Gb of reads from the Hi-C library. The assembled sion, which have good effects on diabetic cataracts16,17. scaffolds were ultimately clustered into 19 pseudomole- Erianin has been demonstrated to exhibit metabolic cules, which represented the 19 chromosomes in the inhibition20 and antitumor21, antiproliferative22, and haploid genome of D. chrysotoxum (Fig. 2a). The lengths antiangiogenic activity23. Moreover, it also inhibits high of the 19 pseudochromosomes ranged from 38.28 to glucose-induced retinal angiogenesis12. Polysaccharides 100.49 Mb with a scaffold N50 value of 67.80 Mb (Sup- isolated from D. chrysotoxum have potential utility in plementary Tables 4 and 5). In addition, contigs with a enhancing antioxidation, immune function, and/or length of 1.31 Gb were mapped onto the 19 pseudo- hypoglycemic activity11. The stems of D. chrysotoxum are chromosomes at a 95.75% anchor rate (Supplementary fusiform and rich in active medicinal substances, which Tables 4 and 5). The chromatin interaction data suggest makes it a suitable species for scientific research and that our Hi-C assembly is of high quality (Fig. 2b). industrial applications in Dendrobium. Compared with those of other orchid genome assemblies, With the improvement of sequencing technology and the contig N50 and scaffold N50 values of the D. chry- cost reduction, genome sequencing has become a neces- sotoxum genome were much higher (Table 1), and the sary method for obtaining comprehensive genetic infor- assembly completeness was higher than 90% (Table 1), mation and an effective method for screening candidate suggesting high genome quality and completeness. genes for specific traits, especially for identifying candi- date genes involved in the biosynthesis pathways of Gene prediction and annotation – medicinal compounds24 28. To date, only two Den- In D. chrysotoxum genome, 30,044 protein-coding genes drobium spp. genomes have been sequenced, and some were annotated (see Materials and methods; Supplementary candidate genes involved in polysaccharide metabolic Table6).Thecompletenessofthegenomewas95.64%, pathways have been identified in those two species24,29,30. indicating that the D. chrysotoxum genome annotation was However, these studies were largely limited due to their relatively complete (Supplementary Table 7). low-quality genome assemblies. Therefore, high-quality In addition to a high number of genes, the average reference genomes and additional Dendrobium species length of genes and introns was also larger in D. chryso- need to be sequenced to better understand the molecular toxum than in Phalaenopsis equestris, Gastrodia elata, mechanisms underlying the production of medicinal and D. catenatum24,33,34 and much higher than that in compounds and enable the breeding of new varieties. most other angiosperms (Supplementary Table 8). The In this study, we used PacBio sequencing and Hi-C average length of the coding DNA sequences (CDSs) in D. technology to generate a chromosome-level genome chrysotoxum was longer than those in other angiosperms, assembly. The specific genes of D. chrysotoxum were and a greater average intron length was also previously identified, which lays a foundation for further research on observed for P. equestris, G. elata, and D. Zhang et al. Horticulture Research (2021) 8:183 Page 3 of 14

a

75

50

25

25 50

100 Chr01 25 Chr19 50 75 Chr02

50 Chr03 25

25 Chr18

25

Chr04 75 50

50 75

Chr17 25

Chr05 25

50 75

50 25

Chr16

25 50

Chr06

75

Chr15

25

25 Chr14 Chr07

25 50

Chr13 Chr08 25 50

25

Chr12 25 Chr09

50 50

25 Chr11 Chr10 75

25

50 50

25

b

Fig. 2 Chromosomal features and intensity signal heat map of D. chrysotoxum chromosomes according to Hi-C output. a From inside outward: chromosome (purple), gene density (red), DNA type repeat sequence density (green), copy density (blue), and gypsy density (orange). All the data are shown with sliding windows of 500 kb, and the inner lines (green indicates the positive direction, and red indicates the opposite direction) represent syntenic blocks on homologous chromosomes. b Heat map of the intensity of the Hi-C chromosome. The heat map represents the contact matrices generated by aligning the Hi-C data to the chromosome-scale assembly of the D. chrysotoxum genome. A higher value on the scale bar indicates a higher contact frequency Zhang et al. Horticulture Research (2021) 8:183 Page 4 of 14

Table 1 Genome statistics and comparisons among orchid species whose genome has been sequenced

Species Gene number Contig N50 (bp) Scaffold N50 (bp) BUSCO assembly (%) CEGMA assembly (%)

D. chrysotoxuma 30044 1,540,953 67,798,029 90.30 – D. catenatum2 28910 51,736 1,055,340 92.46 – P. equestris33 29431 45,791 1,217,477 91.00 – A. shenzhenica2 21841 80,069 3,029,156 93.62 – D. officinale29 35567 25,122 76,489 - 91.50 aThis study

catenatum24,33,34; thus, a relatively long CDS might be a Evolution of gene families unique characteristic of Orchidaceae (Supplementary Fig. A high-confidence phylogenetic tree was constructed, 2; Supplementary Table 8). Regulatory elements are fre- and the divergence times were estimated based on 274 quently present in introns, and alternative splicing events single-copy genes from 17 different plant species (Sup- often occur among different introns and exons, diversi- plementary Fig. 5 and Supplementary Table 6). As fying the protein-coding aspect of the genome. All these expected, D. chrysotoxum was sister to D. catenatum, factors might contribute to genome structure evolution, forming an clade together with P. eques- genome size, gene function diversification, and gene tris, G. elata, and A. shenzhenica located at the bases of – expression patterns35 38. For example, intron transcrip- Orchidaceae branches (Supplementary Fig. 6). The tional delay in Drosophila is particularly important for Orchidaceae divergence was estimated to have occurred proper development of the embryo39,40. Thus, this char- 123 Mya; the divergence between subfamily Apostasioi- acteristic of orchids needs to be further analyzed and deae and subfamily Epidendroideae occurred 80 Mya; the researched. Moreover, 80 microRNAs, 1281 transfer divergence between D. chrysotoxum and D. catenatum RNAs, 2275 ribosomal RNAs, and 882 small nuclear occurred 11 Mya; and the divergence between Den- RNAs were identified in the D. chrysotoxum genome drobium and Phalaenopsis occurred 38 Mya (Fig. 3). (Supplementary Table 9). Then, the expansion and contraction of orthologous gene We estimated that the D. chrysotoxum genome com- families were analyzed. According to the results, 140 and prised 62.81% repetitive sequences (Supplementary Figs. 3 1112 gene families expanded and contracted, respectively, and 4; Supplementary Table 10), the percentage of which in the lineage leading to Orchidaceae. In D. chrysotoxum, was higher than 62% in P. equestris but lower than 78.1% 953 gene families were expanded, as opposed to 783 in D. in D. catenatum24,33. Transposable elements (TEs) are catenatum, 853 in P. equestris, 358 in G. elata, and 562 in important forms of repeats and constitute a substantial A. shenzhenica. At the same time, 1335 gene families were part of the D. chrysotoxum genome (61.22%); TEs are the contracted in D. chrysotoxum, as opposed to 644 in D. most abundant repeat subtypes in this species. In addi- catenatum, 1009 in P. equestris, 2748 in G. elata, and tion, repeats predicted de novo were much larger than 1615 in A. shenzhenica. A greater number of expanded those obtained based on Repbase11 database, suggesting gene families in D. chrysotoxum may lead to a larger that, compared with other plants species whose genome genome size than that in other sequenced orchid has been sequenced, D. chrysotoxum has many specific species2,24,33,34. repeats (Supplementary Table 10). Long terminal repeats The ancestral clade of Dendrobium had 464 expanded (LTRs) represented the highest proportion among all gene families and 216 contracted gene families. The D. subtypes of repeats, accounting for ~53.15% of the gen- chrysotoxum clade had 953 expanded gene families and ome, which was higher than the 46% for D. catenatum24 1335 contracted gene families. In the ancestral clade of (Supplementary Table 11). Dendrobium, there were 19 significantly expanded gene In addition, 27,575 (91.78%) predicted genes were families, including 236 genes from D. chrysotoxum. In the functionally annotated (Supplementary Table 12). Among D. chrysotoxum clade, 107 gene families were significantly them, 27,268 (90.76%) and 26,808 (89.23%) genes were expanded, including 1048 genes, and 43 gene families annotated to the TrEMBL and Nr databases, respectively were significantly contracted, including 59 genes. We also (Supplementary Table 12). The numbers of annotated conducted Gene Ontology (GO) enrichment analysis for genes were 22,735 (75.67%), 19,185 (63.86%), and 18,666 the expanded gene families, and the GO terms “cyto- (62.13%) in the InterPro, SwissProt, and KEGG databases, plasmic part” and “intracellular organelle” were found to respectively (Supplementary Table 12). be enriched (Supplementary Table 13). In addition, the Zhang et al. Horticulture Research (2021) 8:183 Page 5 of 14

D. catenatum 10.9 +783 / -644 +464 / -2166 D. chrysotoxum 38.9 +953 / -1335 Gene families +223 / -88 P. equestris Expansion/Contraction 51.5 +98 / -420 +853 / -1009 G. elata +140 / -1112 80.2 +358 / -2748 123.1 A. shenzhenica +11 / -438 +562 / -1615

Combined Change A. officinalis Across Lineages +1208 / -2398 B. distachyon 44.5 +557 / -709 135.4 +148 / -127 +57 / -121 O.sativa +936 / -650 53.5 +819 / -506 S. bicolor 100.4 +214 / -504 +855 / -625 A. comosus 155.3 121.8 +1171 / -1402 +6 / -421 +257 / -223 P. dactylifera 112.9 +1525 / -3397 +52 / -2 M. acuminata +4004 / -1063 181.8 S. polyrhiza +2 / -12 +499 / -4232 A. thaliana 108.9 +1793 / -1409 MRCA +376 / -134 193.2 P. trichocarpa (11252) 131.7 +591 / -594 +5057 / -277 V. vinifera +858 / -2118 A. trichopoda +634 / -1852

193 182 155 135 123 112 100 80 53 44 39 11 0 million years ago

Fig. 3 Phylogenetic tree showing divergence times and the evolution of gene families in D. chrysotoxum. The green and red numbers represent the numbers of expanded gene families and contracted gene families, respectively. The blue portions of the pie charts show that the copy numbers of gene families are constant. Divergence times are represented by light blue bars at internodes; the divergence time is at a 95% confidence interval and shown by the range of these bars. The expansion and contraction of gene families are represented by members of branches (see Materials and methods and Supplementary Fig. 6). The orange part of the pie chart at the top left represents the ratio of 11,252 gene families found in the most recent common ancestor (MRCA) that expanded or contracted during recent differentiation events bidirectional sugar transporter gene SWEET was identi- 21,881 collinear gene pairs between D. chrysotoxum and fied (Supplementary Fig. 7), whose product plays impor- P. equestris, 21,592 between D. chrysotoxum and P. aph- tant roles in sugar translocation between compartments41, rodite, 24,550 between D. chrysotoxum and D. catenatum, phloem loading for long-distance translocation42, pollen and 2800 between D. chrysotoxum and itself (Supple- nutrition43, and seed filling44. Further phylogenetic ana- mentary Table 14). Although D. chrysotoxum was lysis showed that 17 genes were expanded in clade II assembled to the chromosome level, its self-collinearity (Supplementary Fig. 7), suggesting that these SWEET was still very low compared to that of other sequenced genes might be associated with a fleshy stem that is orchid species. The collinearity between D. catenatum abundant in polysaccharides and other medicinal and D. chrysotoxum was fragmented, which may be the compounds. result of the quality of the D. catenatum genome, which was not at the chromosome level. The chromosomes of Synteny analysis and whole-genome duplication (WGD) Dendrobium and Phalaenopsis showed a good corre- Both the loss of a substantial fraction of genes and the sponding relationship, indicating that after the divergence increase in substitution rate complications were indicated of Dendrobium and Phalaenopsis, the chromosomes were by WGD in D. chrysotoxum, which is thought to have conserved, with few rearrangements. Syntenic figures occurred among different orchid species2. WGD is evi- show that the collinearity blocks were mainly in a 1:1 dent in many lineages and is a practical method for pattern, indicating that after the differentiation of D. genome expansion45. To determine the occurrence of chrysotoxum, no species-specific WGD events had WGDs in D. chrysotoxum, JCVI v0.9.1446 was used to occurred (Fig. 4; Supplementary Figs. 8–12). analyze the protein sequences of D. chrysotoxum, P. The distributions of synonymous substitutions per equestris, P. aphrodite, and D. catenatum with the default synonymous site (Ks) were estimated to infer poly- parameters and obtain collinear gene pairs. There were ploidization events that occurred in the D. chrysotoxum Zhang et al. Horticulture Research (2021) 8:183 Page 6 of 14

Fig. 4 Self-collinearity map of D. chrysotoxum (Guchui). The values on the X- and Y-axes are the numbers of cumulative genes on the 19 chromosomes

genome. There were two peaks in the distribution of Ks that the τ WGD had occurred in most monocot species45. for paralogous D. chrysotoxum genes: Ks = 1.0 and Furthermore, the peak of the Ks distribution in D. chry- 1.7–1.8 (Fig. 5). These results suggested that two poly- sotoxum was smaller than 0.2, suggesting that it originated ploidization events occurred in D. chrysotoxum. To fur- from background (tandem) duplications and likely did not ther verify the polyploidization events in D. chrysotoxum, signify additional recent WGDs2. Therefore, this study its genome was compared with that of P. equestris, A. found that D. chrysotoxum experienced two poly- shenzhenica, and D. catenatum. The peaks in Ks values ploidization events: an early WGD event was shared between both D. chrysotoxum/P. equestris and D. chryso- among all extant orchid species, and a later event that was toxum/D. catenatum were less than 1.0, suggesting that shared among most monocot species. the events occurred before the differentiation of these three species. There was a diverging peak in the Ks dis- MADS-box genes and the evolution of flowers tribution of D. chrysotoxum and A. shenzhenica at Ks = MADS-box genes are among the most important reg- 0.7–0.8, which was smaller than but close to the Ks peak ulators of plant floral development and compose major of the Orchidaceae (Ks = 1), indicating that extant orchid class of regulators mediating floral transition. In total, the species differentiated immediately after experiencing a D. chrysotoxum genome encodes 58 putative functional shared WGD event. Based on the evolution of gene MADS-box genes and 1 pseudogene (Table 2; Supple- families, species differentiation mainly occurred through mentary Table 15). Interestingly, the number of MADS- gene loss with little gene expansion, which confirmed that box genes was similar to that in other sequenced orchid the WGD event occurred in the most recent common species but smaller than that in most sequenced angios- ancestor of extant orchid species. The second peak in the perms2,24,33. D. chrysotoxum has 31 type II MADS-box Ks distributions within D. chrysotoxum (1.7–1.8) indicated genes, which is higher than that found in P. equestris (29) Zhang et al. Horticulture Research (2021) 8:183 Page 7 of 14

and A. shenzhenica (27), but smaller than that of D. genes had duplicated, as evidenced by higher seed pro- catenatum (35)2,24,33. Phylogenetic analysis (Supplemen- duction in D. chrysotoxum than in other sequenced orchid tary Fig. 13) showed that, except for those in the MIKC*, species. This must have been accompanied by duplication Bs, and OsMADS32 clades, most genes in the type II of the type I MADS-box gene Mα,asD. chrysotoxum has MADS-box clade were contracted. Bs genes are involved more Mα genes (19) than other sequenced orchid species in the differentiation and development of ovules47.InD. (Table 2), ensuring seed development. In addition, there chrysotoxum, there are four Bs members, more than the were no genes from the FLOWERING LOCUS C (FLC), number found in other sequenced orchid species. The Bs AGL12,orAGL15 clades in the D. chrysotoxum genome or other sequenced orchid genomes. In comparison with genes in the AGL12 and AGL15 clades, which are present in both rice and Arabidopsis, orthologous genes of FLC, AGL12, and AGL15 might have been specifically lost in orchids. Although AGL12-like genes (XAL1 in A. thali- D. chrysotoxum vs D. chrysotoxum (RBH) fl 48 D. chrysotoxum vs D. chrysotoxum (anchor) ana) are necessary for root development and owering , D. chrysotoxum vs A.shenzhenica D. chrysotoxum D.catenatum D. chrysotoxum and P. equestris have varying mechanisms vs 2 D. chrysotoxum vs G.elata that perform the same function , showing that D. chry- D. chrysotoxum P.equestris vs sotoxum is not a terrestrial orchid but is an epiphytic

Density orchid. The D. chrysotoxum genome has 26 putative functional type I genes and 1 pseudogene (Table 2), which might have resulted in a lower expansion rate or a higher con- traction rate compared with those of type II MADS-box genes in D. chrysotoxum (31 functional genes). Tandem gene duplication might play an important role in the 0 2 4 6 8 10 12 increasing number of type I genes in the α group (type I 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Mα), suggesting that the type I genes have mainly been Ks value duplicated on a smaller scale from more-recent duplica- 49 β Fig. 5 Distribution of Ks values in the whole paranome of D. tions . Although members of the group of type I β chrysotoxum. The Ks distributions of paralogs using the reciprocal MADS-box genes (type I M ) do exist in A. thaliana, best hit (RBH) and anchor are shown as light blue and red histograms, poplar, and rice, they are absent in the D. chrysotoxum respectively. The Ks distribution for the one-to-one orthologs of D. genome. Interactions among these type I MADS-box – – – chrysotoxum G. elata, D. chrysotoxum A. shenzhenica, D. chrysotoxum genes are essential for initiating endosperm develop- D. catenatum, and D. chrysotoxum–P. equestris are shown as green, 50 2,24,33 gray, blue, and cyan curves, respectively. RBH reciprocal best blast hit; ment ; therefore, like in other sequenced orchids , “anchor” refers to colinear regions endosperm is also absent in D. chrysotoxum.

Table 2 MADS-box genes in D. chrysotoxum, A. shenzhenica, P. equestris, D. catenatum, and Arabidopsis thaliana

Category P. equestris33 D. catenatum24 D. chrysotoxum* A. shenzhenica2 A. thaliana37

Functional Pseudo Functional Pseudo Functional Pseudo Functional Pseudo Functional Pseudo

Type II (total) 29 1 35 11 31 0 27 4 45 5 MIKCc 28 1 32 9 28 0 25 3 43 4 MIKCa 1032402120 Mδ 0000000041 Type I (total) 22 8 28 1 26 1 9 0 62 36 Mα 1061511915 02023 Mβ 00000000175 Mγ 12 2 13 0 8 1 4 0 21 8 Total 51 9 63 12 58 1 36 4 107 41 aThis study Zhang et al. Horticulture Research (2021) 8:183 Page 8 of 14

G3P+Pyruvate MEP pathway GGDP Ethylene DXS OR PSY Receptors ClpB3 Phytoene CTRs C PDS A ZISO EIN2 R O ZDS T CRTISO E CCD1 Apo-carotenoids Lycopene N LYCE LYCB E LYCB LYCB S CCD4 PGM48 α -carotene β -catotene CCD7/8 CYP97A BCH BCH1/2 SLs X β- cryptoxanthin A Xanthophyll BCH esters N CYP97C XAT/PYP1 T zeaxanthin H lutein Methionine S-adenosyl-L-methionine O VDE ZEP SAMS ACS ‘Free’ P Xanthophylls 1-aminocyclopropane-1-carboxylic acid H antheraxanthin Y ACO VDE ZEP L Ethylene L Violaxanthin CCDs S Receptors NXS Apo-carotenoids NCEDs Neoxanthin Xanthoxin (C15) CTRs SDR (ABA2) Abscisic aldehyde AAO+MoCo (ABA3)

Responsive genes EIN2 Abscisic Acid (C15)

Fig. 6 Gene expression patterns among the interactions between carotenoids, the ABA and ethylene biosynthesis pathways, and regulatory mechanisms in D. chrysotoxum (adapted from Watkins and Pogson53, Yin et al.55, Nisar et al.101, Sun et al.102, and Finkelstein96). Increased gene expression during the four developmental stages of flowers is shown by thick green arrows, while decreased gene expression is shown by thick red arrows. The enzymes or genes are indicated alongside the arrows. The dashed lines suggest that there are multiple steps. Carotenes are shown in orange, and light gold indicates xanthophylls. ABA-related genes are shown in yellow, and ethylene-related genes are shown in light green. CCDs (orange arrows) regulate carotenoid accumulation. Plastoglobule-localized metallopeptidase 48 (PGM48) is hypothesized to regulate CCD4. The orange protein regulates PSY activity, acting as a chaperone. ClpB3 (red arrow) regulates the activity of DXS, acting as an enzyme in the MEP pathway

Floral color regulatory pathway in D. chrysotoxum protein, casein lytic proteinase B3 (ClpB3), deoxy-D- The flowering time of a single flower of D. chrysotoxum xylulose 5-phosphate synthase (DXS), phytoene desatur- was ~10 days51,52, the limit of which might be associated ase, ζ-carotene isomerase, ζ-carotene desaturase, car- with yellow flower color. All photosynthetic tissues in otenoid isomerase (CRTISO), β-lycopene cyclase, ε- each of the biological kingdoms can produce car- lycopene cyclase (LYCE), β-carotene hydroxylase, car- otenoids53. More than 1100 naturally occurring car- otene ε-hydroxylase (CYP97C), zeaxanthin epoxidase otenoids (http://carotenoiddb.jp/) are involved in many of (ZEP), violaxanthin de-epoxidase, neoxanthin synthase the red, orange, and yellow colors of flowers53. These (NXS), xanthophyll acyl-transferase (XAT), plastoglobule- compounds also play important roles in photosynthesis. localized metallopeptidase 48, and carotenoid cleavage Interestingly, carotenoids function as precursors for the dioxygenase (CCD). The expression of these genes biosynthesis of abscisic acid (ABA)53. Moreover, ethylene increased with flower development, except for LYCE, plays a role in senescing flowers54. Ethylene and ABA which is targeted for downregulation during biofortifica- regulate plant growth and development55 synergistically tion, ZEP, NXS, and XAT (Supplementary Table 16; Fig. or antagonistically. We therefore analyzed the network 6), suggesting that more carotenoids and fewer xantho- involving carotenoid, ABA, and ethylene biosynthesis and phylls were produced during flowering to senescence. regulation (Fig. 6). The substrates used to produce ABA were neoxanthin Eighteen genes or gene family members in the car- and violaxanthin, and the process was regulated by nine- otenoid biosynthesis pathway and related regulatory cis-epoxy carotenoid dioxygenases (NCEDs). The bio- mechanisms were identified (Supplementary Table 16). synthesis of ABA is catalyzed by the short-chain dehy- These genes encode phytoene synthase (PSY), orange drogenase/reductase-like (SDR1) enzyme abscisic Zhang et al. Horticulture Research (2021) 8:183 Page 9 of 14

aldehyde oxidase (AAO) and molybdenum cofactor of copalyl diphosphate synthase. Gymnosperm-specific (MoCo). The expression of NCED and SDR genes TPS-d performs several functions, such as those of increased gradually with the development of flowers, diterpene, monoterpene, and sesquiterpene synthases. while the expression of AAO and MoCo genes decreased TPS-e/f encodes copalyl diphosphate/kaurene synthases, gradually (Supplementary Table 17; Fig. 6). Furthermore, which are critical enzymes for gibberellic acid production. there were four AAO gene members detected in Arabi- Another angiosperm-specific TPS, TPS-g, encodes dopsis, while there was only one gene detected in D. monoterpene synthase enzymes that lack the R(R)X8W chrysotoxum (Supplementary Fig. 14). Taken together, motif. TPS-h has been observed only in Selaginella – these findings might indicate that relatively low amounts moellendorffii58 61. Phylogenetic analysis of the TPS gene of ABA (C15) were produced, which might improve family members and their expression in bud formation ethylene biogenesis. and initial flower opening, blooming, and withering are For ethylene biogenesis, genes encoding three kinds of shown in Fig. 7. In this study, the TPS gene number in D. enzymes were identified. The expression of Maker79017, chrysotoxum was 48, which was greater than that in D. encoding S-AdoMet synthetase (SAMS), Maker75695 and catenatum (42) (Fig. 7a). Moreover, there were 14 and 21 Maker66290, encoding ACC synthase (ACS), and genes in A. shenzhenica and P. equestris, respectively. The Maker29641, encoding ACC oxidase (ACO), increased TPS-b subfamily can be divided into monocot and eudicot gradually, suggesting that increased amounts of ethylene clades. More D. chrysotoxum TPS genes than D. catena- were produced during the development of flowers (Sup- tum ones clustered in the monocot A clade—14 (red gene plementary Table 18; Fig. 6). CONSTITUTIVE TRIPLE ID) and 4 (blue gene ID), respectively (Fig. 7a). Fewer TPS RESPONSE 1 and ETHYLENE INSENSITIVE 2 regulate genes were found in D. chrysotoxum than in D. catenatum the interaction between ethylene and the ABA pathway in the monocot B clade—7 (red gene ID) and 10 (blue and are partially dependent on the MHZ5/CRTISO- gene ID), respectively (Fig. 7a). The different distribution mediated ABA pathway in rice55. Therefore, we also patterns might contribute to the difference in terpenoid analyzed the expression patterns of the two genes in D. compositions between these two species, which needs chrysotoxum, but there were no obvious differences in any further validation. of the four stages of flower development (Supplementary To explore heat stress-related genes in D. chrysotoxum, Table 18; Fig. 6). we also analyzed heat stress-related gene families across In conclusion, carotenoid production increased gradu- orchid species. Only two Hsp90 genes (red gene ID) were ally, and the content of xanthophylls decreased gradually identified (clustering in group III), with high expression in yellow D. chrysotoxum flowers during flowering to during bud formation (Supplementary Fig. 15a, b). This senescence. Less xanthophyll was degraded into less ABA, number was lower than that for the other four species and less ABA led to more ethylene being produced. As a (six were identified in D. catenatum, seven in P. equestris, result, yellow flowers of D. chrysotoxum generally have a six in A. shenzhenica, and seven in A. thaliana). This relatively short flowering period. large gene loss might be related to resistance to heat stress. Identification of the terpene synthases (TPS) and Hsp90 gene families and adaptive evolution Conclusion Dendrobium spp., with epiphytic or lithophytic life- Although D. chrysotoxum has high ornamental and styles, frequently experience adverse environmental con- medicinal value, further molecular mechanism research ditions, such as chilling and water deficit56. During plant and development of medicinal compounds have been responses to environmental stresses, volatile terpenes play limited by a lack of omics data. In this study, a critical roles56. Moreover, terpenes also play an important chromosome-level reference genome of D. chrysotoxum role in the formation of orchid floral scents56. TPSs are with an assembled genome size of 1.37 Gb and 30,044 the key enzymes involved in terpene biosynthesis57. Dif- annotated protein-coding genes was obtained. Ks analysis ferent sizes of TPS families and subfamilies in plant spe- suggested that two polyploidization events occurred in D. cies have evolved to synthesize a specific set of terpene chrysotoxum: a recent WGD shared among other orchid compounds57. There are seven subfamilies in the TPS species and an ancient polyploidization event shared family: TPS-a, TPS-b, TPS-c, TPS-d, TPS-e/f, TPS-g, and among most monocots (τ event). Phylogenetic analysis of TPS-h57. Among them, TPS-a encodes a sesquiterpene the SWEET gene family in D. chrysotoxum showed that synthase that is found in both dicotyledonous and gene expansion occurred in clade II of the SWEET gene monocotyledonous plants. Angiosperm-specific TPS-b family, which might be related to fleshy stems containing encodes a monoterpene synthase with an R(R)X8W motif an abundance of polysaccharides. Floral color regulation that catalyzes the isomerization cyclization reaction. TPS- analysis showed that fewer xanthophylls degraded into c belongs to the ancestral clade and catalyzes the activity ABA, which led to more ethylene production, thus Zhang et al. Horticulture Research (2021) 8:183 Page 10 of 14

Fig. 7 Analysis of TPS genes in D. chrysotoxum.aPhylogenetic analysis of TPS genes in D. chrysotoxum, D. catenatum, A. shenzhenica, A. thaliana, and P. equestris. Ash A. shenzhenica, Maker D. chrysotoxum, Dca D. catenatum, Peq P. equestris,ATA. thaliana. b Expression patterns of TPS genes in buds and in the initial flower opening, blooming, and withering stages of D. chrysotoxum accelerating the senescence of D. chrysotoxum flowers. critical for industrial development. Our results provide The analysis of D. chrysotoxum helped elucidate the the first high-quality genome of Dendrobium and give mechanism through which fleshy stems produce an important insights into the molecular mechanism abundance of polysaccharides and other medicinal com- underlying the production of medicinal active ingredients, pounds, as well as flowering time regulation, which is breeding, and orchid evolution. Zhang et al. Horticulture Research (2021) 8:183 Page 11 of 14

Materials and methods the interaction signals generated by the valid mapped read DNA preparation and sequencing pairs between each bin. Fresh leaves of wild D. chrysotoxum were collected for genome sequencing. A modified cetyltrimethylammonium Genome annotation bromide protocol was used to extract the genomic DNA. Repetitive sequences are an important part of a genome To estimate genome size and heterozygosity, 143.78 Gb of and are divided into two types, namely, tandem repeats raw data from paired-end libraries (PE150) constructed and interspersed repeats. Two methods, de novo predic- from a MGISEQ-2000 sequencer were generated. After tion and homology-based searches, were used to annotate data filtering was carried out by SOAPnuke v1.6.5 software repeat sequences in the genome. RepeatMasker v4.0.7 and with the parameters -n 0.02 -l 20 -q 0.4 -Q 2 -i -G RepeatProteinMask v4.0.7 software69 (http://www. --seqType 0 –rmdup, clean data (138.15 Gb) were repeatmasker.org) were used to identify repetitive obtained (Supplementary Table 1). Then, a SMRTbell sequences based on the Repbase v21.12 database69 (http:// Template Prep Kit 1.0 (PacBio, Menlo Park, CA, USA) and www.girinst.org/repbase). For de novo prediction, a a PacBio Sequel system were used to construct and repetitive sequence database was constructed using sequence the DNA libraries, respectively, for PacBio long- RepeatModeler v1.0.870 and LTR_FINDER v1.0671. read sequencing. A total of 132.64 Gb of sequencing data RepeatMasker software and Tandem Repeats Finder (coverage of 96.12%) were generated, with an N50 read v4.0972 were subsequently used to predict repeat length of 19.5 kb (Supplementary Table 1). Furthermore, sequences and identify tandem repeats in the genome, all libraries with a 500 bp insert size were sequenced on a respectively. The annotation of high-quality protein-cod- NovaSeq platform (2 × 150 bp). We ultimately produced ing genes was carried out by integrating homology-based, 169.25 Gb of data and 125.96 Gb of clean data for Hi-C de novo and transcriptome-based predictions. For analysis. The transcriptomes of flowers of D. chrysotoxum homology-based prediction, protein sequences from six were obtained from Huang’s doctoral thesis62 to assist species (Arabidopsis thaliana, Oryza sativa, Sorghum gene annotation. bicolor, Zea mays, G. elata, and P. equestris) were used to align D. chrysotoxum genome sequences via Exonerate Genome assembly v2.2.073. Then, the complete sequences of 3000 genes Genome size and heterozygosity were measured using from the homology-based prediction method were used to Jellyfish v.2.2.6 and GenomeScope (http://qb.cshl.edu/ produce a training model through Augustus v3.2.374 and genomescope)63 based on a 17-K-mer distribution. Canu64 SNAP v2006-07-2875 software. The RNA-seq data of D. wasusedtoassemblethePacBiosequencingreads,withthe chrysotoxum were mapped to genome sequences through following parameters: minOverlapLength = 700; minRea- HISAT2 and StringTie software76,77. Finally, Maker dLength = 1000; and corOutCoverage = 50. Then, Arrow v2.31.878 was used to annotate and integrate the results software was used to polish the assembly, and Pilon v1.2365 generated by the above methods. BUSCO v332 was then was further used for correction of the assembly based on used to evaluate the completeness and quality of the gene short reads, with the following parameters: fixed bases; models. mindepth 10; minqual 20; and diploid. Finally, the com- Functional annotation of the predicted gene models was pleteness and quality of the final assembled genome were carried out by BLAST v2.2.3179 software and aligned evaluated with BUSCO v332. against the contents of the SwissProt80, TrEMBL (http:// www.uniprot.org/), KEGG (http://www.genome.jp/kegg/), Hi-C library construction and chromosome assembly InterPro81, Nr, and GO (The Gene Ontology Consortium) The raw reads produced by the NovaSeq sequencing databases82. For noncoding RNA annotations, tRNAscan- platform were filtered by SOAPnuke66 (v1.6.5, https:// SE 1.3.1 (http://lowelab.ucsc.edu/tRNAscan-SE/)83 was github.com/BGI-flexlab/SOAPnuke) software with the used to annotate tRNA sequences. BLASTN79 was used to following parameters: -n 0.02 -l 20 -q 0.4 -i –rmdup. search for rRNA, and miRNA and snRNA sequences were Then, the obtained clean reads were compared with the predicted by Infernal 1.1 (http://infernal.janelia.org/) preassembled contigs using Juicer67 software. After fil- software84. tering the results and removing the misaligned reads, 3D- DNA68 software was used to preliminarily cluster, WGD analysis sequence, and direct the pseudochromosomes. Juicer-box Ks distribution analysis was used to infer the occurrence was used to adjust, reset, and cluster the pseudochro- of WGD events in D. chrysotoxum and those between D. mosomes to improve the chromosome assembly quality. chrysotoxum and A. shenzhenica, D. catenatum, and P. For the evaluation of the Hi-C assembly results, the final equestris. BLASTP79 was used to search for putative pseudochromosome assemblies were divided into 100 kb paralogous and orthologous genes within and between bins of equal lengths, and a heat map was used to visualize genomes by alignment of each genome pair. MCScanX Zhang et al. Horticulture Research (2021) 8:183 Page 12 of 14

v1.5.285 was used to identify colinear regions, and then The sequences of genes encoding SAMS, ACS, and CodeML in the PAML package86 was used to calculate the ACO, all of which are involved in the ethylene biosynth- Ks value of each salicoid duplicated gene pair. We used esis pathway, in Arabidopsis97 were used as queries for CAFÉ87 to evaluate the significance of each expanded and searching proteins by BLASTP79 software. contracted gene family (P < 0.01). For gene families, a phylogenetic tree was constructed with PhyML98 based on the alignment of sequences from SWEET gene family analyses D. chrysotoxum, D. catenatum, A. shenzhenica, P. equestris, To identify SWEET proteins, proteomic datasets of four and A. thaliana. The tree was generated by the maximum orchid species (D. chrysotoxum, A. shenzhenica, D. cate- likelihood method based on the Jones–Taylor–Thornton natum, and P. equestris) and A. thaliana were con- (JTT) matrix-based model99, and the fast likelihood-based structed. The MtN3_slv domain PF03083 model profile method was used for phylogenetic tests with SH-like from the Pfam database88 was used for performing local branch supports. searches of proteome datasets containing five species via the HMMER program89. The SWEET protein sequences Gene expression analysis were aligned with MAFFT90. The alignment was then Transcriptome data from flowers at four developmental used for phylogenetic tree reconstruction by PhyML stages (flower buds, initial flowering stage, blooming – 3.091 93 with the default parameters. period, and withering flowers), stems, and leaves were obtained (BioProject PRJNA691441), and Salmon MADS-box gene family analysis v1.3.0100 was used to quantify gene expression, with the The sequences of the MADS-box proteins of A. thali- default settings. ana and the HMM profile (PF00319) were obtained from the Arabidopsis information resource (TAIR) (https:// TPS and Hsp90 gene family identification www.arabidopsis.org/) and the Pfam database88, respec- The HMM profiles for PF01397 (Terpene_synth) and tively. Then, the sequences of the MADS-box gene family PF03936 (Terpene_synth_C) were downloaded from the members in the D. chrysotoxum genome were obtained Pfam database (pfam.xfam.org/), and both profiles were using HMMER 3.2.1 software89 and BLASTP83 methods. used to carry out HMM searches against the information The obtained amino acid sequences were used for of the protein databases for five species (D. chrysotoxum, TBLASTN79 analysis of the D. chrysotoxum tran- Dendrobium catenatum, P. equestris, Apostasia shenzhe- scriptomic assemblies. SMART94 was subsequently used nica, and A. thaliana). The sequences aligned with to confirm the obtained sequences by domain analysis. MAFFT90 were used for phylogenetic tree construction MEGA X95 was then used for the alignment of the can- through PhyML79. The tree was generated by the max- didate genes, and the CIPRES website (https://www.phylo. imum likelihood method based on the JTT matrix-based org/portal2/) was used for phylogenetic tree construction. model99 and the bootstrap method for phylogenetic tests iTOL (https://itol.embl.de) was subsequently used to with 1000 replications. Similarly, the HMM profile for visualize the phylogenetic trees. PF00183 (Hsp90) was downloaded from the Pfam data- base (pfam.xfam.org/), and the subsequent steps were the Identification of genes involved in the carotenoid, ABA, same as those for TPS gene family identification. and ethylene biosynthesis pathways and regulatory mechanisms in D. chrysotoxum Acknowledgements This project was supported by the Innovation Research Team The sequences of all 17 genes or gene family members Fund (2014ZT05S078); National Natural Science Foundation of China (grants involved in the carotenoid biosynthesis pathway and reg- 31571252 and 31772322); Guangdong Special Support Program for Young ulatory mechanisms in A. thaliana, Triticum aestivum,and Talents in Innovation Research of Science and Technology (2019TQ05N940); 53 Shenzhen Peacock Grant (827/000189); Science and Technology Program of Pantoea ananatis were used as queries to search against Guangdong Province, China (2019B121202006); Program of Forestry the protein database of D. chrysotoxum. The obtained amino Administration of Guangdong Province (E036011002); Department for Wildlife acid sequences were aligned using MAFFT90.Wethen and Forest Plant Protection of the National Forest and Grassland Administration (2019073010); National Key Research and Development manually inspected the aligned sequences and removed any Program of China (No. 2018YFD1000400); Special Research Foundation of obviously inconsistent sequences. Hebei Agricultural University (YJ201848); and Natural Science Foundation of Four genes or gene family members involved in the Hebei Province (C2019204295). ABA biosynthesis pathway or regulatory mechanisms in 96 79 Author details Arabidopsis were obtained . BLASTP was used to 1Guangdong Provincial Key Laboratory for Plant Epigenetics, College of Life search for homologous genes by querying the protein Sciences and Oceanography, , Shenzhen 518071, China. 2 database of D. chrysotoxum. After aligning the amino acid Laboratory for Orchid Conservation and Utilization, Orchid Conservation and 90 Research Center, The National Orchid Conservation Center, Shenzhen 518114, sequences with MAFFT software, we removed any China. 3School of Food Science and Technology, University, Foshan obviously inconsistent sequences. 528225, China. 4Key Laboratory of National Forestry and Grassland Zhang et al. Horticulture Research (2021) 8:183 Page 13 of 14

Administration for Orchid Conservation and Utilization at College of Landscape 18. Ren,Z.Y.etal.Functionalanalysisof a novel C-glycosyltransferase in the Architecture, Fujian Agriculture and Forestry University, Fuzhou 350002, China. orchid Dendrobium catenatum. Hortic. Res. 7,111(2020). 5College of Forestry, Fujian Agriculture and Forestry University, Fuzhou 350002, 19. Robles-Rivera, R. R. et al. Adjuvant therapies in diabetic retinopathy as an China. 6PubBio-Tech, Wuhan 430070, China. 7Key Laboratory of Plant Resources early approach to delay its progression: the importance of oxidative stress Conservation Sustainable Utilization, South China Botanical Garden, Chinese and inflammation. Oxid.Med.CellLongev.2020,1–23 (2020). Academy of Sciences, 510650, China. 8College of Biological 20. Gong, Y. et al. Erianin induces a jnk/sapk-dependent metabolic inhibition in Sciences and Technology, Beijing Forestry University, Beijing 100083, China. human umbilical vein endothelial cells. In Vivo 18,223–228 (2004). 9College of Horticulture, Hebei Agricultural University, Baoding 071000, China 21. Cushman, M. et al. Synthesis and evaluation of stilbene and dihydrostilbene derivatives as potential anticancer agents that inhibit tubulin polymerization. Data availability J. Med. Chem. 34,2579–2588 (1991). All the data from this study have been deposited in the NCBI database under 22. Ma, G. et al. Iuhibitory effects of Dendrobium chrysotoxutm and its constiuents BioProject ID PRJNA664445. on the mouse HePA and ESC. J. China Pharm. Univ. 25,188–189 (1994). 23. Gong, Y. Q. et al. In vivo and in vitro evaluation of erianin, a novel anti- – Conflict of interest angiogenic agent. Eur. J. Cancer 40, 1554 1565 (2004). The authors declare no competing interests. 24. Zhang, G. et al. The Dendrobium catenatum Lindl. genome sequence pro- videsinsightsintopolysaccharidesynthase,floral development and adaptive evolution. Sci. Rep. 6, 19029 (2016). 25. Hu, M. J. et al. Chromosome-scale assembly of the Kandelia obovata gen- Supplementary information The online version contains supplementary ome. Hortic. Res. 7,75(2020). material available at https://doi.org/10.1038/s41438-021-00621-z. 26. Kang, M. et al. A chromosome-scale genome assembly of Isatis indigotica,an important medicinal plant used in traditional Chinese medicine: an Isatis Received: 6 January 2021 Revised: 23 April 2021 Accepted: 1 June 2021 genome. Hortic. Res. 7, 18 (2020). 27. Chen, F. et al. The sequenced angiosperm genomes and genome databases. Front. Plant Sci. 9, 418 (2018). 28. Chen, F. et al. Genome sequences of horticultural plants: past, present, and future. Hortic. Res. 6, 112 (2019). References 29. Yan,L.etal.ThegenomeofDendrobium officinale illuminates the biology of 1. Christenhusz, M. J. M. & Byng, J. W. The number of known plants species in the important traditional Chinese orchid herb. Mol. Plant 8,922–934 (2015). the world and its annual increase. Phytotaxa 261, 201–217 (2016). 30. Si, C. et al. DoRWA3 from Dendrobium officinale plays an essential role in 2. Zhang, G. et al. The Apostasia genome and the evolution of orchid. Nature acetylation of polysaccharides. Int. J. Mol. Sci. 21, 6250 (2020). 549,379–383 (2017). 31. Zheng, S. G. et al. Genome-wide researches and applications on Dendrobium. 3.Pridgeon,A.M.etal.Genera Orchidacearum: Epidendroideae (Part three) Planta 248,769–784 (2018). (Oxford University, 2014). 32.Simão,F.A.etal.BUSCOonlinesupplementary information: assessing 4. Luo, J.-P., Deng, Y.-Y. & Zha, X.-Q. Mechanism of polysaccharides from genome assembly and annotation completeness with single-copy orthologs. Dendrobium huoshanense. on streptozotocin-induced diabetic cataract. Bioinformatics 31,3210–3212 (2015). Pharm. Biol. 46,243–249 (2008). 33. Cai,J.etal.ThegenomesequenceoftheorchidPhalaenopsis equestris. Nat. 5. Leitch, I. J. et al. Genome size diversity in orchids: consequences and evo- Genet. 47,65–72 (2015). lution. Ann. Bot. 104, 469–481 (2009). 34. Yuan, Y. et al. The Gastrodia elata genome provides insights into plant 6. Ng,T.B.etal.ReviewofresearchonDendrobium, a prized folk medicine. adaptation to heterotrophy. Nat. Commun. 9, 1615 (2018). Appl. Microbiol. Biotechnol. 93,1795–1803 (2012). 35. Sena, J. S. et al. Evolution of gene structure in the conifer Picea glauca:a 7. Pan, L. H. et al. Comparison of hypoglycemic and antioxidative effects of comparative analysis of the impact of intron size. BMC Plant Biol. 14,95 polysaccharides from four different Dendrobium species. Int. J. Biol. Macromol. (2014). 64, 420–427 (2014). 36. DeLaTorre,A.R.etal.Insights into conifer giga-genomes. Plant Physiol. 166, 8.Tian,C.-C.,Zha,X.-Q.&Luo,J.-P.ApolysaccharidefromDendrobium 1724–1732 (2014). huoshanense prevents hepatic inflammatory response caused by carbon 37. Castillo-Davis, C. I. et al. Selection for short introns in highly expressed genes. tetrachloride. Biotechnol. Biotechnol. Equip. 29, 132–138 (2014). Nat. Genet. 31, 415 (2002). 9. Huang, K. et al. Purification, characterization and biological activity of poly- 38. Keane, P. A. & Seoighe, C. Intron length coevolution across mammalian saccharides from Dendrobium officinale. Molecules 21, 701 (2016). genomes. Mol. Biol. Evol. 33,2682–2691 (2016). 10. Lu, J. et al. High-density genetic map construction and stem total poly- 39. Swinburne, I. A. & Silver, P. A. Intron delays and transcriptional timing during saccharide content-related QTL exploration for Chinese endemic Den- development. Dev. Cell 14,324–330 (2008). drobium (Orchidaceae). Front. Plant Sci. 9, 398 (2018). 40. Artieri, C. G. & Fraser, H. B. Transcript length mediates developmental 11. Zhao, Y. et al. Antioxidant and anti-hyperglycemic activity of polysaccharide timing of gene expression across Drosophila. Mol. Biol. Evol. 31, isolated from Dendrobium chrysotoxum Lindl. J. Biochem. Mol. Biol. 40, 2879–2889 (2014). 670–677 (2007). 41. Lin,I.W.etal.Nectarsecretionrequiressucrosephosphatesynthasesandthe 12. Li, S. et al. Elution-extrusion counter-current chromatography separation of sugar transporter SWEET9. Nature 508,546–549 (2014). five bioactive compounds from Dendrobium chrysototxum Lindl. J. Chro- 42. Chen, L. Q. et al. Sucrose efflux mediated by SWEET proteins as a key step for matogr. A 1218,3124–3128 (2011). phloem transport. Science 335,207–211 (2012). 13. Hu,J.,Fan,W.,Dong,F.,Miao,Z.&Zhou,J.ChemicalcomponentsofDen- 43. Sun,M.X.etal.Arabidopsis RPG1 is important for primexine deposition and drobium chrysotoxum. Chin. J. Chem. 30, 1327–1330 (2012). functions redundantly with RPG2 for plant fertility at the late reproductive 14. Yu, Z. et al. Dendrobium chrysotoxum Lindl. alleviates diabetic retinopathy by stage. Plant Reprod. 26,83–91 (2013). preventing retinal inflammation and tight junction protein decrease. J. 44. Chen, L. Q. et al. A cascade of sequentially expressed sucrose transporters in Diabetes Res. 2015, 518317 (2015). the seed coat and endosperm provides nutrition for the Arabidopsis embryo. 15. Yu, Z. et al. Erianin inhibits high glucose-induced retinal angiogenesis via Plant Cell 27,607–619 (2015). blocking ERK1/2-regulated HIF-1alpha-VEGF/VEGFR2 signaling pathway. Sci. 45. Jiao, Y., Li, J., Tang, H. & Paterson, A. H. Integrated syntenic and phylogenomic Rep. 6, 34306 (2016). analyses reveal an ancient genome duplication in monocots. Plant Cell 26, 16. Wu, J. et al. Gigantol from Dendrobium chrysotoxum Lindl. binds and inhibits 2792–2802 (2014). aldose reductase gene to exert its anti-cataract activity: an in vitro 46. Tang, H. et al. jcvi: JCVI Utility Libraries (Zenodo, 2015). https://doi.org/10.5281/ mechanistic study. J. Ethnopharmacol. 198, 255–261 (2017). zenodo.31631. 17. Lim,V.,Schneider,E.,Wu,H.&Pang,I.H.Cataractpreventiveroleofisolated 47. Sang, X. et al. CHIMERIC FLORAL ORGANS1, encoding a monocot-specific phytoconstituents: findings from adecade of research. Nutrients 10,1580 MADS box protein, regulates floral organ identity in rice. Plant Physiol. 160, (2018). 788–807 (2012). Zhang et al. Horticulture Research (2021) 8:183 Page 14 of 14

48. Tapia-López, R. et al. An AGAMOUS-related MADS-box gene, XAL1 (AGL12), 73. Slater, G. S. & Birney, E. Automated generation of heuristics for biological regulates root meristem cell proliferation and flowering transition in Arabi- sequence comparison. BMC Bioinform. 6, 31 (2005). dopsis. Plant Physiol. 146,1182–1192 (2008). 74. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. 49. Parenicová,̆ L. et al. Molecular and phylogenetic analyses of the complete Nucleic Acids Res. 34,W435–W439 (2006). MADS-box transcription factor family in Arabidopsis:newopeningstothe 75. Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation MADS world. Plant Cell 15,1538–1551 (2003). of proxy SNPs using HapMap. Bioinformatics 24,2938–2939 (2008). 50. Masiero,S.etal.TheemergingimportanceoftypeIMADSboxtranscription 76. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome factors for plant reproduction. Plant Cell 23,865–872 (2011). from RNA-seq reads. Nat. Biotechnol. 33,290–295 (2015). 51. Niu,S.C.etal.Morphologicaltypeidentification of self-incompatibility in 77. Kim, D. et al. Graph-based genome alignment and genotyping with HISAT2 Dendrobium and its phylogenetic evolution pattern. Int. J. Mol. Sci. 19,2595 and HISAT-genotype. Nat. Biotechnol. 37,907–915 (2019). (2018). 78. Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome database 52. Wu, R. H. Studies on the Germplasm Resources of Dendrobium in China and management tool for second generation genome projects. BMC Bioinform. their Genetic Relationships. Doctoral dissertation, Chin. Acad. For. (2007). 12, 491 (2011). 53. Watkins, J. L. & Pogson B. J. Prospects for carotenoid biofortification targeting 79. Kent, W. J. BLAT—theBLAST-likealignmenttool.Genome Res. 12,656–664 retention and catabolism. Trends Plant Sci. 25,501–512 (2020). (2002). 54. Johnson, P. R. & Ecker, J. R. The ethylene gas signal transduction pathway: a 80. Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its sup- molecular perspective. Annu.Rev.Genet.32, 227 (1998). plement TrEMBL in 2003. Nucleic Acids Res. 31,365–370 (2003). 55. Yin, C. C. et al. Ethylene responses in rice roots and coleoptiles are differ- 81. Jones,P.etal.InterProScan5:genome-scale protein function classification. entially regulated by a carotenoid isomerase-mediated abscisic acid path- Bioinformatics 30,1236–1240 (2014). way. Plant Cell 27,1061–1081 (2015). 82. Ashburner, M. et al. Gene Ontology: tool for the unification of biology. Nat. 56. Yu, Z. et al. Genome-wide identification and expression profile of TPS gene Genet. 25,25–29 (2000). family in Dendrobium officinale and the role of DoTPS10 in linalool bio- 83. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of synthesis. Int. J. Mol. Sci. 21, 5419 (2020). transfer RNA genes in genomic sequence. Nucleic Acids Res. 25,955–964 57. Jiang,S.Y.,Jin,J.J.,Sarojam,R.&Ramachandran,S.Acomprehensivesurvey (1997). on the terpene synthase gene family provides new insight into its evolu- 84.Nawrocki,E.P.&Eddy,S.R.Infernal1.1:100-foldfasterRNAhomology tionary patterns. Genome Biol. Evol. 11, 2078–2098 (2019). searches. Bioinformatics 29,2933–2935 (2013). 58. Chen, F. The family of terpene synthases in plants: A mid-size family of genes 85. Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of for specialized metabolism that is highly diversified throughout the king- gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012). dom. Plant J. 66,212–229 (2011). 86. Yang, Z. PAML: a program package for phylogenetic analysis by maximum 59. Tholl, D. Biosynthesis and biological functions of terpenoids in plants. Adv. likelihood. Bioinformatics 13,555–556 (1997). Biochem. Eng. Biotechnol. 148,63–106 (2015). 87. De Bie, T. et al. CAFE: a computational tool for the study of gene family 60. Bohlmann, J., Meyer-Gauen, G. & Croteau, R. Plant terpenoid synthases: evolution. Bioinformatics 22, 1269–1271 (2006). molecular biology and phylogenetic analysis. Proc. Natl Acad. Sci. USA 95, 88. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids 4126–4133 (1998). Res. 47,D427–D432 (2019). 61. Jiang, S. Y. et al. Comprehensive survey on the terpene synthase gene family 89. Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 provides new insight into its evolutionary patterns. Genome Biol. Evol. 11, (2011). 2078–2098 (2019). 90. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software 62. Huang, X. L. Research on molecular regulation mechanism of forma- version 7: improvements in performance and usability. Mol. Biol. Evol. 30, tion of floral color and floral fragrance of Dendrobium chrysotoxum 772–780 (2013). based on transcriptome sequencing. Doctoral dissertation, Chin. Acad. 91. Guindon,S.,Lethiec,F.,Duroux,P.&Gascuel,O.PHYMLOnline—aweb For. (2019). server for fast maximum likelihood-based phylogenetic inference. Nucleic 63. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Acids Res. 33,W557–W559 (2005). Smudgeplot for reference-free profiling of polyploid genomes. Nat. Com- 92. Wu, S. S. et al. The genome sequence of star fruit (Averrhoa carambola). mun. 11,1–10 (2020). Hortic. Res. 7,95(2020). 64. Koren, S. et al. Canu: scalable and accurate long-read assembly via 93. Chen, S. P. et al. The Phoebe genome sheds light on the evolution of adaptive k-mer weighting and repeat separation. Genome Res .27, magnoliids. Hortic. Res. 7, 146 (2020). 722–736 (2017). 94. Letunic, I., Doerks, T. & Bork, P. SMART: recent updates, new developments 65. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial and status in 2015. Nucleic Acids Res. 43, D257–D260 (2015). variant detection and genome assembly improvement. PLoS ONE 9, e112963 95. Kumar, S. et al. MEGA X: molecular evolutionary genetics analysis across (2014). computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018). 66. Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for 96. Finkelstein, R. Abscisic Acid synthesis and response. Arabidopsis Book 11, integrated quality control and preprocessing of high-throughput sequen- e0166–e0166 (2013). cing data. GigaScience 7, 120 (2017). 97. Lin, Z., Zhong, S. & Grierson, D. Recent advances in ethylene research. J. Exp. 67. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop- Bot. 60, 3311–3336 (2009). resolution Hi-C experiments. Cell Syst. 3,95–98 (2016). 98. Guindon, S. et al. New algorithms and methods to estimate maximum- 68. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, Hi-C yields chromosome-length scaffolds. Science 356,92–95 (2017). 307–321 (2010). 69. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. 99. Jones,D.T.,Taylor,W.R.&Thornton,J.M.Therapidgenerationofmutation Cytogenet. Genome Res. 110,462–467 (2005). data matrices from protein sequences. Comput.Appl.Biosci.8,275–282 70. Price,A.L.,Jones,N.C.&Pevzner,P.A.Denovoidentification of repeat (1992). families in large genomes. Bioinformatics 21,i351–i358 (2005). 100. Patro, R. et al. Salmon provides fast and bias-aware quantification of tran- 71. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full- script expression. Nat. Methods 14,417–419 (2017). length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007). 101. Nisar, N. et al. Carotenoid metabolism in plants. Mol. Plant 8,68–82 (2015). 72. Benson, G. et al. Tandem repeats finder: a program to analyze DNA 102. Sun, T. et al. Carotenoid metabolism in plants: the role of plastids. Mol. Plant sequences. Nucleic Acids Res. 27, 573–580 (1999). 11,58–74 (2017).