Zhang et al. BMC Genomics (2021) 22:536 https://doi.org/10.1186/s12864-021-07852-3

RESEARCH ARTICLE Open Access Genome sequence of Apostasia ramifera provides insights into the adaptive evolution in orchids Weixiong Zhang1†, Guoqiang Zhang2,3,4†, Peng Zeng1†, Yongqiang Zhang2,3,4,5, Hao Hu1, Zhongjian Liu5 and Jing Cai6*

Abstract Background: The family is one of the most diverse among flowering and serves as an important research model for evolution, especially “evo-devo” study on floral organs. Recently, sequencing of several orchid genomes has greatly improved our understanding of the genetic basis of orchid biology. To date, however, most sequenced genomes are from the Epidendroideae subfamily. To better elucidate orchid evolution, greater attention should be paid to other orchid lineages, especially basal lineages such as . Results: Here,wepresentagenomesequenceofApostasia ramifera, a terrestrial orchid species from the Apostasioideae subfamily. The genomes of A. ramifera and other orchids were compared to explore the genetic basis underlying orchid species richness. Genome-based population dynamics revealed a continuous decrease in population size over the last 100 000 years in all studied orchids, although the epiphytic orchids generally showed larger effective population size than the terrestrial orchids over most of that period. We also found more genes of the terpene synthase gene family, resistant gene family, and LOX1/LOX5 homologs in the epiphytic orchids. Conclusions: This study provides new insights into the adaptive evolution of orchids. The A. ramifera genome sequence reported here should be a helpful resource for future research on orchid biology. Keywords: Orchidaceae, Apostasia ramifera, Comparative genomics, Adaptive evolution

Background i.e., Apostasioideae, Vanilloideae, Cypripedioideae, Epi- The Orchidaceae family is one of the largest among dendroideae, and Orchidoideae. It has been proposed flowering plants, with many species exhibiting great or- that whole-genome duplication occurred in the ancestor namental value due to their colorful and distinctive of all orchid species, which contributed to their survival flowers. At present, there are more than 28 000 orchid under significant climatic change [2, 3]. Orchids are a di- species assigned to 763 genera [1]. According to their verse and widespread family of flowering plants. Notably, phylogeny, orchids can be divided into five subfamilies, several orchid species with specialized floral structures, such as labella and gynostemia, appear to have co- evolved with animal pollinators to facilitate reproductive * Correspondence: [email protected] Weixiong Zhang, Guoqiang Zhang, and Peng Zeng are co-first authors success. In addition to their role in research on evolution †Weixiong Zhang, Guoqiang Zhang and Peng Zeng contributed equally to and pollination biology, orchids are invaluable to the this work. horticultural industry due to their elegant and distinctive 6School of Ecology and Environment, Northwestern Polytechnical University, 710129 Xi’an, China flowers [4]. Full list of author information is available at the end of the article

© The Author(s). 2021 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Zhang et al. BMC Genomics (2021) 22:536 Page 2 of 12

The genome sequences of several orchid species have Table 1 Statistics related to A. ramifera genome assembly been published recently, thereby greatly improving our Feature Summary understanding of orchid biology and evolution. The first Genome Size 365 588 417 bp reported orchid genome (Phalaenopsis equestris) showed Scaffold N50 287 449 bp evidence of an ancient whole-genome duplication event Contig N50 30 765 bp in the orchid lineage and revealed that expansion of MADS-box genes may be related to the diverse morph- Longest Scaffold 1 388 560 bp ology of orchid flowers [2]. The subsequent publication GC Rate 33.38 % of other orchid genome sequences, such as that of Repeat Content 44.99 % Dendrobium officinale, Dendrobium catenatum, Phalae- BUSCO Assessment 94.9 % nopsis aphrodite, Apostasia shenzhenica, and Vanilla Gene Number 22 841 planifolia, has provided data for further investigations on the genetic mechanisms underlying orchid species richness [3, 5–8]. orthologs were found in our assembly (Additional file 1, The Apostasioideae subfamily consists of terrestrial or- Table S3). chid species [9]. Species within Apostasioideae exhibit various primitive traits, such as radially symmetrical flowers and no labella, supporting the placement of this Genome annotation subfamily as a sister clade to all other orchids [10]. These Using both de novo and library-based repetitive sequence primitive features are considered ancient characteristics of annotation, 164.49 Mb of repetitive elements were un- the orchid lineage [10]. Thus, Apostasioideae species can covered, accounting for 44.99 % of the total assembly serve as an important outgroup for evolutionary study of (Additional file 1, Table S4). The proportion of repetitive all other orchid subfamilies. Recently, Zhang et al. [3]pub- DNA in A. ramifera was similar to that in A. shenzhe- lished the A. shenzhenica genome and identified an nica (43.74 %) but less than that in P. equestris (62 %) orchid-specific whole-genome duplication event as well as and D. catenatum (78 %). Among the repetitive se- changes in the MADS-box gene family associated with dif- quences, transposable elements (TEs) were the most ferent orchid characteristics. This is the first (and only) abundant (43.1 %), among which long terminal repeats genome reported for the Apostasioideae subfamily, with (LTR) were dominant, accounting for 24.07 % of the most currently published genomes belonging to the Epi- total genome (Additional file 1, Table S5 and Fig. S1). dendroideae subfamily. Obtaining genomes for other or- The protein-coding gene models were predicted chid lineages, especially basal lineages, will greatly through a combination of de novo and homology-based facilitate our understanding of orchid evolution. Here, we annotation. In total, 22 841 putative genes were identified performed de novo assembly and analysis of the Apostasia in the A. ramifera genome, similar to that in A. shenzhe- ramifera genome sequence, the second Apostasia genome nica (21 831) but less than that in V. planifolia (28 279), after A. shenzhenica. Comparative genomics were carried P. equestris (29 545), and D. catenatum (29 257) (Add- out with six other published orchid genomes to provide itional file 1, Table S6). Further functional annotation of insight into orchid evolution. the predicted genes was carried out by homology searches against various databases, including Gene Ontology (GO), Results Kyoto Encyclopedia of Genes and Genomes (KEGG), Genome sequencing and assembly SwissProt, TrEMBL, nr database, and InterPro. Results The genomic DNA of A. ramifera was sequenced using showed that 19 551 (85.6 %) predicted genes could be an- the Illumina Hiseq 2000 platform. Sequencing of five li- notated (Additional file 1, Table S7). In addition, we iden- braries with different insert sizes ranging from 250 to 5 tified 40 microRNA, 616 transfer RNA, 1 450 ribosomal 000 bp generated more than 57 Gb of clean data, account- RNA, and 108 small nuclear RNA genes in the A. ramifera ing for 156X of the genome sequence (Additional file 1, genome (Additional file 1,TableS8). Table S1). Based on the clean reads, we generated a Synteny comparison based on gene annotations of A. 365.59-Mb long assembly with a scaffold N50 of ramifera and A. shenzhenica identified 927 synteny 287.45 kb (Table 1 and Additional file 1, Table S2). To as- blocks with an average block size of 12.89 genes (Add- sess the quality of the final assembly, clean reads were itional file 1, Table S9). A total of 11 950 gene pairs were mapped to the genome sequence, resulting in a mapping covered by these synteny blocks, accounting for 61 and ratio of 99.7 %. The completeness of the gene regions in 66 % of the genome sequences of A. ramifera and A. the assembly was examined by BUSCO (Benchmarking shenzhenica, respectively (Additional file 1, Table S9). Universal Single-Copy Orthologs) assessment [11]. In The high co-linearity between their genomes suggested a total, 94.9 % (1 304/1 375) of the universal single-copy close relationship between these two species. Zhang et al. BMC Genomics (2021) 22:536 Page 3 of 12

Gene family identification MCMCTree based on our phylogeny. Results showed Gene family identification was carried out for the predicted that the Apostasia species separated from other orchids protein-coding genes in A. ramifera, together with genes 82 million years ago (Fig. 1B), consistent with previously from other species, including P. equestris, P. aphrodite, D. published results [3]. The divergence time between A. officinale, D. catenatum, A. shenzhenica, V. planifolia, As- ramifera and A. shenzhenica was estimated to be 8 mil- paragus officinalis,andOryza sativa. A total of 19 422 puta- lion years ago (Fig. 1B). Gene family expansions and tive genes in the A. ramifera assembly were assigned to 13 contractions on each phylogenetic branch of the 16 spe- 251 gene families (Fig. 1 AandAdditionalfile1, Table S10). cies were estimated using CAFE [12] (Fig. 1B). We fur- The remaining 3 419 genes could not be grouped with other ther carried out GO/KEGG enrichment analyses on the genes and were considered orphans. Among the compared significantly expanded gene families in A. ramifera and species, 266 gene families were only shared by orchid species. found some functionally enriched pathways and terms, KEGG and GO enrichment analyses of those orchid-specific including ‘Zeatin biosynthesis’ (ko00908), Glyceropho- gene families revealed various significantly enriched pathways spholipid metabolism (ko00564), ‘Flavin adenine di- and terms, including ‘Stilbenoid, diarylheptanoid and gin- nucleotide binding’ (GO:0050660), and ‘UDP-N- gerol biosynthesis’ (ko00945), ‘Zeatin biosynthesis’ (ko00908), acetylmuramate dehydrogenase activity’ (GO:0008762) ‘Flavonoid biosynthesis’ (ko00941), ‘Circadian rhythm - plant’ (Additional file 1, Table S15 and S16). In addition, the (ko04712), ‘Regulation of gene expression’ (GO:0010468), significantly contracted gene families were enriched in and ‘Aromatic compound biosynthetic process’ (GO: ‘Homologous recombination’ (ko03440), ‘Glycosphingo- 0019438) (Additional file 1,TableS11andS12).Further- lipid biosynthesis’ (ko00604), ‘Transferase activity, trans- more, a total of 1 145 gene families were specifically ex- ferring phosphorus-containing groups’ (GO:0016772), panded in Apostasia (see Methods), and were significantly and ‘Transferase activity’ (GO:0016740) (Additional file enriched in several pathways, such as ‘Ribosome biogenesis 1, Table S17 and S18). in eukaryotes’ (ko03008), ‘mRNA surveillance pathway’ (ko03015) and ‘Plant-pathogen interaction’ (ko04626) (Add- History of orchid population size itional file 1, Table S13 and S14). Population size history is important for understanding the underlying mechanisms leading to current patterns of species and population diversity [13]. Several investi- Phylogenetic analysis gations on orchid population size have been published We constructed a phylogenetic tree using MrBayes with [14, 15]. Here, the pairwise sequential Markovian coales- gene sequences of 381 single copy genes shared by 16 cent (PSMC) model, which uses the coalescent approach plant species, including A. ramifera. The divergence to estimate population size changes [13], was applied to times among these species were estimated using PAML infer population size history based on the genome

Gene Families A B Expansion /Contraction Ash Amborella trichopoda +685/-1470 Spirodela polyrhiza +556/-3892

+280/-647 Phalaenopsis equestris +616/-1363 381 42 +168/-823 Dendrobium catenatum +963/-707 82 +7/ -338 58 73 157 +323/-1079 Apostasia shenzhenica +275/-832 42 192 126 8 94 67 MRCA Apostasia ramifera +479/-858 1039 386 248 Vpl +3/-377 342 1250 (11,080) Ara Asparagus officinalis +1334/-2240 450 397 Sorghum bicolor 244 +994 /-605 +880/-582 8301 137 53 178 178 +189/-87 Brachypodium distachyon +637/-696 395 +236/-438 44 648 106 101 180 104 +126/-132 Oryza sativa +629/-869 +2 /-7 116 124 125 Ananas comosus +1219/-1323 168 801 144 +324 /-201 Musa acuminata +4069/-973 114 70 1728 118 704 +16/-0 Phoenix dactylifera +1585/-3253 1816 Vitis vinifera +400/-1333 Peq 130 +885 /-639 Populus trichocarpa +5099/-334 Dca 109 +514 /-144 Arabidopsis thaliana +1951/-1504 200 175 150 125 100 75 50 25 0 million years ago Fig. 1 Gene family and phylogenetic relationship analysis. (A) Venn diagram showing distribution of shared gene families among five orchid species, i.e., A. ramifera (Ara), A. shenzhenica (Ash), P. equestris (Peq), D. catenatum (Dca), and V. planifolia (Vpl). (B) Phylogenetic tree showing relationship and divergence times for 16 species. Purple bars at internal nodes represent 95 % confidence interval of divergence times. Numbers of expanded and contracted gene families are presented as green and red values, respectively. MRCA, most recent common ancestor Zhang et al. BMC Genomics (2021) 22:536 Page 4 of 12

sequences of seven orchid species, i.e., A. ramifera, A. belonged to the type II MADS-box clade (Fig. 3 A), shenzhenica, P. equestris, P. aphrodite, D. officinale, D. fewer again than that found in other orchids, e.g., A. catenatum, and V. planifolia. For the Apostasia species, shenzhenica (27 members) [3], V. planifolia (30 mem- population size changed between 10 000 and 250 000 bers, Additional file 1, Fig. S2A), P. equestris (29) [2], years ago, with similar population dynamics (Fig. 2). and D. catenatum (35) [5]. Compared to P. equestris, Earlier history could not be recovered because the low- there were fewer members in the A-class, B-class, E- level heterozygosity of the genome sequences of A. rami- class, and AGL6-class in A. ramifera and V. planifolia fera and A. shenzhenica provided limited information on (Additional file 1, Table S19). In contrast, there were ancient changes in population size. For the other or- more SVP-class, ANR1-class, and AGL12-class members chids, population size histories showed similar patterns, in A. ramifera and V. planifolia than in P. equestris especially D. catenatum, D. officinale, and P. equestris (Additional file 1, Table S19). (Fig. 2). First, a period of population growth was ob- Type I MADS-box transcription factors are involved in served for each of these orchid species. Then, all orchid plant reproduction and endosperm development [16]. populations experienced a severe contraction (bottle- Here, we identified seven and six type I MADS-box neck) over the last 100 000 years, from which they have genes in A. ramifera and V. planifolia, respectively not recovered (Fig. 2). During the reporting period (10 (Fig. 3B and Additional file 1, Fig. S2B and Table S19). 000 to 250 000 years ago), the Apostasia species had the Phylogenetic analysis showed that genes in the Mβ-class smallest population size compared to other orchid spe- were absent in A. ramifera and V. planifolia, (Fig. 3B cies. The population size of Vanilla was slightly higher and Additional file 1, Fig. S2B). than that of Apostasia, but lower than that of all Epiden- droideae orchids. Terpene synthase (TPS) gene family In plants, TPS family members are responsible for the biosynthesis of terpenoids, which are involved in various Gene family evolutionary analysis physiological processes in plants such as primary metab- MADS-box transcription factors olism and development [17]. The architecture of the In plants, MADS-box transcription factors are involved TPS gene family is proposed to be modulated by natural in various developmental processes, such as floral devel- selection for adaptation to specific ecological niches opment, flowering control, and root growth. All MADS- [18]. We used both terpene_synth and terpene_synth_C box gene family members are categorized as type I or domains to search for TPS genes in the orchid genomes. type II based on their gene tree. Using HMMER software A small TPS gene family size was observed in the two and a MADS-box domain profile (PF00319), we identi- Apostasia species compared with the other orchids stud- fied 30 putative MADS-box genes in the A. ramifera ied (Fig. 4). Only eight and six copies of TPS genes were genome, fewer than that detected in the other sequenced found in A. shenzhenica and A. ramifera, respectively orchids (Additional file 1, Table S19). Phylogenetic ana- (Fig. 4 and Additional file 1, Table S20). A small TPS lysis of the putative MADS-box genes revealed that 23 family size in Apostasia may indicate a loss of chemical

70 ) P. aphrodite 4 60 D. catenatum P. equestris 50 D. officinale V. planifolia 40 A. shenzhenica A. ramifera 30

20

10 Effective population size (x10 0 4 5 6 7 10 10 10 10 Years (g=4, =0.5x10-8) Fig. 2 Population size histories of seven orchid species, including P. aphrodite (yellow), D. catenatum (green), P. equestris (purple), D. officinale (dark blue), V. planifolia (pink), A. shenzhenica (light blue) and A. ramifera (red), between 10 000 and 10 million years ago. Generation times of orchids were assumed to be four years, and mutation rate per generation was 0.5 × 10− 8 Zhang et al. BMC Genomics (2021) 22:536 Page 5 of 12

A B AGL15 FLC

Ara017345 Ara014487

AGL68 AGL39 AGL56 Ara003595 AGL15 AGL55 AGL18 AGL99 AGL

FLC Ara011141 MIKC* AGL83 29 B A L GL84 97 AGL74 AGL73 AGL102 AGL91 AGL69 FLM AG AP3 MAF2 AGL57 Ara003954 AGL70 Ara005769 100 Ara020411 AGL104 86 AGL59 79 Ara005112 80 AGL66 AGL60 90 AGL67 AGL58 54 100 72 AGL100 PI 98 AGL33 59 55 50 99 AGL64 99 7 Ara005764 TT16 AGL94 70 95 5 100 99 Ara022801 AGL63 100 100 Ara012716 AGL85 100 AGL65 85 100 AGL61 SEP1 70 69 SVP AGL47 100 AGL40 70 SVP 7182 100 95 AGL24 AGL82 AGL62 SEP2 76 AGL23 99 Ara005659 100 97 66 SEP3 92 AGL43 62 8657 AGL28 E Ara016347 73 Ara000690 95 100 69 AGL75 100PHE1 Ara016558 99 PHE2 Ara000222 99 89 AGL17 AGL76 99 78 52 98 59 79 AGL35 100 56 60 Ara006720 AGL21 AGL81 75 AGL80 100 Ara009096 AGL77 86 AGL45 SEP4 100 90 99 90 AGL16 1 99 AGL46 AGL98 74 0 AGL6 ANR1 94 0 AGL86 59 83 100 62 Ara018206 AGL51 97 83 AGL92 10 90 AGL13 79 57 AGL34 61 Ara020644 AGL78 AGL90 0 72 Ara003524 99 AGL36 59 100 AGL6 56 ANR1 AGL52 Ara021968 75 99 Ara000709 101 Ara005203 Ara009203 99 AGL87 AGL12 92 Ara005262 AGL96 AGL48 98 AGL103 AGL95 99 Ara001077 STK AGL AG AGL26GL54 SHP2 SHP1 A AGL72

Ara009011 FYF AGL71 AGL12 FUL AGL89 AGL79 AGL53 AGL93 AGL49 Ara005076 CAL AGL50 AP1

AGL14 SOC1 Ara005184 AGL19 A C/D

SOC1 Fig. 3 Phylogenetic analysis of MADS-box genes in A. ramifera. (A) Type II MADS-box genes. (B) Type I MADS-box genes. Neighbor-joining gene trees were constructed using MADS-box genes from A. ramifera and Arabidopsis. Genes from A. ramifera are marked in red. Different MADS-box classes are indicated. Numbers above branches are bootstrap support values of at least 50 diversity of terpenoid compounds. To resolve the phylo- catenatum (182), and V. planifolia (86) (Fig. 5). Thus, genetic relationship of TPS genes in orchids, a gene tree the size of the R gene family varied greatly among the was constructed using the TPS gene sequences derived different Orchidaceae genera (Fig. 5). from orchids and Arabidopsis. Phylogenetic analysis showed that four TPS subfamilies were found in Aposta- In Apostasia, in addition to the small R gene family sia (Fig. 4). In Apostasia, members of both TPS-c and size, we also discovered lower copy numbers in both the TPS-f subfamilies, which encode enzymes responsible NAC and WRKY gene families (Fig. 5), which are known for the synthesis of 20-carbon diterpenes, were lost to play important roles in plant immune response [21, (Fig. 4 and Additional file 1, Table S20). In addition, 22]. We identified 55 and 64 NAC transcription factor fewer members of TPS-a and TPS-b subfamilies were members in A. ramifera and A. shenzhenica, respectively, observed in Apostasia compared with other orchids markedly fewer than that found in Dendrobium, Phalae- (Fig. 4 and Additional file 1, Table S20). Genes from nopsis, and Vanilla (77 to 113) (Fig. 5). We also identi- these two subfamilies are reportedly involved in the bio- fied 56 and 50 WRKY transcription factors in A. synthesis of 10- and 15-carbon volatile terpenoids [19], ramifera and A. shenzhenica, respectively, again fewer which are the components of floral scent. than that found in other orchids (64 to 83) (Fig. 5).

Apostasia LOX1/LOX5 genes may contribute to lateral root Pathogen resistance genes development, an important trait for terrestrial growth Pathogen resistance-related genes are closely associated LOX1 and LOX5 are involved in the development of with plant fitness and adaptive evolution [20]. Here, the lateral roots in Arabidopsis,andlossofthesetwo NB-ARC domain profile was used to search for R genes genes causes a significant increase in lateral root in the predicted gene models of A. ramifera and other emergence [23]. Here, we searched the homologs of orchids, including A. shenzhenica, V. planifolia, P. eques- LOX1 and LOX5 in six published orchid genomes tris, P. aphrodite, D. catenatum, and D. officinale.We using protein sequences from Arabidopsis as the identified 71 R genes in A. ramifera and 66 in A. shenz- query, and then constructed a gene tree to elucidate henica, considerably fewer than that found for P. eques- the phylogenetic relationship among these genes. We tris (114), P. aphrodite (109), D. officinale (172), D. detected multiple copies of LOX1/LOX5 homologs in Zhang et al. BMC Genomics (2021) 22:536 Page 6 of 12

XP_020590463.1

PAXXG276740 XP_020590461.1 PAXXG276730 0 AT1G61680

V PAXXG049850 Ara004686 Vpl003795 5100 XP_020590622.1 6 PAXXG276750 p 8788.1

8 88804.1

l

0 XP_020590464.1PAXXG276820 8

2

9

1

0

7

0 017783

l

7 XG21511

p

2 AXXG21 Dca020940 P_0205 20580217.1

V Vpl XP_020590460.1 P X PAXXG379100 Dca013782 XP_02058 PAX Dca003 Vpl012059 AT4G02780 XP_0 0 100 PAXXG045650 99 Vpl014945 XG034410 Dca007747 87 85 0 Vpl012224 9

100 Dca007746 100 9 Vpl013757 03443 295 10 100 G 89 PAX Vpl002448 99 100 XP_020576641.1 100 100 XP_020576698.1 84 Vpl0 100 PAXX Vpl008741 100 10091 22696 77 61 PAXXG034420 Vpl016604 100 99 98 XP_020576697.1 0 78 XP_020576699.1 Vpl008182 1 00 92 85 10 Dca005188 Ash014324 51 Dca022838 Ara010433 98 64 99 AT1G61120 65 AT1G79460 Ash010892 54 100 100 Ash010893 Vpl024694 100 Ara019716 Ara008027 94 Ash010138 Vpl014635 100 XP_020588364.1 51 68 100 100 Vpl013173 98 100 PAXXG352310 100 XP_020599757.1 PAXXG149140 91 100 99 100100 81 PAXXG344580 XP_020591710.1 100 Dca018946 Dca011215 100 98 59 XP_020579525.1 100 81 Dca011214 100 100 PAXXG249710 100 100 XP_020584121.1 Dca026890 69 80 67 96 100 PAXXG024480 Ash013010 100 Dca019412 100 71 96 Ara016901 99 100 Dca019411 TPS-a 100 100 72 XP_020584124.1 PAXXG370420 100 100 PAXXG024450 TPS-b 81 95 9 99 Ash001894 XP_020586098.1 9 100 TPS-c Dca000723 100 96 Ash001839 100 78 TPS-e 100 XP_020597358.1 Dca000724 99 99 100 P Dca000725 100 AXXG278350 TPS-f 730 99 10 100 Dca003142 AT4G16740 100 0 Dca0 94 TPS-g 0 00 85 Dca003141 AT4G16 1 50 03139 100 1 XP_0205964 AT2G24210 86 100 10 00 0 92 0 PAXXG02 AT3G2581 10 100 Dca018407 3G25820 98 Arabidopsis thaliana AT Dca008309 100 Vpl019259 78 5 Apostasia ramifera AT3G25830 0 100 V 4540 5 pl02346 .1 Dca017971 100 100100 Dca02 71 Apostasia shenzhenica 100 D 99 72 100 Vpl000975ca0 100 Vpl001457 PAXXG010350 51 Vp 6570 Dendrobium catenatum 100 97 Vpl000430 25698 8 PAXXG010370AT1G6602 V Ara013999 l019210 AT4G20200 Ash000699 pl003138 Phalaenopsis equestris XP_020598459.1 PAXXG346400 AT4G20230 Dc Dca026369 AT5G23960 AT5G4 AT1G70080 AT3G29190 A Phalaenopsis aphrodite

AT4G20210 T a016979

AT3G29110 3

AT1G48800 G Vanilla planifolia

AT2G23230 2

AT4G15870 8110 9 AT5G44630 AT4G13300 4

AT4G13280 1 AT3G14540

AT3G14520 0 AT1G31950 AT1G33750 AT3G14490 AT3G32030

Fig. 4 Phylogenetic tree for TPS genes predicted in six orchid species and Arabidopsis. Numbers above branches are bootstrap support values of at least 50

C A RKY R N W Phalaenopsis aphrodite 109 77 64 R Phalaenopsis equestris 114 82 65 NAC WRKY Dendrobium officinale 172 113 83

40 Dendrobium catenatum 182 91 71 80 Vanilla planifolia 86 92 77 120 Apostasia shenzhenica 66 64 50 160 Apostasia ramifera 71 55 56

Fig. 5 Number of members of R genes and NAC and WRKY gene families in different orchids. These gene families are marked in blue, green, and yellow, respectively. Sizes of circles are directly proportional to number of members in gene family Zhang et al. BMC Genomics (2021) 22:536 Page 7 of 12

XP_020574744.1 PAXXG180070

V

p

l

AT1G55020 0

2

AT3G22400 0

3

8

8

Ash003356 Ara016944 Dca016356 XP_020590849.1 Ara021787 PAXXG354990 Dca016964 Dca022825 100 XP_020580260.1

100 PAXXG056000 100 73 99 Dca003526 100 100 100 Dca003527 92 XP_020586977.1 100 100 100 99 XP_020571523.1 100 99 67 PAXXG088720 Dca004884 100 100 100 100 Dca016452 100 81 100 96 54 Ash010227 96 100 100 100 XP_020592800.1 100 100 Ara004227 99 PAXXG023340 99 76

100 100 100 100 AT3G45140

Vpl004147 100 Vpl009145

AT1G17420 Dca020949 Ara0 AT1G72520 Ash018896 AT1G67560 06511 Ara009668

Ash000329

PAXXG098190

Dca023278 XP_020577224.1 Vpl013972

Ash011707 Ara003798 Fig. 6 LOX gene tree showing LOX1/LOX5 genes in orchids. Phylogenetic analysis was conducted using LOX gene sequences from A. ramifera, A. shenzhenica, D. catenatum, P. equestris, P. aphrodite, V. planifolia, and Arabidopsis. Branches leading to orchid LOX1/LOX5 genes are marked in green. Numbers above branches are bootstrap support values of at least 50 the epiphytic orchid genomes (Fig. 6 and Additional MADS-box, pathogen resistance, TPS,andLOX genes) file 1, Table S21). However, only one homologous were compared among different orchid lineages. gene was found in A. ramifera,andtheLOX1/LOX5 homologs were completely lost in A. shenzhenica MADS-box transcription factors (Fig. 6 and Additional file 1, Table S21). We also Compared with other orchids, we found smaller gene found one copy of the LOX1/LOX5 genes in the families in the B- and E-classes of type II MADS genes hemi-epiphytic orchid V. planifolia (Fig. 6 and Add- in Apostasia and Vanilla. Genes in these classes of type itional file 1,TableS21). II MADS are involved in floral development [24]. Fur- thermore, it has been proposed that small size in these Discussion gene families may be related to the maintenance of the With worldwide distribution, orchids are one of the lar- ancestral state in Apostasia flowers, which exhibit radial gest families and their extraordinary di- symmetry and no specialized labellum [3]. However, versity provides an excellent opportunity to explore small gene families in the B- and E-classes of the type II plant evolution. Certain evolutionary adaptations in or- MADS family were also found in V. planifolia, which has chids, e.g., pollinium, labella and epiphytism, are pro- bilaterally symmetrical flower and a specialized la- posed to have played key roles in their adaptive bellum. These results indicate that members in the B- evolution and radiation. However, the genetic basis and E-classes may not contribute to the different flower underlying those innovations remains incompletely morphologies found among Apostasioideae and other known. In the current study, we sequenced the genome orchids. of A. ramifera, a basal Apostasioideae lineage terrestrial Recent research has suggested that genes from the orchid, and carried out comparative genomic analyses of MIKC* family are involved in development [25, seven orchid genomes including that of A. ramifera. Sev- 26]. Here, we found a MIKC* P-subclass member in the eral gene families related to adaptations in orchids (e.g., A. ramifera genome. Furthermore, P- and S-subclasses Zhang et al. BMC Genomics (2021) 22:536 Page 8 of 12

members of MIKC* were identified in A. shenzhenica, system differences have been reported between Aposta- while P-subclass genes were lost in P. equestris [3]. It sia and other orchids [3]. Among them, branch roots has been proposed that loss of P-subclass genes is asso- have been found in Apostasia but not in epiphytic or- ciated with the evolution of pollinia [3]. However, both chids, such as Phalaenopsis [3]. In land plants, the for- P- and S-subclass members have been identified in the mation of lateral roots plays a crucial role in root genome assembly of P. aphrodite [7] and V. planifolia architecture, uptake, and anchoring. Following their (Additional file 1, Fig. S2). Thus, loss of MIKC* genes in adaptation to soil-free environments, however, various some orchids might not be relevant to the evolution of orchids have lost the ability to develop lateral roots, in- pollinia. stead forming specialized root structures, such as spongy A lack of Mβ genes has been reported in some orchid epidermis, to help preserve nutrients. Zhang et al. [3] re- genomes, including A. shenzhenica, P. equestris, and D. ported that variation in the copy number of ANR1 sub- catenatum [2, 3, 5]. Here, we found that the Mβ gene family MADS-box genes results in different lateral root was also absent in A. ramifera and V. planifolia. Zhang formation between A. shenzhenica and epiphytic P. et al. [3] suggested that loss of Mβ-class type I MADS- equestris and D. catenatum. However, as the develop- box transcription factors is related to the absence of ment of lateral roots is a complicated process that in- endosperm in the seeds of all orchids. However, Mβ volves intricate regulation and phytohormone genes have been discovered in the genome of P. aphro- interactions [31, 32], the genetic mechanisms controlling dite and transcriptome of Orchis italica [7, 27]. Thus, in- the emergence of lateral roots in orchids await further stead of Mβ genes, other genes or mechanisms may investigation. In this study, we found fewer copies of the contribute to the absence of endosperm in orchid seeds. LOX1/LOX5 homologous genes in the Apostasia species and hemi-epiphytic V. planifolia than that in the epi- TPS gene family phytic orchids. Given the function of LOX1 and LOX5 in In comparison to that in Apostasia, more members in Arabidopsis [23], we propose that copy number variation the TPS-a and TPS-b clades of the TPS gene family were in these genes may contribute to the differences in lat- found in Vanilla, Dendrobium, and Phalaenopsis.Mem- eral root development between terrestrial and epiphytic bers of these clades are involved in the biosynthesis of orchids. In addition, according to the phylogenetic rela- volatile terpenoids, which are the components of floral tionship of LOX genes in orchids, there are six different scent [19]. In addition, it has been proposed that expan- subclades of LOX genes in the common ancestor of or- sion of TPS subfamilies may promote the emergence of chids. The variation in copy number among the different novel compounds [18]. As the flowers of orchids in the orchid lineages may be due to the various degrees of Epidendroideae and Vanilloideae subfamilies are highly gene retention, rather than gene duplication. adapted to animal pollination via many pollination syn- dromes, including the development of various volatile compounds, this result may provide new insight into the Conclusions genetic basis of adaptation to insect pollination in epi- In this study, we performed de novo assembly and ana- phytic orchids. Gene duplication and divergence are lysis of the genome of A. ramifera, a terrestrial orchid more effective ways of evolving new enzymatic functions from the Apostasioideae subfamily. We revealed the than de novo evolution of a new gene [18]. Thus, more population size histories of different orchid species and members of the TPS-a and TPS-b subfamilies may facili- discovered a continuous decrease in population size tate the emergence of novel volatile compounds, which from the genomes of these species over the last 100 000 may contribute to their adaptation to diverse animal pol- years. In addition, the gene family size and subfamily linators via the production of diverse flower scents. architecture of TPS genes varied greatly among species from different orchid subfamilies, which may be associ- Lateral root development ated with the adaptive evolution of orchids. Genes asso- For higher land-based plants, roots play a significant role ciated with pathogen resistance were significantly in their successful colonization of the terrestrial environ- reduced in the genomes of Apostasia compared with ment by providing mechanical support as well as water that of other orchids. In Apostasia, we also found genes and nutrient uptake from the soil (or air for epiphytic that were likely involved in the regulation of lateral root plants) [28]. Root architecture, i.e., the spatial development, which is an important trait for terrestrial organization of roots, also has a significant impact on growth. The A. ramifera genome sequence reported here the functional performance of the root system and is im- should be an important resource for further investiga- portant for plant survival [28, 29]. Environmental fac- tions on orchid biology. Comparative genomics analysis tors, such as water and nutrient availability, contribute of A. ramifera and other orchids should provide new in- to the shaping of root architecture [30]. Significant root sights into the adaptive evolution of these species. Zhang et al. BMC Genomics (2021) 22:536 Page 9 of 12

Methods Gene and non-coding RNA prediction Sample preparation and sequencing Because previously published work on the V. planifolia draft The A. ramifera samples were collected from Jianfeng genome [8] did not include gene prediction, we carried out Mountain, Hainan Province, China. No permission was protein-coding gene prediction for the A. ramifera and V. required to collect these samples. The formal identifica- planifolia genomes. Firstly, we used AUGUSTUS [40]and tion of plant material was conducted by Prof. Zhongjian GlimmerHMM [41] to generate the de novo predicted gene Liu. A voucher specimen of the material was deposited sets for our assembly and the V. planifolia genome (BioPro- at the National Orchid Conservation Center of China ject: PRJNA507095). Protein sets derived from five plant ge- under deposition number Liu.JZ6475. For genome se- nomes, including Arabidopsis thaliana, Phalaenopsis quencing, we collected fresh leaves from A. ramifera. Ex- equestris, Oryza sativa, Sorghum bicolor,andZea mays,were traction of genomic DNA was carried out using the then applied to search against the Apostasia and Vanilla ge- modified cetyltrimethylammonium bromide protocol nomes using TBLASTN with an E-value cutoff of 1e-5 and [33]. Five DNA libraries with different insert sizes were minimum query coverage of 25 %. GeneWise [42]wasused constructed using an Illumina library construction kit to annotate the gene structures. The RNA-seq datasets (NEB DNA Library Rapid Prep Kit for Illumina) and (SRR1509356, SRR1509370, and SRR1509674) for V. planifo- then sequenced using the Illumina HiSeq 2000 platform. lia were downloaded from NCBI SRA, and were de novo as- After filtering the raw reads according to sequencing sembled by Trinity software. Vanilla transcripts were applied quality and adaptor contamination, a total of 57.4 Gb of to annotate the V. planifolia genome using the PASA pro- clean data were retained for assembling the Apostasia gram. The annotation results derived from different methods genome. were then integrated to generate integrated protein-coding gene sets for A. ramifera and V. planifolia with the MAKER [43]program. Genome assembly Non-coding RNAs do not translate into protein se- The estimated genome size of A. ramifera was quences but exert significant roles in cellular metabol- 332.24 Mb according to k-mer frequency distribution. ism, and include microRNAs (miRNAs), transfer RNAs Only one peak was observed in the k-mer distribution, (tRNAs), ribosomal RNAs (rRNAs), and small nuclear indicating high homozygosity of the Apostasia genome. RNAs (snRNAs). Here, we applied previously described For genome assembly, SOAPdenovo2 [34] was used for methods to search for non-coding RNAs in the Aposta- contig construction and scaffolding, and GapCloser was sia genome [3]. The miRNA- and snRNA-coding genes used for extending the length of the final contigs. In were predicted using INFERNAL [44] and the tRNA- total, 57.4 Gb of clean reads derived from the DNA li- coding genes were identified using tRNAscan-SE [45]. braries with five insert sizes (Additional file 1, Table S1) Genes encoding rRNAs were annotated by searching the were used by SOAPdenovo2 assembler and GapCloser genome with the rRNA sequences of Arabidopsis. for de novo genome assembly.

Functional annotation Repeat annotation Functional analysis of the predicted genes in the Aposta- Repeat sequences consist of tandem repeats, such as sia genomes was performed by searching their protein- small and micro-satellite DNA, and interspersed repeats coding regions against sequences derived from publicly (also known as transposable elements, TEs). In the A. available databases, including Gene Ontology (GO) [46, ramifera genome, tandem repeat sequences were identi- 47], Kyoto Encyclopedia of Genes and Genomes (KEGG) fied by TRF software [35]. Identification of TEs was con- [48], SwissProt [49], TrEMBL [49], non-redundant (nr) ducted by homology searches of the RepBase database protein database, and InterProScan [50]. [36] and de novo prediction. Briefly, RepeatMasker [37] and RepeatProteinMask [38] were applied to identify Gene family identification TEs in the Apostasia genome with a RepBase-derived li- Gene family clustering was conducted using OrthoFinder[51] brary of known repeat elements. For de novo prediction, with complete protein sets from seven species, including P. we used RepeatModeler and LTR-FINDER [39] to con- equestris, P. aphrodite, D. officinale, D. catenatum, A. shenz- struct a de novo repetitive element library for the A. henica, A. officinalis,andO. sativa,aswellasthepredicted ramifera genome. RepeatMasker was then applied to protein sequences from A. ramifera. To limit the disturbance search the genome for TEs with the constructed data- of alternative splicing variants on gene family clustering, the base. Finally, these results were combined, and the re- longest transcript of each gene was selected for analysis. dundant sequences were removed to generate a Gene families in which the number of genes from Apostasia complete repeat annotation. (including A. ramifera and A. shenzhenica) was 1.5 times Zhang et al. BMC Genomics (2021) 22:536 Page 10 of 12

higher than that from other orchids were considered ex- identified in A. ramifera and V. planifolia. EvolView [64]was panded in Apostasia. used to visualize the number of members in the NAC, WRKY and R gene families for the selected species. For the Phylogenetic analysis TPS genes, the protein sequences that possessed both Pfam To build a high-confidence phylogenetic tree, we constructed domains and contained more than 500 amino acids were a multi-species protein set containing protein sequences considered as functional genes and used for further analysis. from A. ramifera and 15 other species, including 11 mono- To identify LOX genes, protein sequences of the LOX gene cots (Spirodela polyrhiza, D. catenatum, P. equestris, A. family in A. thaliana (Gene ID: AT1G55020, AT1G72520, shenzhenica, A. officinalis, Ananas comosus, Musa acumi- AT1G67560, AT1G17420, AT3G22400, and AT3G45140) nata, Phoenix dactylifera, Brachypodium distachyon, S. bi- were used to search for homologous genes in orchids. The color,andO. sativa), three eudicots (Vitis vinifera, A. identified protein sequences of each gene family were aligned thaliana,andPopulus trichocarpa), and the outgroup using MUSCLE [65](v3.8.31)withdefaultsettings.MEGA7 Amborella trichopoda. Protein sequences that contained less [66] was then used to construct an unrooted neighbor- than 50 amino acids were removed from the constructed joining tree for each gene family with 500 bootstrap dataset. The pairwise similarities between protein sequences replicates. were calculated through all-against-all BLASTP with cutoff Abbreviations criteria: i.e., (i) E-value < 1e-5, (ii) query coverage > 30 %, (iii) TEs: Transposable elements.; GO: Gene Ontology.; KEGG: Kyoto Encyclopedia alignment identify > 30 %. The results were then entered into of Genes and Genomes.; PSMC: Pairwise sequential Markov coalescent.; OrthoMCL [52] (v2.0.9) to construct orthologous groups. In TPS: Terpene synthase. total, 381 single-copy gene families shared by all 16 species were applied to construct a species tree using MrBayes [53] Supplementary Information The online version contains supplementary material available at https://doi. with the GTR + invgamma model. PAML MCMCTree [54] org/10.1186/s12864-021-07852-3. was used to estimate the species divergence times with the following time calibrations: (i) O. sativa and B. distachyon di- Additional file 1. vergence time (40–54 million years ago) [55], (ii) P. tricho- carpa and A. thaliana divergence time (100–120 million Acknowledgements years ago) [56], (iii) lower boundary of monocot and eudicot Not applicable. divergence time (140 million years ago) [57], and (iv) upper Authors’ contributions boundary for angiosperm divergence time (200 million years G.Z., H.H., Z.L., and J.C. designed and managed the project. J.C. and W.Z. ago) [58]. Gene family expansions or contractions were iden- wrote and revised the manuscript. W.Z., G.Z., P.Z., and Y.Z. contributed to tified using CAFÉ [12]. genome sequencing and data analysis. The final manuscript was read and approved by all authors.

Heterozygosity analysis and estimation of effective Funding population size This study was funded by the Science and Technology Development Fund Macau SAR (File no. 031/2017/A1) to H.H., Talents Team Construction Fund Identification of heterozygous loci was performed via a of Northwestern Polytechnical University (NWPU) to J.C., Fundamental previously described method [59]. Briefly, clean reads Research Funds for the Central Universities (3102019JC007) to J.C., and were aligned to the genome sequence of A. ramifera National Thousand Youth Talents Plan to J.C. The funders played no role in the study. using the BWA tool [60]. Duplicate reads were then re- moved by Picard. SAMtools [61] was used for calling Availability of data and materials heterozygous loci, and bcftools was used for generating Raw data and the genome assembly from this study were deposited in NCBI under the BioProject ID: PRJNA635894. The datasets supporting the consensus sequences. The effective population sizes of conclusions of this article are included within the article and its additional the orchid species were estimated using the PSMC pro- files. gram [13]. The parameters for PSMC analysis were set − Declarations to default except for -g 4 and -u 0.5 × 10 8. Ethics approval and consent to participate Identification of MADS-box, TPS, NAC, WRKY, R, and LOX Not applicable. genes Consent for publication The hidden Markov model profiles [62] were applied to Not applicable. search for MADS-box (Pfam Accession: PF00319), TPS Competing interests (Pfam Accession: PF01397 and PF03936), NAC (Pfam Acces- The authors declare that they have no competing interests. sion: PF02365), WRKY (Pfam Accession: PF03106), and R (Pfam Accession: PF00931) genes using HMMER [63] Author details 1State Key Laboratory of Quality Research in Chinese Medicine, Institute of (v3.2.1). MADS-box genes in A. thaliana reported in [3]were Chinese Medical Sciences, University of Macau, 999078 Macau, China. 2Key used to reconstruct gene trees with the MADS-box genes Laboratory of National Forestry and Grassland Administration for Orchid Zhang et al. BMC Genomics (2021) 22:536 Page 11 of 12

Conservation and Utilization, 518114 Shenzhen, China. 3Shenzhen Key lateral root development and defense responses through a specific Laboratory for Orchid Conservation and Utilization, 518114 Shenzhen, China. signaling cascade. Plant Cell. 2007;19(3):831–46. 4National Orchid Conservation Center of China and Orchid Conservation and 24. Tsai W-C, Chen H-H. The orchid MADS-box genes controlling floral Research Center of Shenzhen, 518114 Shenzhen, China. 5Key Laboratory of morphogenesis. The Scientific World Journal. 2006;6:1933–44. NFGA for Orchid Conservation and Utilization, Fujian Agriculture and Forestry 25. Liu Y, Cui S, Wu F, Yan S, Lin X, Du X, et al. Functional conservation of University, 350002 Fuzhou, China. 6School of Ecology and Environment, MIKC*-Type MADS box genes in Arabidopsis and rice pollen maturation. Northwestern Polytechnical University, 710129 Xi’an, China. Plant Cell. 2013;25(4):1288–303. 26. Kwantes M, Liebsch D, Verelst W. How MIKC* MADS-box genes originated and Received: 19 June 2020 Accepted: 23 June 2021 evidence for their conserved function throughout the evolution of gametophytes. Molecular biology evolution. 2012;29(1):293–302. 27. Valoroso MC, Censullo MC, Aceto S. The MADS-box genes expressed in the inflorescence of Orchis italica (Orchidaceae). PloS one. 2019;14(3). References 28. Smith S, De Smet I. Root system architecture: insights from Arabidopsis and 1. Christenhusz MJM, Byng JW. The number of known plants species in the cereal crops. The Royal Society; 2012. world and its annual increase. Phytotaxa. 2016;261(3). 29. Li X, Zeng R, Liao H. Improving crop nutrient efficiency through root 2. Cai J, Liu X, Vanneste K, Proost S, Tsai WC, Liu KW, et al. The genome architecture modifications. Journal of integrative plant biology. 2016;58(3): sequence of the orchid Phalaenopsis equestris. Nat Genet. 2015;47(1):65–72. 193–202. 3. Zhang GQ, Liu KW, Li Z, Lohaus R, Hsiao YY, Niu SC, et al. The Apostasia 30. Kiba T, Krapp A. Plant nitrogen acquisition under low availability: regulation genome and the evolution of orchids. Nature. 2017;549(7672):379–83. of uptake and root architecture. Plant Cell Physiol. 2016;57(4):707–14. 4. Wei S, Shih C-C, Chen N-H, Tung S-J, editors. Value chain dynamics in the 31. Du Y, Scheres B. Lateral root formation and the multiple roles of auxin. J Taiwan orchid industry. I International Orchid Symposium 878; 2010. Exp Bot. 2018;69(2):155–67. 5. Zhang GQ, Xu Q, Bian C, Tsai WC, Yeh CM, Liu KW, et al. The Dendrobium 32. Fukaki H, Tasaka M. Hormone interactions during lateral root formation. catenatum Lindl. genome sequence provides insights into polysaccharide Plant molecular biology. 2009;69(4):437. synthase, floral development and adaptive evolution. Sci Rep. 2016;6:19029. 33. Murray M, Thompson WF. Rapid isolation of high molecular weight plant 6. Yan L, Wang X, Liu H, Tian Y, Lian J, Yang R, et al. The Genome of DNA. Nucleic acids research. 1980;8(19):4321–6. Dendrobium officinale Illuminates the Biology of the Important Traditional 34. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, et al. SOAPdenovo2: an Chinese Orchid Herb. Mol Plant. 2015;8(6):922–34. empirically improved memory-efficient short-read de novo assembler. 7. Chao YT, Chen WC, Chen CY, Ho HY, Yeh CH, Kuo YT, et al. Chromosome- Gigascience. 2012;1(1):2047–217X-1-18. level assembly, genetic and physical mapping of Phalaenopsis aphrodite 35. Benson G. Tandem repeats finder: a program to analyze DNA sequences. genome provides new insights into species adaptation and resources for Nucleic acids research. 1999;27(2):573–80. orchid breeding. Plant Biotechnol J. 2018;16(12):2027–41. 36. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. 8. Hu Y, Resende MF, Bombarely A, Brym M, Bassil E, Chambers AH. Genomics- Repbase Update, a database of eukaryotic repetitive elements. Cytogenet based diversity analysis of Vanilla species using a Vanilla planifolia draft Genome Res. 2005;110(1–4):462–7. genome and Genotyping-By-Sequencing. Scientific reports. 2019;9(1):1–16. 37. Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive 9. Kocyan A, Qiu Y-L, Endress P, Conti E. A phylogenetic analysis of elements in genomic sequences. Current protocols in bioinformatics. 2009; Apostasioideae (Orchidaceae) based on ITS, trnL-F and matK sequences. 25(1):4. 10. 1–4. 4. Plant Syst Evol. 2004;247(3–4):203–13. 38. Tempel S. Using and understanding RepeatMasker. Mobile Genetic 10. Kocyan A, Endress PK. Floral structure and development of Apostasia and Elements: Springer; 2012. pp. 29–51. (Apostasioideae) and their relationships to other Orchidaceae. Int 39. Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length J Plant Sci. 2001;162(4):847–67. LTR retrotransposons. Nucleic acids research. 2007;35(suppl_2):W265-W8. 11. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. 40. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: BUSCO: assessing genome assembly and annotation completeness with ab initio prediction of alternative transcripts. Nucleic acids research. 2006; single-copy orthologs. Bioinformatics. 2015;31(19):3210–2. 34(suppl_2):W435-W9. 12. De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool 41. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open for the study of gene family evolution. Bioinformatics. 2006;22(10):1269–71. source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878–9. 13. Li H, Durbin R. Inference of human population history from individual 42. Birney E, Clamp M, Durbin R. GeneWise and genomewise. Genome whole-genome sequences. Nature. 2011;475(7357):493–6. research. 2004;14(5):988–95. 14. Gillman MP, Dodd M. The variability of orchid population size. Botanical 43. Holt C, Yandell M. MAKER2: an annotation pipeline and genome-database journal of the Linnean Society. 1998;126(1–2):65–74. management tool for second-generation genome projects. BMC Bioinform. 15. Alexandersson R, Ågren J. Population size, pollinator visitation and fruit 2011;12(1):491. production in the deceptive orchid Calypso bulbosa. Oecologia. 1996;107(4): 44. Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. 533–40. Bioinformatics. 2009;25(10):1335–7. 16. Masiero S, Colombo L, Grini PE, Schnittger A, Kater MM. The emerging 45. Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of importance of type I MADS box transcription factors for plant reproduction. transfer RNA genes in genomic sequence. Nucleic acids research. 1997;25(5): Plant Cell. 2011;23(3):865–72. 955–64. 17. Pazouki L, Niinemets Ü. Multi-substrate terpene synthases: their occurrence 46. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene and physiological significance. Front Plant Sci. 2016;7:1019. ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9. 18. Karunanithi PS, Zerbe P. Terpene synthases as metabolic gatekeepers in the 47. Consortium GO. The gene ontology resource: 20 years and still GOing evolution of plant terpenoid chemical diversity. Frontiers in plant science. strong. Nucleic acids research. 2019;47(D1):D330-D8. 2019;10:1166. 48. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. 19. Chen F, Tholl D, Bohlmann J, Pichersky E. The family of terpene synthases in Nucleic acids research. 2000;28(1):27–30. plants: a mid-size family of genes for specialized metabolism that is highly 49. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its diversified throughout the kingdom. Plant J. 2011;66(1):212–29. supplement TrEMBL in 2000. Nucleic acids research. 2000;28(1):45–8. 20. Tian D, Traw M, Chen J, Kreitman M, Bergelson J. Fitness costs of R-gene- 50. Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. mediated resistance in Arabidopsis thaliana. Nature. 2003;423(6935):74–7. InterPro: the integrative protein signature database. Nucleic acids research. 21. Yuan X, Wang H, Cai J, Li D, Song F. NAC transcription factors in plant 2009;37(suppl_1):D211-D5. immunity. Phytopathology Research. 2019;1(1):1–13. 51. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole 22. Pandey SP, Somssich IE. The role of WRKY transcription factors in plant genome comparisons dramatically improves orthogroup inference accuracy. immunity. Plant physiology. 2009;150(4):1648–55. Genome biology. 2015;16(1):157. 23. Vellosillo T, Martínez M, López MA, Vicente J, Cascón T, Dolan L, et al. 52. Li L, Stoeckert CJ, Roos DS. OrthoMCL: identification of ortholog groups for Oxylipins produced by the 9-lipoxygenase pathway in Arabidopsis regulate eukaryotic genomes. Genome research. 2003;13(9):2178–89. Zhang et al. BMC Genomics (2021) 22:536 Page 12 of 12

53. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17(8):754–5. 54. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology evolution. 2007;24(8):1586–91. 55. Initiative IB. Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature. 2010;463(7282):763. 56. Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). science. 2006;313(5793):1596–604. 57. Chaw S-M, Chang C-C, Chen H-L, Li W-H. Dating the monocot–dicot divergence and the origin of core eudicots using whole chloroplast genomes. Journal of molecular evolution. 2004;58(4):424–41. 58. Magallón S, Hilu KW, Quandt D. Land plant evolutionary timeline: gene effects are secondary to fossil constraints in relaxed clock estimation of age and substitution rates. Am J Bot. 2013;100(3):556–73. 59. Chaw S-M, Liu Y-C, Wu Y-W, Wang H-Y, Lin C-YI, Wu C-S, et al. Stout camphor tree genome fills gaps in understanding of flowering plant genome evolution. Nature plants. 2019;5(1):63–73. 60. Li H, Durbin R. Fast and accurate short read alignment with Burrows– Wheeler transform. bioinformatics. 2009;25(14):1754–60. 61. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16): 2078–9. 62. El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, et al. The Pfam protein families database in 2019. Nucleic acids research. 2019;47(D1): D427-D32. 63. Eddy SR. Accelerated profile HMM searches. PLoS computational biology. 2011;7(10). 64. Subramanian B, Gao S, Lercher MJ, Hu S, Chen W-H. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic acids research. 2019;47(W1):W270-W5. 65. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004;32(5):1792–7. 66. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Molecular biology evolution. 2016; 33(7):1870–4.

Publisher’sNote Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.