bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Sequencing of notoginseng genome reveals genes involved in

disease resistance and ginsenoside biosynthesis

Guangyi Fan1,2,*, Yuanyuan Fu2,3,*, Binrui Yang1,*, Minghua Liu4,*, He Zhang2, Xinming Liang2,

Chengcheng Shi2, Kailong Ma2, Jiahao Wang2, Weiqing Liu2, Libin Shao2, Chen Huang1, Min

Guo1, Jing Cai1, Andrew KC Wong5, Cheuk-Wing Li1, Dennis Zhuang5, Ke-Ji Chen6, Wei-Hong

Cong6, Xiao Sun3, Wenbin Chen2, Xun Xu2, Stephen Kwok-Wing Tsui4,7,†, Xin Liu2,†, Simon

Ming-Yuen Lee1,†.

1State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical

Sciences, University of Macau, Macao, China.

2BGI-Qingdao, BGI-Shenzhen, Qingdao 266000, China.

3State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering,

Southeast University, Nanjing 210096, China

4School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China

5System Design Engineering, University of Waterloo, Ontario, Canada

6Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, China

7Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Hong Kong, China

*These authors contributed equally to this work.

†Correspondence authors: Simon Ming-Yuen Lee ([email protected]), Wenbin Chen

([email protected]) and Stephen Kwok-Wing Tsui ([email protected]).

Abstract

Panax notoginseng is a traditional Chinese herb with high medicinal and economic value. There has been considerable research on the pharmacological activities of ginsenosides contained in Panax spp.; however, very little is known about the ginsenoside biosynthetic pathway. We reported the first de novo genome of 2.36 Gb of sequences from P. notoginseng with 35,451 protein-encoding genes. Compared to other , we found notable gene family contraction of disease-resistance genes in P. notoginseng, but notable expansion for several ATP-binding cassette (ABC) bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

transporter subfamilies, such as the Gpdr subfamily, indicating that ABCs might be an additional mechanism for the to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, we identified several key genes, their transcription factor binding sites and all their family members involved in the synthesis pathway of ginsenosides in P. notoginseng, including dammarenediol synthase, CYP716 and UGT71. The complete genome analysis of P. notoginseng, the first in genus Panax, will serve as an important reference sequence for improving breeding and cultivation of this important nutraceutical and medicinal but vulnerable plant species. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Introduction

Genus Panax in the , contains some medicinally and economically important species including P. g i n s e n g , P. quinquefolius and P. notoginseng (Sanqi in Chinese)1. Because P. notoginseng is an important Chinese traditional medicine, two draft genomes of P. notoginseng were published in 2017. Chen etc.’s genome assembly was about 2.39 Gb with a contig N50 of 16 kb and scaffold N50 of 96 kb and Zhang etc.’s was a ~1.85 Gb assembly with scaffold N50 of 158 kb and contig N50 of 13.2 kb2,3. There was a notable difference of genome assembly size (~540 Mb) between two genomes. In general, the estimation of genome size using flow cytometry analysis is considered as a golden standard approach and the estimated result of diploid P. notoginseng was about 2.31 Gb3. Besides, these two papers also annotated the repetitive sequences and protein-coding genes. Chen etc. annotated ~75.94% repetitive sequences (~1.71 Gb) and 36,790 genes, and Zhang etc. annotated 61.31% repetitive sequences (~1.13 Gb) and 34,369 genes in the assembly, respectively. Zhang etc.’s genome has a longer scaffold N50 size, but shorter contig N50 size, total length and proportion of repetitive sequences comparing to Chen etc.’s genome, suggesting Zhang etc.’s assembly might miss a large amount of repetitive sequences. Thus, due to the high proportion of repetitive sequences in this species, it is a big challenge to obtain a good assembly for this complex genome. Panax notoginseng is susceptible to a wide range of pathogens and identification of the genes conferring disease resistance has been a major focus of research4, which greatly reduces its economic benefits. The main class of disease-resistance genes (R-genes) consist of a NBS, a C-terminal LRR and a putative coiled coil domain (CC) at the N-terminus or a N-terminal domain with homology to the mammalian toll interleukin 1 receptor (TIR) domain5. Generally, the LRR domain is often involved in protein–protein interactions, with an important role in recognition specificity as well as ligand binding6. Besides, ATP-binding cassette (ABC) proteins have also been reported to play a crucial role in the regulation of resistance processes in plants7, most of which modulate the activity of heterologous channels or have intrinsic channel bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

activity. The reason underlying the poor resistance of P. notoginseng is still unclear. Saponin constituents, also known as ginsenosides, are the major active ingredients in genus Panax8,9 and have diverse pharmacological activities, such as hepatoprotection, renoprotection, estrogen-like activities and protection against cerebro-cardiovascular ischemia and dyslipidaemia in experimental models10-14. Xuesaitong (in Chinese) is a prescription botanical drug, manufactured from saponins in the root of P. notoginseng10, used for the prevention and treatment of cardiovascular diseases, and is among the top best-selling prescribed Chinese medicines in China15,16. Compound Danshen Dripping Pill, a prescription botanical drug for cardiovascular disease17, comprises P. notoginseng, Salvia miltiorrhiza and synthetic borneol, and has successfully completed Phase II and is undergoing Phase III clinical trials in the United States. Although different ginseng species have been one of the most important traded herbs for human health around the world, the industry is facing elevating challenges, such as increasing disease vulnerability of the cultivated plant and an as yet undetermined disease preventing continuous cultivation on the same land, leading to a trend of decreasing production18. Plant metabolites play very important roles in plant defense mechanisms against stress and disease, and some have important nutritional value for human consumption. For instance, capsaicinoids and caffeine present in pepper and coffee, respectively, have important health benefits. The recently completed whole-genome sequencing of coffee and pepper provide new clues in understanding evolution and transcriptional control of the biosynthesis pathway of these metabolites19,20. Ginsenosides, the oligosaccharide glycosides of a series of dammarane- or oleanane-type triterpenoid glycosides9, are a group of secondary metabolites of isoprenoidal natural products,

specifically present in genus Panax. Based on the structure differentiation of the sapogenins (aglycones of saponin), dammarane-type ginsenosides can be classified into two subtypes: protopanaxadiol (PPD) and protopanaxatriol (PPT) types9. Up to now, over 100 structurally diversified ginsenosides have been isolated. Interestingly, the two types (PPD and PPT) of ginsenosides have been reported to exhibit opposing biological activities; for instance, pro-angiogenesis and anti-angiogenesis21,22. The bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

whole plant of P. notoginseng contains both PPD- and PPT-type ginsenosides but the aerial parts (e.g. leaf and flower) contain a higher abundance of PPD compared to roots23,24. Herein, we characterize the genome of P. notoginseng, the population of which has a relatively uniform genetic background. This was aimed to determine the genetic causes of weakened resistance of P. notoginseng as well as to understand evolution and regulation of the triterpenoid saponin biosynthesis pathway leading to the production of specialized ginsenosides in genus Panax.

Results

Genome assembly and annotation We de novo sequenced and assembled the P. notoginseng genome from Wenshan City of China using ~256.07 Gb data (~104.09-fold) generated by Illumina HiSeq2000 and ~13 Gb (~6-fold, average read length of ~9kb) SMRT sequence data generated by Pacbio RSII sequencing system (Supplementary Table 1). The final genome assembly spanned about 2.36 Gb (~95.93% of the estimated genome size) with scaffold N50 of 72.37 kb and contig N50 of 16.42 kb (Supplementary Table 2, 3 and Supplementary Fig. 1). To evaluate our assembly, we firstly checked the links within scaffolds based on the pair-end relationships of sequencing reads and the result revealed that pair-end information supported the links of scaffolds well (Supplementary Fig. 2). Secondly, we obtained the unmapped sequencing reads against our genome assembly and re-assembled them into ~359 Mb sequences which contained ~95.05% repetitive sequences (Supplementary Table 4). Using KAT sect tool25, we also calculated the sequencing depth distribution of flanking sequences of gap regions, we observed that ~71% of sequences nearby junction regions have higher sequencing depth (Supplementary Fig. 3), indicating that majority of the gap regions were repetitive sequences. Besides, leaves and roots of P. notoginseng were collected from 1-, 2- and 3-year-old P. notoginseng and flowers collected from 2- and 3-year-old P. notoginseng. An average of ~6.06 Gb raw data were obtained from each sample sequenced using Illumina HiSeq 2000 (Supplementary Table 5). More than bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

95.42% of transcripts (coverage ratio >= 90%) could be unambiguously mapped into assembly sequences revealing the good integrity of genome assembly (Supplementary Table 6). Of the genome, 72.45% of repetitive sequences ratio was estimated by Jellyfish26 but only 1.24 Gb (51.03%) was found to be repetitive sequences, which is more than that of carrot genome (~46%)27, with long terminal repeats the most abundant (~95.08% of transposable elements) (Supplementary Table 7). We totally predicted 35,451 protein-encoding genes in the P. notoginseng genome by integrating the evidences of ab initio prediction, homologous alignment and transcripts (Supplementary Table 8). Assessing the P. notoginseng genome assembly and annotation completeness with single-copy orthologs approach (BUSCO)28 showed that 895 (93%) complete single-copy genes containing 246 (25%) complete duplicated genes were validated, similar to that of cotton29 and Dendrobium officinale30. In combination with the RNA-seq mapping results above, it was unambiguous that our gene set was available for further analysis. Genome evolution of Panax species Panax notoginseng is the first sequenced genome of genus Panax, thus we compared its genome to other related species to reveal genome evolution of Panax species. To analyze P. notoginseng evolution, we first constructed the phylogenetic tree using published closely-related species (Supplementary Fig. 4). We estimated the species divergence time between P. notoginseng and D. carota to be about 71.9 million years ago (Supplementary Fig. 4). Compared to closely related species (D. carota, S. tuberosum, S. lycopersicum and C. annuum), we found 1,144 unique gene families (containing 2,714 genes) with no orthologs in other species (Supplementary Fig. 5), which were possibly related to specific features of P. notoginseng. Thus, we performed functional enrichment of these genes to find several interesting Gene Ontology (GO) terms likely related to saponin biosynthesis, such as UDP-glucosyltransferase (UGT) activity (P = 0.003, Supplementary Fig. 6). To determine features and specific functions in P. notoginseng, we identified a total of 1,520 expanded gene families (Supplementary Fig. 4) and further performed Kyoto Encyclopedia of Genes and Genomes (KEGG) and GO functional enrichment bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

analyses for these genes. We found several significant gene families related to saponin biosynthesis, such as sesquiterpenoid and triterpenoid biosynthesis (P = 0.006, Supplementary Table 9). Resistance genes Most cloned R-genes in the plant encode nucleotide-binding site and leucine-rich-repeat (NBS-LRR) domains. In our P. notoginseng genome assembly, 129 NBS-LRR-encoding genes were detected, which was notably fewer than Arabidopsis thaliana (183), D. carota (170), S. tuberosum (441), S. lycopersicum (282) and C. annuum (778), and similar to that of C. sativus (89)31 (Supplementary Table 10). Considering R genes were known to be small and in repetitive areas and our assembly with much gaps in the repetitive regions, the total number of R gene of P. notoginseng would be likely to underestimated. However, the number of genes in the TIR_NBA_LRR subfamily was still relatively expanded according to the phylogenetic analysis using P. notoginseng, tomato and carrot, which possibly be related to the resistance character of P. notoginseng (Fig. 1a and 1b). ABC transporter gene family We identified a total of 153 ATP-binding cassette (ABC) genes in P. notoginseng, which were classified into nine subfamilies (Supplementary Table 11 and Supplementary Fig. 7). Generally, compared with A. thaliana (12), Daucus carota

(5), Solanum lycopersicum (10), S. tuberosum (13) and Capsicum annuum (10), there were notably fewer members of subfamily A in P. notoginseng (3); whereas, there were notably more of subfamily B in P. notoginseng (28, 28, 33, 38, 34 and 47 for subfamily B, respectively) (Supplementary Table 11). In detail, two genes were notably contracted for subfamily A (Supplementary Fig. 8) – a subfamily reportedly involved in cellular lipid transport32. For subfamily B, we found a specific expanded clade in P. notoginseng containing 11 genes, three duplication expansions and one contraction (Supplementary Fig. 7 and Supplementary Fig. 9). In particular, duplication expansions were Pno037415.1 and Pno028859.1 (AT2G36910.1 homologs); Pno009402.1 and Pno037963.1 (AT3G28860.1 homologs); and Pno034123.1 and Pno018232.1 (AT5G39040.1 bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

homologs). One contraction was Pno003332.1 (AT4G28620.1, AT4G28630.1 and AT5G58270.1 homolog). Interestingly, the expanded genes have been reported as involved in aluminum resistance and auxin transport in stems and roots, respectively33,34. The contracted gene was found to be involved in iron export35. In the Gpdr subfamily, we also found a notable expansion (P. notoginseng: 10 vs. A. thaliana: 1, AT1G15520.1) (Fig. 1c). The homolog of AT1G15520.1 is known to be related to pleiotropic drug resistance and abscisic acid (ABA) uptake transport36,37. Key genes involved in ginsenoside biosynthesis of P. notoginseng Dammarenediol synthase Ginsenosides are committed to be synthesized from dammarenediol-II after hydroxylation via cytochrome P450 (CYP450)38 and then glycosylation by glycosyltransferase (GT)39. Dammarenediol-II is synthesized from farnesyl-PP, which is synthesized via the terpenoid backbone biosynthesis pathway. There are three enzymes involved from farnesyl-PP to dammarenediol-II: farnesyl-diphosphate farnesyltransferase (FDFT1, K00801), squalene monooxygenase (SQLE, K00511) and dammarenediol synthase (DDS, K15817) (Fig. 2a). In total, we identified three SQLE, two FDFT1 and one DDS in P. notoginseng. Combining with the results of real-time PCR, DDS (Pno034035.1) was found with extensive higher expression in the root than aerial parts and in 3-year old plant both root and aerial parts were found with higher expression of DDS (Pno034035.1). We also found that expressions of one FDFT1 (Pno029444.1) and one SQLE (Pno022472.1) were higher in roots compared with flowers and leaves, and one SQLE (Pno009162) gene showed the opposite pattern according to RNA-seq, while the real-time PCR results suggested an increasing expression year after year of all these three genes (Fig. 2b). Phylogenetic analysis showed that DDS of P. notoginseng and P. ginseng had a very high (98.96%) amino acid sequence identity, differing in only eight amino acids, and with fewer genes than other species (Supplementary Fig. 10). We further found there were three amino-acid residue insertions (L194, A195 and E196) in the DDS sequences of P. notoginseng and P. ginseng, which were located in the cyclase-N domain (Fig. 2c).

40 We used the SWISS-MODEL to conduct structural modeling of DDS protein using bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

the human oxidosqualene cyclase (OSC) in complex with Ro 48-8077141(SMTL: 1w6j.1) as the template. When the locations of the amino acids were mapped to the protein 3D structure with amino-acid identity of 40%, two residue insertions (L194 and A195) were located between two DNA helices, which are the presumed substrate-binding pocket and the catalytic residues (Fig. 2d). These three amino acids of DDS sequences of P. notoginseng probably played an important role in the Dammarenediol-II biosynthesis. CYP450s CYP450s and GTs have been demonstrated to be involved in hydroxylation or glycosylation of aglycones for triterpene saponin biosynthesis. We identified a total of 268 CYP450 genes in P. notoginseng using 243 CYP450 homolog genes of A. thaliana, which is consistent with the report that there are around 300 CYP450 genes in genomes of flowering plants42. To compare with other species, we also identified CYP450 genes of four other genomes—D. carota (325), S. tuberosum (465), S. lycopersicum (252) and C. annuum (256)—using the same parameters as for P. notoginseng (Supplementary Table 12 and Supplementary Fig. 11). We further classified the subfamilies of CYP450 genes for each species based on the category of CYP450 in A. thaliana and nine CYP716 genes were identified in P. notoginseng (Fig. 3a). CYP716A47 was reported to produce PPD43 and CYP716A53v2 was reported to be involved in the synthesis of PPT in P. g i n s e n g (Fig. 3b). Moreover, considering the role of different types of notoginsenoside in different parts/tissues of P. notoginseng— that is, PPT was relatively higher in roots while PPD was higher in aerial parts (leaves and flowers) (Fig. 3c)— we sequenced the transcriptomes of leaves, roots and flowers collected from different ages of P. notoginseng. We found Pno012347.1 (CYP716A47 homolog), Pno002960.1 (CYP716A53v2 homolog) and Pno011760.1 were always more highly expressed in roots than in leaves and flowers for all ages; while Pno021283.1 (CYP716A47 homolog) was the opposite (Fig. 3d). Thus, Pno012347.1 and Pno021283.1 were the candidate genes involved in PPD biosynthesis and Pno002960.1 was the candidate gene involved in PPT biosynthesis in P. notoginseng. The expression levels of three CYP716 genes (Pno012347.1, Pno002960.1 and bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Pno011760.1) validated using real-time PCR were consistent with RNA-seq. Meanwhile, we identified another 22 CYP450 genes (including CYP78, CYP71, CYP72 and CYP86), which had the same expression pattern as three CYP716A53v2-homolog genes (Fig. 3d), indicated they may be involved into PPT biosynthesis. UGTs Three UGTs (UGT71A27, UGT74AE2 and UGT94Q2) that participate in the biosynthesis of PPD-type ginsenosides and three UGTs (UGTPg1, UGTPg100 and UGTPg101) participating in the formation of PPT-type saponins have previously been identified in P. g i n s e n g . We identified a total of 160 UGT genes in the P. notoginseng genome based on 116 homologous UGT genes of A. thaliana44. We classified these 160 UGTs of P. notoginseng into 12 subfamilies containing 19 UGT71A homologs and 14 UTG74 homologs (Supplementary Fig. 12). UGTPg1, UGTPg100, UGTPg101, UGTPg102 and UGTPg103 of P. g i n s e n g were UGT71A27 homologs; however, UGTPg102 and UGTPg103 were found to have no detectable activity on PPT due to the lack of several key amino acids in their proteins (Fig. 4a and b). We identified 38 differentially expressed UGT genes between roots and aerial parts (flowers and leaves), including 23 expressed more highly in roots (HR) and 15 in aerial tissues (HA) (Fig. 4c and d). Interestingly, there were three UGT71A27 homolog (Pno020280.1, Pno027722.1 and Pno026280.1) genes in the HR, and HA included four UGT74 genes (Pno031515.1, Pno033394.1, Pno002810.1 and Pno000722.1) and one UGT71A27 homolog (Pno013844.1) gene (Fig. 4c and d). Furthermore, when comparing the UGT71A27 homolog genes of P. notoginseng with UTGPg1, UGTPg100, UGTPg101, UGTPg102 and UGTPg103, we checked the known key amino-acid residues determining the function of UGTs. We found that the key amino-acid residues UGTPg100-A142T, UGTPg100-L186S, UGTPg100-G338R and UGTPg1-H82C were conserved in both Pno026280.1 and Pno27722.1. However, UGTPg1-H144F is a specific mutation F in the Pno026280.1 (Supplementary Fig. 13). All these candidate genes probably play important roles in the biosynthesis of ginsenoside. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Identification of transcription factor binding sites We have predicted the transcription factor binding site (TFBS) in the upstream regions of ten genes encoding enzymes involved in the ginsenoside biosynthesis, including HMG-CoA synthase (HMGS), HMG-CoA reductase (HMGR), mevalonate kinase (MK), phosphomevalonate kinase (PMK), mevalonate diphosphate decarboxylase (MDD), isopentenylpyrophosphate isomerase (IPI), geranylgeranyl diphosphate synthase (GGR), farnesyl diphosphate synthase (FPS), squalene epoxidase (SE) and dammarenediol-II synthase (DS) (Supplementary Fig. 14). DNA-binding with one finger 1 (Dof1) and prolamin box binding factor (PBF) binding sites were detected in all these ten genes which used the same binding site motif (5'-AA[AG]G-3') (Supplementary Table 13). Therefore, the expression levels of Dof1 and PBF with these ten genes in different developmental stages of roots and leaves from P. notoginseng were compared (Supplementary Table 14). Most of genes’ expression levels were higher in the 3 roots than that in the 3 leaves, expecting PMK and Dof1 (Supplementary Fig. 15). Dof1 belong to the Dof family, which is plant-specific transcription factor 45. Yanagisawa’s studies suggested that Dof1 in maize is related with the light-regulated plant-specific gene expression 46,47. PBF encodes Dof zinc finger DNA binding proteins, which could enhance the DNA binding of the bZIP transcription factor Opaque-2 to O2 binding site elements 48. In three stages of roots, the trend of PBF expression level was consistent with that of DS, HMGR, SE, HMGS, MK, MDD, FPS and GGR (Supplementary Table 15). Whereas, in different developmental stage of leaves, there was almost no significant change in genes expression levels in these genes, excluding Dof1 and DS with a rising trend (Supplementary Table 16). We inferred that the PBF and Dof1 were probably associated with the transcriptional controls of these genes, involved in triterpene saponins biosynthesis pathway, in different parts, the root and the leaf of P. notoginseng, respectively.

Discussion

Compared with other phylogenetically-related plant species, there was notable gene bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

family contraction of R-genes in P. notoginseng. The encoded resistance proteins play important roles in plant defense by controlling the host plant’s ability to detect a pathogen attack and facilitate a counter attack against the pathogen5. Whether P. notoginseng has evolved other defenses and/or anti-stress mechanisms to compensate for the contraction of R-genes remains to be determined in data mining of the first draft genome of this genus. Given that the notable contraction of R-genes of P. notoginseng may provide clues to its poor stress and disease resistance, we also observed expansion and contraction of different subfamilies of the ABC transporter gene family in P. notoginseng. The substrates of ABC proteins include a wide range of compounds such as peptides, lipids, heavy metal chelates and steroids49. Interestingly, one of the contracted gene candidates (Pno003332.1) in P. notoginseng was reported to be involved in iron export35. A previous experimental study showed that optimal iron concentration could significantly determine the plant biomass and stabilize arsenic content in soil to reduce arsenic contamination50 where arsenic contamination of P. notoginseng is a

serious agricultural problem due to arsenic pollution in the environment50. Phenolics, alkaloids and terpenoids are three major classes of chemicals involved in plant defenses51. Another possible complementary mechanism to cope with reduced pathogen resistance, which possibly resulted from R-gene contraction, is through evolving synthesis of defensive secondary plant metabolites. Pepper and tomato, two species phylogenetically close to P. notoginseng, produce alkaloids such as capsaicinoids and tomatine, respectively, which have been found to function as deterrents against pathogens20,52. Azadiracht, a meliacane-type triterpenoid with very similar skeleton to dammarane from the neem tree (Azadirachta indica)53, has insecticidal properties. Ginsenosides are the oligosaccharide glycosides of a series of dammarane- or oleanane-type triterpenoid glycosides9. Palazón et al. reported that during in vitro culture of ginseng hair root lines, the addition of methyl jasmonate, a signaling molecule specifically expressed by plants in response to insect and pathogenic attacks, could enhance overall ginsenoside production and conversion of PPD-type (e.g. Rb1, Rb2, Rc and Rd) to PPT-type (e.g. Re, Rf and Rg1) ginsenoside54. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

In addition, natural ginsenosides have antimicrobial and antifungal action, as shown in numerous laboratory studies55,56. The exact natural roles of ginsenosides in plants, particularly controlling plant growth and defense against pathogenic infection, remain unclear and require further investigation. DDS is the critical determinant for the initial step of producing the dammarane-type sapogenin (aglycone of saponin) template. This can be further subjected to structural diversification by CYP450 and UGT enzymes. Along the triterpenoid saponin biosynthesis pathway, DDS of the OSC group is responsible for the cyclization of 2,3-oxidosqualene to dammarenediol-II57. A highly-conserved DDS homolog of P. ginseng was also identified in P. notoginseng. Upstream of DDS in the triterpenoid saponin biosynthesis pathway, three SQLE and two FDFT1 genes were identified and shown to exhibit significant differential expression in different parts in P. notoginseng. Han et al. identified that CYP716A47 produces PPD through the hydroxylation of C-12 in dammarenediol-II43, CYP716A53v2 hydroxylates the C-6 of PPD to produce PPT58, and CYP716A52v2 is involved in oleanane-type ginsenoside biosynthesis59 in P. g in s e n g . We were interested in determining how the transcriptional regulation of different CYP450 genes respectively correlated with synthesis of higher content of PPT- and PPD-type ginsenosides in roots and aerial parts (leaves and flowers). Combined the previous research and transcriptome analysis, we identified the key genes which produces PPD and PPT in the P. notoginseng. UGTs are GTs that use uridine diphosphate (UDP)-activated sugar molecules as donors. Several types of UGTs, which catalyze the synthesis of specific sapogenins to ginsenosides, were found in previous studies of P. g i n s e n g 60. For example, UGTPg1

61 was demonstrated to region-specifically glycosylate C20-OH of PPD and PPT . UGTPg100 specifically glycosylates PPT to produce bioactive ginsenoside Rh1, and UGTPg101 catalyzes PPT to produce F161. Up to now, very few of the UGTs characterized that are able to glycosylate triterpenoid saponins have been found in P. ginseng62-65 and P. notoginseng66,67. However, in the present study, an almost complete collection of 160 UGT genes, representing many new homologs and bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

putative UGT members, was disclosed in the P. notoginseng genome. In our study, we found the lack of correlation between qPCR and RNA-seq in the Pno026280, Pno002772 and Pno020280. In order to eliminate error because of improper normalization method, we used other two quantification methods. Using the raw RNA-seq reads counting and normalized by quantiles (limma) 68and DESeq69 showed the completely consistent tendency with FPKM results (Supplementary Table 17). The probable reasons are the nonspecific binding of qPCR primers to cDNA template and the bias in the RNA-seq sequencing due to sequencing library preparation, namely, it was due to nature of these two methods (RNA-seq vs. qPCR). In conclusion, using a combination of data from genome and transcriptomes, we identified contraction and expansion in numerous important genome events during the evolution of this plant. The contraction of disease-resistance genes families figured out the vulnerability to disease of P. notoginseng on genetic level and the expansion of several ABC transporter subfamilies indicated the potential of ABCs as an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, several key genes and all their family members involved in the synthesis pathway of ginsenosides were also identified including dammarenediol synthase, CYP716 and UGT71. These findings provide new insight into its poor resistance and the biosynthesis of high-valued pharmacological molecules. Finally, it is essential that a better chromosome level reference genome assembly using long reads sequencing platform and Hi-C technology, serving as an important platform for improving breeding and cultivation of this vulnerable plant species to increase the medicinal and economic value of Panax species.

Methods

K-mer analysis We used the total length of sequence reads divided by sequencing depth to calculate the genome size. To estimate the sequencing depth, the copy number of 17-mers present in sequence reads was counted and the distribution of sequencing depth of the bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

assembled genome was plotted. The peak value in the curve represents the overall sequencing depth, and can be used to estimate genome size (G):

G = Number17-mer/Depth17-mer. Genome assembly The first genome assembly version (v1.0) using SOAPdenovo70 (v2.04) based on the short length reads data sequencing by Hiseq2000, was highly fragmented. It contained more than 600Mb gaps with missing sequence because of presence of high proportion of repetitive DNA sequences in the genome. The contig N50 sizes of v1.0 was 4.2kb. For the short contig N50 size, therefore, we further generated ~13Gb SMRT sequence data (~6-fold of whole genome size) with average sub-read length of ~9kb. Considering the inadequate depth of our Pacbio third generation sequencing, we used the PBJelly71 to fill the gap and upgrade the genome assembly with long length reads. Concretely, we generated our protocol file containing full paths of the reference assembly above, the output directory and the input files. Then, we executed multiple steps (including setup, mapping, support, extraction, assembly, output) in a consecutive order. From the statistics of the result of PBJelly, we found that the gap number was lowered to 533Mb while sizes of contig N50 was increased to 10.7kb. Next, we used the libraries with the insert sizes of 500bp and 800bp as the gap-filling input data of the GapCloser (v1.0, http://soap.genomics.org.cn/). After running, the gap number was decreased to 493Mb and the length of contig N50 was increased to 16.4kb. Finally, the sizes of contig and scaffold N50 of the final version (v1.1) were 16.4kb and 70.6kb, respectively. In addition, the total length of the genome changed from 2.1Gb to 2.4Gb with 493Mb gap sequences. The detailed parameters of SOAPdenovo were “pregraph -s lib.cfg.1 -z 2800000000 -K 45 -R -d 1 -o SAN_63 -p 12; contig -g SAN_63 -R -p 24 ; map -s lib.cfg.1 -g SAN_63 -k 45 -p 24; scaff -g SAN_63 -F -p 24”. We also adopted ABySS72 to estimate the optimized K value for genome assembly from 35 to 75 using about 20-fold high quality sequencing reads. The result revealed the K=45 is the best K value (Supplementary Table 18), which is completely consistent with the best K value of SOAPdenovo. Transcript de novo assembly bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Leaves and roots of P. notoginseng were collected from 1-, 2- and 3-year-old P. notoginseng and flowers collected from 2- and 3-year-old P. notoginseng. All of these 8 samples were randomly collected from the fileld of commercial planting base in Yanshan county of Wenshan Zhuang and Miao minority autonomous prefecture in Yunnan province. After cleansing, the leaves, roots and flowers were collected separately, cut into small pieces, immediately frozen in liquid nitrogen, and stored at -80 °C until further processing. Subsequently, mRNA isolation, cDNA library construction and sequencing were performed orderly. Briefly, total RNA was extracted from each tissue using TRIzol reagent and digested with DNase I according to the manufacturer’s protocol. Next, Oligo magnetic beads were used to isolate mrna from the total rna. By mixing with fragmentation buffer, the mrna was broken into short fragments. The cDNA was synthesized using the mRNA fragments as templates. The short fragments were purified and resolved with EB buffer for end repair and single nucleotide A (adenine) addition, and then connected with adapters. Suitable fragments were selected for PCR amplification as templates. During the quality control steps, an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System were used for quantification and qualification of the sample library. Each cDNA library was sequenced in a single lane of the Illumina HiSeqTM 2000 system using paired end protocols. Before performing the assembly, firstly, raw sequencing data that generated by Illumina HiSeq2000, was subjected to quality control (QC) check including the analysis of base composition and quality. After QC, raw reads which contained the sequence of adapter (adapters are added to cDNAs during library construction and part of them may be sequenced), more than 10% unknown bases or 5% low quality bases were filtered into clean reads. Finally, Trinity73 was used to perform the assembly of clean reads (detail information could be retrieved in Trinity website http://trinityrnaseq.sourceforge.net/). Based on the Trinity original assembly result, TGICL74 and Phrap75 were used to acquire sequences that cannot be extended on either end, the sequences are final assembled transcripts. We mapped all transcripts that were de novo assembled from nine transcriptome short reads by Trinity into the genome assembly using Blat76, and then calculated their coverage and mapping rate. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

The sequencing depth distribution was examined by aligning all short reads against assembly using SOAP277 and then the depth of each base was calculated. Gene expression analysis All the QC and filter processes of raw sequencing transcriptome data were the same as that of de novo assembly. All the clean reads are mapped to reference gene sequences by using SOAP277 with no more than 5 mismatches allowed in the alignment. We also performed the QC of alignment results because mRNAs were firstly broken into short segments by chemical methods during the library construction and then sequenced. If the randomness was poor, read preference for specific gene region could influence the calculation of gene expression. The gene expression level of each gene was calculated by using RPKM method78 (reads per kilobase transcriptome per million mapped reads) based on the unique alignment results. In the previous test, comparative analysis between RPKM and FPKM showed that the correlation between RPKM and QPCR was better than the correlation between FPKM and QPCR. Referring to the previous study79, we have developed a stringent algorithm to identify differentially expressed genes between two samples. The probability of a gene expressed equally between two samples was calculated based on the Poisson distribution. P-values corresponding to differential gene expression test were generated using Benjamini, Yekutieli.2001 FDR method80. Moreover, because of some debates in the normalization method by gene size like RPKM and then two new quantification methods were conducted to validate the existing results. Concretely, based on the tophat mapping results, R package ‘Rsubread’ was used to do the raw reads counting then we used ‘limma’ with the normalize parameter "quantile". Additionally, DESeq also was conducted. Finally, aiming at the differential expressed genes mentioned in the article based on RNA-seq (RPKM), we found the completely same tendency. Identification of repetitive sequences Transposable elements (TEs) in the P. notoginseng genome were annotated with structure-based analyses and homology-based comparisons. In detail, RepeatModeler22 was used to de novo find these TEs based on features of structures. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

RepeatMasker and RepeatProteinMask were applied using RepBase81 for TE identification at the DNA and protein level, respectively. Overlapped TEs belonging to the same repeat class were checked and redundant sequences were removed. TRF software82 was applied to annotate tandem repeats in the genome. Gene prediction and annotation We predicted protein-encoding genes by homolog, de novo and RNA-seq evidence; and results of the former two methods were integrated by the GLEAN program83 then filtered by threshold level of 20% percent overlap with at least 3 homolog species support. Subsequently, the inner auto pipeline was used to integrated RNA-seq data within and then checked manually. Firstly, protein sequences from five closely related species—A. thaliana, S. lycopersicum, S. tuberosum, Capsicum annuum and Daucus carota—were applied for homolog prediction by respectively mapping them to the genome assembly using TblastN software with E-value -1e-5. Aiming at these target areas, we used Genewise84 to cluster and filter pseudogenes. Then Fgenesh85, AUGUSTUS86 and GlimmerHMM87 were used for de novo prediction with parameters trained on A. thaliana. We merged three de novo predictions into a unigene set. De novo gene models that were supported by one more de novo methods were retained. For overlapping gene models, the longest one was selected and finally, we got de novo-based gene models. Using de novo gene set (30,660-41,495) and five homolog-based results as gene models (24,358 to 36,116) integration was done using the GLEAN program. Finally, we got the GLEAN gene set (referred to as G-set, 31,678) with parameter “value 0.01, fixed 0”. Combined with the RNA transcriptome sequencing data, we used Tophat288 to map and Cufflinks89 to assemble reads into transcripts and finally integrated the transcripts into the gene set using inner pipeline. The process of inner pipeline was: (1) The assembly of transcriptome data. RNA-seq reads were mapped to P. notoginseng genome using tophat, and then cufflinks pipeline was used to conduct the assembly of transcripts. (2) Prediction of the ORF in transcripts (CDS length >=300bp, score >-15). (3) Comparison of transcripts and Glean results to count the overlap region (identity >=95%, total/len >=90%). (4) Integration of the existing results. Aiming at the results, we checked manually by bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

discarding genes without any functional annotation information, any homolog support or cufflinks results support. The final gene set (35,451) was obtained. Then, with the known protein databases (SwissProt90, Trembl, KEGG91, InterPro92 and GO93), we annotated these functional proteins in the gene set by aligning with template sequences in the databases using BLASTP (1e-5) and InterProScan. Identification of R-genes Most R-genes in plants encode NBS-LRR proteins. According to the conservative structural characteristics of domains, we used HMMER (V3, http://hmmer.janelia.org/software) to screen the domains in the Pfam NBS (NB-ARC) family. Then we compared all NBS-encoding genes with TIR HMM (PF01582) and LRR 1 HMM (PF00560) data sets using HMMER (V3). For the CC domains, we used the MARCOIL program94 with a threshold probability of 0.9 and double-checked using paircoil295 with a P-score cut-off of 0.025. Gene cluster analysis We used OrthoMCL96 to identify the gene family and so obtained the single-copy gene families and multi-gene families that were conserved among species. Then we constructed a phylogenetic tree based on the single-copy orthologous gene families using PhyML97. The different molecular clock (divergence rate) might be explained by the generation-time hypothesis. Here, we used the MCMCTREE program98 (a BRMC approach from the PAML package) to estimate the species divergence time. ‘Correlated molecular clock’ and ‘JC69’ model in the MCMCTREE program were used in our calculation. Analysis of key gene families The reference genes of interest in A. thaliana (such as ABC, CYP450 and UGT) were found in the TAIR10 functional descriptions file. Then, the target genes of P. notoginseng were identified and classified using BlastP with cut-off value of 1e-5 and constructing gene trees using PhyML (-d aa -b 100). The tree representation was constructed using MEGA software. Real-time PCR analysis Isolated RNA from different P. notoginseng tissues were reverse-transcribed to bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

single-strand cDNA using the Super Script™ III First-Strand Synthesis System (Invitrogen™, USA). Quantitative reactions were performed on the Real-Time PCR Detection System (VIIA 7TM Real-Time PCR System, Applied Biosystems, USA) using FastStart Universal Probe Master and Universal ProbeLibrary Probes (Roche, Switzerland). All primers used in this study are listed in Supplementary Table 19. ΔΔ The relative gene expression was calculated with the 2- CT method. For each sample, the mRNA levels of the target genes were normalized to that of 18S rRNA.

Availability of supporting data and materials

The genome assembly and sequencing data have been deposited into NCBI Sequence Read Archive (SRA) under project number PRJNA299863 and into GigaDB.

Declarations

Acknowledgements

This work was supported by the Science and Technology Development Fund (FDCT) of Macao SAR (Ref. No. 134/2014/A3), Research Committee of University of Macau (MYRG139(Y1-L4)-ICMS12-LMY and MYRG2015-00214-ICMS-QRCM), the Overseas and Hong Kong, Macau Young Scholars Collaborative Research Fund by the Natural National Science Foundation of China (Grant no. 81328025), Technology Innovation Program Support by Shenzhen Municipal Government (NO.JSGG20130918102805062) and Basic Research Program Support by Shenzhen Municipal Government (NO.JCYJ2014072916361729 and NO. JCYJ20150831201123287).

Additional files

Additional file 1: Supplementary information includes supporting figures and supporting tables. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

S. M.Y.L, X.X and X.L designed the project. G.F, B.R.Y, Y. F, H.Z, X.L, C.S, K.M, J.H, R.G, L.S, S.D, Q.X, W.L, M.L, A.K.W, D.Z and M.C analyzed the data. W.C, X.L, G.F, Y. F, J.C and B.R.Y wrote the manuscript. B.R.Y, M.G and C. H prepared the samples and conducted the experiments.

Authors’ Affiliations

(1) State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macao, China. (2) BGI-Shenzhen, Shenzhen 518083, China. (3) State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering, Southeast University, Nanjing 210096, China (4) Faculty of Science and Technology, Department of Civil and Environmental Engineering, University of Macau, Macao, China

Figure legends

Fig. 1. The gene families related to resistance of P. n o to gi ns e ng . (a) The evolution of R-genes with NBS domains among S. lycopersicum (Sl), D. carota (Dc) and P. notoginseng (Pn). The numbers in circles represent the gene numbers of a species and the numbers in clades represent the gene number of expanded gene families (E), contracted gene families (C) and extinct gene families (D). NBS represents a gene that only contains the NBS domain, and NBS_LRR, TIR_NBS_LRR and CC_NBS_LRR follow a similar representation. (b) Phylogenetic tree of R genes of P. notoginseng, D. carota and Oryza sativa (outgroup). Blue label is P. notoginseng, orange is D. carota and purple is Oryza sativa. (c) The gene trees of ABC Gpdr subfamily of P. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

notoginseng and other five plants. The red represents genes of P. notoginseng, the green is pepper, the orange is potato, dark blue is tomato, the cyan is carrot and the blue is Arabidopsis. .

Fig. 2. The preliminary steps of ginsenoside biosynthesis and several key gene families. (a) The dammarenediol-II synthesis pathway from glycolysis involves terpenoid backbone biosynthesis and finally catalysis by DDS. (b) Comparison of gene expressions of three key enzyme gene families between the roots and aerial parts (flowers and leaves) and the qPCR results. R1 represents one-year-old roots of P. notoginseng, R2: two-year-old roots, R3: three-year-old roots from plant B, L1: one-year-old leaves, L2, two-year-old leaves, L3: three-year-old leaves, F2: two-year-old flowers, F3: three-year-old flowers. (c) Protein sequence multiple comparisons of DDS among P. notoginseng, P. g i n s e n g and other representative plants. Stars represent amino acids conserved in all protein sequences, red circles represent amino acids that are the same in P. notoginseng and P. ginseng, but missing in all other plants. (d) The protein 3D structure of DDS of P. notoginseng constructed by SWISS-MODEL. The arrows indicate two amino acids specific to P. notoginseng and P. ginseng.

Fig. 3. The CYP450 gene families of P. notoginseng. (a) Phylogenetic analysis of CYP716 subfamily among P. notoginseng, P. g i n s e n g and other plants. Red circles represent P. notoginseng, purple diamonds represent P. ginseng, green circles represent S. tuberosum, yellow triangles represent D. carota, purple squares represent S. lycopersicum and green diamonds represent A. thaliana. The lengths of clades in the gene tree show the protein similarity between two clades. (b) Synthesis pathway of protopanaxadiol (PPD) and protopanaxatriol (PPT) of P. notoginseng. (c)The relative amount of PPD was more than that of PPT in the aerial parts of P. notoginseng, but the opposite in roots. (d) The differentially expressed CYP450 genes of P. notoginseng between the aerial parts and roots and the qPCR results. The pink bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

and green genes in the heatmap represent CYP716 subfamily genes that were involved in the synthesis pathway in P. ginseng.

Fig. 4. The UGT gene families of P. n o to g i n s e n g . (a) The synthesis pathway of different PPT-type ginsenosides that were catalyzed by different UGT genes. (b) The synthesis pathway of different PPD-type ginsenosides that were catalyzed by different UGT genes. (c) Highly expressed UGT genes of P. notoginseng in roots than in aerial parts and the qPCR results. (d) Highly expressed UGT genes of P. notoginseng in aerial parts than in roots and the qRCR results. The purple, green and blue genes in the heatmap represent gene families that were involved in the synthesis pathway in P. ginseng. bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

References 1. Briskin, D.P. Medicinal plants and phytomedicines. Linking plant biochemistry and physiology to human health. Plant Physiol 124, 507-14 (2000). 2. Zhang, D. et al. The Medicinal Herb Panax notoginseng Genome Provides Insights into Ginsenoside Biosynthesis and Genome Evolution. Mol Plant 10, 903-907 (2017). 3. Chen, W. et al. Whole-Genome Sequencing and Analysis of the Chinese Herbal Plant Panax notoginseng. Mol Plant 10, 899-902 (2017). 4. Ou, X. et al. [Status and prospective on nutritional physiology and fertilization of Panax notoginseng]. Zhongguo Zhong Yao Za Zhi 36, 2620-4 (2011). 5. Gururani, M.A. et al. Plant disease resistance genes: Current status and future directions. Physiological and Molecular Plant Pathology 78, 51-65 (2012). 6. Knepper, C. & Day, B. From perception to activation: the molecular-genetic and biochemical landscape of disease resistance signaling in plants. Arabidopsis Book 8, e012 (2010). 7. Andolfo, G. et al. Genetic variability and evolutionary diversification of membrane ABC transporters in plants. BMC Plant Biol 15, 51 (2015). 8. Leung, K.W. & Wong, A.S. Pharmacology of ginsenosides: a literature review. Chin Med 5, 20 (2010). 9. Yang, W.Z., Hu, Y., Wu, W.Y., Ye, M. & Guo, D.A. Saponins in the genus Panax L. (Araliaceae): a systematic review of their chemical diversity. Phytochemistry 106, 7-24 (2014). 10. Ng, T.B. Pharmacological activity of sanchi ginseng (Panax notoginseng). J Pharm Pharmacol 58, 1007-19 (2006). 11. Son, H.Y., Han, H.S., Jung, H.W. & Park, Y.K. Panax notoginseng Attenuates the Infarct Volume in Rat Ischemic Brain and the Inflammatory Response of Microglia. J Pharmacol Sci 109, 368-79 (2009). 12. Li, H. et al. Total saponins of Panax notoginseng modulate the expression of caspases and attenuate apoptosis in rats following focal cerebral ischemia-reperfusion. J Ethnopharmacol 121, 412-8 (2009). 13. Yang, C.Y. et al. Anti-diabetic effects of Panax notoginseng saponins and its major anti-hyperglycemic components. J Ethnopharmacol 130, 231-6 (2010). 14. Xiang, H. et al. The antidepressant effects and mechanism of action of total saponins from the caudexes and leaves of Panax notoginseng in animal models of depression. Phytomedicine 18, 731-8 (2011). 15. Yang, X., Xiong, X., Wang, H., Yang, G. & Wang, J. Xuesaitong soft capsule (chinese patent medicine) for the treatment of unstable angina pectoris: a meta-analysis and systematic review. Evid Based Complement Alternat Med 2013, 948319 (2013). 16. Wang, L. et al. A network study of chinese medicine xuesaitong injection to elucidate a complex mode of action with multicompound, multitarget, and multipathway. Evid Based Complement Alternat Med 2013, 652373 (2013). 17. Luo, J., Song, W., Yang, G., Xu, H. & Chen, K. Compound Danshen (Salvia miltiorrhiza) dripping pill for coronary heart disease: an overview of systematic reviews. Am J Chin Med 43, 25-43 (2015). 18. Baeg, I.H. & So, S.H. The world ginseng market and the ginseng (Korea). J Ginseng Res 37, 1-7 (2013). bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

19. Denoeud, F. et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science 345, 1181-4 (2014). 20. Kim, S. et al. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat Genet 46, 270-8 (2014). 21. Yue, P.Y. et al. Pharmacogenomics and the Yin/Yang actions of ginseng: anti-tumor, angiomodulating and steroid-like activities of ginsenosides. Chin Med 2, 6 (2007). 22. Sengupta, S. et al. Modulating angiogenesis: the yin and the yang in ginseng. Circulation 110, 1219-25 (2004). 23. Yang, W.Z. et al. Rapid chemical profiling of saponins in the flower buds of Panax notoginseng by integrating MCI gel column chromatography and liquid chromatography/mass spectrometry analysis. Food Chem 139, 762-9 (2013). 24. Wan, J.B. et al. Chemical investigation of saponins in different parts of Panax notoginseng by pressurized liquid extraction and liquid chromatography-electrospray ionization-tandem mass spectrometry. Molecules 17, 5836-53 (2012). 25. Mapleson, D., Garcia Accinelli, G., Kettleborough, G., Wright, J. & Clavijo, B.J. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics (2016). 26. Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764-70 (2011). 27. Iorizzo, M. et al. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution. Nat Genet 48, 657-66 (2016). 28. Simao, F.A., Waterhouse, R.M., Ioannidis, P., Kriventseva, E.V. & Zdobnov, E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210-2 (2015). 29. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol 33, 531-7 (2015). 30. Yan, L. et al. The Genome of Dendrobium officinale Illuminates the Biology of the Important Traditional Chinese Orchid Herb. Mol Plant 8, 922-34 (2015). 31. Huang, S. et al. The genome of the cucumber, Cucumis sativus L. Nat Genet 41, 1275-81 (2009). 32. Rea, P.A. Plant ATP-binding cassette transporters. Annu Rev Plant Biol 58, 347-75 (2007). 33. Martinoia, E. et al. Multifunctionality of plant ABC transporters--more than just detoxifiers. Planta 214, 345-55 (2002). 34. Geisler, M. & Murphy, A.S. The ABC of auxin transport: the role of p-glycoproteins in plant development. FEBS Lett 580, 1094-102 (2006). 35. Chen, S., Sanchez-Fernandez, R., Lyver, E.R., Dancis, A. & Rea, P.A. Functional characterization of AtATM1, AtATM2, and AtATM3, a subfamily of Arabidopsis half-molecule ATP-binding cassette transporters implicated in iron homeostasis. J Biol Chem 282, 21561-71 (2007). 36. Kang, J. et al. PDR-type ABC transporter mediates cellular uptake of the phytohormone abscisic acid. Proc Natl Acad Sci U S A 107, 2355-60 (2010). 37. Campbell, E.J. et al. Pathogen-responsive expression of a putative ATP-binding cassette transporter gene conferring resistance to the diterpenoid sclareol is regulated by multiple defense signaling pathways in Arabidopsis. Plant Physiol 133, 1272-84 (2003). 38. Shibuya, M. et al. Identification of beta-amyrin and sophoradiol 24-hydroxylase by expressed bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

sequence tag mining and functional expression assay. FEBS J 273, 948-59 (2006). 39. Choi, D.W. et al. Analysis of transcripts in methyl jasmonate-treated ginseng hairy roots to identify genes involved in the biosynthesis of ginsenosides and other secondary metabolites. Plant Cell Rep 23, 557-66 (2005). 40. Schwede, T., Kopp, J., Guex, N. & Peitsch, M.C. SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res 31, 3381-5 (2003). 41. Thoma, R. et al. Insight into steroid scaffold formation from the structure of human oxidosqualene cyclase. Nature 432, 118-22 (2004). 42. Nelson, D. & Werck-Reichhart, D. A P450-centric view of plant evolution. Plant J 66, 194-211 (2011). 43. Han, J.Y., Hwang, H.S., Choi, S.W., Kim, H.J. & Choi, Y.E. Cytochrome P450 CYP716A53v2 catalyzes the formation of protopanaxatriol from protopanaxadiol during ginsenoside biosynthesis in . Plant Cell Physiol 53, 1535-45 (2012). 44. Yonekura-Sakakibara, K. & Hanada, K. An evolutionary view of functional diversity in family 1 glycosyltransferases. Plant J 66, 182-93 (2011). 45. Yanagisawa, S. A novel DNA-binding domain that may form a single zinc finger motif. Nucleic Acids Res 23, 3403-10 (1995). 46. Yanagisawa, S. Dof1 and Dof2 transcription factors are associated with expression of multiple genes involved in carbon metabolism in maize. Plant J 21, 281-8 (2000). 47. Yanagisawa, S. & Sheen, J. Involvement of maize Dof zinc finger proteins in tissue-specific and light-regulated gene expression. Plant Cell 10, 75-89 (1998). 48. Vicente-Carbajosa, J., Moose, S.P., Parsons, R.L. & Schmidt, R.J. A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proc Natl Acad Sci U S A 94, 7685-90 (1997). 49. Theodoulou, F.L. Plant ABC transporters. Biochimica et Biophysica Acta (BBA) - Biomembranes 1465, 79-103 (2000). 50. Yan, X.L., Lin, L.Y., Liao, X.Y., Zhang, W.B. & Wen, Y. Arsenic stabilization by zero-valent iron, bauxite residue, and zeolite at a contaminated site planting Panax notoginseng. Chemosphere 93, 661-7 (2013). 51. Freeman, B.C. & Beattie, G.A. An Overview of Plant Defenses against Pathogens and Herbivores, (The Plant Health Instructor., 2008). 52. Friedman, M. Tomato glycoalkaloids: role in the plant and in the diet. J Agric Food Chem 50, 5751-80 (2002). 53. Aerts, R. & Mordue, A.J. Feeding Deterrence and Toxicity of Neem Triterpenoids. Journal of Chemical Ecology 23, 2117-2132 (1997). 54. Palazón, J. et al. Elicitation of different Panax ginseng transformed root phenotypes for an improved ginsenoside production. Plant Physiology and Biochemistry 41, 1019-1025 (2003). 55. Bernards MA, Y.L., Nicol RW. The allelopathic potential of ginsenosides. , ( Springer, Netherlands, 2006). 56. Nicol RW, Traquair JA & MA., B. Ginsenosides as host resistance factors in American ginseng (Panax quinquefolius). Canadian Journal of Botany 80, 557-562. (2002). 57. Kushiro, T., Ohno, Y., Shibuya, M. & Ebizuka, Y. In vitro conversion of 2,3-oxidosqualene into dammarenediol by Panax ginseng microsomes. Biol Pharm Bull 20, 292-4 (1997). 58. Han, J.Y., Kim, H.J., Kwon, Y.S. & Choi, Y.E. The Cyt P450 enzyme CYP716A47 catalyzes the bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

formation of protopanaxadiol from dammarenediol-II during ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol 52, 2062-73 (2011). 59. Han, J.Y., Kim, M.J., Ban, Y.W., Hwang, H.S. & Choi, Y.E. The involvement of beta-amyrin 28-oxidase (CYP716A52v2) in oleanane-type ginsenoside biosynthesis in Panax ginseng. Plant Cell Physiol 54, 2034-46 (2013). 60. Jung, S.C. et al. Two ginseng UDP-glycosyltransferases synthesize ginsenoside Rg3 and Rd. Plant Cell Physiol 55, 2177-88 (2014). 61. Wei, W. et al. Characterization of Panax ginseng UDP-Glycosyltransferases Catalyzing Protopanaxatriol and Biosyntheses of Bioactive Ginsenosides F1 and Rh1 in Metabolically Engineered Yeasts. Mol Plant (2015). 62. Yan, X. et al. Production of bioactive ginsenoside compound K in metabolically engineered yeast. Cell Res 24, 770-3 (2014). 63. Sun, C. et al. De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC Genomics 11, 262 (2010). 64. Chen, S. et al. 454 EST analysis detects genes putatively involved in ginsenoside biosynthesis in Panax ginseng. Plant Cell Rep 30, 1593-601 (2011). 65. Li, C. et al. Transcriptome analysis reveals ginsenosides biosynthetic genes, microRNAs and simple sequence repeats in Panax ginseng C. A. Meyer. BMC Genomics 14, 245 (2013). 66. Liu, M.H. et al. Transcriptome analysis of leaves, roots and flowers of Panax notoginseng identifies genes involved in ginsenoside and alkaloid biosynthesis. BMC Genomics 16, 265 (2015). 67. Luo, H. et al. Analysis of the transcriptome of Panax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markers. BMC Genomics 12 Suppl 5, S5 (2011). 68. Ritchie, M.E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43, e47 (2015). 69. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol 11, R106 (2010). 70. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1, 18 (2012). 71. English, A.C. et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PLoS One 7, e47768 (2012). 72. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res 19, 1117-23 (2009). 73. Grabherr, M.G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29, 644-52 (2011). 74. Pertea, G. et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651-2 (2003). 75. Vogel, J.P. et al. EST sequencing and phylogenetic analysis of the model grass Brachypodium distachyon. Theor Appl Genet 113, 186-95 (2006). 76. Kent, W.J. BLAT--the BLAST-like alignment tool. Genome Res 12, 656-64 (2002). 77. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966-7 (2009). bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

78. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5, 621-8 (2008). 79. Audic, S. & Claverie, J.M. The significance of digital gene expression profiles. Genome Res 7, 986-95 (1997). 80. Yoav Benjamini, D.Y. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29, 1165-1188 (2001). 81. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110, 462-7 (2005). 82. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573-80 (1999). 83. Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol 8, R13 (2007). 84. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res 14, 988-95 (2004). 85. Salamov, A.A. & Solovyev, V.V. Ab initio gene finding in Drosophila genomic DNA. Genome Res 10, 516-22 (2000). 86. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435-9 (2006). 87. Majoros, W.H., Pertea, M. & Salzberg, S.L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878-9 (2004). 88. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105-11 (2009). 89. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511-5 (2010). 90. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28, 45-8 (2000). 91. Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27-30 (2000). 92. Zdobnov, E.M. & Apweiler, R. InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-8 (2001). 93. Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-9 (2000). 94. Delorenzi, M. & Speed, T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics 18, 617-25 (2002). 95. McDonnell, A.V., Jiang, T., Keating, A.E. & Berger, B. Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics 22, 356-8 (2006). 96. Li, L., Stoeckert, C.J., Jr. & Roos, D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13, 2178-89 (2003). 97. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59, 307-21 (2010). 98. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24, 1586-91 (2007).

bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

LOC_Os11g45620.1_Osativa_NBS CC_NBS_LRR TIR_NBS_LRR a E3/C5/D0 E5/C0/D0 b 25 18 Sl Sl LOC_Os12g10490.1_Osativa_NBS

L

LOC_Os02g39884.1_Osativa_NBS

Pno008534.1_Dcarota_NBS

Pno029364.1_Dcarota_NBS_LRR

LOC_Os08g30634.1_Osativa_NBS

LOC_Os01g52270.1_Osativa_NBS LOC_Os01g25340.1_Osativa_NBS SBN_avitasO_1.04004g10sO_COL 13 01sO_CO CC_NBS_LRR 23 LOC_Os11g42590.1_Osativa_NBS_LRR LOC_Os07g02530.1_Osativa_NBS

LOC_Os05g45700.1_Osativa_CC_NBS 1.06140g01s O _ C O L

LOC_Os01g52304.1_Osativa_NBS

_

LOC_Os04g30530.1_Osativa_NBS

SBN_avitasO_1.05251g10sO_COL SBN_avitasO_1.03004g10sO_COL LOC_Os11g40160.1_Osativa_NBS

LOC_Os07g02640.1_Osativa_NBS E0/C9/D3 LOC_Os05g45670.1_Osativa_CC_NBS

LOC_Os05g45690.1_Osativa_NBS SBN_avitasO_1.07482g11sO_COL LOC_Os01g52330.1_Osativa_NBS E23/C0/D0 LOC_Os10g04550.1_Osativa_NBS

LOC_Os01g52290.1_Osativa_CC_NBS vitasO

4 s O _1.08440g LOC_Os10g04540.1_Osativa_NBS 42 a LOC_Os10g04380.1_Osativa_NBS

t MRCA Dc SBN_avitasO_1.02004g10sO_COL SBN_CC_avitasO_1.08501g20sO_COL MRCA LOC_Os10g04560.1_Osativa_NBS Dc LOC_Os01g52380.1_Osativa_CC_NBS LOC_Os07g02624.1_Osativa_NBS

LOC_Os01g52320.1_Osativa_NBSLOC_Os01g52340.1_Osativa_CC_NBS SBN_a

SBN_avitasO_1.01260g11sO_COL LOC_Os08g15880.1_Osativa_NBS_LRR LOC_Os02g41760.1_Osativa_NBS

LOC_Os07g02630.2_Osativa_NBSLOC_Os07g02560.1_Osativa_CC_NBS LOC_Os07g02570.1_Osativa_NBSLOC_Os07g02610.1_Osativa_NBS LOC_Os05g16180.1_Osativa_NBS LOC_Os01g52280.1_Osativa_NBS SBN_avi

SBN_avitasO_2.02260g11sO_COL 16 LOC_Os07g02660.1_Osativa_NBS LOC_Os06g17910.1_Osativa_NBS_LRR Pno022320.1_Dcarota_NBS LOC_Os11g45090.1_Osativa_CC_NBS_LRRLOC_Os11g45060.1_Osativa_CC_NBS_LRR LOC_Os07g02620.1_Osativa_CC_NBS 19 LOC_Os09g14410.1_Osativa_CC_NBS_LRR

O_COL E4/C1/D0 Pno037984.1_SAN.3W5_NBS LOC_Os10g33440.2_Osativa_NBS_LRR LOC_Os11g44990.1_Osativa_CC_NBS_LRRLOC_Os09g14450.1_Osativa_NBS_LRR E0/C3/D1 LOC_Os11g45130.1_Osativa_NBS_LRR TIR_NBS_LRR Pno010928.1_Dcarota_NBS S B N _avitas O _1.05440g01s O _ C O L LOC_Os11g44960.1_Osativa_CC_NBS_LRRLOC_Os11g45190.1_Osativa_CC_NBS_LRR LOC_Os11g45050.1_Osativa_CC_NBS_LRR LOC_Os11g29110.1_Osativa_NBS_LRR LOC_Os05g23250.1_Osativa_NBS_LRR LOC_Os11g40780.1_Osativa_CC_NBS_LRRLOC_Os11g45180.1_Osativa_CC_NBS_LRR S B N _avitas O _1.09440g01s O _ C O L LOC_Os07g29820.1_Osativa_CC_NBS_LRR Pno035093.1_SAN.3W5_TIR_NBS_LRR LOC_Os08g24380.1_Osativa_NBS_LRR

Pno001279.1_Dcarota_NBS_LRR LOC_Os10g25487.1_Osativa_NBS_LRRLOC_Os07g29810.1_Osativa_NBS_LRR S B N _avitas O _1.07440g01s O _ C O L 07540g01s

LOC_Os01g39990.1_Osativa_CC_NBS LOC_Os05g16200.1_Osativa_CC_NBS_LRRLOC_Os11g44970.1_Osativa_CC_NBS_LRR 3260g11s O _ C O L LOC_Os09g11020.1_Osativa_CC_NBS_LRR LOC_Os11g28950.1_Osativa_CC_NBS_LRR 4 22 Pn LOC_Os02g25900.2_Osativa_CC_NBS_LRR Pno008399.1_SAN.3W5_TIR_NBS_LRRPno029849.1_Dcarota_TIR_NBS_LRR LOC_Os11g47780.1_Osativa_NBS_LRR Pno015522.1_Dcarota_NBS 9 LOC_Os10g04510.1_Osativa_NBS Pno009620.1_SAN.3W5_NBS LOC_Os09g16000.1_Osativa_NBS_LRR Pn LOC_Os05g12770.1_Osativa_NBS_LRRLOC_Os09g13820.1_Osativa_NBS_LRRLOC_Os11g13940.1_Osativa_NBS_LRR Pno006400.1_SAN.3W5_NBS Pno021436.1_Dcarota_CC_NBS LOC_Os01g05620.1_Osativa_NBS_LRR LOC_Os10g04090.1_Osativa_NBS_LRR LOC_Os11g45160.1_Osativa_CC_NBS Pno007709.4_SAN.3W5_NBS E6/C0/D0 LOC_Os07g17230.1_Osativa_NBS_LRR Pno005478.1_SAN.3W5_NBSPno009962.1_SAN.3W5_TIR_NBS_LRR LOC_Os09g15850.1_Osativa_NBS_LRRLOC_Os12g09739.1_Osativa_NBS_LRRLOC_Os01g05600.1_Osativa_NBS_LRR Pno017996.1_SAN.3W5_NBS LOC_Os02g10550.1_Osativa_NBS Pno037009.1_SAN.3W5_NBS_LRR Pno017997.1_SAN.3W5_NBS_LRR Pno025159.1_Dcarota_NBS_LRR LOC_Os05g23990.1_Osativa_CC_NBS_LRRLOC_Os03g48370.1_Osativa_NBS_LRR S B N _ C C _avitasPno021433.1_Dcarota_CC_NBS O _1.09260g11s O _ C O L LOC_Os12g03080.1_Osativa_NBS_LRR LOC_Os06g43670.1_Osativa_CC_NBS_LRR Pno019949.1_SAN.3W5_NBS LOC_Os03g48320.1_Osativa_NBS_LRR Pno027464.1_SAN.3W5_NBS E0/C8/D2 Pno012172.2_SAN.3W5_NBS Pno012118.1_SAN.3W5_NBS LOC_Os01g72390.1_Osativa_NBS_LRRLOC_Os01g71114.1_Osativa_CC_NBS Pno015729.1_Dcarota_CC_NBS LOC_Os11g10610.1_Osativa_NBS_LRRLOC_Os11g10550.1_Osativa_NBS_LRR Pno038507.1_SAN.3W5_NBS Pno000355.1_SAN.3W5_NBS_LRR Pno000327.1_SAN.3W5_NBS_LRRPno005216.1_Dcarota_TIR_NBS_LRR avitasO_1. Pno018662.1_SAN.3W5_TIR_NBS LOC_Os03g40194.1_Osativa_NBS_LRR Pno021434.1_Dcarota_NBS_LRR

LOC_Os08g16460.1_Osativa_NBS_LRR S B N _avitas O _1.00360g11sPno038512.2_SAN.3W5_TIR_NBS_LRR O _ C O L LOC_Os02g20210.1_Osativa_NBS_LRR Pno027725.2_SAN.3W5_TIR_NBSPno027959.2_SAN.3W5_TIR_CC_NBS_LRR

S B N _ avitas O _ 1. LOC_Os12g10410.1_Osativa_NBS LOC_Os10g04060.1_Osativa_NBS_LRRLOC_Os01g25740.1_Osativa_NBS_LRRLOC_Os01g25630.1_Osativa_NBS LOC_Os11g36410.1_Osativa_NBS_LRR Pno008885.1_Dcarota_CC_NBS LOC_Os03g63200.1_Osativa_NBS_LRR LOC_Os01g25810.1_Osativa_NBS_LRRLOC_Os10g04120.1_Osativa_NBS_LRR Pno004262.1_SAN.3W5_TIR_NBS SBN_ Pno013743.1_SAN.3W5_TIR_NBS Pno029757.1_Dcarota_CC_NBSPno011333.1_Dcarota_CC_NBS LOC_Os03g63150.1_Osativa_NBS_LRR Pno011772.1_SAN.3W5_TIR_NBSPno027541.2_SAN.3W5_TIR_NBS_LRR NBS_LRR LOC_Os11g10770.1_Osativa_NBS Pno005106.3_SAN.3W5_TIR_NBS_LRRPno012120.2_SAN.3W5_TIR_NBS_LRRPno017200.1_SAN.3W5_NBS_LRR Pno007629.4_SAN.3W5_TIR_NBS_LRR LOC_Os03g63240.1_Osativa_NBS_LRR Pno012383.1_SAN.3W5_NBS_LRR LOC_Os12g30760.1_Osativa_CC_NBS_LRRPno008886.1_Dcarota_CC_NBS E11/C9/D0 LOC_Os01g15580.1_Osativa_NBS_LRR Pno020939.1_Dcarota_NBS_LRR LOC_Os01g25760.1_Osativa_NBS_LRR Pno031076.1_Dcarota_TIR_NBS_LRR LOC_Os12g39620.2_Osativa_NBS_LRR Pno037129.1_SAN.3W5_TIR_NBS_LRR Pno001710.1_Dcarota_CC_NBS_LRRLOC_Os03g10910.1_Osativa_CC_NBS_LRRPno004567.3_Dcarota_CC_NBS_LRR LOC_Os08g43000.1_Osativa_CC_NBS_LRR Pno012994.1_SAN.3W5_NBS_LRR Pno005190.1_Dcarota_TIR_NBS_LRR Pno033541.2_SAN.3W5_TIR_NBS_LRRLOC_Os09g14010.1_Osativa_CC_NBS_LRRPno008799.1_Dcarota_CC_NBS LOC_Os08g43050.1_Osativa_CC_NBS_LRR LOC_Os12g25170.1_Osativa_NBS Pno027619.1_SAN.3W5_TIR_NBS LOC_Os04g39460.1_Osativa_NBS_LRR E33/C6/D0 LOC_Os01g25720.1_Osativa_NBS Pno018199.2_SAN.3W5_NBS_LRR Pno017653.1_SAN.3W5_TIR_NBS NBS Pno008834.1_Dcarota_CC_NBS LOC_Os08g42930.1_Osativa_NBS LOC_Os12g30720.1_Osativa_CC_NBS_LRR Pno008835.1_Dcarota_CC_NBS_LRR Pno025990.1_SAN.3W5_NBS_LRR LOC_Os05g31550.1_Osativa_NBS_LRRLOC_Os05g31570.1_Osativa_NBS_LRRLOC_Os05g31530.1_Osativa_NBS_LRR Pno010933.2_SAN.3W5_NBS_LRRLOC_Os03g14900.1_Osativa_NBS_LRR Pno005814.1_Dcarota_NBS Pno025161.1_Dcarota_NBS Pno015673.2_SAN.3W5_TIR_NBS_LRR LOC_Os09g14100.1_Osativa_NBS_LRRLOC_Os09g10054.1_Osativa_CC_NBS_LRR Pno024486.1_Dcarota_CC_NBS_LRRPno023914.3_Dcarota_NBSPno008833.1_Dcarota_CC_NBS_LRRPno008804.1_Dcarota_CC_NBS_LRR Pno012381.6_SAN.3W5_TIR_NBS Pno008860.1_SAN.3W5_NBS_LRRPno008770.1_Dcarota_NBSPno008798.1_Dcarota_NBS_LRRPno008889.1_Dcarota_CC_NBS_LRRPno008887.1_Dcarota_CC_NBS 75 Pno019359.1_Dcarota_NBS Pno037047.1_SAN.3W5_NBS_LRRPno007252.3_SAN.3W5_NBS_LRRPno022348.2_SAN.3W5_NBS_LRRLOC_Os04g43440.1_Osativa_NBS_LRRLOC_Os09g14060.1_Osativa_CC_NBS_LRRPno023355.2_Dcarota_CC_NBS Sl Pno011255.1_Dcarota_NBS Pno025137.1_Dcarota_NBS_LRRPno008803.1_Dcarota_CC_NBS_LRRPno008837.1_Dcarota_CC_NBS_LRRPno025158.1_Dcarota_NBSPno025176.1_Dcarota_NBS_LRRLOC_Os08g43010.1_Osativa_NBS_LRR LOC_Os03g10900.1_Osativa_NBS_LRR 158 Pno008832.1_Dcarota_CC_NBSPno025160.1_Dcarota_NBS LOC_Os11g29090.1_Osativa_CC_NBS_LRRPno008924.1_Dcarota_CC_NBSPno008884.1_Dcarota_CC_NBS_LRR Sl Pno031532.1_Dcarota_NBS_LRR LOC_Os01g72680.1_Osativa_CC_NBS_LRRPno004568.1_Dcarota_CC_NBSPno001278.1_Dcarota_NBSPno011927.1_Dcarota_NBS Pno008769.1_Dcarota_CC_NBSPno001281.1_Dcarota_NBSPno012228.1_Dcarota_NBS_LRR Pno025162.1_Dcarota_CC_NBS_LRRPno025134.1_Dcarota_NBS_LRRPno025076.1_Dcarota_NBS_LRRPno031533.2_Dcarota_CC_NBS_LRR Pno035502.1_SAN.3W5_TIR_NBS_LRR Pno038150.1_SAN.3W5_CC_NBS_LRRPno006010.1_SAN.3W5_CC_NBS_LRRPno011331.1_Dcarota_NBS Pno025119.1_Dcarota_NBS_LRR Pno022290.1_SAN.3W5_TIR_NBS_LRR Pno008888.1_Dcarota_CC_NBS_LRRPno011335.2_Dcarota_NBSPno010202.1_Dcarota_NBS Pno025136.1_Dcarota_NBS_LRR Pno033247.1_SAN.3W5_NBS_LRRLOC_Os01g57870.2_Osativa_CC_NBS_LRRPno008879.1_Dcarota_CC_NBS Pno005785.1_Dcarota_NBS_LRRPno025154.1_Dcarota_NBS Pno006206.1_Dcarota_CC_NBS_LRRPno034563.1_SAN.3W5_CC_NBS_LRRLOC_Os06g37820.1_Osativa_NBS_LRR Pno031534.1_Dcarota_NBS Pno008772.1_Dcarota_CC_NBSPno001277.1_Dcarota_NBS_LRRPno014162.1_SAN.3W5_NBS Pno031525.1_Dcarota_NBS Pno006946.4_SAN.3W5_NBS Pno013526.1_Dcarota_NBS_LRR Pno025118.1_Dcarota_CC_NBS Pno031316.1_SAN.3W5_NBSLOC_Os04g08390.1_Osativa_NBS_LRRLOC_Os04g12460.1_Osativa_NBS_LRRLOC_Os04g26350.1_Osativa_NBS_LRR Pno025577.1_SAN.3W5_NBS_LRR Pno017175.1_Dcarota_CC_NBS_LRRPno011422.1_Dcarota_NBS_LRR LOC_Os11g36760.1_Osativa_NBSPno013378.1_SAN.3W5_NBS Pno015415.1_Dcarota_NBS_LRR Pno009112.1_Dcarota_CC_NBS_LRRPno013527.1_Dcarota_NBS_LRR 73 LOC_Os12g29280.1_Osativa_NBS_LRR LOC_Os11g10120.1_Osativa_NBS 130 LOC_Os12g29290.1_Osativa_NBS_LRR Pno009401.1_SAN.3W5_NBS_LRR LOC_Os11g30050.1_Osativa_NBS_LRRLOC_Os10g07400.1_Osativa_NBS_LRRLOC_Os06g47800.1_Osativa_NBSPno032328.1_SAN.3W5_NBS_LRR Pno037961.1_SAN.3W5_NBS Pno025157.1_Dcarota_NBS Pno019575.1_SAN.3W5_NBS_LRR LOC_Os11g30060.1_Osativa_NBS_LRRLOC_Os08g12740.2_Osativa_NBS_LRR Pno001776.1_SAN.3W5_NBS LOC_Os09g30380.1_Osativa_NBSLOC_Os08g38970.1_Osativa_NBS LOC_Os11g29920.1_Osativa_CC_NBSLOC_Os11g29970.1_Osativa_NBS_LRRLOC_Os11g29990.1_Osativa_NBS_LRR Pno015201.1_Dcarota_NBS E12/C9/D6 LOC_Os11g30210.1_Osativa_NBS_LRRPno012753.2_SAN.3W5_NBS_LRR LOC_Os01g58530.1_Osativa_NBS_LRR Pno008370.1_Dcarota_NBSLOC_Os01g55530.1_Osativa_NBSLOC_Os02g46680.1_Osativa_NBS LOC_Os09g14490.1_Osativa_NBSPno025872.1_SAN.3W5_NBS_LRRPno013308.1_SAN.3W5_NBS_LRR Pno010835.1_SAN.3W5_NBSPno035972.3_SAN.3W5_NBSPno010639.1_Dcarota_NBS E25/C12/D6 LOC_Os01g33684.2_Osativa_CC_NBS_LRRPno032593.2_SAN.3W5_NBS_LRRPno010879.1_Dcarota_NBS_LRR Pno001399.1_SAN.3W5_NBS LOC_Os01g20720.1_Osativa_NBS_LRRLOC_Os11g29980.1_Osativa_NBSLOC_Os11g30110.1_Osativa_NBSPno013491.1_SAN.3W5_NBS_LRR Pno017613.1_Dcarota_NBS 52 LOC_Os03g38250.1_Osativa_NBS_LRRLOC_Os03g38330.1_Osativa_NBS_LRR MRCA Dc 72 LOC_Os04g43340.1_Osativa_CC_NBS_LRRLOC_Os07g17250.1_Osativa_NBS_LRR Pno009205.1_SAN.3W5_NBS MRCA Dc LOC_Os03g20840.1_Osativa_NBS_LRR Pno033896.2_SAN.3W5_NBS LOC_Os05g41310.1_Osativa_NBS_LRR Pno037646.1_SAN.3W5_NBSLOC_Os01g16380.1_Osativa_NBS LOC_Os09g20040.1_Osativa_NBS_LRRPno017588.1_SAN.3W5_CC_NBS_LRRPno016111.1_Dcarota_NBS_LRRLOC_Os05g41290.1_Osativa_CC_NBS_LRR LOC_Os07g02590.1_Osativa_NBSLOC_Os04g49890.1_Osativa_NBS Pno018367.1_SAN.3W5_NBS_LRRPno007914.1_Dcarota_NBS_LRR LOC_Os11g45780.1_Osativa_NBSLOC_Os11g29000.1_Osativa_NBS_LRRPno014363.1_Dcarota_NBS LOC_Os09g20020.1_Osativa_NBS_LRR Pno025394.1_SAN.3W5_NBS_LRR LOC_Os02g18140.1_Osativa_NBSLOC_Os10g36270.1_Osativa_NBS_LRRPno021696.1_SAN.3W5_NBS_LRR Pno013215.1_SAN.3W5_NBS LOC_Os02g18070.1_Osativa_NBSLOC_Os09g20030.1_Osativa_NBS_LRRPno013128.2_SAN.3W5_NBS_LRR 55 LOC_Os12g30590.1_Osativa_NBS Pno034919.1_SAN.3W5_NBS_LRR Pno020607.1_Dcarota_NBSLOC_Os01g71950.1_Osativa_NBS 65 LOC_Os04g24700.1_Osativa_CC_NBS_LRRLOC_Os02g18000.1_Osativa_NBS_LRRLOC_Os02g18080.1_Osativa_NBS_LRR LOC_Os02g20460.1_Osativa_NBS_LRRPno024806.1_SAN.3W5_NBSPno010192.1_SAN.3W5_NBS LOC_Os06g15750.1_Osativa_CC_NBS_LRRLOC_Os04g30690.1_Osativa_NBSLOC_Os04g30610.1_Osativa_NBSLOC_Os04g30660.1_Osativa_NBS LOC_Os05g12570.1_Osativa_CC_NBS_LRR LOC_Os02g10900.1_Osativa_NBS E3/C15/D6 Pno010122.1_SAN.3W5_NBSPno035773.1_SAN.3W5_NBS LOC_Os11g39590.1_Osativa_NBS LOC_Os07g33730.2_Osativa_NBS_LRR Pno030045.1_Dcarota_NBS E6/C51/D20 LOC_Os02g27540.1_Osativa_NBS_LRRLOC_Os02g27680.1_Osativa_NBSLOC_Os12g03750.1_Osativa_NBS_LRRLOC_Os03g37720.1_Osativa_NBS_LRR LOC_Os04g52690.3_Osativa_NBS LOC_Os07g33720.1_Osativa_NBS_LRRLOC_Os07g33740.1_Osativa_NBS_LRRLOC_Os07g33690.1_Osativa_NBS_LRR LOC_Os08g31870.1_Osativa_CC_NBSLOC_Os10g30580.1_Osativa_NBSLOC_Os03g05730.1_Osativa_NBSPno005859.1_SAN.3W5_NBSPno017688.1_Dcarota_NBSPno014993.1_Dcarota_NBSPno020824.1_Dcarota_NBSPno009373.1_Dcarota_NBS Pno038012.1_SAN.3W5_NBSLOC_Os06g01980.1_Osativa_NBSPno030388.1_Dcarota_CC_NBS LOC_Os11g06230.1_Osativa_NBSPno031714.1_SAN.3W5_NBS LOC_Os04g53160.1_Osativa_NBS_BED_LRRLOC_Os04g11430.1_Osativa_NBS LOC_Os08g28470.1_Osativa_NBS LOC_Os04g22090.1_Osativa_NBS_LRRLOC_Os04g53120.1_Osativa_NBS_BED_LRR Pno005998.1_SAN.3W5_NBS Pn LOC_Os11g29520.1_Osativa_NBS_LRRLOC_Os04g53050.1_Osativa_NBS_LRR LOC_Os06g02430.1_Osativa_NBSPno009845.1_SAN.3W5_NBS 45 LOC_Os02g02640.1_Osativa_CC_NBS_LRRLOC_Os04g52970.1_Osativa_NBS_LRR LOC_Os11g39020.1_Osativa_CC_NBS 52 Pn LOC_Os02g02670.1_Osativa_CC_NBSLOC_Os04g53030.1_Osativa_NBS Pno028348.1_Dcarota_NBSPno030442.1_SAN.3W5_NBSLOC_Os08g45010.1_Osativa_NBS LOC_Os06g45690.1_Osativa_NBS_LRRLOC_Os11g24170.1_Osativa_NBS Pno013210.1_SAN.3W5_NBSLOC_Os11g16500.1_Osativa_NBS Pno021671.1_SAN.3W5_NBSPno029293.1_Dcarota_NBS E8/C13/D5 LOC_Os08g20000.1_Osativa_CC_NBS_LRRLOC_Os08g19980.1_Osativa_NBS_LRR LOC_Os08g39570.1_Osativa_NBS_LRR E16/C16/D13 Pno025552.1_SAN.3W5_NBSPno018589.1_SAN.3W5_NBS LOC_Os11g15700.1_Osativa_NBSLOC_Os12g29710.1_Osativa_NBS_LRRLOC_Os12g29690.1_Osativa_NBS_LRRLOC_Os06g49380.5_Osativa_NBS_LRRLOC_Os06g49390.1_Osativa_NBS LOC_Os12g28050.1_Osativa_NBS LOC_Os06g49360.1_Osativa_NBSLOC_Os04g53496.1_Osativa_NBS_LRRLOC_Os07g04900.1_Osativa_NBS Pno030226.1_Dcarota_NBSPno005919.1_SAN.3W5_NBS LOC_Os02g16270.2_Osativa_NBS_LRRLOC_Os02g16060.1_Osativa_CC_NBS CA12g19150 LOC_Os01g02250.1_Osativa_NBS LOC_Os02g16330.1_Osativa_CC_NBSLOC_Os01g02280.1_Osativa_NBS LOC_Os02g16250.1_Osativa_NBSLOC_Os01g57340.1_Osativa_NBS_LRRLOC_Os01g57310.2_Osativa_NBS_LRRLOC_Os01g57280.1_Osativa_NBS_LRR LOC_Os06g48990.2_Osativa_NBS_LRR Dcarota029797.1 Pno007540.1_Dcarota_NBS_LRRPno017303.1_SAN.3W5_NBS_LRR Dcarota031690.2 Dcarota031692.1 LOC_Os01g57270.2_Osativa_NBS_LRRLOC_Os05g30220.1_Osativa_NBS_LRR LOC_Os11g35210.1_Osativa_NBS_LRRLOC_Os02g04530.1_Osativa_NBS_LRRLOC_Os04g02030.1_Osativa_NBS_LRRLOC_Os11g44580.1_Osativa_NBS LOC_Os04g02450.1_Osativa_NBS_LRRLOC_Os06g16790.1_Osativa_NBSLOC_Os02g30150.1_Osativa_NBS_LRR Solyc12g019640.1.1 LOC_Os04g25900.1_Osativa_CC_NBSLOC_Os04g30930.1_Osativa_NBS_LRR LOC_Os12g32710.1_Osativa_NBS_LRRLOC_Os12g32790.1_Osativa_NBS CA03g18610 LOC_Os12g32680.1_Osativa_NBSLOC_Os12g10460.1_Osativa_NBS Sotub09g028050.1.1 LOC_Os12g10340.1_Osativa_CC_NBS Dcarota001031.1 LOC_Os12g32660.2_Osativa_NBS LOC_Os12g32670.1_Osativa_CC_NBSLOC_Os12g10180.1_Osativa_NBS_LRRLOC_Os05g34220.1_Osativa_NBS_LRR

Solyc09g091670.2.1 LOC_Os12g10710.1_Osativa_NBS_LRR Solyc08g067610.2.1

Sotub12g032470.1.1 Sotub08g016110.1.1 LOC_Os12g10390.1_Osativa_NBSLOC_Os06g20050.1_Osativa_NBS_LRRLOC_Os06g41670.1_Osativa_NBSLOC_Os03g36920.1_Osativa_NBS_LRR Solyc12g100180.1.1 LOC_Os12g10400.1_Osativa_CC_NBSLOC_Os05g34230.1_Osativa_NBS_LRRLOC_Os06g41640.1_Osativa_NBS_LRRLOC_Os06g41660.1_Osativa_NBS_LRR LOC_Os11g46210.1_Osativa_NBS_LRRLOC_Os11g39320.1_Osativa_NBS_LRRLOC_Os10g04180.1_Osativa_NBS_LRRLOC_Os11g29030.1_Osativa_NBS

CA08g05620 LOC_Os12g10360.1_Osativa_NBSLOC_Os06g41480.1_Osativa_NBS_LRRLOC_Os04g02110.2_Osativa_NBS_LRR LOC_Os11g29050.1_Osativa_NBS_LRRLOC_Os11g39160.1_Osativa_NBS_LRRLOC_Os11g39260.1_Osativa_NBS Sotub12g032480.1.1 LOC_Os12g10330.1_Osativa_NBS

Pno023723.1 03730g60AC

CA12g21950 Pno000157.1 CA08g05600 Solyc12g100190.1.1 CA06g02280 LOC_Os06g06380.1_Osativa_NBS_LRRLOC_Os11g43420.1_Osativa_CC_NBS_LRR AT1G15520.1 LOC_Os06g06400.1_Osativa_NBS_LRRLOC_Os06g06390.1_Osativa_NBS_LRRLOC_Os11g42660.1_Osativa_NBS_LRR c CA03g01180 LOC_Os07g40810.1_Osativa_NBS_LRRLOC_Os11g43320.1_Osativa_NBS_LRRLOC_Os11g43410.1_Osativa_NBS_LRR LOC_Os01g16370.1_Osativa_NBS_LRR CA09g05790 CA08g05490 LOC_Os03g30910.1_Osativa_NBS_LRRLOC_Os11g43480.1_Osativa_NBSLOC_Os11g39730.1_Osativa_NBS_LRR LOC_Os01g16390.1_Osativa_NBS_LRRLOC_Os01g16400.1_Osativa_NBS_LRRLOC_Os11g43390.1_Osativa_NBSLOC_Os11g43250.1_Osativa_CC_NBS_LRR Solyc12g019620.1.1 LOC_Os04g41370.1_Osativa_NBS_LRR LOC_Os07g34750.1_Osativa_NBS_LRRLOC_Os11g37040.1_Osativa_NBS_LRR Sotub06g016450.1.1 LOC_Os11g43140.1_Osativa_CC_NBS_LRR Sotub12g020340.1.1 LOC_Os06g16450.1_Osativa_CC_NBS_LRRLOC_Os11g38440.1_Osativa_NBS_LRR CA12g15710 LOC_Os11g42070.2_Osativa_NBS_LRRLOC_Os11g42090.1_Osativa_NBS_LRR LOC_Os05g31610.1_Osativa_CC_NBS_LRRLOC_Os11g38520.1_Osativa_NBS_LRRLOC_Os11g37050.1_Osativa_NBS_LRR 07220g60AC LOC_Os01g23380.1_Osativa_NBS_LRR LOC_Os11g38580.1_Osativa_CC_NBS_LRR CA12g15430 CA12g21970 LOC_Os01g58510.1_Osativa_NBS LOC_Os08g07920.1_Osativa_CC_NBSLOC_Os06g06860.1_Osativa_NBS_LRR Pno000131.1_SAN.3W5_NBS_LRR LOC_Os02g35210.1_Osativa_NBS_LRRLOC_Os06g06850.1_Osativa_NBS_LRRLOC_Os08g07930.1_Osativa_NBS_LRR Pno006054.1 LOC_Os12g37770.1_Osativa_CC_NBS_LRRLOC_Os12g37740.1_Osativa_NBS_LRRLOC_Os11g14380.1_Osativa_CC_NBS_LRRLOC_Os07g27370.1_Osativa_NBS_LRRLOC_Os08g28540.1_Osativa_NBS_LRRLOC_Os08g28460.1_Osativa_NBS_LRR CA12g19370 LOC_Os12g37760.1_Osativa_NBS_LRRLOC_Os11g15500.1_Osativa_NBS_LRRLOC_Os11g15190.1_Osativa_NBS_LRRLOC_Os08g28570.1_Osativa_NBS_LRRLOC_Os08g07940.1_Osativa_NBS_LRRLOC_Os12g30080.1_Osativa_NBS LOC_Os08g31800.1_Osativa_NBS_LRRLOC_Os11g17330.1_Osativa_NBS_LRRLOC_Os02g17304.1_Osativa_NBS_LRRLOC_Os12g30070.1_Osativa_NBS_LRR

Pno000657.1_SAN.3W5_CC_NBS_LRR LOC_Os11g17120.1_Osativa_NBSLOC_Os08g07774.1_Osativa_NBS_LRR

CA12g15720 1.2.0 2 6 7 6 0 g 8 0 cylo S LOC_Os08g28600.1_Osativa_NBSLOC_Os08g31780.1_Osativa_CC_NBS_LRRLOC_Os08g07890.1_Osativa_NBS_LRR 1.1.0 0 1 6 1 0 g 8 0 b uto S LOC_Os11g39280.1_Osativa_NBSPno033371.1_SAN.3W5_NBS_LRR LOC_Os10g04342.1_Osativa_CC_NBSLOC_Os10g04110.1_Osativa_NBS_LRRLOC_Os11g41170.1_Osativa_NBS_LRR Sotub12g029970.1.1CA03g18600 Pno007800.1_SAN.3W5_CC_NBS_LRR LOC_Os07g01530.1_Osativa_NBS LOC_Os08g29854.1_Osativa_NBS_LRRLOC_Os11g13430.1_Osativa_CC_NBS_LRRLOC_Os11g13410.1_Osativa_NBSLOC_Os04g11780.1_Osativa_NBS_LRR LOC_Os08g32880.1_Osativa_NBS_LRR LOC_Os11g13440.1_Osativa_CC_NBS_LRRLOC_Os11g13510.1_Osativa_CC_NBS Solyc12g098210.1.1 LOC_Os01g58520.1_Osativa_NBS_LRRPno021587.1_Dcarota_NBS LOC_Os12g17410.1_Osativa_NBS_LRRLOC_Os03g46550.1_Osativa_NBS_LRRLOC_Os11g34920.1_Osativa_NBS_LRR LOC_Os11g42060.1_Osativa_NBS_LRR Pno014029.1_SAN.3W5_NBS LOC_Os11g16470.1_Osativa_NBS_LRRLOC_Os11g34880.1_Osativa_NBS_LRR LOC_Os02g19750.1_Osativa_NBS_LRR LOC_Os08g29809.1_Osativa_NBS_LRRLOC_Os11g43700.1_Osativa_NBS_LRR Pno003408.1_SAN.3W5_CC_NBS_LRR RRL_SBN_avitasO_1.87970g01sO_COL LOC_Os06g05359.1_Osativa_NBS_LRR LOC_Os11g16530.2_Osativa_NBS_LRRLOC_Os12g17480.1_Osativa_NBS_LRRLOC_Os12g17490.1_Osativa_NBS_LRR LOC_Os01g72410.1_Osativa_NBS_LRR Pno005152.1_SAN.3W5_NBS_LRR AT1G66950.1 LOC_Os04g41380.1_Osativa_NBS_LRR Pno023561.1_Dcarota_CC_NBS LOC_Os12g09240.1_Osativa_CC_NBS_LRRLOC_Os12g17420.1_Osativa_NBS_LRRLOC_Os12g33160.1_Osativa_CC_NBS_LRRLOC_Os12g36690.1_Osativa_CC_NBS_LRRLOC_Os10g03570.1_Osativa_CC_NBS LOC_Os09g34150.1_Osativa_NBS_LRRLOC_Os06g17880.1_Osativa_NBS_LRR LOC_Os12g17090.2_Osativa_NBS_LRRLOC_Os01g24820.1_Osativa_NBS_LRR Pno009113.1_Dcarota_NBS_LRR LOC_Os12g06920.1_Osativa_NBS_LRRLOC_Os12g28100.1_Osativa_NBSLOC_Os01g41890.1_Osativa_NBS CA00g14820 LOC_Os03g40250.1_Osativa_NBS_LRR LOC_Os06g03500.1_Osativa_NBS LOC_Os08g10440.1_Osativa_NBS_LRRLOC_Os12g36720.1_Osativa_CC_NBS_LRR AT2G36380.1 Pno019576.1_SAN.3W5_CC_NBS_LRR LOC_Os06g17930.1_Osativa_NBS_LRR LOC_Os12g36730.1_Osativa_CC_NBSLOC_Os11g34970.1_Osativa_NBS_LRR Pno027905.1_SAN.3W5_NBS RRL_SBN_avitasO_1.06422g60sO_COL LOC_Os02g18510.1_Osativa_NBS_LRRLOC_Os07g12680.1_Osativa_NBS LOC_Os06g17950.1_Osativa_NBS_LRR LOC_Os10g41900.1_Osativa_NBS_LRRLOC_Os12g13550.1_Osativa_NBS_LRRLOC_Os11g37790.1_Osativa_CC_NBSLOC_Os11g37880.1_Osativa_NBS_LRRLOC_Os02g19890.1_Osativa_NBS_LRR Pno004835.1_Dcarota_CC_NBS CA12g19390 Solyc11g067000.1.1 Pno010945.1_SAN.3W5_NBS _1.576910 G C C RRL_SBN_avitasO_1.47640g01sO_COL LOC_Os10g17690.1_Osativa_NBS_LRRLOC_Os11g37740.1_Osativa_NBS_LRR Pno011401.1_Dcarota_NBSPno028713.1_Dcarota_NBS LOC_Os07g09900.1_Osativa_NBS_LRRLOC_Os08g10430.1_Osativa_NBS_LRRLOC_Os01g21240.1_Osativa_NBS CA12g05580 nUrhC LOC_Os12g09710.1_Osativa_NBS_LRR LOC_Os11g47447.1_Osativa_NBS_LRR LOC_Os02g27500.1_Osativa_NBS_LRR LOC_Os08g01580.1_Osativa_NBSLOC_Os11g37870.1_Osativa_NBS_LRRLOC_Os11g37780.1_Osativa_NBS_LRR Sotub11g024810.1.1 LOC_Os09g34160.1_Osativa_NBS_LRR LOC_Os12g28070.1_Osativa_NBS_LRRLOC_Os12g28040.1_Osativa_NBSLOC_Os12g17430.1_Osativa_NBS Pno010222.1_Dcarota_NBS LOC_Os06g17900.1_Osativa_NBS_LRR LOC_Os12g31160.1_Osativa_CC_NBSLOC_Os08g10260.1_Osativa_NBSLOC_Os06g33360.1_Osativa_NBSLOC_Os11g37850.1_Osativa_NBS_LRR LOC_Os04g02120.1_Osativa_CC_NBS Pno009040.2_SAN.3W5_NBS LOC_Os06g17920.1_Osativa_NBS_LRR LOC_Os11g37759.1_Osativa_NBS_LRRLOC_Os07g19320.1_Osativa_NBS_LRRLOC_Os11g37860.1_Osativa_NBS_LRR Pno011400.2_Dcarota_NBS RRL_SBN_avitasO_1.07061g80sO_COL Solyc11g007300.1.1 RRL_SBN_avit Pno023557.3_Dcarota_NBS_LRR LOC_Os07g17420.1_Osativa_NBS LOC_Os08g07330.2_Osativa_NBS_LRRLOC_Os08g07340.1_Osativa_NBS_LRRLOC_Os11g03650.1_Osativa_NBS_LRRLOC_Os11g37774.1_Osativa_NBS_LRRLOC_Os11g32170.1_Osativa_CC_NBS_LRR LOC_Os05g50790.1_Osativa_NBS LOC_Os11g11940.1_Osativa_NBS_LRR LOC_Os11g45840.1_Osativa_NBSLOC_Os11g11920.1_Osativa_NBS_LRR Sotub11g011460.1.1 LOC_Os08g42700.1_Osativa_NBS_LRR CA12g19380 Pno023560.1_Dcarota_NBS_LRR RRL_SBN_avitasO_1.03354g11sO_COL LOC_Os08g42670.1_Osativa_CC_NBS_LRR LOC_Os11g46140.1_Osativa_CC_NBS_LRR Sotub05g026400.1.1 LOC_Os05g40160.1_Osativa_NBS SBN_avitasO_1.08553g11sO_COL LOC_Os11g27430.1_Osativa_NBS LOC_Os11g45970.1_Osativa_NBS Pno008200.1_Dcarota_NBS_LRR LOC_Os08g42710.1_Osativa_NBS_LRRLOC_Os08g13800.1_Osativa_NBS_LRR LOC_Os08g14830.1_Osativa_NBS_LRR Dcarota019012.1 LOC_Os10g22290.1_Osativa_NBS LOC_Os11g11550.2_Osativa_NBS LOC_Os05g40150.1_Osativa_NBS_LRRLOC_Os12g28250.1_Osativa_NBSLOC_Os11g32210.1_Osativa_CC_NBS_LRR RRL_SBN_CC_avitasO_1.01833g10sO_COL LOC_Os11g11790.1_Osativa_NBS LOC_Os07g26430.1_Osativa_NBS_LRR Solyc05g053590.2.1 Dcarota015421.1 Pno025205.1_SAN.3W5_NBS LOC_Os11g11810.1_Osativa_NBS_LRRLOC_Os09g07590.1_Osativa_NBS Pno013736.1_SAN.3W5_CC_NBS_LRR Pno036671.1_SAN.3W5_NBS_LRR SBN_avitasO_1.03403g60sO_COL Pno034790.1 Pno020282.2_SAN.3W5_NBS LOC_Os11g45790.1_Osativa_CC_NBS LOC_Os05g15040.1_Osativa_NBS_LRR LOC_Os11g45930.1_Osativa_NBS_LRRLOC_Os08g07950.1_Osativa_NBS_LRR Sotub05g026410.1.1 Pno016435.1_SAN.3W5_NBSPno004830.1_Dcarota_NBS_LRRPno017046.1_Dcarota_NBS_LRRPno008568.1_Dcarota_NBS_LRRPno025204.1_SAN.3W5_NBS_LRR LOC_Os11g11580.1_Osativa_NBS Pno019525.1_SAN.3W5_NBS_LRR LOC_Os12g18374.1_Osativa_CC_NBS LOC_Os12g18360.1_Osativa_NBS LOC_Os09g07580.1_Osativa_NBS_LRR Solyc11g007280.1.1CA12g05570 Pno025331.1_Dcarota_NBS_LRRPno022394.1_SAN.3W5_NBS LOC_Os10g22510.1_Osativa_NBS_LRR LOC_Os10g22484.1_Osativa_NBS LOC_Os11g27440.1_Osativa_NBS_LRR Pno012621.1_Dcarota_CC_NBS Pno004841.3_Dcarota_NBSPno004880.1_Dcarota_NBS LOC_Os11g39190.1_Osativa_NBS_LRRLOC_Os11g39310.1_Osativa_NBS_LRR LOC_Os08g05440.1_Osativa_NBS Pno010221.1_Dcarota_NBSPno016436.1_SAN.3W5_NBSPno004842.1_Dcarota_NBS a LOC_Os11g46200.1_Osativa_NBS_LRR LOC_Os11g46070.3_Osativa_CC_NBS Pno025342.1_Dcarota_NBS_LRRPno025337.1_Dcarota_CC_NBS_LRRPno025328.1_Dcarota_NBSPno004853.1_Dcarota_NBS LOC_Os12g37280.1_Osativa_NBS_LRR LOC_Os11g39330.1_Osativa_NBS_LRR Solyc05g053600.2.1 Pno004985.1_Dcarota_NBS sO_1.00412g01sO_COL Sotub11g011480.1.1 Pno015998.1_SAN.3W5_NBSPno004988.1_Dcarota_NBSPno004978.1_Dcarota_NBS LOC_Os12g31200.1_Osativa_NBS_LRRLOC_Os11g45980.1_Osativa_NBS_LRR Pno014749.1_Dcarota_NBS LOC_Os07g09940.1_Osativa_NBS LOC_Os11g46080.1_Osativa_NBS Pno004879.1_Dcarota_NBSPno004976.1_Dcarota_NBSPno009467.1_SAN.3W5_NBS LOC_Os08g14850.1_Osativa_NBS_LRR Pno013733.1_SAN.3W5_NBS Pno004845.1_Dcarota_CC_NBSPno004846.2_Dcarota_NBSPno004971.1_Dcarota_NBSPno004970.1_Dcarota_NBS Pno014736.1_Dcarota_NBS_LRR R R L_ SLOC_Os12g37290.1_Osativa_NBS B N _ C C _atoracLOC_Os08g14810.1_Osativa_NBS_LRR D Pno025329.1_Dcarota_CC_NBS_LRRPno004969.1_Dcarota_NBS Pno004872.1_Dcarota_NBS_LRR LOC_Os11g41540.1_Osativa_NBS LOC_Os11g39290.1_Osativa_NBS_LRR Solyc05g053570.2.1 LOC_Os11g46130.1_Osativa_NBS Solyc11g007290.1.1 tas O _ 2 4.A N R m .hseLOC_Os08g30660.1_Osativa_NBS n e gf.y Srh C Pno025332.1_Dcarota_NBS_LRRPno025333.1_Dcarota_NBS Pno025716.1_Dcarota_NBS_LRR Pno004982.1_Dcarota_NBSPno022573.1_Dcarota_NBS_LRRPno003582.1_Dcarota_NBS_LRRPno005833.1_Dcarota_NBS LOC_Os07g09910.1_Osativa_NBS Pno004987.1_Dcarota_NBSPno004980.1_Dcarota_NBS CA00g72550 Pno004981.1_Dcarota_NBSPno018824.1_Dcarota_NBS_LRRPno015581.1_Dcarota_NBS_LRRPno024175.1_Dcarota_NBSPno009164.1_SAN.3W5_NBS_LRRLOC_Os10g10360.1_Osativa_NBS Sotub11g011470.1.1 Pno037645.3_SAN.3W5_NBS_LRR Pno004876.1_Dcarota_NBS Pno003581.1_Dcarota_NBS_LRR CA05g15890 Pno009165.1_SAN.3W5_NBS_LRRPno021423.1_SAN.3W5_NBS_LRR Pno008982.2_SAN.3W5_NBS_LRRPno012250.1_Dcarota_NBS_LRR Pno028006.1_SAN.3W5_NBS Pno028763.2_Dcarota_NBS Pno026650.1_Dcarota_NBS LOC_Os04g12830.1_Osativa_NBS CA00g72540 Pno026649.1_Dcarota_NBSPno012251.1_Dcarota_NBS_LRRPno021422.2_SAN.3W5_NBS_LRRLOC_Os03g50150.1_Osativa_NBS_LRRLOC_Os01g70080.1_Osativa_NBS_LRR

CA05g15900 Pno029804.1_Dcarota_NBS_LRR R R L_ S B N _avitas O _1.04663g10s O _ C O L Pno008981.1_SAN.3W5_NBS_LRRPno009166.1_SAN.3W5_NBS_LRR LOC_Os04g46300.1_Osativa_NBS_LRR Pno037647.1_SAN.3W5_NBS_LRR LOC_Os12g32590.1_Osativa_NBS

R R L_ S B N _avitas O _1.43570g01s O _ C O L Pno009477.1_Dcarota_NBS L_ S B N _avitasLOC_Os02g57310.1_Osativa_CC_NBS O _62.A N RLOC_Os07g12670.1_Osativa_NBS_LRR m.hsenegf. LOC_Os09g09490.1_Osativa_NBS_LRRLOC_Os06g48520.1_Osativa_NBS_LRR LOC_Os11g17014.1_Osativa_NBS_LRR RL_SBN_avi

LOC_Os04g21890.1_Osativa_NBS_LRR R Pno026645.1_Dcarota_NBS_LRRPno030282.1_Dcarota_NBS_LRR LOC_Os07g08890.1_Osativa_NBS_LRR Sotub05g026420.1.1 Pno026647.1_Dcarota_NBS_LRR R

R Pno004975.1_Dcarota_NBS R R L_ S B N _avitas O _1.09823g80s O _ C O L Sotub05g028440.1.1 LOC_Os12g14330.1_Osativa_NBS_LRR CA00g72560 LOC_Os11g39600.1_Osativa_NBS LOC_Os12g31620.1_Osativa_NBS_LRR

Pno026646.1_Dcarota_NBS_LRR LOC_Os08g09110.1_Osativa_CC_NBS_LRR LOC_Os04g02860.1_Osativa_CC_NBS LOC_Os11g39550.1_Osativa_NBS Pno004877.1_Dcarota_NBS_LRR R R L_ S B N _avitas O _92.A N R m.hsenegf.n Urh C Solyc05g053610.2.1 Pno022256.1_Dcarota_NBS_LRR Pno028646.1_Dcarota_NBS Pno008964.1_Dcarota_NBS LOC_Os09g09750.1_Osativa_NBS_LRR LOC_Os08g19694.1_Osativa_NBS_LRR LOC_Os09g30220.1_Osativa_NBS_LRR

LOC_Os08g09430.1_Osativa_NBS_LRRLOC_Os08g16120.2_Osativa_NBS_LRR CA00g72520 LOC_Os06g37840.1_Osativa_NBS

CA05g15880 LOC_Os11g11770.1_Osativa_NBS_LRR LOC_Os03g26260.1_Osativa_NBS_LRR LOC_Os11g12320.1_Osativa_NBS_LRR

LOC_Os02g09790.1_Osativa_CC_NBS_LRR

LOC_Os11g12040.1_Osativa_NBS_LRR CA00g72530 LOC_Os07g31800.1_Osativa_NBS_LRR LOC_Os11g11960.1_Osativa_NBS_LRR CA00g21660 LOC_Os11g11950.1_Osativa_NBS_LRR LOC_Os11g12000.1_Osativa_NBS_LRR LOC_Os11g12330.1_Osativa_NBS_LRR Pno026651.1_Dcarota_NBS_LRR LOC_Os11g11990.1_Osativa_CC_NBS_LRR LOC_Os04g14220.2_Osativa_NBS_LRR

Dcarota009144.1 LOC_Os11g12050.1_Osativa_NBS_LRR Pno013216.1_SAN.3W5_TIR_NBS LOC_Os11g12340.2_Osativa_CC_NBS_LRR CA04g01140 LOC_Os11g12300.1_Osativa_NBS Solyc05g055330.2.1 LOC_Os11g38000.1_Osativa_NBS Sotub09g028040.1.1 Sotub05g028430.1.1 Dcarota000276.1

CA03g01170 LOC_Os11g42040.1_Osativa_CC_NBS_LRR Dcarota016237.1 Pno027624.1 Dcarota009141.1 Dcarota016246.1 Dcarota009142.1 Pno000515.1 ABC_Gpdr Dcarota007547.1 CA12g07800 Pno017868.1 CA02g18760 Dcarota026297.1 AT2G29940.1 Dcarota026296.1 Dcarota007372.1 Dcarota026298.1 Pno015467.1 Pno031281.1 Dcarota007545.1 CA06g18790 Dcarota007544.1 Solyc06g076930.1.1 Dcarota007543.1 Sotub06g032710.1.1 Pno000514.1 CA02g18750 Pno017869.1 Solyc02g081870.2.1 Pno008727.1 Sotub02g025180.1.1

Dcarota007546.1 Dcarota015854.1 Dcarota007282.1 Pno018622.1 CA00g80000 Pno022767.1 Pno001585.1 Pno027911.1 Dcarota017069.1 Pno038308.1

Dcarota030905.1 AT3G30842.1Pno011842.2 CA07g18980 CA01g06810 Dcarota010401.1 Dcarota016908.1CA01g29850 Sotub05g017530.1.1 Solyc05g018510.2.1AT2G26910.1 Dcarota020679.1 CA01g08790 CA06g12240 CA06g12250 CA08g05500 CA07g15670 CA09g03590

AT1G15210.1 AT1G59870.1 Sotub06g024720.1.1Pno029236.1

Solyc06g065670.2.1 Solyc03g120980.2.1 Sotub03g035980.1.1 Sotub12g022330.1.1 CA00g46320 Pno013181.2

Dcarota016190.1 Dcarota010649.1

Pno010065.1 CA11g11430Dcarota029966.1 Pno002516.1

Dcarota005727.1 Dcarota007583.1 Dcarota015172.1 Pno004247.1 Dcarota002084.1 CA12g03090

1.552300GCC Dcarota007582.1 Sotub01g038890.1.1 Solyc01g101070.2.1 CA00g29040

Pno012205.1 CA00g62180 Dcarota029967.1 Pno038386.1

AT3G53480.1 Dcarota018248.1 CA00g62170 Dcarota003482.1 Pno036103.1 Dcarota002930.1 1.811110atoracD AT2G37280.1

AT4G15215.1 Pno004379.1 Dcarota005262.1 AT4G15236.1

Dcarota023397.1

Dcarota005260.1 AT4G15230.1

1.104510 G C C Dcarota026956.1 1.595830 G C C

Dcarota011123.1Dcarota011120.1

CA06g15670 a Acetoacetyl-CoA 2. 3. 1. 9 1. 1. 1. 34 Mevalonate-5P Isopentenyl-PP

GlycolysisbioRxiv preprint doi: https://doi.org/10.1101/3620462. 3. 3. 10 ; this version posted July2. 7.6, 1.2018. 36 The copyright2. holder 7. 4. 2 for this preprint4. 1. (which 1. 33 was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Acetyl-CoA 3-Hydroxy-3-methy Mevalonate Mevalonate-5PP -lglutaryl-CoA

K15817(DDS) K00511(SQLE) K00801(FDFT1) K00801(FDFT1) DammarenediolII 4. 2. 1. 125 1. 14. 13132 2. 5. 1. 21 2. 5. 1. 21 2. 5. 1. 10

(S)-Squalene-2,3-epoxide Squalene Presqualene-PP Farnesyl-PP b 3000 Pno022472.1(SQLE) 2500 Pno009162.1(SQLE) 2000 Pno029444.1(FDFT1) d Pno034035.1(DDS) 1500 Pno005189.1(FDFT1) 1000 Pno035240.1(SQLE) 500

Log(RPKM) 150

100

50

0 R1 R2 R3 L1 L2 L3 F2 F3 α helix (Front) 40

Pno022472.1(SQLE) 35 Pno009162.1(SQLE) Pno029444.1(FDFT1) 30 Pno034035.1(DDS) Pno005189.1(FDFT1) 195A Pno035240.1(SQLE) 25 194L 20

Relative Quantities 15

10

5

0 R1 R2 R3 L1 L2 L3 F2 F3 c α helix (Front) aa:194-196 α helix (Back) a d

Lg(RPKM) Relative Quantities Solyc07g064450.2.1 −4 −2 0 2 4 6 8 10 12 14

0 2 4 6 8 Achn148311 bioRxiv preprint

Achn303911

Achn333001 Achn130041 1R 3L 2L 2F3 F2 L3 L2 L1 R3 R2 R1

R1 PGSC0003DMT400038240 PGSC0003DMT400038256 R2

PGSC0003DMT400008432 doi:

R3 certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. PGSC0003DMT400056719 L1 https://doi.org/10.1101/362046 L2 Pno038964.1 L3

PGSC0003DMT400093941 F2 Pno002960.1 Solyc07g042880.1.1PGSC0003DMT400035313 Solyc05g021390.2.1 F3 PGSC0003DMT400054457CYP716A53v2 PGSC0003DMT400019561 Achn329931 Pno005525.1 Pno017963.1 Pno032385.1 Pno011050.1 Pno008576.1 Pno011440.1 Pno020138.2 Pno006076.1 Pno008188.1 Pno001661.1 Pno037530.1 Pno033513.1 Pno022529.1 Pno027966.1 Pno009840.2 Pno031133.1 Pno012725.1 Pno011760.1 Pno002960.1 Pno012347.1 Pno012456.2 Pno033997.1 Pno020287.1 Pno008582.1 Pno035711.1 CYP716 Achn023471 Solyc02g069600.2.1 Achn329911

Achn368931 Achn038071 Pno011760.1

; Dc020454.1

this versionpostedJuly6,2018. Achn223051 Achn329921

Pno021283 Pno011760 Pno002960 Pno012347 Pno009048.1 Achn213111 CYP716 CYP716 CYP716

CYP71 CYP71 CYP- CYP71 CYP76 CYP96 CYP72 CYP83 CYP72 CYP51 CYP86 CYP86 CYP94 CYP82 CYP78 CYP78 CYP- CYP77 CYP51 CYP- CYP81 CYP87 AT5G36110.1

Achn201691 Achn313551 Achn201701 AT5G36140.1

PGSC0003DMT400067171

Lg(RPKM) Achn115321

−2 Achn363971 Solyc06g065420.1.1

0 2 4 6 8 CYP716A47

Pno021283.1 Pno011353.1 Pno007340.1

Pno012347.1 Pno032234.1

PGSC0003DMT400067089

Solyc06g065430.2.1

Dc032270.1

PPTS>PPDS

PPDS>PPTS Dc032272.1

R1 Dc014537.1 R2 Dc024338.1 The copyrightholderforthispreprint(whichwasnot R3 L1 L2 L3

F2 c F3 Pno021283.1 Pno002917.1 Pno020120.1 Pno008682.1 Pno020413.1 Pno017577.2 Pno006032.1 Pno015535.1 Pno008683.1 Pno000020.2 Pno037322.1 Pno019492.1 Pno027184.1 Pno023041.1 Pno014418.1 Pno002334.1 Pno021801.1 Pno021401.1 Pno005453.1 Pno032082.1 Pno038184.1 Pno028322.1 Pno012265.1 Pno011544.1 b Protopanaxatriol (PPT) Protopanaxadiol (PPD) 0 20 Dammarenediol-II −5 CYP-- CYP716 CYP71 CYP77 CYP715 CYP72 CYP79 CYP71 CYP707 CYP71 CYP71 CYP707 CYP71 CYP718 CYP94 CYP98 CYP90 CYP82 CYP71 CYP76 CYP71 CYP82 CYP77 CYP87 Value CYP716A53v2 CYP716A47 5 0 a b

bioRxiv preprint doi: https://doi.org/10.1101/362046; this version posted July 6, 2018. The copyright holder for this preprint (which was not UGTPg100 UGTPg1certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. UGTPg101 UGTPg101 Protopanaxadiol (PPD)

Protopanaxatriol (PPT) ginsenoside F1 ginsenoside Rg1 UGT71A27 UGT74AE2 c 8

6

4

2 ginsenoside Rh2 Compound K

0 0 12 −5 0 5 −2 Value

−4 UGT94Q2 Pno017659.1 UGT-- Pno026280 UGT74AE2 Pno026280.1 UGT71A Pno0027722 Pno033035.1 UGT74 10 Pno036629.1 UGT72B Pno020280 Pno007320.1 UGT-- 9 Pno020571.1 UGT-- Pno010455.1 UGT73 8 Pno018210.1 UGT-- Pno027722.1 UGT71A 7 Pno008958.1 UGT74 Pno015224.1 UGT-- 6 ginsenoside Rg3 ginsenoside F2 Pno012461.1 UGT71 Pno014832.1 UGT72E 5 Pno014565.1 UGT-- Pno016510.1 UGT85A 4 Pno018314.1 UGT85A 3 Pno003277.1 UGT-- Relative Quantities Pno003692.1 UGT-- UGT94Q2 Pno003883.1 UGT-- 2 Pno020280.1 UGT71A 1 Pno033219.1 UGT-- Pno013811.1 UGT-- 0 Pno024024.1 UGT85A R1 R2 R3 L1 L2 L3 F2 F3 L1 L2 L3 F2 F3 R1 R2 R3 d 6

4 ginsenoside Rd

2

0 0 8 −2 −5 0 5 Value Pno013844.1 UGT71A Pno033394 Pno019712.1 UGT75 Pno013844 Pno019168.1 UGT-- 12 Pno031515 Pno002810 Pno020304.3 UGT85A Pno00732 Pno013982.1 UGT80 10 Pno026117.1 UGT-- Pno034813.1 UGT73B 8 Pno003567.1 UGT-- Pno024781.1 UGT-- 6 Pno031515.1 UGT74 Pno033394.1 UGT74 4 Pno002810.1 UGT74

Relative Quantities Pno000722.1 UGT74 2 Pno036328.1 UGT85 Pno018315.1 UGT85A 0 R1 R2 R3 L1 L2 L3 F2 F3 L1 L2 L3 F2 F3 R3 R1 R2