This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore.

β‑Ginkgotides: Hyperdisulfide‑constrained peptides from biloba

Wong, Ka Ho; Tan, Wei Liang; Xiao, Tianshu; Tam, James P.

2017

Wong, K. H., Tan, W. L., Xiao, T., & Tam, J. P. (2017). β‑Ginkgotides: Hyperdisulfide‑constrained peptides from . Scientific Reports, 7, 6140‑. https://hdl.handle.net/10356/86680 https://doi.org/10.1038/s41598‑017‑06598‑x

© 2017 The Author(s) (Nature Publishing Group). This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Downloaded on 07 Oct 2021 19:42:21 SGT www.nature.com/scientificreports

OPEN β-Ginkgotides: Hyperdisulfde- constrained peptides from Ginkgo biloba Received: 10 May 2017 Ka H. Wong, Wei Liang Tan, Tianshu Xiao & James P. Tam Accepted: 14 June 2017 Hyperdisulfde-constrained peptides are distinguished by their high stability and diverse functions. Published: xx xx xxxx Thus far, these peptides have been reported from animals only but their occurrence in are rare. Here, we report the discovery, synthesis and characterization of a hyperdisulfde-constrained peptides family of approximately 2 kDa, β-ginkgotides (β-gB1 and β-gB2) from Ginkgo biloba. Proteomic analysis showed β-ginkgotides contain 18‒20 amino acids, of which 16 residues form a conserved six-cysteine core with a highly clustered cysteine spacing of C‒CC‒C‒CC, an arrangement that has not been reported in cysteine-rich peptides. Disulfde mapping revealed a novel disulfde connectivity of CysI‒IV, CysII‒ VI and CysIII‒V. Oxidative folding of synthetic β-gB1 to the native form was obtained in 70% yield. The synthetic β-gB1 displays a compact structure with no regular secondary structural elements, as determined by NMR spectroscopy. Transcriptomic analysis showed precursor βgb1 has a four-domain architecture and revealed an additional 76 β-ginkgotide-like peptides in 59 diferent , but none in angiosperms. Phylogenetic clustering analysis demonstrated β-ginkgotides belong to a new cysteine-rich peptide family. β-Ginkgotide is resistant to thermal, chemical and proteolytic degradation. Together, β-ginkgotides represent the frst-in-class hyperdisulfde-constrained peptide family from plants with a novel scafold that could be useful for engineering metabolically stable peptidyl therapeutics.

Plants have evolved complex and efective defense mechanisms that release pathogenesis-related biomolecules against pathogen attacks1. Tese chemicals range from small-molecule metabolites to large biomolecules and are natural products that have been essential for drug discovery. Among the pathogenesis-related biomolecules, cysteine-rich peptides (CRPs), which fall in the chemical space of 2–6 kDa, are putative active compounds in medicinal herbs. A common characteristic of such plant CRPs is their stable structures, which ofen contain three to fve intramolecular disulfdes as cross-braces to render them exceptionally resistant to thermal and proteolytic degradation. In addition, the CRPs in this chemical space are promising therapeutic candidates because they pos- sess a large footprint that could increase their on-target specifcity and minimize their of-target side reactions2. Tese exceptional features, coupled with their underrepresentation as active compounds in herbal medicine, have prompted our laboratory to develop an herbalomics program to discover the CRPs in medicinal herbs that can be potential drug candidates1, 3–6. Ginkgo biloba, commonly known as the maidenhair tree or YinXing in Chinese, is a deciduous native to Southwestern China. Te G. biloba nut is used as a traditional Chinese medicine for relieving symp- toms of asthma, cough, vaginal discharge and urination discomfort7. Apart from their medicinal use, G. biloba nuts are commonly used as a functional food in Asian cuisines. Major phytochemicals in G. biloba nuts include alkyl phenols, cyanophoric glycosides, favonoids and terpene lactones8, 9. Lipid extracts of G. biloba nuts were shown to reduce apolipoprotein B secretion and enhance low-density lipoprotein receptor expression in human hepatoma cells10. Furthermore, oral supplementation of G. biloba nuts for four weeks has been shown to reduce hepatic cholesterol and triacylglyceride level in mice10. Nonetheless, limited information is available regarding the occurrence and therapeutic potential of CRPs in G. biloba nuts, an area that is of interest to our laboratory. Recently, we have characterized eleven proline-rich 8C-hevein-like peptides, ginkgotides from G. biloba leaves. Te leaf-derived ginkgotides display potent anti-fungal activity against common phyto-pathogenic strains5. CRPs are grouped into families based on their cysteine motifs and disulfde connectivities. Based on their disulfde connectivities, the CRPs can be broadly classifed into two types: cystine knots which are exemplifed by

School of Biological Sciences, Nanyang Technological University, Singapore, 637551, Singapore. Correspondence and requests for materials should be addressed to J.P.T. (email: [email protected])

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 1 www.nature.com/scientificreports/

Figure 1. Animal hyperdisulfde-constrained peptides. (A) M-1 conotoxin mr3e from the marine cone snail Conus marmoreus (PDB: 2EFZ), (B) θ-defensins RTD-1 from Rhesus macaque (PDB: 2LYF) and (C) minicollagen-1 from the jellyfsh Hydra sp. (PDB: 1SOP).

knottins, and symmetrics which are represented by thionins1. Recently, our laboratory discovered two new fami- lies of CRPs with novel disulfde connectivity, the jasmintides11 and the lybatides. Te jasmintides, which contain 27 amino acids with a molecular weight of 3.1 kDa, are six-cysteine CRPs (6C-CRPs) from Jasminum sambac with a disulfde connectivity of CysI‒V, CysII‒IV and CysIII‒VI11. Lybatides, which are 33 amino acids in length, are eight-cysteine CRPs from the Lycium barbarum root cortex with a disulfde connectivity of CysI‒VI, CysII‒VIII, CysIII‒VII and CysIV‒V. Tus far, the knottins, with a disulfde connectivity of CysI‒IV, CysII‒V and CysIII‒VI, are the most frequently encountered 6C-CRPs in plants. In addition, the structure of CRPs becoming increasingly constrained with decreasing molecular weight. CRPs become hyperdisulfde-constrained when their cysteine residues account for >30% of total amino acids within the cystine core. Tus far, hyper-disulfde-constrained CRPs had been found in animal kingdom and their occurrence in planta have not been reported. Figure 1 exem- plifes the animal hyperdisulfde-constrained peptides, including M-1 conotoxin mr3e from the marine cone snail Conus marmoreus12, θ-defensins RTD-1 from Rhesus macaque13 and minicollagen-1 from the jellyfsh Hydra sp.14. Herein, we report the discovery, characterization and synthesis of a new family of hyperdisulfde-constrained peptides ginkgotides, β-ginkgotides β-gB1 and β-gB2, from the nuts of G. biloba. Te nut-derived ginkgotides and leaf-derived, proline-rich 8C-hevein-like peptide family of ginkgotides are renamed the α-ginkgotides based on the order of their discovery5. Te β-ginkgotides difer from the α-ginkgotides in length, sequence composition, cysteine spacing and disulfde connectivity. Strikingly, the β-ginkgotides are small, 18‒20 amino acids in length, and hyperdisulfde-constrained, with six cysteine residues arranged in a novel CC‒C‒CC‒C motif. A structure determination by NMR showed that β-gB1 displays a novel disulfde connectivity and a compact structure with no helical or beta strand secondary structural elements. Transcriptomic data mining found an additional 76 β-ginkgotide-like peptides in gymnosperms but not in angiosperms. A neighbor-joining clustering analysis revealed that β-ginkgotides represent a new family of CRPs that are only distributed in gymnosperms. Taken together, the discovery of β-ginkgotides expands the occurrence and distribution of CRPs in planta and provides insight into the molecular diversity of CRPs in modern gymnosperms and angiosperms. Results and Discussion β-Ginkgotide: Identifcation and Purifcation. A mass-spectrometry-driven approach was employed to identify putative CRPs that are present in medicinal plants. G. biloba nuts (5 g) were extracted with boiling water (50 mL), and the extraction was fractionated by solid-phase extraction using a C18 column and profled by MALDI-TOF MS in the mass range between 2 and 6 kDa. Figure 2A shows a mass spectrum with peaks between 2.0 and 2.8 kDa. A compound with a molecular mass [M + H]+ of 2375.19 Da was isolated and designated β-gink- gotide β-gB1. In a scaled up purifcation using 2 kg of G. biloba nuts, β-gB1 and an additional β-ginkgotide β-gB2 with a molecular mass [M + H]+ of 2083.04 Da were isolated using a series of chromatography steps. Te extrac- tion yields of β-gB1 and β-gB2 were approximately 3 and 0.1 mg, respectively, per kg of dried G. biloba nuts. To determine the number of cysteine residues that are present in the β-ginkgotides, a mass shif experiment was performed. β-Ginkgotide β-gB1 and β-gB2 were reduced by dithiothreitol and subsequently alkylated by iodoacetamide. Te disulfde content of β-ginkgotide was determined by comparing the mass diference before and afer the S-alkylation using MALDI-TOF MS. Te β-ginkgotide displayed a mass shif of 348 Da, and each S-alkylated Cys causes a mass increase of 58 Da, suggesting that these peptides contained six Cys residues (Figure S1, Supporting information).

Primary Sequence Determination. De novo sequencing was performed to determine the primary sequence of the S-alkylated β-gB1 using MALDI-TOF/TOF MS/MS. Te amino acid assignment was based on the b- and y-ions detected during the tandem MS fragmentation (Fig. 2B). Te MS/MS profle of β-gB1 revealed a putative sequence of YETGCXRCCYXDEYGCXRCC, where X represents the isobaric residues Ile/Leu or Lys/Gln. Teir assignment was confrmed with the cDNA sequence from OneKP (accession number: SGTW- 2035618) to give the complete sequence of β-gB1 as YETGCKRCCYLDEYGCIRCC. Furthermore, the predicted molecular mass (2373.91 Da) was in agreement with the experimental molecular mass of β-gB1 (2374.19 Da) that was determined from the MS spectrum. Te amino acid sequence of β-gB2 was determined in the same manner. β-Ginkgotide β-gB2 is a truncated β-gB1 with two amino acids (Tyr1 and Glu2) omitted at the N-terminus. Te β-ginkgotide β-gB1 was chosen as the representative for further characterization because of its high abundance.

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 2 www.nature.com/scientificreports/

Figure 2. Mass spectra of Ginkgo biloba nut. (A) MALDI-TOF MS profle of the G. biloba nuts in a mass range of 2000−2800 Da. G. biloba nuts (5 g) were extracted with boiling water (50 mL) and fractionated by solid- phase extraction C18 column. (B) MALDI-TOF/TOF MS/MS profle of the G. biloba nuts in a mass range of 10–2500 Da. Te purifed β-ginkgotide β-gB1 was reduced by 20 mM dithiothreitol and alkylated with 200 mM iodoacetamide. Assignment of isobaric amino acids such as Leu/Ile were confrmed by the transcriptome data.

Disulfide Connectivity. The disulfide connectivity was determined by partially reducing β-gB1 with tris(2-carboxyethyl)phosphine to generate intermediates with one or two disulfde bonds remaining intact. Tese intermediates were immediately alkylated by N-ethylmaleimide (NEM) under acidic conditions (pH 3.5) to prevent disulfde bond scrambling and were subsequently purifed by RP-HPLC. Te number of NEM-labeled cysteine (S-NEM) residues were determined by MALDI-TOF MS (Fig. 3A). Each NEM-labeled cysteine causes a mass increase of 126.15 Da; intermediates that displayed shifs of 252.30 and 504.60 Da thus had one and two reduced disulfde bonds, respectively. Figure 3B shows four fractions that were collected from the HPLC, includ- ing fraction 1, which had six S-NEM and no intact disulfde bonds (0S-S); fraction 2, which had four S-NEM and one intact disulfde bond (1S-S); fraction 3, which had native β-gB1 with three intact disulfde bonds (3S-S); and fraction 4, which had two S-NEM and two disulfde bonds (2S-S). Fractions 2 and 4 were then fully reduced by dithiothreitol and alkylated with iodoacetamide. Te positions of the NEM- and acetamido-labeled cysteines (S-AA) were determined by tandem MS fragmentation and employed to deduce the peptide’s disulfde connectiv- ity. Figure 3C–F summarize our disulfde mapping results. Te 2S-S intermediate revealed the CysIII‒V disulfde linkage, and the 1S-S intermediate revealed both CysII‒VI and CysIII‒V. Te third disulfde linkage, from CysI‒ IV, was obtained by deduction. Currently, the disulfde connectivity of plant CRPs with six cysteine residues can be arbitrarily classifed top- ologically into three families. Tey are (1) cystine knot peptides (CysI‒IV, CysII‒V and CysIII‒VI)15, (2) sym- metrics (CysI‒VI, CysII‒V and CysIII‒IV)16 and (3) jasmintides (CysI‒V, CysII‒IV and CysIII‒VI)11. Cystine knot peptides, including cystine knot α-amylase inhibitors (CKAIs), 6C-hevein-like peptides, carboxypeptidase inhibitors and cyclotides, represent the majority of known disulfde connectivities of plant CRPs by far17. Tese peptides share a similar topology of three disulfde bonds; one disulfde bond penetrates through a macrocycle formed by the other two disulfde bonds, which interlock the peptide backbone to form a knotted structure15. Te symmetric family of disulfde connectivities is found in thionin and thionin-like peptides, which also contain a symmetric cysteine motif. Te symmetry disulfde pattern in thionins consists of the frst cysteine connecting to the last one, the second cysteine to the second to last one, and so on1. Te third family, the jasmintides, is a varia- tion of the symmetrics, but the presence of consecutive cysteine residues in a CC motif at their C-terminus alters the symmetric disulfde connectivity. In the jasmintides, the disulfde bonds CysII‒IV and CysIII‒VI cross-brace with each other at the C-terminus, while the disulfde bond CysI‒V brings the C-terminus and N-terminus into close proximity11. Finally, the disulfde connectivity of CysI‒IV, CysII‒VI and CysIII‒V that is found in β-gB1 represents the fourth family of disulfde connectivities that has not been reported in any plant CRP families. Te reductive unfolding order refects the exposure of the disulfde bonds to the solvent and reducing rea- gents. Te solvent-exposed outer disulfde bonds are more labile to reducing agents than those buried inside the

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 3 www.nature.com/scientificreports/

Figure 3. Disulfde mapping of β-ginkgotide β-gB1. (A) MALDI-TOF MS profle of partially alkylated β-gB1. Te purifed β-gB1 was partly S-reduced by tris(2-carboxyethyl)phosphine for 17 min and S-alkylated by N-ethylmaleimide (NEM) for 1 hr. (B) Chromatogram of partially alkylated β-gB1. Te reaction mixture was separated by HPLC and four major fractions were collected. Tese fractions were profled by MALDI-TOF MS to determine the number of disulfde bonds present. (C) MALDI-TOF MS profle of fraction 4 with two-N- ethylmaleimide alkylated intermediate and (D) fully reduced species. (E) MALDI-TOF MS profle of fraction 2 with four-N-ethylmaleimide alkylated intermediate and (F) fully reduced species. Te two- and four-N- ethylmaleimide alkylated intermediate species were further reduced with dithiothreitol and alkylated with iodoacetamide. Te fully alkylated β-ginkgotide was desalted and sequenced by MALDI-TOF/TOF MS/MS.

cystine core. Interestingly, the unfolding order in β-gB1 was CysIII‒V and CysII‒VI, followed by CysI‒IV, which is diferent from that of allotide Ac2, a CKAI from Allamanda cathartica leaves, and cliotide T24, a cyclotide from Clitoria ternatea fowers and leaves; in these cases, the most buried disulfde bond was the CysII‒V and the CysIII‒VI, respectively3. Te solvent-exposed outer disulfde bonds in β-gB1 (CysIII‒V and CysII‒VI) are labile to reducing agents compared with the CysI‒IV bond, which is buried inside the core.

Oxidative Folding of β-gB1. To determine the oxidative folding conditions, the reduced form of synthetic β-gB1 was obtained by solid-phase peptide synthesis and purifed by reversed-phase HPLC. Synthetic β-gB1 was subjected to oxidative folding in 13 conditions with diferent concentrations of co-solvent, redox reagents and duration, as shown in Table S1, Supporting information. All experiments were carried out in ammonium bicar- bonate bufer at a pH of 8.518. For the redox reagents, the use of the standard redox pair of reduced and oxidized glutathione (4:400 mM) produced a higher yield of 38.8% (Run 2) than the use of the less expensive redox pair of cysteineamine and cystamine (4:400 mM), which gave a yield of 28.9% (Run 1). Adding dimethyl sulfoxide (10% v/v) to the folding solution increased the oxidative folding yield from 38.8 (Run 3) to 55.3% (Run 4). Tis result agreed with our previous experience in the oxidative folding of cyclotides in which dimethyl sulfoxide served as a mild oxidizer and a strong aprotic polar solvent that minimized peptide aggregation18, 19. However, the use of isopropanol (20% v/v) as an organic co-solvent to enhance the conformational stability to aid folding was unsuccessful. Te folding yield was reduced to 17.4% (Runs 5 and 6), suggesting that the addition of organic solvent is unnecessary. To evaluate the efects of the ratios of redox reagents on the folding yield, three diferent concentrations of reduced and oxidized glutathione (4:400, 4:800 and 2:400 mM) were used. An increase of the reduced-to-oxidized glutathione ratio increased the folding yield (Runs 9 and 13), while the highly reducing con- ditions that were present at a ratio of 2:400 mM produced the highest yield, 73.4% (Run 13). Te oxidative folding apparently became efcient once the correct mix of redox reagents were determined, as the incubation time (2, 6 and 24 hr) had little efect; prolonging the folding reaction beyond 2 hr did not show signifcant improvement in yield. Together, the optimized folding conditions for β-gB1 are as follows: oxidative folding was initiated at room temperature by adding 0.1 mg of synthetic, reduced β-gB1 to 1 mL of ammonium bicarbonate bufer (0.1 M at pH 8.5) that contained 2 mM reduced glutathione, 400 μM oxidized glutathione and 10% (v/v) dimethyl sulfoxide, which was then incubated for 2 hr. Te folding yield of the synthetic β-gB1 under such optimized conditions was 76.7%. Te synthetic β-gB1 co-eluted with the native β-gB1 (Figure S2, Supporting information), suggesting that they folded similarly. In addition, NMR structural characterization was used to confrm the disulfde connectivity of synthetic β-gB1.

NMR Structure. To evaluate whether the synthetic β-gB1 has the same peptide fold as the native protein, their 2D NOESY spectra were measured. As shown in Fig. 4A, most of the NOE cross peaks from both the

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 4 www.nature.com/scientificreports/

Figure 4. NMR spectra and structures of β-gB1 (PDB: 5XIV). (A) Sequence and disulfde connectivity of β-gB1. (B) Te 20 lowest energies ensembles of β-gB1. (C) Overlapped 2D NOESY spectra of native (green) and synthetic β-gB1 (red) displayed by Sparky 3.115. (D) Te ribbon representation of the synthetic β-gB1 structure. Te disulfde bonds are formed between CysI–CysIV, CysII–CysVI, CysIII–CysV. (E) Surface topology of the synthetic bL1 structure. Te distribution of electrostatic charges was illustrated in red (negatively charged), blue (positively charged) and yellow (hydrophobic). Te positively charged surface (K6, R7 and R18) was opposed to the negatively charged surface (E2, D12 and E13).

synthetic and native forms (highlighted in green and red, respectively) overlapped, suggesting that the synthetic β-gB1 shared the same peptide fold as the native β-ginkgotide. Subsequently, the synthetic β-gB1 was employed for NMR characterization, which was performed using a combination of 2D 1H-1H-TOCSY and 1H-1H-NOESY (Tables S2 and S3, Supporting information). An unambiguous assignment was achieved based on the NOE cross peaks between the amide protons (HNi) and the side chain protons from the previous residue (Hαi-1, Hβi-1 and HΥi-1 if any), as well as the patterns of the side chain peaks in the TOCSY spectrum. Te NMR structure of β-gB1 was determined from a total of 206 NMR-derived distance restraints and 9 dihedral angle restraints. Te ensem- ble of the 20 lowest-energy structures had a root-mean-square deviation of 0.74 ± 0.30 Å and 1.77 ± 0.44 Å for the backbone and heavy atoms, respectively. Figure 4B shows the backbone traces of 20 ensembles with the lowest energies. Te spin-spin systems of β-gB1 were identifed and all proton resonances were unambiguously assigned. Figure S3 (Supporting information) shows the complete amide proton assignment of the NOESY spectrum. A

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 5 www.nature.com/scientificreports/

PROCHECK analysis indicated that all residues were distributed in the allowed region of the Ramachandran map. Overlapped NOESY spectra of the native and synthetic β-gB1 showed fve slightly shifed amide proton peaks (Fig. 4C), corresponding to residues D12, E13, Y14, G15 and I17, all of which are located in close proximity and in an acidic face of the molecule (Fig. 4E). Te peak shif could be due to the interaction between these resi- dues and impurities, most likely cationic secondary metabolites derived from the plant sample during our puri- fcation process. As shown in Fig. 4D, the structure of synthetic β-gB1 consisted of three turns with no defned secondary structure. To determine the disulfde connectivity of the synthetic β-gB1, the averaged overall energies of 15 diferent disulfde bond combinations were assessed by CNSsolve 1.3 and are summarized in Table S4, Supporting infor- mation. Figure S4 (Supporting information) shows the NOESY spectrum with NOEs that confrm the assigned pattern of disulfde bonds Te diferent disulfde bond patterns were assumed for structure calculations in which the same distance and dihedral angle restraints were uploaded for simulated annealing. In each combination, 20 structures with the lowest overall energies were selected. Te energies of the structures of the 15 disulfde combinations illustrated that pattern 1 had a lower energy (357.55 ± 5.56 kcal/mol) than all the other disulfde bond patterns, which displayed energies ranging from 466.59 ± 10.69 to 1004.41 ± 62.54 kcal/mol (Table S4, Supporting information). In comparison, the structural energies for the disulfde connectivity of the cystine knot peptide (pattern 2), thionin (pattern 15) and jasmintide (pattern 9) were 727.12 ± 178.71, 466.59 ± 10.69 and 471.87 ± 17.20 kcal/mol, respectively, suggesting that pattern 1 is the most stable disulfde arrangement and signifcantly diferent from other reported CRP families. As illustrated in Fig. 4D, the disulfde bond between Cys5 and Cys16 connects the N-terminus to the-C terminus. Te disulfde bonds Cys8‒Cys20 and Cys9‒Cys19 bring the C-terminus into proximity with the cystine core. Tese results suggested that β-gB1 possesses a novel disulfde connectivity of CysI‒IV, CysII‒VI, and CysIII‒V, which agrees with the disulfde connectivity that was determined by chemical means using the aforementioned partial alkylation. Te three-dimensional structure revealed that β-gB1 is bipolar and dumbbell-shaped, with a positive and a negative patch located at opposite ends, connected by a hydrophobic bar. We speculate that this arrangement may facilitate the interaction of β-ginkgotides with charged glycoproteins. Tis hypothesis was supported by our preliminary experiments, which shows that β-ginkgotide binds to heparin beads (data not shown). In addition, our recent studies suggest that such hyperdisulfde-constrained peptides are cell permeable and could interact with intercellular proteins20, 21. Figure 4E shows the surface topology of β-gB1; positive, negative and hydrophobic residues were highlighted in blue, red and yellow, respectively. Te positively charged residues (Lys6, Arg7 and Arg18) were located on the opposite surface of the negatively charged (Glu2, Asp12 and Glu13) residues, while the hydrophobic residues are concentrated in the core of the molecule. Te pairwise structural alignment algorithm TM-align was used to determine whether β-gB1 folded similarly to members of the other CRP families, and its results were summarized in Table S5, Supporting information. A TM-score <0.3 means that the two candidates have random structural similarity, while a value >0.5 means that they shared the same protein fold. β-Ginkgotide β-gB1 has an average TM-score value of 0.14, 0.14, 0.15, 0.16 and 0.18 compared to the thionins, carboxypeptidase inhibitors, CKAIs, 6C-hevein-like peptides and jasmintides, respectively. Together, these results suggested that β-gB1 has a novel protein fold that is signifcantly diferent from those of the other reported CRP families, mostly due to the pres- ence of an unusual cysteine spacing and the unique disulfde connectivity.

Comparison of Cysteine Frameworks in 6C-CRP Families. Te term “cysteine framework” refers to the cysteine spacing and disulfde connectivity in CRPs. Cysteine spacing defnes the positions of the cysteines within a peptide and the number of amino acids between cysteine residues, whereas the disulfde connectivity defnes how the cysteine residues connect in a three-dimensional space. Figure 5 compares diferent schematic disulfde-constrained frameworks of the β-ginkgotides and the 6C-CRPs, CRPs with six cysteine residues. Te cysteine framework of β-ginkgotides impacts their structure in two ways. First, it alters their loop size. Te amino acids between successive cysteine residues divide the peptide backbone into segments, commonly referred as loops. Cystine knot peptides comprised the majority of CRPs have a cysteine spacing of either C‒C‒CC‒C‒C, found in 6C-hevein-like peptides and CKAIs as 4-loop CRPs, or C‒C‒C‒C‒C‒C, found in carboxypeptidase inhibitors as 5-loop CRPs. Cyclotides which are head-to-tail cyclized knottins with a C‒C‒C‒C‒C‒C cysteine spacing are 6-loop CRPs. Compared to these four to six looped cystine-knot peptides, β-ginkgotides, with a cysteine spacing of C‒CC‒C‒CC, only have three loops because of the presence of two CC motifs (two adjacent cysteine residues). Te presence of a CC motif can be found in various CRPs such as CKAIs, thionins and jas- mintides; however, the novel cysteine spacing in β-gB1 has not yet been observed in other known plant CRPs. For example, the CC motif in 6C-hevein-like peptides and CKAIs is located at CysIII and CysIV, whereas in thionins, the CC motif is located near the N-terminus, at CysI and CysII. Interestingly, the cysteine spacing of jasmintides is a mirror image of that of thionins, whose CC motif is located near the C-terminus, at CysV and CysVI. Since each family CRPs share a common structural fold, the size and sequence variations of the displaying loops play an important role in diversifying their biological functions. Second, the cysteine framework in β-ginkgotides impacts their cystine-core constraint. β-Ginkgotides have a relatively small intercysteinyl distance, with 16 amino acid residues between the frst and sixth cysteine residue and an average cysteine content of 37.5% (number of cysteines per cysteine core). Tat translates to more than one out of every three residues that are located in the cysteine core (starting from the frst to the last cysteine) being a cysteine. In comparison, the number of intercysteinyl residues in carboxypeptidase inhibitors, 6C-hevein-like peptides, CKAIs, cyclotides, thionins and jasmintides are 27, 30, 25, 22–26, 35–38 and 22 residues, with an aver- age cysteine content of 22.2, 24.0, 20.7, 25.2, 16.5 and 27.3%, respectively. Tus, the 37.5% average cysteine con- tent in β-ginkgotides, compared with the known range of 16.5–27.3% of the other 6C-CRP families, suggests that the β-ginkgotides are hyperdisulfde-constrained.

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 6 www.nature.com/scientificreports/

Figure 5. Cysteine framework comparison of plant CRPs with six cysteine residues. β-Ginkgotide β-gB1 displayed a novel cysteine spacing of C–CC–C–CC, with the presence of two CC motif forming three intercysteinyl loops as compared to other CRPs consisting four to six intercystinyl loops. β-Ginkgotide β-gB1displayed a novel disulfde connectivity of CysI–CysIV, CysII–CysVI, CysIII–CysV, which difered from other known CRPs such as cystine knot peptides (CysI‒CysIV, CysII‒CysV and CysIII‒CysVI), thionins (CysI‒CysVI, CysII‒CysV and CysIV‒CysVI) and jasmintides (CysI‒CysV, CysII‒CysIV and CysIII‒CysVI). Intercysteinyl cysteine content refers to the percentage of cysteine present within the cystine core (between the frst and last cysteine), in which β-ginkgotide β-gB1displayed the highest content among others.

To determine the occurrence of the β-ginkgotide disulfde connectivity among diferent kingdoms, a pub- lic database search was performed, including searches of the Protein Data Bank22, Cybase23, PhytAMP24, YADAMP25, Knottin Database26, ATDB27, ConoServer28 and the ArachnoServer29. A total of 65 6C-CRPs in the mass range of 2.0‒3.0 kDa with a confrmed disulfde framework were found. Conotoxins with a disulfde spacing of CC‒C‒C‒CC and a cystine knot connectivity were the most common framework, followed by the cyclotide disulfde spacing of C‒C‒C‒C‒C‒C, with a cystine knot, and the θ-defensin disulfde spacing of CC‒C‒C‒CC, with a symmetric disulfde connectivity. However, no cysteine framework similar to that of β-ginkgotide was found in these databases, suggesting that the β-ginkgotide disulfde connectivity is novel not only among CRPs in plants but also among those in animals.

Transcriptomic Data Mining of β-Ginkgotide Homologs In Planta. Te full-length β-ginkgotide precursor sequence, namely, β-gb1, which encoded β-gB1 and β-gB2, was obtained from the OneKP transcrip- tome database30. A tBALSTn using the β-gB1 sequence as a query was performed to determine the distribution of its homologs in planta. Te search results revealed 101 sequence homologs from 67 diferent plants. A manual fltering was employed to remove the precursor sequences that consisted of incomplete sequences with a signal peptide <10 amino acids, identical sequences from the same plant and sequences with an odd number of cysteine residues. Subsequently, a total of 76 putative β-ginkgotide homologs were identifed in 59 plants (Table S6, Supporting information). Ten of the putative mature peptides were expressed in multiple plants, which were composed of identical mature peptides that had diferent propeptides and/or C-terminal tails. For example, aC1 was expressed in ten diferent plants from the same plant family (Cupressaceae), including Austrocedrus chilensis, Calocedrus decurrens, Fokienia hodginsii, Juniperus scopulorum, Microbiota decussata, Neocallitropsis pancheri, Papuacedrus papuana, Pilgerodendron uviferum, Callitris gracilis and Callitris macleayana (Table S6, Supporting information). Tis fnding is consistent with the occurrences of the natural products, that is, the same product can be found in diferent species or families. To determine the similarity between the β-ginkgotide-like peptides and the other CRPs, their full precursor sequences were aligned using Kalign and analyzed using a neighbor-joining clustering algorithm. Te clustering results were displayed as a phylogenetic tree in Fig. 6, with each sequence assigned to its corresponding CRP fam- ily, plant family and plant clade. Two major clusters were observed; the β-ginkgotides and β-ginkgotide-like pep- tides (highlighted in dark brown in Fig. 6A) were separated from other CRPs such as the 6C-hevein-like peptides, CKAIs and cyclotides. As shown in Fig. 6B and C, the β-ginkgotides and β-ginkgotide-like peptides are found only in gymnosperms and are distributed among six families, Pinaceae, Podocarpaceae, Taxaceae, Cupressaceae, Gnetaceae and Welwitschiaceae. In contrast, all the other 6C-CRPs are found in both monocotyledons and dicot- yledons of angiosperms such as Poaceae, Solanaceae and Fabaceae. Figure 7 illustrates the evolution of modern seed plants based on fossil records and genetic analyses31, 32. Christenhusz et al. classifed gymnosperms into four distinct subclasses, including Ginkgoidae, , and Cycadidae33. Modern seed plants were evolved from a common ancestor, the progymnosperms, and were separated into two major clades, Cordaitales and Lyginopteridales, in the period 354 million years ago. Cordaitales later evolved into Ginkgopsida, Pinopsida and Gnetopsida, whereas Cycadopsida and angio- sperms evolved from Lyginopteridales34. Data mining showed that the β-ginkgotides and β-ginkgotide-like peptides are distributed only in the Ginkgoidae, Gnetidae and Pinidae but are absent in the and the angiosperms, suggesting that these compounds may have potentially originated from the same ancestor, Cordaitales35, 36. Tese results are in agreement with the previously described genetic analyses37 and provide an

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 7 www.nature.com/scientificreports/

Figure 6. Phylogenetic tree of β-ginkgotide β-gB1, β-ginkgotide-like peptides and other reported CRPs with six cysteine residues. Te precursor sequences β-gB1, β-ginkgotide-like peptides, jasmintides, 6C-hevein-like peptides, thionins, carboxypeptidase inhibitors, cystine knot α-amylase inhibitors and cyclotides were aligned by Kalign and the phylogenetic tree was generated by iTOL. Classifcation based on (A) the number of cysteine residues in the mature domain, (B) plant’s family and (C) plant’s clade.

Figure 7. Occurrence of β-ginkgotide-like peptides in planta. Modern seed plants were evolved from a common ancestor progymnosperms and were separated into two major clades, Cordaitales and Lyginopteridales. Ginkgotide bL1 and bL1-like peptides were distributed only in Ginkgopsidae, Gnetidae and Pinidae (highlighted in green) but absent in Cycadidae and angiosperms.

explanation for the absence of the β-ginkgotide-like peptides in angiosperms. Taken together, the transcriptomic data mining and neighbor-joining clustering analysis demonstrated that the β-ginkgotides belong to a new family of CRPs, are distributed only in gymnosperms and are absent in angiosperms.

β-Ginkgotide Biosynthesis. Figure 8A displays the selected precursor sequences of the β-ginkgotides and β-ginkgotide-like peptides, CKAIs, 6C-hevein-like peptides, carboxypeptidase inhibitors cyclotides, thionins and jasmintides. Te β-ginkgotides and β-ginkgotide-like peptides shared the same four-domain architecture that includes an endoplasmic reticulum signal peptide, a propeptide, a mature peptide and a C-terminal tail. Te pres- ence of the signal peptides suggests that they are secretory peptides, similar to the case in other known CRPs3–5, 11, 38. Te release of the mature peptide involves the removal of the signal peptide by a signal peptidase and the

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 8 www.nature.com/scientificreports/

Figure 8. Precursor sequences alignment of β-ginkgotide β-gb1 and twelve selected β-ginkgotide-like peptides. (A) Te precursors are divided into four major domains, including a signal peptide, a propeptide, a mature peptide and a C-terminal tail. (B) Te mature peptides of 76 β-ginkgotide-like peptides were summarized as a Weblogo. Te cysteine, positive and negative charged residues were highlighted with yellow, blue and red, respectively.

Figure 9. Summary of precursor sequences of β-gB1-like peptides and other known CRPs with six cysteine residues. Te number on top of each domain represents the average number of amino acids.

cleavage of the propeptide and the C-terminal tail by an endopeptidase. Te cleavage site between the propeptide and the mature peptide, where the peptide bond between the His and the Tyr was cleaved, was highly conserved. Subsequently, the C-terminal tail was removed, and the mature peptide was released for further post-translational modifcation in the Golgi apparatus and to be packed into vesicles for secretion. Figure 8B illustrates the sequence logo that was obtained from the aligned sequences of the β-gB1 and 76 β-gB1-like peptides. Te overall height of a stack represents the sequence conservation, while the height of each symbol within a stack represents the relative frequency with which the amino acid occurred39. Loop 2 was the longest, with six amino acids; fve of these were highly conserved including Tyr10, Leu11, Asp12, Glu13 and Gly15. Both loops 1 and 3 comprised two amino acids. Interestingly, the majority of positive residues (Lys and Arg) were located in loop 2, whereas the negative residues (Asp and Glu) were located in loop 3. A distinguishing characteristic of the β-ginkgotides and β-ginkgotide-like peptides, compared with other CRPs, was their relatively short full precursor sequence (80 aa) (Fig. 9). Te β-ginkgotide and β-ginkgotide-like peptides have a relative longer C-tail (range: 2‒29 aa; mean: 11 aa) than those of carboxypeptidase inhibitors (range: 7 aa; mean: 7 aa) and CKAIs (no tail), but the lengths are similar to those of the 6C-hevein-like peptides (range: 29‒30 aa; mean 29.7 aa). Furthermore, the N-terminal repeated region in the cyclotides was absent in the β-ginkgotides and their homologs. Tese results suggest that the precursor sequence arrangement is conserved among β-ginkgotides and β-ginkgotide-like peptides but is signifcantly diferent in other CRPs. Tis knowledge of the β-ginkgotide and β-ginkgotide-like peptide precursors provides insight into their biosynthesis and is ben- efcial in developing a bacterial expression system and transgenic crops in the near future.

Stability of β-Ginkgotide. To assess whether β-gB1 is relevant as a potential active component in tradi- tional medicine, stability assays that simulated a decoction preparation process and mimicked a digestive system were employed and were illustrated in Figure S5, Supporting information. UPLC and MS data revealed that more than 75 and 90% of the β-gB1 remained intact afer being boiled in water at 100 °C and incubated under acidic conditions at 37 °C for 1 hr, respectively. Likewise, β-gB1 was resistant to the endopeptidase trypsin and the exo- peptidase carboxypeptidase A for up to 6 hr, with >90% of the peptide remaining.

Cytotoxicity, Hemolytic and Antibacterial Efects of β-Ginkgotide. β-gB1 was used as a repre- sentative example to examine its antimicrobial, hemolytic and cytotoxic activities. Radial difusion assays were used to determine the efect of β-gB1 on human pathogenic fungi and bacteria. No signifcant inhibitory efect

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 9 www.nature.com/scientificreports/

of β-gB1was observed against Escherichia coli, Candida albicans and Candida tropicalis at concentrations up to 100 µM. Similarly, no hemolytic efect of β-gB1 was observed against erythrocytes obtained from human type O blood at concentrations up to 100 µM. Lastly, Vero and BHK cell lines were used for studying the in vitro cytotoxic efect that was exerted by β-gB1. Te peptide was found to be nontoxic to both cell lines at concentrations up to 100 µM. Together with its high stability against harsh treatments, these results suggest that β-gB1 is nontoxic and can be used as a scafold for peptide engineering. Conclusion This study describes the discovery, characterization and synthesis of a new CRP family, the β-ginkgotides from G. biloba. The β-ginkgotides differ from other known families of CRPs by the following features: (1) short mature domain with 18‒20 amino acids and six cysteine residues, (2) unique cysteine spacing with CC motifs such as C‒CC‒C‒CC, (3) a novel disulfde connectivity of CysI‒IV, CysII‒VI and CysIII‒V, (4) a hyperdisulfde-constrained, compact structure with only loops and no regular secondary structure elements, and (5) they are gymnosperm-specifc. Data mining found an additional 76 β-ginkgotide-like peptides in planta. In addition, a cluster- ing analysis confirmed that the β-ginkgotides and β-ginkgotide-like peptides represent the first-in-class hyperdisulfde-constrained peptide family. Overall, our discovery of this family expands the existing families of the CRPs in the chemical space of 2–6 kDa and provides new insights to their sequence diversity, cysteine arrangement, structure, biosynthesis, occurrences and distribution in planta. Materials and Methods Plant Materials. G. biloba seeds were purchased from a local herb distributor (Hung Soon Medical Trading Ltd., Singapore) and authenticated by an experienced traditional Chinese medicine practitioner from the Nanyang Technological University Traditional Chinese Medicine Clinic, Singapore. Te authentication cri- teria were based on the macroscopic and microscopic characteristics described in the Pharmacopoeia of the People’s Republic of China7, 40, 41. A voucher sample (NTUH-SGB20161010-01) was deposited at the Nanyang Technological University Herbarium, School of Biological Sciences, Singapore.

Purifcation of β-Ginkgotide. G. biloba seeds (2 kg) were homogenized in 20 L of MilliQ water and incu- bated for 2 hr at room temperature. Te homogenate was centrifuged at 9000 rpm for 20 min, and the supernatant was loaded onto a reversed-phase fash column with 500 g of C18 powder (Grace, MD, USA) packed in a Büchner funnel (250 × 22 mm). Te elution was carried out using increasing concentrations of ethanol (20, 40, 60 and 80% v/v). Te 20 and 40% eluents were loaded onto a Sepharose Fast Flow SP (GE Healthcare Life Sciences, Little Chalfont, United Kingdom) fash column. Te column was percolated with 20 mM potassium dihydrogen phos- phate at a pH of 3.0 and eluted with 20 mM potassium dihydrogen phosphate with 1 M sodium chloride at a pH of 3.0. To obtain the purifed peptides, multiple rounds of preparative reversed-phase- (RP-) and strong cation exchange-high-performance liquid chromatography (SCX-HPLC) were performed as previously described5, 42. Te purifcation was performed on a Prominence UFLC system (Shimadzu, Kyoto, Japan) attached to an Aeris peptide XB-C18 column (Phenomenex, CA, USA; particle size 5 µm, 250 × 22 mm) or a polysulfoethyl A column (PolyLC, MA, USA; particle size 10 µm, 250 × 22 mm). β-Ginkgotide Sequence Determination. Te β-ginkgotide primary amino acid sequence was deter- mined as described previously5. Briefy, lyophilized β-ginkgotide (10 µg) was reconstituted in 30 µL of 20 mM dithiothreitol (DTT) and incubated at 37 °C for 1 hr. Te reduced peptide was alkylated with 200 mM iodoaceta- mide at 37 °C for 1 hr in the dark, desalted using a C18 Zip-tip, and lyophilized. Te S-alkylated β-ginkgotide was analyzed by an ABI 4800 matrix-assisted laser desorption/ionization time-of-fight mass spectrometer (MALDI-TOF MS) (Applied Biosystem, MA, USA). Te sample (0.5 µL) was mixed with 0.5 µL of matrix consist- ing of a saturated solution of α-cyano-4-hydroxycinnamic acid in 50/50 (v/v) acetonitrile/water with 0.1% trifuo- roacetic acid. Te mixture was spotted onto a MALDI plate and dried at room temperature. Te MALDI-TOF MS was operated in positive ion refector mode, acquiring 2000 shots (20 positions per spot; 100 shots per position) per spectrum with a laser intensity at 4500. Te accelerating and grid voltages were set at 20 and 16 kV, respec- tively. Te data were acquired between 2000 and 5000 Da using AB 4000 Series Explorer v3.5.1 and were analyzed using Data Explorer v4.9 (Applied Biosystem, MA, USA). Te primary sequence was obtained by interpreting the b- and y-ions formed during the tandem MS/MS fragmentation. Te assignments of isobaric residues Ile/Leu and Lys/Gln were based on transcriptomic data obtained from the One thousand plants database (OneKP; accession: SGTW-2035618)43.

Disulfde Mapping. β-Ginkgotide β-gB1 was partially reduced in 400 µL of 100 mM citrate bufer (pH 3.0) with 50 mM tris(2-carboxyethyl)phosphine at 37 °C for 17 min. Subsequently, 250 mM N-ethylmaleimide (NEM) was added and incubated at 37 °C for 1 hr. Te reaction mixture with the partially alkylated peptide was moni- tored using MALDI-TOF MS and purifed by RP-HPLC. Te two- and four-NEM alkylated intermediate species were further reduced with 20 mM dithiothreitol at 37 °C for 1 hr and alkylated with 40 mM iodoacetamide at 37 °C for 1 hr. Te fully alkylated β-gB1 was desalted and sequenced by MALDI-TOF/TOF MS/MS.

Stability Assay. Heat. Te β-ginkgotide β-gB1 (28 µg) was incubated in a heat block at 100 °C for 2 hr. At each time-point (0 and 1 hr), 20 µL of the treated sample was aliquoted and quenched in an ice bath for 10 min.

Acid. Te β-ginkgotide β-gB1 (28 µg) was incubated in 1 M hydrochloride acid (pH 2.0) at room temperature for 2 hr. At each time-point (0 and 1 hr), 20 µL of the treated sample was aliquoted and quenched by adding 2 µL of 1 M sodium hydroxide.

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 10 www.nature.com/scientificreports/

Trypsin. Te β-ginkgotide β-gB1 (28 µg) was added to 100 µL of 100 mM ammonium bicarbonate bufer (pH 7.8) and incubated in a water bath at 37 °C for 6 hr. At each time-point (0 and 6 hr), 20 µL of the treated sample was aliquoted and quenched by adding 5 µL of 1 M hydrochloric acid.

Carboxypeptidase A. Te β-ginkgotide β-gB1 (28 µg) was added to a solution of 50 mM Tris-HCl (pH 7.5) and 100 mM sodium chloride with 100 nM carboxypeptidase B. Te mixture was incubated in a water bath at 37 °C for 6 hr. At each time-point (0 and 6 hr), 20 µL of the treated sample was aliquoted and quenched by adding 5 µL of 1 M hydrochloric acid.

Analytical HPLC. Te reaction mixture was analyzed by a Nexera UHPLC system (Shimadzu, Kyoto, Japan) equipped with an Aeris peptide XB-C18 column (Phenomenex, CA, USA; particle size 3.6 µm, 100 × 4.6 mm). Te eluents were monitored with MALDI-TOF MS, and the area under the peak before and afer the treatment was determined to evaluate stability.

Synthesis and Oxidative Folding of β-Ginkgotide. β-Ginkgotide β-gB1, obtained from GL Biochem (Shanghai, China), was synthesized by solid phase peptide synthesis using Fmoc (N-(9-fuorenyl)methoxycar- bonyl) chemistry and purifed by C18-reversed-phase HPLC to 85% purity. Optimization of the folding condi- tions was performed by varying the concentrations of co-solvent and redox reagents as well as the duration and was summarized in Table S1 (Supporting information). Te fnal peptide concentration was 42.12 μM. Afer 2 hr, the reactions were quenched by acidifcation with 2 M hydrochloric acid, and the products were profled using RP-UPLC. Native β-gB1 that was purifed from the plant was used as a standard to monitor the folding process. Te yield was determined by comparing the area under the peak of β-gB1 before and afer the folding reaction.

NMR Structure Determination. 2D NOESY and TOCSY experiments were used to achieve an unambig- uous peak assignment and determine the structure of β-gB1. All the NMR spectra were acquired on a 600 MHz NMR spectrometer (Bruker, IL, USA) equipped with a cryogenic probe. Te temperature of the NMR experiment was 298 K. Te protein concentration of the β-gB1 sample was 1 mM and contained 5% D2O and 95% H2O (pH 3.5). Te mixing times were 80 ms and 200 ms for TOCSY and NOESY respectively. Te spectrum width was set to 12 ppm with the 1H carrier frequency set at 4.700 ppm. Te spectra were processed using NMRPipe sof- ware44. Te NOE cross-peaks were assigned using Sparky sofware on the 2D NOESY and TOCSY spectra45. Te structure determination was performed using the method of simulated annealing as defned by the script of the CNSsolve 1.3 sofware46. Te distance restraints were loaded for the structure calculation and were frst divided into three classes based on the intensities of the NOE peaks: strong, 0 < d ≤ 1.8 Å; medium, 1.8 < d ≤ 3.4 Å; and weak, 3.4 < d ≤ 5 Å. Dihedral angle restraints were determined based on the J coupling splitting, JHN-Hα, in the 1D-NMR spectrum. Te φ angle was considered to be between −100 and −160° when the splitting was more than 8 Hz. Te structure was verifed using the PROCHECK program47 and displayed using Chimera version 1.6.248 or Pymol version 1.849. Te NMR structure of β-gB1 was deposited in the Protein Data Bank with an acces- sion number of 5XIV with a BMRB number of 36079.

Biological Assay. Cytotoxicity. Human umbilical vein endothelial cells (HUVECs) were seeded onto 96-well plates in DMEM medium (10% FBS and 1% penicillin and streptomycin) and grown at 37 °C with a 5% CO2 atmosphere until confuence. Afer 24 hr, various concentrations of β-ginkgotide β-gB1 (1–100 µM) were added, and the plates were incubated for 24 hr. Te cell viability was assessed by 3-(4,5-dimethylthiazol-2-yl)-2,5- diphenyltetrazolium (MTT) assay. Triton X-100 solution (1%) was used as a positive control4.

Hemolytic effect. Erythrocytes were isolated from fresh blood type O+ by centrifugation for 15 min at 10000 rpm. Te erythrocytes were diluted with phosphate-bufered saline, mixed with various concentrations of β-ginkgotide β-gB1 (1–100 µM) and incubated for 4 hr at 37 °C. Te hemolytic efect was examined using a microplate reader set at 415 nm3.

Antibacterial efect. Radial difusion assays were performed on gram-negative Escherichia coli (ATCC 25922) and gram-positive Staphylococcus aureus (ATCC 12600) as described previously11, 38.

Data Mining and Bioinformatics Analysis. Te transcriptomic database search was performed on public domains GenBank and OneKP30. Te cleavage site of the signal peptide in the precursor sequence was deter- mined by SignalP 4.050. Te primary amino acid sequence and precursor sequence alignments were performed using Kalign51. Neighbor-joining clustering analysis was performed on MEGA 6.052, and a phylogenetic tree was constructed using a bootstrap test of 1000 replicates. Te phylogenetic tree was displayed using iTOL53. Te sequence logo were generated with WebLogo39. Te pairwise structural alignment was performed using TM-align algorithm54. References 1. Tam, J. P., Wang, S., Wong, K. H. & Tan, W. L. Antimicrobial peptides from plants. Pharmaceuticals 8, 711–757, doi:10.3390/ ph8040711 (2015). 2. Majumdar, S. & Siahaan, T. J. Peptide-mediated targeted drug delivery. Med Res Rev 32, 637–658, doi:10.1002/med.20225 (2012). 3. Nguyen, G. K. et al. Discovery and characterization of novel cyclotides originated from chimeric precursors consisting of albumin-1 chain a and cyclotide domains in the Fabaceae family. Te Journal of biological chemistry 286, 24275–24287, doi:10.1074/jbc. M111.229922 (2011). 4. Nguyen, P. Q. et al. Allotides: Proline-Rich Cystine Knot alpha-Amylase Inhibitors from Allamanda cathartica. J Nat Prod 78, 695–704, doi:10.1021/np500866c (2015).

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 11 www.nature.com/scientificreports/

5. Wong, K. H. et al. Ginkgotides: Proline-Rich Hevein-Like Peptides from Gymnosperm Ginkgo biloba. Frontiers in Plant Science 7, 1639 (2016). 6. Kini, S. G., Wong, K. H., Tan, W. L., Xiao, T. & Tam, J. P. Morintides: cargo-free chitin-binding peptides from Moringa oleifera. BMC Plant Biol 17, 68, doi:10.1186/s12870-017-1014-6 (2017). 7. Pharmacopoeia of the People’s Republic of China, C. P. C., People’s Medical Publishing House, Beiging, China Ginkgo Semen (2010). 8. Liu, Y., Wang, Z. & Zhang, J. Dietary Chinese Herbs: Chemistry, Pharmacology and Clinical Evidence (Springer Vienna, 2015). 9. Miwa, H., Iijima, M., Tanaka, S. & Mizuno, Y. Generalized Convulsions Afer Consuming a Large Amount of Gingko Nuts. Epilepsia 42, 280–281, doi:10.1046/j.1528-1157.2001.33100.x (2001). 10. Mahadevan, S., Park, Y. & Park, Y. Modulation of cholesterol metabolism by Ginkgo biloba L. nuts and their extract. Food Research International 41, 89–95, doi:10.1016/j.foodres.2007.10.002 (2008). 11. Kumari, G. et al. Cysteine-Rich Peptide Family with Unusual Disulfde Connectivity from Jasminum sambac. J Nat Prod 78, 2791–2799, doi:10.1021/acs.jnatprod.5b00762 (2015). 12. Du, W. H. et al. Solution structure of an M-1 conotoxin with a novel disulfide linkage. The FEBS journal 274, 2596–2602, doi:10.1111/j.1742-4658.2007.05795.x (2007). 13. Hunter, H. N., Fulton, D. B., Ganz, T. & Vogel, H. J. Te solution structure of human hepcidin, a peptide hormone with antimicrobial activity that is involved in iron uptake and hereditary hemochromatosis. Te Journal of biological chemistry 277, 37597–37603, doi:10.1074/jbc.M205305200 (2002). 14. Milbradt, A. G., Boulegue, C., Moroder, L. & Renner, C. The two cysteine-rich head domains of minicollagen from Hydra nematocysts difer in their cystine framework and overall fold despite an identical cysteine sequence pattern. J Mol Biol 354, 591–600, doi:10.1016/j.jmb.2005.09.080 (2005). 15. Gracy, J. et al. KNOTTIN: the knottin or inhibitor cystine knot scafold in 2007. Nucleic acids research 36, D314–319, doi:10.1093/ nar/gkm939 (2008). 16. Stec, B. Plant thionins–the structural perspective. Cellular and molecular life sciences: CMLS 63, 1370–1385, doi:10.1007/s00018- 005-5574-5 (2006). 17. Iyer, S. & Acharya, K. R. Tying the knot: the cystine signature and molecular-recognition processes of the vascular endothelial growth factor family of angiogenic cytokines. Te FEBS journal 278, 4304–4322, doi:10.1111/j.1742-4658.2011.08350.x (2011). 18. Wong, C. T., Taichi, M., Nishio, H., Nishiuchi, Y. & Tam, J. P. Optimal oxidative folding of the novel antimicrobial cyclotide from Hedyotis bifora requires high alcohol concentrations. Biochemistry 50, 7275–7283, doi:10.1021/bi2007004 (2011). 19. Tam, J. P., Wu, C. R., Liu, W. & Zhang, J. W. Disulfde bond formation in peptides by dimethyl sulfoxide. Scope and applications. Journal of the American Chemical Society 113, 6657–6662, doi:10.1021/ja00017a044 (1991). 20. Huang, Y. H., Chaousis, S., Cheneval, O., Craik, D. J. & Henriques, S. T. Optimization of the cyclotide framework to improve cell penetration properties. Front Pharmacol 6, 17, doi:10.3389/fphar.2015.00017 (2015). 21. Wong, C. T. et al. Orally active peptidic bradykinin B1 receptor antagonists engineered from a cyclotide scafold for infammatory pain treatment. Angew Chem Int Ed Engl 51, 5620–5624, doi:10.1002/anie.201200984 (2012). 22. Berman, H. M. et al. Te Protein Data Bank. Nucleic acids research 28, 235–242 (2000). 23. Mulvenna, J. P., Wang, C. & Craik, D. J. CyBase: a database of cyclic protein sequence and structure. Nucleic acids research 34, D192–194, doi:10.1093/nar/gkj005 (2006). 24. Hammami, R., Ben Hamida, J., Vergoten, G. & Fliss, I. PhytAMP: a database dedicated to antimicrobial plant peptides. Nucleic acids research 37, D963–968, doi:10.1093/nar/gkn655 (2009). 25. Piotto, S. P., Sessa, L., Concilio, S. & Iannelli, P. YADAMP: yet another database of antimicrobial peptides. Int J Antimicrob Agents 39, 346–351, doi:10.1016/j.ijantimicag.2011.12.003 (2012). 26. Gelly, J. C. et al. Te KNOTTIN website and database: a new information system dedicated to the knottin scafold. Nucleic acids research 32, D156–159, doi:10.1093/nar/gkh015 (2004). 27. He, Q. et al. ATDB 2.0: A database integrated toxin-ion channel interaction data. Toxicon 56, 644–647, doi:10.1016/j. toxicon.2010.05.013 (2010). 28. Kaas, Q., Yu, R., Jin, A. H., Dutertre, S. & Craik, D. J. ConoServer: updated content, knowledge, and discovery tools in the conopeptide database. Nucleic acids research 40, D325–330, doi:10.1093/nar/gkr886 (2012). 29. Herzig, V. et al. ArachnoServer 2.0, an updated online resource for spider toxin sequences and structures. Nucleic acids research 39, D653–657, doi:10.1093/nar/gkq1058 (2011). 30. OneKP. Te one thousand plants project, https://sites.google.com/a/ualberta.ca/onekp/home (2015). 31. Savard, L. et al. Chloroplast and nuclear gene sequences indicate late Pennsylvanian time for the last common ancestor of extant seed plants. Proceedings of the National Academy of Sciences of the United States of America 91, 5163–5167 (1994). 32. Burleigh, J. G. & Mathews, S. Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. American journal of botany 91, 1599–1613, doi:10.3732/ajb.91.10.1599 (2004). 33. Christenhusz, M. et al. A new classifcation and linear sequence of extant gymnosperms. Phytotaxa (2011). 34. Wang, D. & Liu, L. A new Late genus with seed plant afnities. BMC Evol Biol 15, 28, doi:10.1186/s12862-015-0292-6 (2015). 35. Wang, X. Q. & Ran, J. H. Evolution and biogeography of gymnosperms. Mol Phylogenet Evol 75, 24–40, doi:10.1016/j. ympev.2014.02.005 (2014). 36. Mathews, S. & Kramer, E. M. Te evolution of reproductive structures in seed plants: a re-examination based on insights from developmental genetics. New Phytol 194, 910–923, doi:10.1111/j.1469-8137.2012.04091.x (2012). 37. Linkies, A., Graeber, K., Knight, C. & Leubner-Metzger, G. Te evolution of seeds. New Phytol 186, 817–831, doi:10.1111/j.1469- 8137.2010.03249.x (2010). 38. Kini, S. G. et al. Studies on the Chitin-Binding Property of Novel Cysteine-Rich Peptides from Alternanthera sessilis. Biochemistry 3, 6639–6649, doi:10.1021/acs.biochem.5b00872 (2015). 39. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res 14, 1188–1190, doi:10.1101/gr.849004 (2004). 40. Wong, K. H., Razmovski‐Naumovski, V., Li, K. M., Li, G. Q. & Chan, K. Te quality control of two Pueraria species using Raman spectroscopy coupled with partial least squares analysis. Journal of Raman Spectroscopy 46, 361–368 (2015). 41. Wong, K. H., Razmovski-Naumovski, V., Li, K. M., Li, G. Q. & Chan, K. Comparing morphological, chemical and anti-diabetic characteristics of Puerariae Lobatae Radix and Puerariae Thomsonii Radix. J Ethnopharmacol 164, 53–63, doi:10.1016/j. jep.2014.12.050 (2015). 42. Wong, K. H., Li, G. Q., Li, K. M., Razmovski-Naumovski, V. & Chan, K. Optimisation of Pueraria isofavonoids by response surface methodology using ultrasonic-assisted extraction. Food Chemistry 231, 231–237 (2017). 43. Matasci, N. et al. Data access for the 1,000 Plants (1KP) project. Gigascience 3, 17, doi:10.1186/2047-217X-3-17 (2014). 44. Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6(3), 277–293, doi:10.1007/BF00197809 (1995). 45. Goddard, T. D. & Kneller, D. G. Sparky 3 (University of California, San Francisco). 46. Brunger, A. T. et al. Crystallography & NMR system: A new sofware suite for macromolecular structure determination. Acta Crystallogr D Biol Crystallogr 54, 905–921, doi:10.1107/S0907444998003254 (1998).

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 12 www.nature.com/scientificreports/

47. Laskowski, R. A., Rullmannn, J. A., MacArthur, M. W., Kaptein, R. & Tornton, J. M. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol NMR 8, 477–486 (1996). 48. Huang, C. C., Couch, G. S., Pettersen, E. F. & Ferrin, T. E. Chimera: An Extensible Molecular Modeling Application Constructed Using Standard Components. Pacifc Symposium on Biocomputing 1, 724 (1996). 49. Schrodinger, L. L. C. Te PyMOL Molecular Graphics System, Version 1.8 (2015). 50. Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods 8, 785–786, doi:10.1038/nmeth.1701 (2011). 51. Lassmann, T., Frings, O. & Sonnhammer, E. L. Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features. Nucleic acids research 37, 858–865, doi:10.1093/nar/gkn1006 (2009). 52. Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol 33, 1870–1874, doi:10.1093/molbev/msw054 (2016). 53. Letunic, I. & Bork, P. Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic acids research 39, W475–478, doi:10.1093/nar/gkr201 (2011). 54. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic acids research 33, 2302–2309, doi:10.1093/nar/gki524 (2005). Acknowledgements This research was supported by a Competitive Research Grant from the National Research Foundation in Singapore (NRF-CRP8-2011-05). We would like to thanks Ms Debby M. Siow, Ms Yue Q. Loh and Ms Wei L. Liew from Nanyang Technological University in performing the preliminary screening. Author Contributions J.P.T., K.H.W., and W.L.T. conceived and designed the experiments. K.H.W., W.L.T. and T.X. performed the experiments, analyzed the data, and wrote the manuscript. J.P.T. revised the manuscript. All authors read and approved the fnal version of the manuscript. Additional Information Supplementary information accompanies this paper at doi:10.1038/s41598-017-06598-x Competing Interests: Te authors declare that they have no competing interests. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© Te Author(s) 2017

Scientific RePorTs | 7: 6140 | DOI:10.1038/s41598-017-06598-x 13