<<

Digital Comprehensive Summaries of Uppsala Dissertations Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 282 from the Faculty of Pharmacy 282

Exploring evolutionary Exploring evolutionary and chemical space using and chemical space using chemoinformatic tools chemoinformatic tools and traditional methods in and traditional methods in pharmacognosy pharmacognosy

ASTRID HENZ RYEN ASTRID HENZ RYEN

ACTA ACTA UNIVERSITATIS UNIVERSITATIS UPSALIENSIS ISSN 1651-6192 UPSALIENSIS ISSN 1651-6192 UPPSALA ISBN 978-91-513-0843-2 UPPSALA ISBN 978-91-513-0843-2 2020 urn:nbn:se:uu:diva-399068 2020 urn:nbn:se:uu:diva-399068 Dissertation presented at Uppsala University to be publicly examined in C4:305, BMC, Dissertation presented at Uppsala University to be publicly examined in C4:305, BMC, Husargatan 3, Uppsala, Friday, 14 February 2020 at 09:15 for the degree of Doctor of Husargatan 3, Uppsala, Friday, 14 February 2020 at 09:15 for the degree of Doctor of Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty examiner: Associate Professor Fernando B. Da Costa (School of Pharmaceutical Sciences of examiner: Associate Professor Fernando B. Da Costa (School of Pharmaceutical Sciences of Ribeiraõ Preto, University of Saõ Paulo). Ribeiraõ Preto, University of Saõ Paulo). Abstract Abstract Henz Ryen, A. 2020. Exploring evolutionary and chemical space using chemoinformatic Henz Ryen, A. 2020. Exploring evolutionary and chemical space using chemoinformatic tools and traditional methods in pharmacognosy. Digital Comprehensive Summaries of tools and traditional methods in pharmacognosy. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 282. 80 pp. Uppsala: Acta Universitatis Uppsala Dissertations from the Faculty of Pharmacy 282. 80 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0843-2. Upsaliensis. ISBN 978-91-513-0843-2.

The number of new drugs coming to the market is declining while interest in lead discovery from The number of new drugs coming to the market is declining while interest in lead discovery from natural resources is seeing a revival. Although methods for isolation and identification of natural natural resources is seeing a revival. Although methods for isolation and identification of natural products have advanced tremendously, methods for selection of potential leads have fallen products have advanced tremendously, methods for selection of potential leads have fallen behind. As part of the Marie Curie ITN “MedPlant: Phylogenetic exploration of medicinal behind. As part of the Marie Curie ITN “MedPlant: Phylogenetic exploration of medicinal diversity” this thesis contributed to the exploration of chemical diversity in angiosperms plant diversity” this thesis contributed to the exploration of chemical diversity in angiosperms and the development of new tools to analyze and define the chemical potential of a plant. and the development of new tools to analyze and define the chemical potential of a plant. In Paper I, it was demonstrated that physicochemical properties of selected specialized In Paper I, it was demonstrated that physicochemical properties of selected specialized metabolites change in different plant groups. Changes in properties were assessed using metabolites change in different plant groups. Changes in properties were assessed using ChemGPS-NP and diversity was quantified by calculating the volume occupied by the ChemGPS-NP and diversity was quantified by calculating the volume occupied by the compounds in chemical space. By discussing the results against the background of possible compounds in chemical space. By discussing the results against the background of possible underlying evolutionary mechanisms, it was concluded that evolutionary processes are reflected underlying evolutionary mechanisms, it was concluded that evolutionary processes are reflected in chemical property space. These results hold great value for further studies on the evolution in chemical property space. These results hold great value for further studies on the evolution of chemical diversity and biochemical traits in . The methods developed can be used e.g. of chemical diversity and biochemical traits in plants. The methods developed can be used e.g. to define and predict the chemical diversity of related taxa, providing a strategy for a guided to define and predict the chemical diversity of related taxa, providing a strategy for a guided plant selection in search for new drug leads. plant selection in search for new drug leads. In Paper II, the scaffold and molecular diversity of over 5,200 sesquiterpene lactones (STLs) In Paper II, the scaffold and molecular diversity of over 5,200 sesquiterpene lactones (STLs) was investigated, using different chemoinformatic tools. Quantity and distribution of skeleton was investigated, using different chemoinformatic tools. Quantity and distribution of skeleton classes was determined and it was shown that different plant families possess specific sets classes was determined and it was shown that different plant families possess specific sets of molecular frameworks, with considerable variation in their frequency. Clustering analysis of molecular frameworks, with considerable variation in their frequency. Clustering analysis enabled qualitative division of STLs into smaller groups with similar structural features, enabled qualitative division of STLs into smaller groups with similar structural features, pointing out the differentiation of various plant groups. Including the study results, the dataset pointing out the differentiation of various plant groups. Including the study results, the dataset offers a compelling resource for chemosystematics, natural product research and drug lead offers a compelling resource for chemosystematics, natural product research and drug lead discovery focused on STLs. It provides the basis for phylogenetic implementations due to the discovery focused on STLs. It provides the basis for phylogenetic implementations due to the detailed taxonomic annotation. Since STLs display a source for new drugs, it is of high value detailed taxonomic annotation. Since STLs display a source for new drugs, it is of high value for a guided search for plant derived drug leads. for a guided search for plant derived drug leads. In Paper III, benzoin was subjected to phytochemical and pharmacological In Paper III, Lindera benzoin was subjected to phytochemical and pharmacological investigations. Phytochemical investigations led to the isolation of three new sesquiterpenes. investigations. Phytochemical investigations led to the isolation of three new sesquiterpenes. As Native American tribes used this for various medicinal purposes, e.g. cold remedy or As Native American tribes used this shrub for various medicinal purposes, e.g. cold remedy or diaphoretic, the isolated compounds were evaluated in vitro for their anti-inflammatory activity. diaphoretic, the isolated compounds were evaluated in vitro for their anti-inflammatory activity. In cellular assays, they reduced pro-inflammatory prostaglandin E2 production in A549 cells in In cellular assays, they reduced pro-inflammatory prostaglandin E2 production in A549 cells in a dose-dependent manner, which may rationalize the traditional use of this plant. a dose-dependent manner, which may rationalize the traditional use of this plant.

Keywords: angiosperm chemistry, physicochemical properties of plant specialized Keywords: angiosperm chemistry, physicochemical properties of plant specialized metabolites, chemical diversity, chemical space, sesquiterpene lactones, scaffold diversity, metabolites, chemical diversity, chemical space, sesquiterpene lactones, scaffold diversity, molecular framework, cluster analysis, ECFP6, Lindera benzoin, sesquiterpenes, anti- molecular framework, cluster analysis, ECFP6, Lindera benzoin, sesquiterpenes, anti- inflammatory inflammatory

Astrid Henz Ryen, Department of Medicinal Chemistry, Farmakognosi, Box 574, Uppsala Astrid Henz Ryen, Department of Medicinal Chemistry, Farmakognosi, Box 574, Uppsala University, SE-751 23 Uppsala, Sweden. University, SE-751 23 Uppsala, Sweden.

© Astrid Henz Ryen 2020 © Astrid Henz Ryen 2020

ISSN 1651-6192 ISSN 1651-6192 ISBN 978-91-513-0843-2 ISBN 978-91-513-0843-2 urn:nbn:se:uu:diva-399068 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-399068) urn:nbn:se:uu:diva-399068 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-399068)

Was ist blau und liegt unterm Pilz? Schlumpfkacke.

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Henz Ryen A., Backlund A. (2019) Charting angiosperm chemistry: Evolutionary perspective on specialized metabolites peflected in chemical property space. J. Nat. Prod. 82: 798-812.

II Henz Ryen A., Buonfiglio R., Backlund A., Kogej T. (2019) Structural classification and scaffold diversity of sesquiterpene lactones in the angiosperms. (Manuscript, submitted)

III Henz Ryen A., Göls T., Steinmetz J., Tahir A., Jakobsson P.-J., Backlund A., Urban E., Glasl-Tazreiter S. (2019) Bisabolane ses- quiterpenes from the leaves of Lindera benzoin reduce prostaglandin E2 formation in A549 cells. (Manuscript, submitted)

Reprints were made with permission from the respective publishers.

List of Papers not included in this thesis

Yang L., Chai C.-Z., Yan Y., Duan Y.-D., Henz A., Zhang B.-L., Backlund A., Yu B.-Y. (2017) Spasmolytic Mechanism of Aqueous Licorice Extract on Oxytocin-Induced Uterine Contraction through Inhibiting the Phosphory- lation of Heat Shock Protein 27. , 22: 1392.

Lai K.-H., Lu M.-C., Du Y.-C., El-Shazly M., Wu T.-Y., Hsu Y.-M., Henz A., Yang J.-C., Backlund A., Chang F.-R., Wu Y.-C. (2016) Cytotoxic Lanostanoids from Poria cocos. J. Nat. Prod. 79: 2805-13.

Buonfiglio R., Engkvist O., Varkonyi P., Henz A., Vikeved E., Backlund A., Kogej T. (2015) Investigating Pharmacological Similarity by Charting Chemical Space. J. Chem. Inf. Model. 55: 2375-90.

Contents

Background information ...... 11 Introduction ...... 13 Chemical diversity in plants ...... 13 Phylogenetic approaches in natural product research ...... 15 Chemoinformatic tools ...... 16 ChemGPS-NP ...... 17 Molecular framework ...... 18 2D fingerprints and clustering analysis ...... 19 Pharmacognosy ...... 20 Aims and significance ...... 22 Material and Methods ...... 23 Datasets ...... 23 Chemical diversity in chemical property space ...... 25 Volume calculation in chemical property space ...... 25 Scaffold and molecular diversity of STLs ...... 26 STL skeletons ...... 27 Molecular framework ...... 28 Scaffold diversity analysis ...... 29 Extended-Connectivity Fingerprints and clustering analysis ...... 29 Phytochemical and pharmacological investigation of Lindera benzoin ... 31 Phytochemical investigation of Lindera benzoin ...... 31 Pharmacological investigation ...... 32 Results and Discussion ...... 34 Chemical diversity in chemical property space ...... 34 Towards a guided plant selection in drug lead discovery (I) ...... 41 Scaffold and molecular diversity of STLs ...... 46 STL skeletons ...... 47 Scaffold diversity analyses ...... 48 Clustering analysis ...... 54 Towards a guided plant selection in drug lead discovery (II) ...... 57 Phytochemical and pharmacological investigation of Lindera benzoin ... 63 New bisabolane sesquiterpenes from the leaves of Lindera benzoin .. 65 Pharmacological investigation ...... 67

Conclusion and future perspectives ...... 70 Popular scientific summary ...... 73 Acknowledgments ...... 75 References ...... 77

Abbreviations

BEs Betalains CAS CAS Registry Number CSR Cyclic System Retrieval CYP Cytochrome P450 ECFPs Extended-Connectivity Fingerprints ECFP_6 Extended-Connectivity Fingerprints with a diameter of 6 FLs Flavonoids FPP Farnesyl pyrophosphate HPCL High performance liquid chromatography HRMS High resolution mass spectrometry MF Molecular framework mPGES-1 Microsomal prostaglandin E synthase-1 NMR Nuclear Magnetic Resonance NPs Natural Products OI Overlap index PC Principal Component PCA Principal Component Analysis PG Prostaglandin SMILES Simplified Molecular Input Line Entry Specification SSE Scaled Shannon Entropy STLs Sesquiterpene lactones TAs Tropane alkaloids TI Tanimoto index TPS Terpenoids synthase QSAR Quantitative structure-activity relationship VS

Background information

The number of new drugs coming to the market is declining and interest in lead discovery from natural resources is seeing a revival. However, although methods for isolation and identification of natural products have advanced tremendously in recent decades, methods for selection of potential leads have fallen behind.

The Marie Curie Initial Training Network (ITN) “MedPlant: Phylogenetic exploration of medicinal plant diversity” supported a new generation of researchers in biodiversity driven drug lead discovery. The project has re- ceived funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration, under grant agreement no. 606895 [2]. The network was running for four years from October 2013 to 2017, and enabled 13 PhD and 2 postdoctoral researchers to work collaboratively across disciplines to develop new approaches and tech- nologies for selection and sustainable use of biodiversity resources for lead discovery and to develop new plant derived leads (http://medplant.eu). This PhD-project was part of the MedPlant Key Research Area “Evolu- tion of Chemical Diversity” with the aim to explore and correlate phyloge- netic and chemical diversity in order to develop new tools for chemosystem- atics, prediction of biosynthetic pathways, drug lead discovery, as well as the appreciation and sustainable use of biodiversity resources.

11

12 Introduction

Chemical diversity in plants Evolution has shaped natural products biochemistry in distinctive ways and has led to the massive chemical diversity of natural products (NPs). Since mid 19th century, several hypotheses have been formulated and models pro- posed to explain the occurrence, distribution, and evolution of NP diversity [e.g. 3-9]. Just as for other evolutionary processes, the evolution of special- ized metabolism in plants is likely to be driven by the perpetually changing environmental niches. Due to biotic and abiotic stress plants are under con- stant selection pressure and hence must adapt to maintain and increase popu- lation fitness [9, 10]. In evolutionary theory it is generally accepted that gene duplication fol- lowed by divergence is a major reason for the evolution of metabolism. Therewith, new genes arise through a mutation in one copy of the gene, which can lead to a new product in the pathway, whereas the original path- way is maintained by another copy of the original gene. Mutations that in- crease the fitness of an organism are favored and eventually become fixed within a population [6]. Biological diversity is underpinned by chemical diversity and thus, the principals of evolution must apply also to the chemi- cal diversity found in nature [11]. It is hypothesized that the chemodiversity found in plants emerges primarily due to rapidly evolving specialized meta- bolic systems. The continuous branching and extension of existing pathways appear to be crucial for metabolic diversification. Gene duplication followed by mutation leads to sub- or neofunctionalization of existing enzymes, which thereafter show, for example, a broader substrate recognition, resulting in a larger NP diversity and changes in intrinsic properties of new substances [9, 11, 12]. Mutations can also lead to an enhanced level of enzyme catalytic promiscuity and allows single enzymes to catalyze multiple reactions and biosynthesize multiple products. So called “hub metabolites” are typically coopted by functionally diverse enzymes and serve as chemical hubs from which new metabolic pathways emerge. Often, metabolic pathways branch- ing from hubs are taxonomically distributed [9]. Furthermore, enzymes from different pathways can be recruited and incorporated in new pathways, lead- ing to an expansion of chemodiversity [12]. Figure 1 illustrates schematic examples of contemporary theories in specialized metabolism.

13 I. Existing pathway II. Substrate permissiveness A B C E1 E2 OH

Pathways after mutation O O OH HO

a) A B C OH HO E1 E2 OH OH E1’ B’ ( ) C’ E2 E6’’ b) A B C D OH E1 E2 E3 O OH HO c) B X Y Z OH O E2’ E3’ E5’ O O HO E F G O O E4 E5

OH E3’’ C’ D’ J K d) C’’ D’’ e) E6’’ C’’’ D’’’ L M OH E8 E10 W E7’’ O E9’’ S C f) N P g) R T V Q E11 U H E12’ I

III. Catalytic promiscuity IV. Hub metabolite

FPP E11’ E8 E11 E9’’ > 300 OPP E12’ different E7’’ E7’’’ E10 skeletons E9’’’

Main product Main product + 14 minor + 23 minor products products

Figure 1. Development of chemical diversity in plant specialized metabolism. Figure redrawn and modified after Firn and Jones (2003) [8] and Weng et al. (2012) [9]. (Figure caption continued on next page.)

14 I: Development of biosynthetic pathways after a mutation event. a) A mutation gives rise to sub- or neofunctionalization of existing enzymes (E1 à E1’). A new pathway branches off resulting in the production of a new metabolite (B’). If the original downstream enzyme (E2) has a broad substrate tolerance, further new products will be made (C’); otherwise the pathway might be shortened. b) The existing pathway is extended by incorporating a new enzyme, which in turn leads to a new metabolite. c) Enzymes from different existing pathways are recruited and a new pathway is estab- lished. d) Broadened substrate recognition of a single enzyme results in multiple products. e) Substrate permissiveness enables a single enzyme to accept different substrates and to produce a multitude of products. f) Catalytic promiscuity allows a single enzyme to catalyze multiple reactions and synthesize multiple products from one substrate. g) A single substrate serves as metabolic hub for functionally diverse enzymes. New metabolic pathways branch from this hub, leading to structurally and functionally different metabolites. In addition, the same product (C) might be pro- duced by more than one route. II. Substrate permissiveness: The same enzyme re- cognizes different substrates and produces different metabolites. It shows increased activity towards one of the substrates. III. Catalytic promiscuity: The enzymes E7’’ and E7’’’ produce a multitude of products from the same substrate FPP. IV. Hub metabolite: The substrate FPP serves as chemical hub and is co-opted by functional- ly diverse enzymes, which results in a high number of metabolite skeletons. A-Z represent different specialized metabolites; E, enzyme; OPP, pyrophosphate; FPP, farnesyl pyrophosphate.

Divergent and convergent evolution, as well as co-evolution in ecological niches have resulted in the present distribution of specialized metabolites. Some metabolites are taxonomically restricted while others show a broader distribution amongst phylogenetically distant related taxa [13-15]. Since the 1950s, many attempts have been made to classify angiosperms using phyto- chemical characters, i.e. compounds have been used for systematics and and relationships between taxa started to be seen from an evolu- tionary perspective. Outstanding papers and books have been published by e.g. Cronquist [4], Hegnauer [16], Harborne [17], Smith [18] (and many more) – and the terms chemosystematics and chemotaxonomy were estab- lished. For further reading regarding the history of chemosystematics, please refer to Reynolds [19]. It became apparent that chemical diversity and its distribution among taxa could not be explained without considering evolu- tionary relationships [6, 8, 20, 21].

Phylogenetic approaches in natural product research Phylogenetics is the field of study concerned with the evolutionary relation- ships of living and extinct taxa. Today, phylogenetic analyses are omnipres- ent across many scientific fields and have been proven useful for diverse applications. Phylogenetic , or phylogenies, are applied in many disci- plines such as molecular biology, genetics, developmental biology or ecolo- gy [22]. The increased availability of molecular data allows analyses of genes and gene clusters for the study of evolution and biosynthesis. It can

15 aid in the discovery of novel or unknown compounds and assist the process of genetically engineering pathways to obtain new biologically active and pharmaceutically relevant compounds [23]. Identifying a correspondence between phylogenetics and chemical diver- sity helps to understand the evolution of chemical traits. Regarding phyto- chemistry, this can be used to predict the chemical potential of a plant which also includes functional predictions, e.g. biosynthetic pathways, gene evolu- tion, gene functions, gene gain/loss etc. The assumption of a correlation be- tween phylogenetic relationships and chemical diversity/biosynthetic path- ways has led to several studies in which significant phylogenetic signals were found. Phylogenies have been used to understand e.g. the distribution of certain compounds across organisms, the evolution of biosynthetic path- ways, and as a predictive tool in plant-derived drug lead discovery to select candidate taxa [24-28]. Thus, evolutionary patterns and processes can be inferred, and lineages can be identified with certain chemical characters or that are of metabolic interest respectively. Eventually this can aid the dis- covery of novel compounds, predict their properties and possible biological activities and will advance the understanding of chemical trait evolution. In this thesis, existing phylogenies are used as framework to define dif- ferent plant groups in order to analyze the distribution of distinct structural features or physicochemical properties of their compounds and to explore their individual chemical diversity. Furthermore, they are used to demon- strate possible applications combining chemoinformatic tools and phyloge- netics.

Chemoinformatic tools Chemoinformatics has established itself as an important discipline in . It covers a variety of scientific fields in chemistry and computer science with the goal of making better decisions faster in the areas of drug lead identification and optimization [29]. Among others, virtual screening (VS), library design, and high-throughput screening (HTS) have become methods of choice in industry when handling large amounts of compounds or data [30]. With regard to computational chemistry, the introduction of SMILES strings (Simplified Molecular Input Line Entry Specification), which describe the molecular structure in a compact way using short ASCII strings, has simplified data handling and enables for a multitude of chemoin- formatic applications [31]. To analyze the large amount of generated data, visualization of chemical space also plays an essential role in contemporary chemoinformatics [32]. Chemical space can be defined as suggested by Reymond et al. [33]: “All the known molecules form the ‘available chemical space’”. The chemical space of drug-like molecules has been estimated to consist of at least 1060

16 organic molecules below 500 Da [33, 34]. While it is impossible to handle such a number, it is generally considered that similar molecules reside close to each other in chemical space. Additionally, it has been largely accepted that similar compounds are likely to have similar biological activities [35]. Considering this, similarity-based methods play a substantial role in drug discovery. In silico tools such as VS cover different methods, for example structure-based and ligand-based approaches. Structure-based approaches can be used when the 3D structure of the biological target is available and can be applied e.g. for molecular to simulate the interactions be- tween a compound and the binding side of the target [36]. In the absence of such structural information, ligand-based approaches are applicable. The latter follows the similarity principle to screen for potential active com- pounds and includes common approaches such as molecular fingerprints, pharmacophore methods, machine learning, scaffold analysis, or quantitative structure-activity relationships (QSAR) [30, 37]. In this project, several chemoinformatic tools are used to investigate physicochemical properties, molecular similarity and chemical diversity of angiosperm chemistry. Principles of major applications are introduced in the following paragraphs.

ChemGPS-NP In 2001, Oprea and Gottfries first introduced ChemGPS and the concept of chemography – ‘the art of navigating chemical space’ [38]. This Chemical Global Positioning System was later extended for natural products, which led to ChemGPS-NP to identify the biological relevant chemical space [39]. This tool enables to navigate chemical property space of natural products based on 35 selected molecular descriptors and principal component analysis (PCA). It allows fast and efficient analyses of compounds based on their physicochemical properties in eight dimensions. The first three dimensions or principal components (PCs) represent in order PC1 = size, shape, and polarizability, PC2 = and conjugation related properties, and PC3 = lipophilicity, polarity, and H-bond donor capacity, and explain ap- proximately 71% of the variability found. Novel structures are positioned in chemical property space via PCA score prediction. The aim of this method is to provide a visualization of the ‘absolute’ position of a or group of molecules in chemical space. Figure 2 describes the workflow using ChemGPS-NPweb [40] from a sin- gle compound expressed as SMILES to its final positioning in the ChemGPS-NP global map. Cluster and trends in chemical space can easily be identified in the ChemGPS-NP global map, allowing an immediate com- parison of the compounds with regard to compound similarity, chemical diversity and distribution in chemical space. In this context, the closer com- pounds reside to each other in chemical property space, the more similar

17 they are in their properties. The relative distance between the molecules and thus their similarity can be measured through the calculation of Euclidean Distances (ED) [32]. In this thesis, ChemGPS-NP is employed to investigate the distribution and diversity of physicochemical properties of selected types of specialized metabolites in chemical property space.

OH O

O O OH HO O O O NH O O O O

OH O O O HO OH O O O Compounds 1. Artemisinin 2. Resveratrol 3. Taxol SMILES strings

[1] CC(C1C2(C3O4)C(CCC(C)(OO2)O3)C(C)CC1)C4=O [2] C1=CC(=CC=C1C=CC2=CC(=CC(=C2)O)O)O [3] CC1=C2C(C(=O)C3(C(CC4C(C3C(C(C2(C)C)CC1OC(=O)C(C(C5=CC=CC=C5)NC(=O)C6… Physico-chemical properties 35 molecular descriptors Submit Job: http://chemgps.bmc.uu.se

ChemGPS-NP scores for PC1-PC8

Principal Component Analysis MOLID PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 (PCA) [1] -2.710756 -1.096161 0.14238 -1.071657 -0.615396 1.081968 -0.265112 -0.764884 [2] -2.015571 3.347868 -0.631368 -0.346318 0.57871 -0.64132 -0.577217 -0.20416 [3] 7.781041 1.106286 1.67121 -0.035571 -0.249858 -0.070196 1.218009 1.22652 ChemGPS-NP model 8 dimensions Visualization: ChemGPS-NP global map

Principal components (PC) PC1 size, shape, polarizability PC2 aromaticity & conjugation PC3 lipophilicity, polarity PC4 flexibility & rigidity …

Euclidean distance (ED) Figure 2. Workflow using ChemGPS-NP. A given molecule is translated into a SMILES string and submitted to the online tool ChemGPS-NPweb. For each com- pound, 35 molecular descriptors are calculated and are comprised by the ChemGPS- NP model into eight principal components. The resulting scores can be used to chart compounds in the ChemGPS-NP global map to assess their physicochemical proper- ties in chemical space.

Molecular framework Also referred to as “Bemis and Murcko scaffolds”, the molecular framework (MF) defines the “shape” of a given molecule [41]. To obtain the MF, all atoms or side chains that are not part of rings or between rings are removed. This chemoinformatic approach represents a useful tool for screening large databases for particular frameworks and allows further classifying the struc-

18 tural data and analyzing the scaffold diversity of a given set of molecules. Depending on the intended usage, the MF can be defined in different ways. For example, atom type, hybridization, and bond order can be considered or disregarded. Figure 3 gives an example of different possible MFs for the compound parthenolide. In this project, the MF is applied to perform a comprehensive scaffold di- versity analysis of a set of selected specialized metabolites, using a number of established metrics to measure and quantify chemical diversity [42].

O O O O O O parthenolide atomic framework graph framework

Figure 3. Molecular frameworks of parthenolide. Atomic framework: single bonded side chains have been removed. Graph framework: atom type, hybridization, and bond order are not taken into consideration.

2D fingerprints and clustering analysis The use of 2D fingerprints remains the method of choice for studying mo- lecular similarity [37]. Molecular fingerprints are represented by binary strings, encoding for presence or absence of structural fragments and have originally been designed to assist similarity searching, clustering, and classi- fication [35]. A variety of different fingerprints have been developed focus- ing on different purposes. For example, path-based fingerprints such as Day- light fingerprints consist of 2048 bits and encode all possible connectivity pathways of a molecule up to a predefined length [43] whereas circular fin- gerprints such as Extended-Connectivity Fingerprints (ECFPs) were explicit- ly designed to capture molecular features relevant to molecular activity and precise atom environment substructural features [44]. In turn, key-based fingerprints such as Molecular ACCess System (MACCS) are based on pre- defined substructure keys [45]. The degree of resemblance between com- pounds is assessed by similarity measures – mostly used in fingerprint-based similarity calculations is the Tanimoto index (TI) [46]. To perform a cluster- ing analysis, i.e. arranging compounds into groups based on their similarity, a broad range of different algorithms exist. No technique exists in pattern recognition that yields the best classification in all scenarios. Comparing a number of supervised classifiers, it was shown that k-Nearest Neighbors (k-

19 NN) frequently allowed the best accuracy [47]. However, depending on the purpose, different algorithms might be better suited in chemoinformatics, especially when handling large datasets, for example variants of k-means or leader algorithms [48]. In Fig. 4 the principle workflow of a 2D fingerprint- based clustering analysis is presented. In this project, molecular fingerprints and clustering analysis are em- ployed to assess molecular diversity of a set of defined specialized metabo- lite in different plant families.

Principles of 2D fingerprint-based clustering analysis

• binary strings encoding the presence (1) or absence (0) of substructural fragments are generated for each molecule • Similarity metrics: Tanimoto index, ranges from 0 to 1 (where 1 is the highest similarity)

5 bits on (A) !"#$%&'& !"#$% ! 3 bits on (B) = = !, ! 3 bits in common (C) ! + ! − !

• Clustering analysis based on a defined algorithm and a Tanimoto index cutoff: compounds with high molecular similarity group together

Cluster 1 Cluster 2 Cluster 5 20 OH O O O O HO O O O O O HO 15 O O O O O HO HO OH O OH O OH O O O OH 10 HO O O HO O HO O O O OH OH O O

Cluster size O O 5 O O OH O OH O O OH O O 0 1 2 3 4 5 6 …

Figure 4. General procedure of 2D fingerprint-based clustering analysis. Molecular fingerprints are represented by binary strings, encoding for presence (1) or absence (0) of structural fragments. The similarity between the compounds is calculated via the Tanimoto index. Hereon follows the clustering analysis using an appropriate algorithm with a similarity cutoff to form groups of similar compounds.

Pharmacognosy Pharmacognosy can be defined as “the study of the physical, chemical, bio- chemical, and biological properties of drugs, drug substances or potential drugs or drug substances of natural origin as well as the search for new drugs from natural sources” [49]. According to Newman and Cragg [50] up to 50% of the approved drugs in the period from 1981 to 2014 are either natural

20 products or derived thereof, demonstrating their still highly significant role in drug discovery. In particular, natural products derived from plants repre- sent a major source of new drugs, new drug leads or new chemical entities (NCE) [51-53]. The selection of a suitable plant source covers a range of multidisciplinary approaches, from random screening approaches, over eth- nopharmacology, chemotaxonomic knowledge, to computational methods. The discovery of pharmacologically active plant-derived substances is guid- ed by as set of well-chosen bioassays to investigate the plant-derived sub- stances [54]. Often, pharmacological testing starts with a test of the crude plant extract followed by isolation and characterization of constituents re- sponsible for the activity of the extract [55]. Chemical characterization of a plant extract and the identification of isolated compounds includes a range of highly sensitive and reproducible analytical methods, such as (ultra) high performance liquid chromatography (HPLC), high resolution mass spec- trometry (HRMS), and nuclear magnetic resonance (NMR) spectroscopy [54]. Besides the application of chemoinformatic tools to investigate chemical diversity in the angiosperms, traditional methods in the discovery of phar- macologically active plant-derived natural products are employed in this thesis to generate novel information in pharmacognosy. This includes the selection of a promising plant source, extraction, isolation, and structure elucidation of new compounds, as well as testing for biological activity of isolated compounds and purified plant extract.

21 Aims and significance

Phytochemistry represents a major source of new drugs, new drug leads, and new chemical entities [52, 53, 56]. It has been estimated that among the known plant only around 15% have been investigated phytochemi- cally and only 6% pharmacologically [57]. Hence, a targeted selection of plants is not just desirable for drug discovery, but necessary [58]. As part of the MedPlant Key Research Area “Evolution of Chemical Di- versity”, this thesis contributed to the exploration of chemical diversity in the angiosperms and the development of new tools to analyze and define the chemical potential of a plant. The main focus of Paper I & II was to analyze chemical diversity using different chemoinformatic tools, whereas in Paper III traditional methods were applied to generate novel information in phar- macognosy. The specific aims were:

• Establish a comprehensive in-house database of selected specialized metabolites with special focus on sesquiterpene lactones (STLs) (prerequisite for Paper I & II)

• Analyze physicochemical properties of selected specialized metabo- lites in different plant groups and measure chemical diversity in chemical property space; discuss if differences and changes in their properties underlie contemporary evolutionary mechanism (Paper I)

• Qualitative and quantitative assessment of scaffold and molecular diversity of STLs in different plant groups, using various chemoin- formatic tools, and determination of the distinct distribution of dif- ferent STL classes (Paper II)

• Phytochemical and pharmacological investigation of Lindera benzo- in (Paper III)

Based on the combination of phylogeny and in silico applications, the meth- ods developed in this project will serve as tools for chemosystematics and to predict the chemical and possible pharmacological potential of a plant. For future work, this enables a guided selection of the next promising plant source, which eventually can be applied for a guided drug lead discovery from plants.

22 Material and Methods

Datasets The collection of a sufficient amount of chemical data was a major aspect of this thesis and represented the prerequisite for the first two projects. Data were collected in form of continuous literature surveys mostly published up until 2015, and were extracted from various open access sources as well as proprietary databases, such as the Dictionary of Natural Products [59], the Dictionary of Flavonoids [60], and SciFinder [61].

For Paper I, data for four different types of specialized metabolites were collected (number of compounds):

• sesquiterpene lactones (STLs, 5,157) • tropane alkaloids (TAs, 285) • betalains (BEs, 132) • flavonoids (FLs, 12,328)

Systematic searches have been performed up to the taxonomic rank of the in order to cover the distribution of specialized metabolites throughout the angiosperms. Correct taxonomy of proposed plants of origins was con- trolled using the “International Plant Names Index” [62], and the systematic classifications were assigned using APG IV [63]. Each compound was annotated with its plant family of origin, the CAS Registry Number (Chemical Abstract Service, Columbus, OH, USA), and structural data were stored in form of canonical SMILES strings. The dataset comprised 17,902 unique compounds and, since numerous compounds have been reported from more than one family, it included more than 20,000 en- tries. Even though erroneous entries or publications can never fully be ex- cluded, considerable efforts have been made to verify the structural accuracy of the compounds in the dataset by visual inspection. At last, each compound was annotated with the ChemGPS-NP scores for PC1-PC8 (Principal Com- ponent), which will be explained in more detail below. The complete dataset is available in the Supporting information of Paper I (https://pubs.acs.org/doi/10.1021/acs.jnatprod.8b00767).

23 Asparagales Liliales Dioscoreales Pandanales Petrosaviales Alismatales Acorales Ceratophyllales Ranunculales Proteales Trochodendrales Angiosperm phylogeny Buxales Specialized Plant family Gunnerales metabolite Fabalesroot RosalesAmborellales

-- FagalesNymphaeales Schisandraceae (PII) Angiosperms Austrobaileyales Cucurbitales OxalidalesMagnoliales Magnoliaceae MalpighialesLaurales CelastralesPiperales Aristolochiaceae ZygophyllalesCanellales Canellaceae, Winteraceae

GeranialesChloranthales Chloranthaceae Commelinids MyrtalesArecals CrossosomatalesPoales PicramnialesCommelinales MalvalesZingiberales Zingiberaceae BrassicalesAsparagales Amaryllidaceae Monocots HuertealesLiliales SapindalesDioscoreales VitalesPandanales Saxifragales -- Petrosaviales DillenialesAlismatales BerberidopsidalesAcorales SantalalesCeratophyllales CaryophyllalesRanunculales

-- CornalesProteales Proteaceae

Eudicots EricalesTrochodendrales AquifolialesBuxales AsteralesGunnerales EscallonialesFabales BrunialesRosales Rosaceae

ApialesFagales Fabids DipsacalesCucurbitales Coriariaceae (PII) ParacryphialesOxalidales SolanalesMalpighiales Picrodendraceae (PII); Erythroxylaceae LamialesCelastrales VahlialesZygophyllales

Rosids GentianalesGeraniales

Superrosids BoraginalesMyrtales GarryalesCrossosomatales Malvids -- MetteniusalesPicramniales IcacinalesMalvales Brassicales -- Huerteales Rutaceae Vitales Saxifragales Dilleniales Berberidopsidales -- Santalales Superasterids Caryophyllales various families

-- Cornales

Asterids Ericales Theaceae Aquifoliales Campanulids Escalloniales Bruniales Dipsacales Paracryphiales Solanales Convolvulaceae, Solanaceae Lamiales Type of specialized metabolite: Lamiaceae Vahliales Lamiids Sesquiterpene lactones (STLs) Gentianales Tropane alkaloids (TAs) Boraginales Flavonoids (FLs) Garryales Betalains (BEs) Metteniusales Icacinales

Figure 5. Distribution of analyzed specialized metabolites in the angiosperms. Plant families used as subsets are displayed next to the respective type of specialized me- tabolite and are connected to their order of origin in the angiosperm phylogeny. Colors have been chosen in congruence with the ChemGPS-NP plots generated in Paper I. Compound subsets that were not further analyzed are annotated with a blank symbol next to their respective plant order. In the case of FLs, over 100 subsets were analyzed, but only selected families are highlighted. Note that the listed families correspond solely to the highlighted metabolite annotation. Family subsets added in Paper II have been labeled with (PII) and are highlighted with grey, filled circles. Adapted with permission from Henz Ryen & Backlund, 2019, ©2019 American Chemical Society.

24 For Paper II, the existing STL dataset was extended by additional com- pounds from different plant families and comprised 5,271 STLs. Additional- ly, the compounds were annotated with plants of origin to the taxonomic rank of the genus, and in most cases, also with the specific epithet. Data were collected with considerable effort, attempting to be as complete as pos- sible to include any kind of family and STL skeleton classes. Including taxo- nomic annotation, this dataset comprised in total 8,619 entries since numer- ous compounds have been reported from several different plant taxa. For future usage of this dataset, the SMILES stings have been transformed into uniform canonical SMILES strings, using the open-source chemical toolbox Open Babel, version 2.3.1 [64]. Figure 5 represents the angiosperm phyloge- ny based on APG IV. The phylogeny has been annotated with the occurrence of specialized metabolites and plant family subset used in Paper I and II have been highlighted.

Chemical diversity in chemical property space In Paper I, the physicochemical properties of selected specialized metabo- lites were analyzed in different plant families. In order to evaluate how the physicochemical properties of specialized metabolites differ across families, and to investigate their distribution and diversity in chemical space, ChemGPS-NP has been employed to chart angiosperm chemistry. ChemGPS-NP prediction scores were calculated using ChemGPS-NPweb [40] and charted in chemical property space using the R package rgl [65, 66]. Thereby, different clusters and trends can be identified and changes in phys- icochemical properties can be analyzed efficiently. Furthermore, a method was developed to measure the degree of chemical diversity from the distribu- tion of metabolites in chemical property space: volume calculation in chemi- cal property space.

Volume calculation in chemical property space Compounds that reside close to each other in chemical property space are similar in their physicochemical properties; vice versa, compounds that are distant to each other are less similar in their properties. Accordingly, com- pound sets that comprise similar metabolites and do not show strong varia- bility in their structural features reside close to each other and occupy only a small volume. Compound sets comprising more complex and diverse mole- cules on the other hand, vary stronger in their physicochemical properties and thus occupy a larger volume since they spread out in chemical space. To measure the degree of chemical diversity of a given dataset, expressed by the spread of the compounds, the volume occupied in chemical property space was computed using the quickhull algorithm for convex hulls [67].

25 The convex hull of a set of points is the smallest convex set that contains the points. Thus, the size of the volume enclosed by the convex hull reflects the spread of the compounds in chemical space and therefore their chemical diversity. In Paper I, the convex hull was computed for different subsets of specialized metabolites, using the R package geometry [68] with the function convhull. An example of two dataset with a different degree of chemical diversity is presented in Fig. 6.

PC1

PC2

PC3 PC3

PC1

PC2

Chemical diversity – Volume occupied in chemical property space

Sesquiterpene lactones Flavonoids V = 135.31 (n = 5,157) V = 2,915.49 (n = 12,328)

Figure 6. Volume calculation in chemical property space in the form of a convex hull. V = unit volume; n = number of compounds. Flavonoids spread out in chemical property space and occupy a large volume, thus showing a large chemical diversity. Sesquiterpene lactones, on the contrary, reside in a comparably smaller volume and exhibit less diversity. Reprinted with permission from Henz Ryen & Backlund, 2019, ©2019 American Chemical Society.

Scaffold and molecular diversity of STLs In Paper II, an in-depth analysis of the chemical diversity of STLs was per- formed to investigate scaffold and molecular diversity in different plant fam- ilies. This included the determination of skeleton classes and their distribu- tion among plant families, an extensive scaffold diversity analysis based on the molecular framework, and the assessment of molecular diversity and similarity via 2D fingerprint and clustering analysis.

26 STL skeletons STLs can be categorized into major groups based on their skeleton, for ex- ample germacranolides (10-membered ring), eudesmanolides (6/6-bicyclic compounds), guaianolides and pseudoguaianolides (5/7-bicyclic compounds) [69]. For Paper II, the skeleton was determined for each compound for an initial classification of STLs. The above four skeletons were considered to- gether with other relevant classes [70, 71] and/or more frequently found STL skeletons in the dataset. Skeleton subtypes based on stereochemical features were not considered since only 2D structures were investigated. Less fre- quent seco-derivatives have been assigned to their major class if reasonable, otherwise labeled as “other”. Dimers formed by STLs belonging to the same skeleton class were assigned to their respective class, whereas dimers con- taining different skeletons were placed in the category “other”. Unusual and/or unique skeletons were labeled as “other” as well. The relative and absolute frequency of occurrence of the different skeleton classes was rec- orded and the distribution among plant species was calculated. Figure 7 pre- sents example compounds containing the STL skeletons annotated in the study.

1 9 2 10 8 14 O 5 O 7 3 4 13 O 6 11 O 12 O O O 15 O OH O Costunolide Isoallantolactone Dihydrocostuslactone Helenalin (germacranolide) (eudesmanolide) (guaianolide) (pseudoguaianolide)

O O OH O O O O O O O

Istanbulin A Elemasteriactinolide Chloranthalactone A Drimenin (eremophilanolide) (elemanolide) (lindenanolide) (drimanolide)

O O O OH O HO O HO O O O OH O O HO OH O O OH O O O Tomentosin Anisatin Tutin Isosecotanapartholide (xanthanolide) (seco-prezizaane) (picrotoxane) (open ring structure = “others”) Figure 7. STL skeletons in the angiosperms. Represented are the well-known STL skeleton classes as well as further relevant skeletons and/or more frequently found STL skeletons. Guaianolides and pseudoguaianolides share similar skeletons but differ by the position of a methyl group, which is C-4 in guaianolides and C-5 in pseudoguaianolides. The same applies for eudesmanolides and eremophilanolides, which is C-10 in eudesmanolides and C-5 in eremophilanolides.

27 Molecular framework To investigate the chemical diversity of STLs produced by a plant family, their molecular frameworks (MF) were analyzed. In Paper II, only single bonded entities were removed (e.g. atom attached with a double bond to a linker or ring such as the oxygen from a carbonyl or the nitrogen from an imine are kept). The MFs have been generated using an AstraZeneca in- house script based on oetoolkit [72]. As shown in Fig. 8 atom types, hybridi- zations, and bond orders are well preserved, and thus, the compounds can be reduced to their main framework, yet enough relevant information is kept to separate the resulting scaffolds in a meaningful way. For example, α- methylene-γ-lactones can still be distinguished from other lactones since the double bond is not removed from the framework. Other algorithms would remove important structural differences that should be considered for this class of compounds and would produced indistinguishable scaffolds.

Original MF algorithm used Different MF algorithm: Graph compound in Paper II: All side chains framework Only single bonded are removed entities are removed

O O O O O O

✔ OH O O c O O O O

✔ ✗ OH O O O O O O ✔ ✗

Figure 8. Different molecular frameworks. The method used in Paper II preserves enough important structural information to distinguish between different STL scaf- folds. Other algorithms that remove double bonded entities or even atom types strip relevant information of the structures and produce indistinguishable scaffolds.

28 Scaffold diversity analysis Based on the MF, a comprehensive analysis of the scaffold diversity was performed. Each plant family subset was assessed using the following com- plementary metrics: MF frequency counts, Cyclic System Retrieval (CSR) curves (cyclic system = MF), and Scaled Shannon Entropy (SSE). These metrics have previously been used to measure and quantify scaffold diversity and are described in detail by several authors [73-76]. In brief, the MF frequency counts represent the number of unique MFs in a dataset, as well as the number of MF-singletons, i.e. the number of MFs that are solely present in a single compound. The ratio of MFs and MF- singletons relative to the number of compounds in a dataset allows a first assessment of the diversity of scaffolds. Furthermore, the distribution of the MFs was analyzed as a measure of diversity. CSR curves were computed for each dataset and further characterized and the SSE was calculated to quanti- fy the specific distribution of the compounds in the n most populated MFs. To perform the analyses, the online platform Consensus Diversity Plots Version 2 (CDPs-V.2) was used, which is freely available at https://consensusdiversityplots-difacquim-unam.shinyapps.io/RscriptsCDPlots/ [42]. To depict the most populated MFs and their frequencies at the same time, the “Molecule Cloud” was employed [77]. In this cloud, the size of compounds, in this case the MFs, corresponds to their frequencies in a da- taset. Lastly, the MF similarity between families was evaluated. The overlap of identical MFs was measured using the Jaccard/Tanimoto index (TI) [78], expressed as Overlap index (OI) and evaluated in form of a similarity matrix.

Extended-Connectivity Fingerprints and clustering analysis Extended-Connectivity Fingerprints (ECFPs) are widely adopted fingerprints and have been well validated for various usages in building classification models, e.g. Bayesian modeling, clustering, and similarity based virtual screening [44, 79]. In Paper II, clustering analysis was performed based on ECFPs to identify series of similar STLs. Thus, chemical diversity of STLs was assessed in its entirety (i.e. all side chains were considered). Described as the current 2D gold standard fingerprint in scaffold-based selection [80], ECFPs with a diameter of 6 (ECFP_6) were computed to describe molecular structural features. The similarity between the com- pounds was computed via the Tanimoto index [46]. K-means clustering [81] was performed with a TI of 0.4 using an AstraZeneca in-house Pipeline Pilot workflow [82]. The obtained clusters were revised by visual inspection and assisted by the calculation of the average pairwise similarity between the compounds belonging to two different clusters. If the average pairwise simi- larity was greater than a TI = 0.8, this usually indicated that the clusters could be merged into one. If possible, singletons have been

29 Clustering analysis Paper II

• 2D fingerprint: ECFP_6, executed for each atom in a molecule

A A

A A A A A A A

A C O O O

A O O O O 6 4 0 2 O 1 0 0 1 0 1 0 0 1 0 4 2 0 … …

ECFP • Clustering analysis: k-means clustering with Tanimoto index cutoff: > 0.4 Diameter 6 • First cluster revision: visual inspection; merge clusters if the average pairwise similarity between compounds belonging to two different clusters is > 0.8; singleton rescue • Taxonomic annotation of compounds in the clusters and final revision: merge clusters to better reflect phylogenetic relationships

Clusters Taxonomic annotation and final revision = Final clusters (after first revision)

Figure 9. Workflow of clustering analysis in Paper II. Top: Iteration process of ECFP_6: with each iteration, larger circular substructures are identified around the central atom. The appended number is the effective diameter of the largest feature and is equal to twice the number of iterations performed. Hence, three iterations result in the largest possible fragment with a width of six bonds. The process is re- peated over each atom and thus captures substructural information from all parts of the molecule. The final set contains a mixture of substructures of differing size for each atom in the molecule. Additionally, information of attached bonds is stored, where the atom type “A” represents an atom of any type other than hydrogen. Bot- tom: The clustering analysis is performed using k-means algorithm and the TI as similarity metric. In the last step, the compounds inside the clusters are annotated with their plant of origin, which results in the final clusters. For example, cluster #16 contains STLs solely produced by two genera belong to the family Lauraceae, whereas cluster #90 contains STLs produced by many genera belonging to different plant families.

30 “rescued”, i.e. they were reassigned to a cluster based on of their core mo- lecular scaffold. Eventually, the compounds were tagged with their associat- ed plant origin to investigate the distribution of the plant families and their respective genera among the clusters. If meaningful, clusters were also merged to better reflect phylogenetic relationships, which can evade the molecular description used by the cluster algorithm. Figure 9 summarizes schematic the workflow and the methods employed in Paper II.

Phytochemical and pharmacological investigation of Lindera benzoin In Paper III, traditional methods were applied to generate novel information in pharmacognosy. To broaden the phytochemical knowledge of Lindera species, the North American “spicebush” Lindera benzoin was investigated, which included the extraction, isolation and structure elucidation of com- pounds produced by this plant. Furthermore, the plant extract and the isolat- ed compounds were tested in vitro for their anti-inflammatory activity. The selection of this particular plant species was guided by the methods devel- oped in Paper I and II, and the decision for an appropriate bioassay was made based on literature research. Both will be discussed in more detail in Results and Discussion.

Phytochemical investigation of Lindera benzoin The aerial parts of L. benzoin were collected at the Uppsala University Bo- tanical Garden, Sweden (July 2017, Accession number: 2001-1309). The dried, pulverized leaves were extracted three times at room temperature with CH2Cl2. The filtered and dried crude extract was re-dissolved in a small vol- ume of CH2Cl2 and partitioned against aqueous MeOH 50% (1 + 1.5 v/v). After evaporation of CH2Cl2, the aqueous-methanolic layer was filtered to remove insoluble residues and dried in the rotary evaporator, resulting in the purified plant extract. To isolate single compounds, further fractionation of the purified extract was performed via flash chromatography on a PuriFlash 4250 instrument with UV and ELSD detection (Interchim) using a PuriFlash 15 C18 HQ Column and gradient elution with water and methanol. Two of twelve fractions resulted in pure compounds. The isolated compounds were subjected to further analyses to characterize the compounds. In brief, CD spectra were obtained on a Jasco J-1500 spec- trophotometer and UV-spectra were recorded on a Shimadzu UV spectro- photometer UV-1800. NMR spectra were recorded on a Bruker Avance 500 NMR spectrometer (UltraShield). HRMS spectra were acquired with ESI- Qq-TOF mass spectrometer (micrOTOF-Q II, Bruker Compass, maXis HD).

31 Analytical HPLC was performed using Dionex Ultimate 3000 (Thermo Fisher Scientific) HPLC system, with a Phenyl-Hexyl-column (5 µm, di- mension 250 x 2 mm). Chromatograms were recorded by UV/ViS diode array detector (UV/ViS-DAD) and evaporative light scattering detector (ELSD).

Pharmacological investigation The compounds and the purified plant extract where tested in vitro for their anti-inflammatory activity in cellular assays, investigating the inhibition of prostaglandin production in A549 lung carcinoma cells.

A549 cell assay A549 cells (ATCC, Manassas, VA, USA) were cultured in DMEM – High Glucose cell culture media (Dulbecco's Modified Eagle Medium), containing 10% fetal bovine serum, 100 U/ml penicillin, 100 µg/ml streptomycin, and 1 mM sodium pyruvate, at 37 °C at a humidified atmosphere containing 5% CO2. Cells were seeded on 24 well plates (1 ml) at a density of 70.000 cell/well and incubated overnight. Fresh medium was added after 24 h and the cells were stimulated with 5ng/ml IL-1β (interleukin-1β, Sigma-Aldrich), and treated with the isolated compounds, and the purified plant extract. The compounds were tested at 1 mM, 100 µM, 10 µM, and 1 µM concentrations, and the purified plant extract was tested at concentrations of 320 µg/ml, 32 µg/ml, 3.2 µg/ml, and 0.32 µg/ml. In addition, the cells were treated with either 1 µM COX-2 inhibitor NS-398 (Sigma-Aldrich), 10 µM mPGE-1 in- hibitor “Compound III” (CIII) (NovaSAID AB, Sweden), or vehicle control (0.1% v/v of dimethyl sulfoxide, DMSO, Sigma-Aldrich). After 21h, 600 µl cell culture supernatants were collected and subsequently stored at -20 °C for prostanoid analysis.

Solid-phase extraction The thawed cell culture supernatants were spiked with 50 µl deuterated standard containing five different prostanoids and thromboxane: PGE2-d4, 12,14 PGD2-d4, PGF2α, 6-keto-PGF1α-d4, 15-deoxy-Δ -PGJ2-d4, and TXB2-d4 (Cayman Chemical Company) in 100% MeOH. Samples were acidified with formic acid (FA) to a final concentration of 0.2%, pH <3. Solid-phase ex- traction was performed as previously described [83]. The extracted samples where dried and stored at -20 °C for LC-MS/MS analysis.

Prostanoid analysis using LC-MS/MS Prostanoid profiling was performed by liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) using an Acquity H-class UPLC (Waters) coupled to an Acquity triple quadrupole detector mass spectrometer (Waters). The extracted samples were reconstituted in 50 µl of 20% acetoni-

32 trile (ACN) and 10 µl was injected into the UPLC. Separation of analytes was performed on an Aquity UPLC BEH C18 column (50 x 2.1 mm, 1.7µm, Waters, Ireland) using a linear gradient with 0.05% FA/MQ water and 0.05% FA/acetonitrile. The analytes were quantified by multiple reaction monitoring in negative mode for all prostanoids (Acquity TQ detector, Wa- ters). Raw data were processed and analyzed using MassLynx software, ver- sion 4.1 (Waters), and quantified with internal standard calibration.

Cell viability assay To exclude that observed anti-inflammatory effects underlie possible cyto- toxicity of the compounds, a cell viability assay was performed. A549 cells were cultured and treated as described above, with the differences that they were seeded on 96 well plates (200 µl) at a density of 10,000 cell/well, and the cells were treated with six different concentrations ranging from 1 mM down to 10 nM for the compounds, and 320 µg/ml down to 3.2 ng/ml for the purified plant extract, respectively. Cell viability assay was performed using the CellTiter-Glo® 2.0 Assay (Promega) according to manufacturer’s in- struction and luminescence was recorded on a GloMax®-Multi Detection System (Promega).

33 Results and Discussion

Chemical diversity in chemical property space The aim of Paper I and further related analyses was to investigate the distri- bution and diversity of physicochemical properties of selected types of spe- cialized metabolites in chemical property space, and to discuss possible un- derlying evolutionary mechanisms that could explain the observed distribu- tion in chemical space. Changes in chemical properties were assessed using ChemGPS-NP and visual inspection, and the diversity was quantified calcu- lating the volume the compounds occupy in chemical properties space. At first, the distribution of the four selected metabolite types - STLs, FLs, TAs, and BEs - was compared in chemical property space. Unsurprisingly, these metabolites differed in their physicochemical properties and occupied distinct areas in chemical space. As represented in Fig. 10, the different compound sets explore the first dimension and vary considerably in their molecule size (PC1, positive direction = increase in molecular size). FLs spread out in chemical space and occupy a large volume, whereas STLs and TAs occupy rather small and well-defined volumes. With increasing mole- cule size, STLs and TAs tend to show higher lipophilicity (PC3, positive direction = increase in lipophilicity), whereas FLs and BEs become more hydrophilic. In contrast to STLs, TAs vary also in their degree of aromaticity (PC2, positive direction = increase in aromaticity). BEs show only little vari- ability in their lipophilic properties but explore especially PC1. Using the quickhull algorithm for convex hulls [67] to compute the vol- ume occupied in chemical space, the chemical diversity was measured for each metabolite dataset in the first three dimensions. As already indicated by visual inspection, the FLs showed the largest chemical diversity with a cal- culated unit volume (V) of 2,915.49, followed by STLs = 135.31, TAs = 92.92, and BEs with the smallest volume of 64.22. To evaluate how different types of specialized metabolites change in physicochemical properties in different plant families, the datasets were fur- ther divided into subsets based on the origin of the compounds. Charting the different metabolites on the ChemGPS-NP global map according to their occurrence in plant family and subsequently calculating their corresponding volume allowed interpreting possible evolutionary patterns in chemical property space. Figure 11 represents the distribution of the STL subsets in

34 a) PC1

PC2

PC3 PC3

PC1

PC2

b)

Rotated view PC3

PC1

PC2

ChemGPS-NP global map: Specialized metabolites: Typical compounds:

PC1: size, shape, polarizability Betalains Gomphrenin-I PC2: aromaticity & conjugation related properties Flavonoids Epimedin C PC3: lipophilicity, polarity & H-bond donor capacity Sesquiterpene lactones Costunolide Tropane alkaloids Atropine

HO Gomphrenin-I Epimedin C HO HO O O O O O O + HO O N O- HO OH HO OH O OH OH OH O O OH Costunolide OH O N O HO H O O OH OH O N OH O O Atropine O OH Figure 10: Charting specialized metabolites in the ChemGPS-NP global map. (a) First three dimensions showing primarily PC1 = size, PC2 = aromaticity, and PC3 = lipophilicity. Color-code specialized metabolites: betalains = red; flavonoids = or- ange; sesquiterpene lactones = blue; tropane alkaloids = green. (b) Rotated view of the ChemGPS-NP global map. Typical compounds of each major group have been highlighted: gomphrenin-I = dark red, epimedin C = violet, costunolide = turquois, atropine = light rose. Adapted with permission from Henz Ryen & Backlund, 2019, ©2019 American Chemical Society.

35 PC1 PC2 a) 1 2 HO * * OH H O O O O H O OH

4 O H * 3* O 4* O O O 5 * O O O 1* *3 PC3 O 6 H * O 5 6 2 * * OH * OH HO O O H O O O O O HO Origin of Sesquiterpene lactones: O O Aristolochiaceae (n = 21, V = 0.15) Lauraceae (n = 135, V = 2.86) Canellaceae (n = 34, V = 0.4) Winteraceae (n = 10, V = 0.23) O O Chloranthaceae (n = 197, V = 26.04) Zingiberaceae (n = 23, V = 0.21) O Magnoliaceae (n = 47, V = 0.35) all plant families (transparent) O b) 7 * O O O O

O O

OH 8 7* * O 9* O HO

HO O H O 8* HO O O OH H O O

9 O * O O O H O Origin of Sesquiterpene lactones: O O Apiaceae (n = 244, V = 24.64) OH Asteraceae (n = 4,474, V = 105.53) O OH Lamiaceae (n = 36, V = 4.65) O all plant families (transparent) O

Figure 11. ChemGPS-NP global maps of sesquiterpene lactones. In general, STLs produced by members of the magnoliid-clade and monocots cluster in a well-defined volume, whereas STLs produced by members of the expand the volume occupied in chemical property space and show a stronger variability in their proper- ties. First three dimensions: PC1 = size, PC2 = aromaticity, PC3 = lipophilicity. Color-code according to plant family origin, n = number of compounds, V = unit volume. Positions of representative compounds are highlighted. Compounds: 1: parthenolide; 2: zedoarolide B; 3: 7α-acetylugandensolide; 4: bilindestenolide; 5: linderalactone; 6: henriol A; 7: scapiformolactone D; 8: ixerin N 6′-O-acetate; 9: thapsigargin. Reprinted with permission from Henz Ryen & Backlund, 2019, ©2019 American Chemical Society.

36 chemical property space as an example of such an assessment. Global maps of other specialized metabolite types are presented in Paper I. It was determined that physicochemical properties of the selected special- ized metabolites change in different plant families. Each of the metabolite groups showed different behavior in chemical property space, which in most of the cases could be linked to contemporary evolutionary processes. Conse- quently, evolutionary mechanisms were reflected in chemical property space. Regardless of metabolite type, higher chemical diversity could gener- ally be explained by several mechanisms that are considered driving forces leading to chemical diversity in plants (cf. Introduction). This includes for example an enhanced level of enzyme catalytic promiscuity, continuous branching and extensions of existing pathways with multifunctional en- zymes, broadened substrate recognition or the biosynthesis of multiple prod- ucts of a single enzyme [9, 12, 84]. The diversification rate, i.e. the rate at which the number of lineages in a clades grows [85], could influence chemi- cal diversity as well. A higher diversification rate can result in a higher number of gene duplications and mutations, leading to a higher number of possible homologous enzymes with catalytic promiscuity. Thus, plant fami- lies belonging to such clades could for example produce a larger variety of compounds, which reflects as a larger volume occupied in chemical property space. The results and potential mechanisms responsible for the observed diversity and distribution were discussed in detail in Paper I. The most im- portant potential mechanism for each metabolite group can be summarized as follows:

Sesquiterpene lactones:

Divergent evolution

The chemical diversity of STLs differed considerably between the families. In general, plant families that belong to the magnoliids clade and the Zingi- beraceae (monocots) produced less diverse STLs and occupied only a small volume in chemical space. Plant families belonging to the eudicots on the other hand expanded this volume and produced more complex STLs that varied stronger in their properties, especially in size and lipophilicity. Excep- tions were STLs formed by members of the Chloranthaceae, a family for which the systematic placement remains disputed [63]. This family formed two separate clusters in chemical space (cf. Fig. 11a). Even though STLs show a patchy distribution among the angiosperms, a convergent evolution of terpenoid synthases (TPS), which play a crucial role in the formation of terpenoids, can with high probability be ruled out since independent functional specialization has occurred after the separation of angiosperms and gymnosperms [13, 86]. Instead, their patchy distribution

37 can be linked to on/off switching of genes, silent gene clusters, or the “birth- and-death” evolution of multigene families [7, 10, 87]. Besides the known high catalytic plasticity of TPS that leads to a high diver- sity of different terpenoids, lineage-specific TPSs could be connected to a variation in diversity. The angiosperm-specific subfamily TPS-a is further grouped into dicot- and monocot-specific TPS (TPS-a-1 and -2, respectively) [88]. Although no classification of TPS-a could be found in literature for the magnoliids, the specific TPSs could initially influence the inherent variabil- ity of STLs in the members of the different clades and thus explain the dif- ferences in chemical diversity. After inspection of the molecular structures, it was revealed that the changes in properties especially derive from side chain modification and thus, enzymes for subsequent modifications are important for STL diversity. Cytochrome P450 71 (CYP71) contributes particularly to sesquiterpene di- versity [89] and it has been shown that magnoliids and monocots lack one of the oldest member of the CYP71 clan, and that magnoliids lack in addition other monocot-specific CYPs [90]. This absence could explain the lower diversity of STLs found in members of the magnoliids and the Zingiberace- ae. Besides owning specific set of enzymes, the observed patterns in chemi- cal space could furthermore be linked to plant metabolic gene clusters and different mechanism of pathway assembly. It was shown that new pathways arise in monocots by mixing and matching individual TPS and CYP genes by dynamic genome rearrangements, whereas in eudicots, microsyntenic blocks of TPS/CYP gene pairs duplicate and provide templates for the evolu- tion of new pathways [91]. Assuming that magnoliids have a similar path- way assembly as monocots, this provides a complementary explanation why members belonging to the eudicots show a significantly higher chemical diversity. At last, plant families such as Apiaceae and Asteraceae belong to clades with higher diversification rate [85], which could explain their elevated met- abolic diversification. Moreover, the outstanding STL diversity in the Aster- aceae could be a good example of continuous metabolic diversification by extension of existing pathways with additional enzymes, and further on, the combination of various existing pathways including a broad substrate toler- ance and catalytic promiscuity of enzymes involved in STL formation [8, 84]. Ultimately, this could lead to the large number of STLs and their high chemical diversity in the Asteraceae.

38 Tropane alkaloids:

Convergent evolution of similar compound

The four analyzed family subsets of the TAs (Convolvulaceae, Erythroxy- laceae, Proteaceae, and Solanaceae) showed a similar behavior in chemical property space regarding their aromaticity but varied strongly in their mo- lecular size. The Proteaceae produced the lowest molecular weight com- pounds and resided in a small and well-defined volume, reflecting low vari- ability. TAs produced by the three other families expanded the volume oc- cupied and showed greater chemical diversity. As demonstrated by Jirschitzka et al. [92] species of Solanaceae and Erythroxylaceae have recruited enzymes from completely different protein groups to perform the same reaction in the tropane ring biosynthesis. Thus, TA biosynthesis evolved independently in Solanaceae and Erythroxylaceae and supports the hypothesis that TAs have evolved more than once in the angiosperms, i.e. a convergent evolution of these compounds took place. This chemical convergence is also reflected in chemical property space. As the different families produce similar compounds despite different path- ways, they reside in similar volumes and show similar behavior in chemical property space. The strong differences regarding molecule size between the families could be ascribed to extension of existing pathways and different subsequent modification of the TAs. Moreover, the Convolvulaceae, Erythroxylaceae, and Solanaceae are members of plant orders with a high diversification rate [85], which might provide the basis for higher chemical diversity. Lastly, granatane alkaloids (GAs) are produced by all analyzed families in addition to TAs, but not by the Proteaceae. Even though the syn- thesis of these two alkaloids is hypothesized to be similar, a whole new set of enzymes could have been recruited for the formation of GAs [93]. This in turn could lead to the incorporation of new enzymes into the biosynthetic pathways of TAs – enzymes that are lacking in the Proteaceae, which re- flects as lower diversity of their TAs.

Flavonoids and betalains:

Convergent evolution of different compounds that fulfill similar biological functions

Analyzing the two different metabolite types separately, no apparent evolu- tionary pattern could be seen in chemical property space. Concerning flavo- noids, this could be because their biosynthesis is very well preserved in land plants. Regardless of subgroup of flavonoids, from chalcones to more com- plex anthocyanins, all enzymes involved in the formation of flavonoids are already present in the first angiosperms and with minor exceptions, all major

39 genes encoding for these enzymes are conserved in all angiosperms studied [94, 95]. However, the overall distribution of flavonoids in chemical space reflected an increased chemical diversity that can emerge from a conserved pathway, which was extended by a range of different enzymes since the first green plants colonized land [96]. Besides this, their uniform core structure and subsequent modification mechanism [97] are less suited to analyze chemical diversity in terms of physicochemical properties. For example, glycosylation at either of two adjacent carbons is visible regarding molecular structure, but physicochemical properties will change in the same way. Hence, it is not surprising that no patterns were observed between the fami- lies. Whereas anthocyanins, a subclass of flavonoids, are responsible for color of fruits and flowers in most of the angiosperms, another class of pigments, the betalains, has replaced anthocyanins in a number of members belonging to the plant order Caryophyllales [98, 99]. These nitrogen-containing com- pounds are structurally and biosynthetically different from anthocyanins, however, are likely to have similar physiological and ecological roles be- sides their functions as pigments. By charting anthocyanins and betacyanins, a subgroup of betalains, this possible similarity in function was reflected in an overlap of physiochemical properties and a similar behavior in chemical space. As betacyanins possess similar properties as anthocyanins, this could speak for a theory of convergent evolution: structurally different compounds that arise from different biochemical pathways but fulfill the same or very similar functions [13].

Taken together, it was shown that proposed evolutionary processes in plant specialized metabolism and the resulting metabolic diversification are re- flected in chemical property space. These results serve as basic concept to further study the evolution of chemical diversity and biochemical traits in plants. ChemGPS-NP can be used to assess and define the chemical potential of a defined group of plants and detect possible trends in physiochemical properties. Knowing that evolutionary processes are reflected in chemical space, a combination of ChemGPS-NP with phylogenetics holds great pre- dictive value. In Paper I, a small subset of STLs was used to demonstrate a potential application, which will be summarized in the following paragraph. In addition, a method will be introduced to define an area in chemical space with desired properties, which can be used to identify similar compounds from distant related taxa.

40 Towards a guided plant selection in drug lead discovery (I) Presuming a correlation between chemical diversity and phylogeny, i.e. that related taxa will produce similar metabolites, enables the prediction of the chemical potential of closely related taxa for which chemical data are miss- ing. In the same way, this can include functions of their metabolites, for ex- ample biological activity. Chemical diversity was assessed for the genera , Liatris and Mikania, belonging to the tribe , Asteraceae. Their STLs showed some overlap in chemical space, but predominantly differed in distribution and trends in physicochemical properties (Fig. 12). Charting STLs with anti- protozoal activity isolated from Mikania micrantha [100] showed that these compounds are very likely genus-specific among the three analyzed genera as they resided in an area almost exclusively populated by STLs from Mika- nia (dataset available in Supporting information of Paper I).

PC2 PC1 20* O

20 * O *21 O 22* O O

21* O PC3 O O O

O O

22* O O Origin of Sesquiterpene lactones: O Eupatorium (n = 178, V = 2.58) O Mikania (n = 143, V = 3.86) Liatris (n = 32, V = 1.32) O O H M. micrantha, antiprotozoal activity M. trachypleura

Figure 12. ChemGPS-NP global map of STLs retrieved from the three genera Eupa- torium, Mikania, and Liatris, tribe Eupatorieae (Asteraceae). The three genera partly overlap in physicochemical properties but mostly show clear tendencies and differ- ent behaviors in chemical space. First three dimensions: PC1 = size, PC2 = aroma- ticity, PC3 = lipophilicity. Color-code according to genus or species origin, n = number of compounds, V = unit volume. Antiprotozoal compounds from M. micran- tha cluster together as they have similar properties (highlighted in red). STLs from the related M. trachypleura reside in the vicinity of the active compounds (pink). 20: deoxy-mikanolide, 21: mikanolide, 22: dihydromikanolide. Adapted with permission from Henz Ryen & Backlund, 2019, ©2019 American Chemical Society.

41 In search for similar compounds with antiprotozoal activity, compounds that reside close to the active compounds in chemical property space could be tested for their activity, as compounds with similar properties are likely to have similar biological activities [35]. Retrieving the nearest neighbors in the high-dimensional ChemGPS-NP space can easily be achieved by calculating Euclidean Distances (ED) [32]. Concerning the selection of a promising plant source for drug lead discovery, closely related species should be tar- geted next for phytochemical investigations and bioassays, since it can be assumed that they produce similar compounds. As discussed in Paper I, a related species was identified, that indeed produces similar or even the same compounds. As can be seen in Fig. 12, the compounds isolated from M. trachypleura [101] reside in the vicinity of the active compounds from M. micrantha and reflect the rationale to search for compounds with desired properties in related species. Hence, further related species, which have not been subjected to phytochemical analyses, could be selected to test for anti- protozoal properties. To extend the search, other genera could be targeted that produce compounds that reside in a similar volume as the active com- pounds. In this case however, Eupatorium and Liatris could be excluded from further analyses because they mostly produce STLs with different properties and show different tendencies in chemical property space. Thus, with the data present, it seems unlikely to find STLs with desired properties in these two genera, which limits the search on related Mikania species. This example of a phylogenetic implementation demonstrates that a promising plant source for drug lead discovery can be detected based on close relation- ship and that other taxa can be excluded from further investigations, based on their behavior in chemical space. As discussed in Paper I, compounds with similar physicochemical prop- erties can also be found in distantly related taxa. The following implementa- tion demonstrates how to define a volume in chemical space with desired properties, and that valuable plant sources for further investigation can also be identified irrespective of their relationships. Chagas disease, a potentially life-threatening illness caused by the protozoan parasite Trypanosoma cruzi, is one of the neglected tropical diseases [102]. The STL dataset used in Pa- per I has been charted in Fig. 13a and STLs with confirmed activity against T. cruzi have been highlighted (CAS, PC1-3 and plant of origin provided in Table S1, Supporting information). As can be seen in comparison to the whole dataset, the trypanocidal STLs are rather small to medium sized mole- cules (PC1) with lipophilic properties. They do not show strong variability in their lipophilicity (PC2) and have no aromatic properties (PC3). They occu- py a small and well-defined area, V = 1.03, in comparison to the whole STL dataset, V = 135.31. By calculating the convex hull of these compounds, not only their chemical diversity can be assessed, but the hull can also be used to define the volume in chemical space that is occupied by STLs with activity against T. cruzi. Thus, the physicochemical properties of potential trypano-

42 PC2 a) PC1

PC3

b)

STLs inside convex hull V = 1.03

c)

Antiprotozoal activity of STLs: Origin of STLs: Chagas Helianthus, Asteraceae Malaria Ferula, Apiaceae Leishmaniasis Salvia, Lamiaceae , Lauraceae STLs Magnolia, Magnoliaceae Figure 13. Selection of compounds and plant genera as promising sources for drug lead discovery using a convex hull in chemical property space. a) ChemGPS-NP global map of STLs and antiprotozoal STLs (Chagas, red). STLs active against T. cruzi reside in a small and well-defined volume, representing smaller compounds without strong variability in their lipophilicity. First three dimensions: PC1 = size, PC2 = aromaticity, and PC3 = lipophilicity. b) All STLs inside the convex hull de- fined by the STLs active against T. cruzi. c) All STLs inside the hull including anno- tation of different antiprotozoal activity and STLs of five example genera from dis- tant related families.

43 cidal compounds can be defined and the search for new compounds can be narrowed down to these specific properties. Whether a compound resides inside a convex hull can be tested using the function inHull included in the R package gMOIP [117]. The results of this test are expresses as -1 = outside hull, 0 = on hull, and 1 = inside hull. Hence, all compounds inside the defined convex hull can be filtered. Figure 13b shows the convex hull including all compounds inside the hull. In total, 1490 STLs (CAS) were retrieved (992 unique positions, data not shown). Several of the compounds in the hull have reported activities against other diseases caused by protozoan parasites, malaria and leishmaniasis, and have been highlighted in Fig. 13c (data in Table S2 and S3). This infers that se- lecting compounds inside a volume, which is defined based on activity data, can represent a promising selection approach for new drug leads. Hence, other compounds residing in this volume represent candidate compounds that are likely to have suitable properties against T. cruzi and further diseases caused by protozoa. Next, the compounds were matched with their plant of origin (data from Paper II), which showed that the STLs inside the convex hull belong to 197 genera from eight different families. In Fig. 13c, five genera from distant related families have been highlighted as example. It can be assumed that genera, which produce STLs residing in the defined volume, represent prom- ising plant sources to search for new drug leads. This approach of filtering genera that produce compounds in a volume with desired properties, sets the basis for a guided search for a promising plant source, irrespective of phylo- genetic relationships. Of course, the numbers of filtered STLs and genera in this example are too high as starting point for further studies, but can be reduced by more specific criteria. As a next step, it would be useful to inves- tigate the overall spread of the compounds from the filtered genera. This determines if they produce more compounds in, or close to, the volume with desired properties or if they show other tendencies in chemical space. This could narrow down the search for suitable genera even more. Not least, keeping evolutionary processes in mind could help to exclude filtered candi- date taxa (preliminary) that are not expected to produce further compounds with certain properties, e.g. compounds with higher molecular weight. To not miss out on similar compounds not enclosed by (or directly on) the con- vex hull, the “tolerance” settings of the test can be changed. With this, the distance of a point outside of a hull can be adapted to be perceived as on the surface of the hull, and hence include more compounds.

44 Key results and observations from Paper I and further related analyses: • physicochemical properties of selected specialized metabolites change in different plant groups • evolutionary processes in plant specialized metabolism and the re- sultant metabolic diversification are reflected in chemical property space • the methods developed can be used to: o measure the chemical diversity of a set of compounds, e.g. a type of metabolite or compounds belonging to a selected plant group o define the chemical potential of a taxon in terms of physico- chemical properties o predict the chemical and possible pharmacological potential of (related) taxa à guided plant selection in drug lead dis- covery o define a volume in chemical space with desired properties and retrieve compounds exclusively thereof à guided plant selection in drug lead discovery

45 Scaffold and molecular diversity of STLs The aim of Paper II and further related analyses was to analyze scaffold and molecular diversity of STLs in different families of the angiosperms, both in a qualitative and quantitative manner, using different chemoinformatic tools. For this purpose, a dataset was compiled that comprised over 5,200 STLs (CAS) and including taxonomic annotations more than 8,600 entries. The transformation of isomeric SMILES to uniform canonical SMILES resulted in 4,394 unique molecular structures and the generation of MF-scaffolds for each compound resulted in 1,238 unique MFs. A quantitative summary of entries in total, number of genera, CAS, unique canonical SMILES and unique MFs per plant family subset is provided in Table 1. Subsets that con- tained less than ten CAS were not considered significant to make general statements about the chemical diversity of a plant family. They have been included in the analyses but are not discussed further.

Table 1. Quantitative summary of the plant family subsets used in Paper II

Family Entries Genera CAS Canonical MF SMILES SMILES Apiaceae 318 14 232 212 76 Aristolochiaceae 38 1 21 18 15 Asteraceae 7337 250 4472 3688 965 Canellaceae 47 5 34 30 11 Chloranthaceae 308 3 196 181 89 Coriariaceae 20 1 13 12 5 Lamiaceae 39 2 36 33 16 Lauraceae 183 5 134 118 71 Magnoliaceae 86 2 47 41 23 Picrodendraceae 15 1 15 15 7 Schisandraceae 143 1 94 85 43 Winteraceae 14 3 10 9 7 Zingiberaceae 30 1 23 19 13 Asparagaceae 1 1 1 1 1 2 1 2 2 1 Limeaceae 1 1 1 1 1 Malvaceae 9 4 9 9 4 Menispermaceae 5 1 5 5 3 Nyctaginaceae 2 1 2 2 1 9 1 9 8 6 Putranjivaceae 1 1 1 1 1 Rosaceae 2 2 2 2 2 Rubiaceae 2 1 2 2 1 Rutaceae 3 1 3 3 1 Xanthorrhoeaceae 4 1 4 3 1 Absolute numbers of: dataset entries including annotation of plant sources down to the taxo- nomic rank of the species (Entries), genera per plant family (Genera), CAS Numbers (CAS), unique canonical SMILES strings and MF SMILES strings. Families with less than ten CAS Numbers are italicized.

46 In total, STLs were compiled from at least 1020 species belonging to 305 genera from 25 families in the angiosperms. As can be seen from Table 1, the families vary in their number of entries (including taxonomic annota- tion), different STLs (CAS Numbers) and further associated records. The main STL-producing family is Asteraceae with over 7,300 entries and more than 4,400 different STLs. Chloranthaceae and Apiaceae follow with over 300 entries and around 200 different STLs. The distribution of STLs has further been examined down to the rank of the species and a range of domi- nant genera could be identified that contained a high number of entries and different STLs. This included for example Artemisia, , Eupatorium, Inula (Asteraceae), and Chloranthus (Chloranthaceae) (over 200 entries and 150 different CAS). A quantitative summary as presented above is provided for the 305 genera in Table S4 (Supporting information).

STL skeletons The amount and distribution of STL skeleton classes was determined per plant family using the unique STLs (5,271 CAS). This revealed the diversity of skeletons in general and the proportions of skeleton classes to other STL producing families. As represented in Fig. 14a, the carbocyclic skeleton germacranolide, guaianolide, and eudesmanolides present the majority of the skeletons among the compiled STLs (STLs_total); however, this occurrence does not reflect the distribution among the analyzed plant families. Instead, the frequency of occurrence varies considerably and reveals that many plant families predominantly produce certain STL skeleton classes, and that some skeleton classes show a very restricted occurrence across families. Zingiber- aceae and Apiaceae both predominantly produce guaianolides, whereas Aris- tolochiaceae, Magnoliaceae, Lauraceae, and Lamiaceae in particular produce germacranolides. Because the Asteraceae represent a large part of the entire dataset, they reflect approximately the distribution of the complete dataset with predominantly germacranolides and guaianolides. In addition, some families produce uncommon skeletons, such as xanthanolides or elemano- lides. Very rare skeletons, seco-prezizaane, drimanolide, or picrotoxane, are solely found in Schisandraceae, Canellaceae, Winteraceae, Coriariaceae, and Picrodendraceae, respectively. The distinct distribution of different classes was captured to the taxonom- ic rank of the species (Fig. 14b), i.e. the absolute occurrences of all STLs across the species belonging to a family (8,619 entries). With 572 reported species, the germacranolides show the broadest distribution among the angi- osperm (STLs_total), which is in accordance with literature [70]. A similar broad distribution is seen with guaianolides. In many cases, the skeletons show similar proportions in species distribution as observed in skeleton oc- currence per family. However, the distributions also show deviation. For example, the eudesmanolides found in the Lauraceae and the

47 a) Skeletons per plant family Skeletons per plant family b) Species distribution

UniqueCount(CAS) per Skeleton class Data table: NEW_STLs_PII_V3_spot_inc 273273 (5,2(5,2 %)%) Marking: Marking Occurrence 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) Data table: Color by 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_inc Skeleton class 273273 (5,2(5,2 %)%) germacranolide Marking: guaianolide Marking 730730 (13,8(13,8 %)%) eudesmanolide 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) Data table: Color by pseudoguaianolide 363363 (6,9(6,9 %)%) 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_inc Skeleton class eremophilanolide 273273 (5,2(5,2 %)%) germacranolide lindenanolide 14841484 (28,2(28,2 %)%) Marking: Marking guaianolide elemanolide 730730 (13,8(13,8 %)%) eudesmanolide ... 17941794 (34,0(34,0 %)%) Data table: Color by pseudoguaianolide 363363 (6,9(6,9 %)%) Skeleton class NEW_STLs_PII_V3_spot_inc Species distribution eremophilanolide Data table: germacranolide lindenanolide 14841484 (28,2(28,2 %)%) Marking: Color by NEW_STLs_PII_V3_spot_incMarking 100 % guaianolide elemanolide 19 34 10 8 8 1 1 15 219 9 273 9 9 5 8 3 4 1 1 1 98 2 131 730730 (13,8(13,8 %)%) 2 eudesmanolide ... 4 17941794 (34,0(34,0 %)%) Marking: 2 Skeleton class 4 Color by pseudoguaianolide 12 110 73 Marking 15 Skeleton class 5 12 Species distribution 90 % eremophilanolide 7 7 3 1 germacranolide Data table: Color by germacranolide lindenanolide 14841484 (28,2(28,2 %)%) 362 Skeleton classguaianolide Color by 96 guaianolide NEW_STLs_PII_V3_spot_inc 100 % elemanolide 11 6 16 8 1 1 15 13219 9 273 9 9 5 8 803 %4 1 1 1 98 2 131 363 germacranolideeudesmanolide ... 4 4 97 Marking: 2 Skeleton class243 eudesmanolide 4 guaianolidepseudoguaianolide 1 7512 110 73 Marking570 291 13 5 12 Species distribution 90 % eudesmanolideeremophilanolide7 7 3 1 germacranolide pseudoguaianolide 40 730 70 % 7 Data table: Color by 8 pseudoguaianolidelindenanolide 14841484 (28,2(28,2 %)%) 362 3 Skeleton class Color by 96 guaianolide 6 NEW_STLs_PII_V3_spot_inc 100 % eremophilanolideelemanolide 11 6 eremophilanolide 219 13 9 273 9 9 5 8 3 804 % 1 1 1 98 2 131 16 363 germacranolide ... 4 97 Marking: 1309 2 lindenanolide 60 % Skeleton4 class 401 13 243 1 4 3 eudesmanolide lindenanolide 73 Marking570 guaianolide elemanolide 31 459 13 5 291 Species distribution 90 % eudesmanolide127 7 1484 3 1 germacranolide1 1 7 730 7 70 % xanthanolide 7 pseudoguaianolide elemanolide Data table:40 Color by 8 50 % 18 11 362 pseudoguaianolide seco-prezizaane 3 1 1 Skeleton class Color by 96 guaianolide 6 NEW_STLs_PII_V3_spot_inc 100 % eremophilanolide 11 6 3 eremophilanolide xanthanolide 9 9 5 8 803 % 4 1 1 1 98 2 131 drimanolide 16 363 germacranolide lindenanolide 4 60 % 4 97 8 Marking: 11 1309 2 19 Skeletonpicrotoxane class eudesmanolide401 lindenanolide seco-prezizaane 13 243 1e 404 %3 570Marking guaianolide elemanolide other c 31 459 5 n 291 90 % 127 7 1 1 e 12 22 eudesmanolide 14847 3 xanthanolide germacranolider pseudoguaianolide elemanolide drimanolide 8 730 10 70 % 7 u 11 Color by pseudoguaianolide 50 % 18c 4 468

3 c seco-prezizaane 1 30 % 528

Skeleton class 96 guaianolide 6O eremophilanolide xanthanolide eremophilanolide 11 6 16 3 picrotoxane 80 % drimanolide 7 germacranolide lindenanolide 7 60 % 4 97 8 1309 51 19 picrotoxane1688 eudesmanolide401 lindenanolide seco-prezizaane other 13 243 e 14 403 % guaianolide elemanolide 45 other c 31 45920 % n 1794291 127 1 e 12 eudesmanolide 1484 xanthanolide r pseudoguaianolide elemanolide drimanolide 10 70 % 7 u 11 pseudoguaianolide 50 % 18 c 4 468

3 c seco-prezizaane 1 30 % 528 O 6 eremophilanolide 10 % xanthanolide picrotoxane eremophilanolide drimanolide 3 5 7 8 7 lindenanolide51 7 19 60 % picrotoxane1688 401 lindenanolide seco-prezizaane other 13 5 e 404 % 3 0 % elemanolide 45 other c 31 20459 % n 1794

1 e 12 l xanthanolide r elemanolide drimanolide u e e e e e e e e e e e e e

11 a 50 % c 18 Figure 14. Frequency4 of occurrence and distribution of different STL skeletons. a) 468 t a a a a a a a a a a a a a

seco-prezizaane c 1 30 % 528 o O 10 % xanthanolide 5 picrotoxanee e e e e e e e e e e e e drimanolide 17 3 7 t

7 c c c c c c c c c c c c c 8 _

1688picrotoxane a a a a a a a a a a a a a 18 s

Relative (percentage, Y-axis) and l absolutei i frequenciesi i ofi occurrence (numbers in

seco-prezizaane r r r r r r

e other

5 40 % l l

0 % r h c L

20 % h p t other d e u e d e n e o m a

1794 t t e

12 c i T n l n a b n A r n n r a

drimanolide i s n e e e e e e e e e e e e e u o i a a a L e S i l l c a g r o L

4 g e e e e e e c e a e h e c e e e d e a e e a e a t r

bars) of468 different skeleton classes per family (5,271 CAS).A b) Distribution of differ- h a a a a a a a a a a a a a t t c d s a c a c a a a a e a a a c a n a e a e

30 % d o c i a n o

528 i n l r n o C o e a a e e c e e a e e c e c t O W t C e e e e e e e e e e e e e l o 10i % i

a picrotoxane l a l r l o e

5 t o c 17 c a c r d c a c a _ 7 r i h s e e o n r b r s c c c c c c c c c c c c c M r i a a a i a o a a s

t t o 7 i i i h Z l _ n g u r r e m h g r L c n s t p r i i h c a a a c a a a a a a a a a a a a a a c n o i T r s s i

18 i l i i i i i A L L r r r r r r C

C other C M

ent skeleton classesl amongl species (= all entries excluding “NA_spec”). The se- S W P S S A C A r Z h

0 % A L

20 % h p t P d e u e d e e o m a t t c i T n n a b n A l

Family n n

r a Family i s n e e e e e e e e e e e e e o i a a L e S l a l a g o L g a e h e c e e e d e a e e a e a r t A t t quencea a a a oa f familiesa a a a isa accordinga a a to phylogenetic relationships (cf. Fig. 5). The occur- s e a a a c a n a e a e d o a n o i r n C o c e e a e e c e c t o W t C i

10 % e e e e e e e e e e e e e i a l e 5 t o a c r d c a c a _ 7 r i h r b r s M r a i a o a a s c c c c c c c c c c c c c o i i i h Z l u r r e m _ g r L c t p r h c a c a a a a a a a a a a a a a a n o i T s i i s 18 A l i i i i i L L C C r r r r r r P S l l S C A Z r 0 % rence of different STL skeletonsA varies considerably between the families. h L h p P t d e u e d e e o m a t t c i T n l Family n a b n A Family n n r a i s n e e e e e e e e e e e e e o i a a a L e S l l a g o L g a e e a e a t r A a a a a a a a a a a a a a t s e a e d o a o i n o C o c e c t W t C e e e e e e e e e e e e e i l t o a c a _ i h r s c c c c c c c c c c c c c M r a s i i h Z _ e m L c t p r c a a a a a a a a a a a a a a T s s i l i i i i i A L r r r r r r C l l S S A r h A L h p t P d e u e d e e o m a t t c i T n n a b n A n n

r Familya i s n o i a a L e S l a g o L g r A s d o a i n C o W t C i l o h s M r i h Z c r

lindenanolidesc from Chloranthaceae are less widely distributed across their i C S A P Family species than what might have been expected based on their high frequency of occurrence. This means that certain types of skeletons are mainly pro- duced by comparably few species. Due to the detailed taxonomic annotation of the dataset, such distribution analyses can also be performed on the taxo- nomic rank of either genus or species. An example thereof is presented in Fig. S1 (Supporting information). The summary of skeleton distribution across families and their respective genera is available on request.

Scaffold diversity analyses Analyzing the scaffold diversity showed that the plant families differ in their degree of diversity and possess specific sets of molecular frameworks with a considerable variation in their frequency of occurrence. Depending on the metric used to evaluate scaffold diversity, different observations were made.

48 The detailed results of MF frequency counts, CSR curves and SSE values are summarized in Table 1 of Paper II. Based on MF frequency counts and CSR curves, the plant families showed different levels of diversity. To give an immediate overview of scaffold di- versity of each family, Fig. 15 represents the CSR curves, which assess the complete distribution of the MFs among the compounds. To obtain these curves, the MFs are at first ordered by their frequency of occurrence (most to least common). The fraction of MFs is plotted on the X-axis and the fraction of compounds that contain those MFs is plotted on the Y-axis. Thus, a da- taset with maximum diversity would contain a different MF-scaffold for each STL and the curve would correspond to a diagonal. Vice versa, mini- mum diversity would be represented by a dataset in which all compounds have the same MF-scaffold and the curve would be a vertical line.

Figure 15. CSR curves for all family datasets. Curves that rise more steeply (e.g. Asteraceae, sand) suggest lower diversity, where- as curves closer to the diagonal (e.g. Aristolochiaceae, orange) indicate larger scaf- fold diversity.

In Fig. 15, clear differences of MF diversity can be seen between the plant families. Whereas families such as Asteraceae and Apiaceae rise more steep- ly and indicate a lower diversity, the curves for Aristolochiaceae, Winterace- ae, and Zingiberaceae suggest a larger diversity of these datasets since they

49 are closer to the diagonal. The results of MF frequency counts and the quan- titative analysis of the CSR-curves are discussed in detail in Paper II. The key finding is that differences in scaffold diversity exist between the fami- lies. In general, intermediate diversity was observed. Thus, it can be inferred that the known high structural diversity of STLs is likely to be driven by various substitutions on the different MFs rather than the diversity of MF- scaffolds itself. SSE values, however, showed nearly an overall high diversity and partly contradicted the low diversity measured for some plant families by other metrics (discussed in more detail in Paper II). This metric enables the measurement of the specific distribution of molecules in the most populated MFs of each dataset. Thus, it is less affected by the differences in size of the datasets and enables a more refined and comparable assessment of scaffold diversity. For example, datasets of Lauraceae and Magnoliaceae showed the highest diversity when considering their ten most populated MFs, while in previous metrics, these datasets presented intermediate diversity. In previous metrics the Apiaceae and Asteraceae datasets were defined as datasets with low diversity and SSE values for the Apiaceae reflected low diversity as well. In case of the Asteraceae, the SSE values reflected continuous high diversity up to their 60 most populated MFs. Such discrepancies emphasize the importance of complementary methods to assess scaffold diversity from different angles. Besides the determination of the inherent scaffold diversity of each plant family, the MFs were further analyzed by visual inspection. Figure 16 pro- vides an overview of the most common MFs for each plant family. As point- ed out above with regard to the distribution of STL skeleton classes, the ab- solute distribution of MFs can vary among the species as well. Thus, the three most common MFs are illustrated per family dataset, reflecting both: the most populated MFs concerning their unique STLs, and the MFs that show the highest absolute occurrence among their species. In cases of equal frequencies, MFs were chosen in favor depicting different MFs. Two scaf- folds are presented if only singletons were present further on. It becomes clear from Fig. 16 that the most common MFs are quite differ- ent between the plant families. Similar MFs of two families that belong to the same skeleton class differ e.g. in positions of double bonds. Often, the most populated MFs are also among the three MFs with the highest absolute occurrence among species. In some families however, e.g. Magnoliaceae or Zingiberaceae, other MFs are found more frequently across their species. This indicates that even if a plant family possesses the ability to produce a variety of different scaffolds, certain pathways seem to predominate at the genus or species level.

50 Most common MFs - Occurrence in unique STLs Most common MFs - Absolute occurrence O O O O O O O O

O O O O O O O O O O Schisandraceae O O O O O O O O O O O O O O Canellaceae

O O O O O O O O O O O O Winteraceae

O O O

O O O O Aristolo- chiaceaec O O O

O O O O O O O O O O O

Magnoliaceae O O O O O

O O O O O O O O O O O O O Lauraceae O O O O

O O O O O O O O O O O O

Chloranthaceae O O

O O O O O O O O O O O O O Zingiberaceae

O O O O O O O O O O O O O O O Coriariaceae O

O O O O O O O O draceae O O Picroden- O O

O O O O O O O O O O O O O Asteraceae

O O

O Apiaceae O O O O O O O O O O O

O O O

O O O O O O O O O O O O Lamiaceae O O O O

Figure 16. Overview of the most populated MFs per plant family. Left: MFs with the highest occurrence among the unique STLs. Right: MFs with the highest abso- lute occurrence (= distribution among species).

51 To depict most the populated MFs and their frequency of occurrence at the same time, the “Molecule Cloud” [77] was employed. In this cloud, the size of the MFs corresponds to their frequencies in the dataset, i.e. the larger the picture of a MF, the higher the frequency of occurrence. As example, the 30 most abundant MFs are illustrated for the plant family Apiaceae and Laura- ceae in Fig. 17. Molecule clouds of all plant families are provided in the Supplementary data of Paper II (Fig. S2 – S15) together with the distribu- tion of MFs in form of histograms. In the case of the Apiaceae, a few larger molecule images and several smaller images appear, reflecting that some MFs occur rather often, and thus reflect low scaffold diversity. In the Laura- ceae, high scaffold diversity is reflected, since the scaffolds are more evenly distributed among the compounds, resulting in images of similar size. Using the Molecule Cloud to visualize the most populated MFs gave a simplified overview of the different MFs and STL skeletons in the plant families. In addition, it allowed pinpointing of further STL subgroups, for example, whereas the Lauraceae produced a variety of isomers and sub- groups of different STL classes, with both cis- and trans-fused lactones, the Apiaceae seem to be rather restricted. As determined in the skeleton class analysis, members of the Apiaceae produce especially STLs with a guaian- skeleton. It is striking that the Apiaceae produce apparently only guaian- olides with cis-fused lactones (6,12-guaianolides). In fact, only three guaian- olides with a trans-fused lactone (8,12-guaianolides) are recorded in this family subset. This could provide another explanation as to why the Apiace- ae showed lower scaffold diversity in the different scaffold diversity metrics. Calculation of MF overlap between the plant families confirmed that the plant families only hold low similarity to each other and in fact produce fam- ily-specific MFs, albeit these MFs belong to the same skeleton class. As presented in form of a similarity matrix in Fig. 18, hardly any similarity was found between the MFs of the different plant families. In general, mostly low Overlap indices (OI) are present which points towards uniqueness of the MFs in each family. A slight overlap was found between both phylogenet- ically more closely related families (e.g. Magnoliaceae and Lauraceae, OI = 0.08) and distantly related families (Zingiberaceae and Lamiaceae, OI = 0.07). Unsurprisingly, Canellaceae and Winteraceae, and Picrodendraceae and Coriariaceae only showed overlap with each other whereas the Schisan- draceae did not show any overlap with other families.

52 a)

b)

Figure 17. Molecule Clouds of the family (a) Apiaceae and (b) Lauraceae. The 30 most populated MFs are depicted for each dataset. Larger molecule images represent MFs that occur several times, whereas smaller images stand for MFs that are seldom found. Colors have been added at random for better visibility.

53 Similarity Matrix - Molecular Framework Overlap Similarity Matrix - Molecular Framework Overlap Plant family

Plant family e

Overlap Index e e e a a a e a e e e e e e e e a e c a a c a e a e

Colors e a c e c e a e e e a e a e a e e a r c a i e a a c a e e e c e r c c e c e a e d a h a e h

a e a Max a e a e a a r d a a t c i e c a c e e n c c i r r e c e e a l e a e d n h a n a h a c a a e e a o e i c d a t c l c r c c e l o e n c i r a l r e 0,50 a a l e a d n b a n a a c r a e a r c e o n i i o e i c s e a l r c t l o e i t r i o a l r a a o e a d g g b a r r r n l e a r c i o n i i n s t s Min h u m e a t i i i t r i o a n c o o a h e a r s p g g c i i r r a a n l i n s t h u m i i a n c o M C a h C W A L C Z A A L r s p S P c i i a a M C C W A L C Z A A L S P Schisandraceae

Canellaceae 0.38

Winteraceae

Aristolochiaceae y

l Magnoliaceae i 0.08 m a f t n a l Lauraceae P

Chloranthaceae

Zingiberaceae 0.07

Coriariaceae 0.33

Picrodendraceae

Asteraceae

Apiaceae

Lamiaceae

Figure 18. Overlap index matrix. The OI quantifies the similarity between the MFs sets of two plant families, ranging between 0 and 1. Values closer to 1 express a higher similarity (= Max, dark blue), whereas 0 means no common MF (= Min, light blue).

Clustering analysis The clustering analysis revealed similarities and differences of the com- pounds and the used metrics, ECFP_6 and a TI of 0.4, enabled to qualitative- ly divide STLs into smaller groups with similar structural features. The 4,394 unique compounds (uniform canonical SMILES) separated in 518 clusters, of which 71 were singletons. More than half of the clusters con- tained 2 – 10 compounds. This reflected a large diversity of STLs as many clusters were formed of which the majority contained comparatively few compounds. To investigate the distribution of the plant families and their genera among the clusters, the unique compounds were tagged with their associated plant origin (= 8,619 entries). Initially, the distribution among the clusters can give indications about the molecular diversity found in each plant fami- ly. A broad distribution suggests a higher diversity whereas a limited distri- bution suggests a lower diversity. Furthermore, mixed clusters, i.e. clusters occupied by STLs belonging to several families or genera, can reveal simi- larities. Vice versa, clusters formed by compounds exclusively produced by certain taxa can reveal their uniqueness. The distribution of the families, their overlap with other families, as well as examples of genus distribution

54 are discussed in detail in Paper II. Below, the results will be summarized to give an overview of the most important findings. Figure 19 represents the distribution of the families among the clusters, including excerpts of clusters and example compounds. In general, the dis- tribution among the clusters correlates with the number of STLs produced by a plant family. For example, the Asteraceae show the strongest distribution as it contains the highest number of unique structures, whereas families with very few STLs such as Winteraceae only occur in a small number of clusters (cf. pie chart Fig. 19a). However, a higher number of compounds does not necessarily correspond to a higher chemical diversity, as has already been

STLs_dataset_PII Cluster analysis – Family distribution Clustera) analysis - Family distribution Color by Family 300 29 Apiaceae (4.4%) Aristolochiaceae 250 43 Asteraceae (6.5%) Canellaceae e

z 200 i Chloranthaceae Genus distribution - DetailGenus distribution - DetailGenuss distribution - Detail GenusBar distributionChart49 - Detail 398 r (60.4%) Coriariaceae e (7.4%) Color by Color by Color by Color byData table: 100 100 t 100150 100 s Genus Genus Genus Lamiaceae Genus STLs_dataset_PII 2000 u l Acanthospermum Acanthospermum Acanthospermum Lauraceae AcanthospermumMarking: C 1800 Achillea Achillea Marking 80 80 10080 80 52 Achillea Magnoliaceae Achillea Achyropappus Achyropappus Achyropappus Achyropappus 1600(7.9%) Actinodaphne Actinodaphne Actinodaphne Picrodendraceae Actinodaphne 1400 Ageratina Ageratina Ageratina Schisandraceae Ageratina e e e 50 e z 60 z 60 z 60 z 60 Ainsliaea Ainsliaea Ainsliaea Ainsliaea i i i ) i

t 1200 Winteraceae s s s s n Ajania Ajania Ajania Ajania u r r r r 0 o e e e

e Zingiberaceae

C Allagopappus Allagopappus t t Allagopappus Allagopappus

t 1000 t s s s w s Amberboa Amberboa o Amberboa Amberboa u u u u l 40 l 40 l

40 l 40R Cluster ID ( 800 Amblyopappus Amblyopappus Amblyopappus Amblyopappus C C C C Genus distribution - Detail 600 Ambrosia Ambrosia Ambrosia Ambrosia Ammodaucus Ammodaucus Ammodaucus Color by Ammodaucus 20 20 20 400 20 Amphoricarpos AmphoricarposGenus Amphoricarpos Anacyclus Anacyclus Anacyclus Anacyclus Family distribution - Detail 200 Genus distribution - Detail Andryala Andryala Andryala Acanthospermum Andryala 0 1200 0 0 b) 0 Anthemisc) Anthemis Anthemis Achillea Anthemis 1 - 9 10 - 18 19 - 27 29 - 44 45 - 199 ...... Achyropappus ... Cluster ID Cluster ID Cluster ID O ClusterBinned ID ClusterSize Family distribution - Detail Family distribution - DetailFamily 100 distribution - Detail FamilyFamily distribution distribution - Detail - Detail Actinodaphne O Color by ColorO by Color by Ageratina Color byData table: Family Family Family FamilySTLs_dataset_PII_V5 HO Ainsliaea Genus distributionGenus distribution - Detail - Detail ApiaceaeGenusO distribution - DetailApiaceaeO Apiaceae ApiaceaeMarking:

e O Ajania Marking 120 120 12080 120 120 Color by z Color by Color by

i Aristolochiaceae Aristolochiaceae Aristolochiaceae Aristolochiaceae Color by s GenusGenusAllagopappus Genus

r Asteraceae Asteraceae Asteraceae AsteraceaeFamily OH O Acanthospermum Acanthospermum e O Amberboa t Canellaceae O Canellaceae CanellaceaeAchillea CanellaceaeApiaceae 100 s 100 Achillea Achillea 100 O 100 100 O O Amblyopappus Aristolochiaceae u 60 l 80O Chloranthaceae80 Chloranthaceae OH ChloranthaceaeBothrioclineAchyropappus ChloranthaceaeAsteraceae Achyropappus 80 HO O

C HO OH O Coriariaceae Coriariaceae CoriariaceaeChloranthusAmbrosiaActinodaphne CoriariaceaeCanellaceae Actinodaphne O O Chloranthaceae e e

e Ageratina e Ageratina #74 O e Lamiaceae O Lamiaceae Ammodaucus z z Lamiaceae

z Lamiaceae z Chrysolaena 80 80 z Coriariaceae i 80 i 80

i 80 i i Ainsliaea Ainsliaea s s s

s O 40 s Amphoricarpos Lamiaceae

e Lauraceae Lauraceae #110 e Lauraceae Cyrtocymura Lauraceae r r Ajania Ajania r r 60 r 60 z z

e Lauraceae i i e e e e 60 e Magnoliaceae #57 Magnoliaceae z Magnoliaceae t t AnacyclusAllagopappus Magnoliaceae t s Allagopappus t s t

i Eirmocephala Magnoliaceae s s s s s r r 60 s 60 Picrodendraceae PicrodendraceaeO PicrodendraceaeAmberboa Picrodendraceae Amberboa u 60 u 60 60 Picrodendraceae u u u Andryala e e l l

l O l l Laser r t t Amblyopappus Schisandraceae Amblyopappus C C s s e 20 C Schisandraceae Schisandraceae C C Schisandraceae Schisandraceae

t Anthemis Winteraceae

u u O Ambrosia Ambrosia l 40 l

s #112 40 Winteraceae Winteraceae #61 Winteraceae WinteraceaeZingiberaceae C u C LepidaploaAnthriscusAmmodaucus Ammodaucus 40 40 l 4040 40 40 Zingiberaceae Zingiberaceae Zingiberaceae Amphoricarpos Zingiberaceae Amphoricarpos C 0 PseudelephantopusAnvillea Anacyclus Anacyclus ... 20 Cluster ID 20 SarcandraAndryala Andryala 20 20 20 20 20 Anthemis Anthemis 20 Thapsia VernonanthuraAnthriscus Anthriscus Filter settings Anvillea 0 0 0 0 0 0 0 … … VernoniaArctium Arctium STLs_dataset_PII ...... - Family:Cluster0 (Apiaceae, ID Aristolochiaceae,Cluster Asteraceae, ID Canellaceae,Cluster Chloranthaceae, ID Cluster Coriariaceae, ID Lamiaceae,uniqSTL_06_ECFP6-ClusterID Lauraceae,Cluster Magnoliaceae,ID Picrodendraceae,Cluster ID Schisandraceae,Family Winteraceae, distribution Zingiberaceae) - Detail Cluster ID Family distribution - Detail STLs_dataset_PII (5) Color by Color by - Family uniqSTL_06_ECFP6-ClusterID: distribution - Detail (147 <= uniqSTL_06_ECFP6-ClusterID <= 151) without empty values Family Family ColorApiaceae by Apiaceae Figure 19.80 Clustering analysis. a) Overview 80of plant family distribution amongFamilyAristolochiaceae the Aristolochiaceae ApiaceaeAsteraceae Asteraceae 518 80clusters (8,619 entries). Cluster size reflects the number of entries in eachAsteraceaeCanellaceae clus- Canellaceae Chloranthaceae Chloranthaceae

e Chloranthaceae 60 e z 60 z i i Coriariaceae Coriariaceae s ter. Color-coding according to plant family.s b) Excerpt of exemplary clusters includ- r r Lamiaceae Lamiaceae e e e t 60 t Lauraceae Lauraceae z s s

ingi representative STLs. Cluster #74 and #112 represent overlapping clusters com- u u s l 40 l 40 Magnoliaceae Magnoliaceae r C C Picrodendraceae Picrodendraceae e

prisingt compounds from several families, whereas cluster #110 is a separate cluster

s Schisandraceae Schisandraceae u

l 40 20 20 Winteraceae Winteraceae

withC compounds solely found in the Schisandraceae. c) Distribution of plant genera Zingiberaceae Zingiberaceae among the clusters. Excerpt of example clusters (highlighted). Color-coding accord- 20 0 0 ing to plant genus. STLs exclusivelyCluster produced ID by genera belonging Cluster to the ID tribe Vernonieae0 (Asteraceae) formed the overlapping cluster #57. Unusual guaianolides solely found in the genus Achillea Cluster(Asteraceae) ID formed the separated cluster #61.

55 shown in the scaffold diversity analysis. Schisandraceae are distributed only among 25 clusters, even though they produce 85 unique STLs (canonical SMILES); in comparison the Magnoliaceae reflect higher chemical diversity of their 41 STLs as they distribute among 29 clusters. Little similarity was found between the plant families, expressed by rather few overlapping clusters. In total, almost 80% of the clusters belonged to distinct plant families, which exhibited the uniqueness of their STLs. Mixed clusters formed by STLs from several plant families often contained STLs with common, more frequent skeletons, namely eudesman-, germacran-, or guaianolides (e.g. cluster #112, Fig. 19b). The combination Asteraceae- Magnoliaceae was found more frequently, and inspection of their com- pounds revealed that this increased overlap is due to the fact that they share similar STLs with α-methylene-γ-lactone (e.g. cluster #74). In contrast, most of the other families that produce ‘common’ STLs contain α,β-unsaturated lactones or further different types of lactones, which will be discussed in more detail below. Another common overlap was obtained between Asteraceae-Apiaceae and closer inspection revealed that this overlap is almost exclusively based on STLs different from the guaian-skeleton. Most likely, and as continuously investigated, there exists an Apiaceae-type STL biosynthesis, which includes different putative precursors and enzymes, resulting in different guaianolides [103, 104]. The cluster separation of guaianolides supports this view and infers that the clustering analysis enables to determine possible differentia- tion of biosynthetic pathways. As expected, the Schisandraceae did not show any overlap with other families due to their seco-prezizaane-type skeletons (e.g. cluster #100). Likewise, the drimanolide producing families Canellace- ae and Winteraceae only overlapped with each other or formed separate clus- ters, as well as the picrotoxane producing families Coriariaceae and Picro- dendraceae. Further described in Paper II is the investigation of separate family clusters on the taxonomic rank of the genus. Clusters were identified which contained STLs from closely related genera or solely from one genus, emphasizing their separation and uniqueness. Examples are represented in Fig. 19c. These clear separations highlighted the chemical diversity within a plant family and pinpointed similarities and differences of their genera. Likewise, clusters with very few molecules or singletons represented unique and more uncommon structures. Thus, the clustering analysis based on ECFP_6 ena- bles the identification of important structural features, such as different lac- tones and other functional groups, but also smallest structural variations. Together with the detailed taxonomic annotation, it provides a classification of the compounds in its entirety, i.e. considering molecular criteria and spe- cific plant origin.

56 The results obtained from the various analyses to assess chemical diversity of STLs can be ascribed to a range of theories of evolutionary processes concerning chemical diversity in plants and metabolic diversification. As the goal of Paper II was a structural classification and scaffold diversity anal- yses, potential biological reasons behind the observed diversity and distribu- tion of STL types were out of scope. However, as described in the introduc- tion and discussed in more detail in Paper I, the results are likely to be caused by the continuous branching and extension of existing pathways, enhanced levels of enzyme catalytic promiscuity, or broadened substrate recognition of involved enzymes [8, 9, 89]. The concept of metabolic path- way branching from “hub metabolites” [9] could explain why most of the plant families that produce “common” STLs skeleton classes, are able to produce the same different classes as they all derive from the same precursor FPP. The branching from hubs also provides a good explanation for the low overlap of the families in both MFs and clustering analyses. The potential hub metabolite FPP and its subsequent intermediates are further recruited by several, most likely lineage specific enzymes for modification (e.g. CYPs), which will lead to structurally and functionally different STLs. As a result, lineage-specific MFs and STLs are produced, which was reflected in the analyses. The compiled STL dataset includes CAS Numbers, canonical SMILES strings, and plant origin of compound (family, genus, specific epithet), and includes the results of the study. Each entry is annotated with MF-IDs, MF- SMILES strings, skeleton class, cluster size- and ID. Furthermore, each compound has been annotated with its lactone type. The complete dataset has been deposited in the Dryad digital repository: https://doi.org/10.5061/dryad.1h282n8 and is available on request. This da- taset represents the latest detailed compilation of STLs and offers a compel- ling resource for chemosystematics, natural product research and drug lead discovery, focusing on STLs. It provides the basis for phylogenetic imple- mentations due to the detailed taxonomic annotation and since STLs are of pharmacological importance and display a source for new drugs, it is of high value for a guided search for plant derived drug leads. In Paper II, some of the results were further utilized to demonstrate a potential implementation combining the clustering analysis and phylogenetics, which is shortly sum- marized below. Further application possibilities of the dataset are exempli- fied.

Towards a guided plant selection in drug lead discovery (II) The clustering analysis including taxonomic annotation enabled pinpointing of the distribution of different STL types and revealed similarities and dis- tinct branching points of genera or even species. Thus, it represents a useful tool to bridge phylogenetics and phytochemistry since it reveals common

57 and unique compounds in different plant groups. Figure 20a represents an excerpt of three clusters containing different guaianolides found in genera belonging to the family Apiaceae. The example compounds illustrated for each cluster mainly differ by substitutions on various positions, as compared to the reference compound thapsigargin. The distribution of the genera among the clusters is captured in form of a histogram, which reveals tenden- cies and separations. Hence, the observed clusters highlight the distinct dis- tribution of genera-specific or species-specific chemical features and facili- tate to establish possible chemosystematic markers. This in turn can be used for character annotation on a phylogenetic in order to define the chemi- cal potential for members of a taxon and has been implemented as shown in Fig. 20b. With the exception of Laser trilobum and Laserpitium siler, only the species belonging to the Thapsia clade produce highly substituted guaia- nolides with a hydroxyl at position C-7, belonging to the compound group called thapsigargins (cluster #65) [105]. Apart from the outgroup (and Thap- sia scabra, see Paper II), the clade containing Laserpitium prutenicum and Daucus littoralis is defined by cluster #20. These species produce com- pounds with less or other substitution in comparison to compounds in the other two clusters, e.g. an α,β-unsaturated ketone at C-2. Daucus littoralis shows a clear separation, as it does not produce compounds found in other clusters. Thapsia villosa on the other hand can give indications about a pos- sible change in the biosynthetic pathway in the Thapsia clade, since it is the only Thapsia species that produces STLs found in cluster #13 and #65. In search for a promising plant source that might provide new potential drug leads, the compounds in the clusters can in addition be annotated with their biological activity. Since the clustering analysis reveals the specific distribution of structurally similar STLs in different genera and their respec- tive species, it can also be used for prediction purposes as well. Taking the above data as an example, it becomes obvious that mostly members of the Thapsia clade produce the complex thapsigargins (cluster #65). In additions, it appears that they almost exclusively produce this par- ticular scaffold. STLs of this type have mostly sensitizing properties [105] but more importantly, this cluster contains thapsigargin, the active moiety of the anticancer prodrug mipsagargin [106]. Structurally similar compounds are likely to have similar biological activities [35], and since it can be as- sumed that related taxa produce similar metabolites, they represent the next promising plant sources for drug lead discovery. In this case, the search for similar compounds can be narrowed down to members of the Thapsia clade, which have not previously been investigated phytochemically, and later ex- panded to members of the Laser and Siler clade. Simultaneously, species or genera can be excluded from the search for desired STLs. For example, members belonging to the clade Daucus II can be ruled out from further phytochemical investigations sine they most likely produce other scaffolds. This strategy allows to shortlist a group of plants as next promising plant

58 a) O O O O 10 O 3 1 8 6 OH O 11 O OH

Thapsigargin O Cluster 20 Cluster 13 Cluster 65 O O O O O O O O O O O O O O O O O O O O O O O

O O O O O O O O OH OH OH O O O O O O O O O OH OH O O O O O O O O O O O O O O OH O O O O O O O O O OH O O O O O O OH O O O O

b) Character annotation on phylogeny

Ferula spp. Outgroup Ammodaucus leucotrichus

92 Laserpitium latifolium Laserpitium krapfii Laserpitium s.str.

100 Thapsia garganica 100 Thapsia gymnesica

98 Thapsia transtagana

100 Thapsia scabra 99 100 Thapsia maxima Thapsia villosa Thapsia Laserpitium siler Siler

99 Laserpitium archangelica Laser trilobum Laser

100 Laserpitium prutenicum Silphiodaucus Daucus littoralis Daucus II Figure 20. Analysis of clusters and their distribution over a phylogeny. Representa- tive structures are depicted for each cluster, whereas absence or presence of substit- uents is highlighted with red and green respectively, using thapsigargin as reference compound. Below, the clusters with the same color-coding are used as chemosys- tematic markers to demonstrate the distribution of chemical structure types on a phylogeny, denoting presence of compound types (cluster 20 = red, cluster 13 = blue, cluster 65 = green). The maximum likelihood tree is based on the publication by Banasiak et al. (2016) and has been simplified considering only branches with a bootstrap support over 85%. The genus Ferula has been used as outgroup in con- cordance with previously published studies. candidates. With this simplified example, the feasibility of combining the clustering analysis with phylogeny was demonstrated. To define the chemi- cal potential of a taxon more precisely, track possible changes in biosynthet- ic pathways, or resolve uncertain systematic placement of species, further clusters and related genera can be included in such a combined analyses.

59 As demonstrated using the convex hull, the selection of a promising plant source can also take place irrespective of phylogenetic relationships. Instead, the clusters can be used to filter structurally similar STLs and genera that produce STLs with desired biological activity. Annotating the STLs with any activity of interest reveals clusters containing active STLs. Structurally simi- lar compounds for which no activity data exists could be selected for further tests. To find new plant sources that potentially produce similar compounds, members of the genera within these clusters that have not been investigated yet, can be targeted for phytochemical and pharmacological analyses. Only briefly discussed in Paper II are the different types of lactones of the STLs. Each compound was annotated with either α-methylene-γ-lactone, α-methyl-γ-lactone (α,β-unsaturated lactone), α-methyl-γ-lactone, or δ- lactone. STLs with different lactones, e.g. if additional side chains were pre- sent on the lactone ring or methyl groups were functionalized, have been annotated with “other lactone”. Figure 21 represents the distribution of the different lactones among the STLs in each plant family. In very rare occa- sions, some STLs possess two types of lactones but as they represent a mi- nute amount, they have been excluded from the figure.

Lactones per plant family

O O

O O

O O

O

Occurrence O

O O

R O O O O R R

Figure 21. Frequency of occurrence and distribution of different lactones. The fami- lies differ in the types of lactones they produce. Clear tendencies are visible in each family. Color code according to lactone type: violet = α-methylene-γ-lactone, green = α-methyl-γ-lactone (α,β-unsaturated lactone), yellow = α-methyl-γ-lactone, orange = δ-lactone (of any kind), teal = other types of lactones (three examples).

60 A typical structural feature of STLs is the α-methylene-γ-lactone, which is considered the major moiety responsible for their anti-inflammatory and cytotoxic properties [69]. Almost 60% of the STLs (CAS) possess this type of lactone (Fig. 21, STLs_total), whereas most are found in the Asteraceae. Searching for new STLs with this lactone feature, it becomes obvious that most of the families could be excluded from further investigations, as they produce different types of lactones. Regarding diversity, the family Laura- ceae is, besides the Asteraceae, the only family to produce a variety of dif- ferent lactones, but a tendency towards α-methyl-γ-lactone is seen. All other families are more restricted and mostly produce one type, or other lactones than the four defined ones. Besides the α-methylene-γ-lactone, other functional groups such as con- jugated cyclopentenones or epoxides are responsible for the alkylating center reactivity of STLs. Furthermore, various side chains, lipophilicity, molecular geometry and electronic features are involved in their biological activity [reviewed by 71]. The STL dataset allows for further chemoinformatic ap- plications and simple filtering of specific STL characters since the molecular data is stored as SMILES strings. Substructure search can be used to filter the dataset for STLs with desired moieties. As an example, the dataset was filtered for STLs having a conjugated cyclopentenone, using Open Babel [64]. In total, 408 STLs were filtered from six different plant families. Bi- functional STLs, i.e. STLs with two alkylating centers, can show higher cytotoxicity [107]. Among the filtered STLs, 181 contained both, a conju- gated cyclopentenone and an α-methylene-γ-lactone. With one exception in the Chloranthaceae, these compounds were exclusively found in Asteraceae, more specifically in 62 different genera (data not shown). Other criteria could be added to further reduce the number of genera to be targeted in search for new STLs with both moieties. For example, individual physico- chemical properties can be calculated to filter STL with properties relevant for activity. Another starting point to select STLs of interest can be the type of skele- ton and MF. In a QSAR-study by Schmidt et al. [108], STLs of the furanohe- liangolide-type, a subtype of germacranolides, were predicted to have high in vitro activity against Trypanosoma brucei rhodesiense (African trypano- somiasis, sleeping sickness), which was subsequently confirmed experimen- tally. Following this lead, 14 different MF were selected from the STL da- taset similar to the potent skeletons in the study, with different double bond positions and different lactones (Fig. S2). Using these MFs to filter the da- taset for all STLs that possess these MFs resulted in a list of 116 STLs, pro- duced by 20 genera belonging to four different tribes of the Asteraceae (quantitative summary in Table S5). Hence, filtering STLs based on their MFs and associated genera that produce desired scaffolds can be used for a guided plant selection for further studies as well.

61 In summary, this detailed dataset provides the basis for a multitude of chemoinformatic or chemosystematic analyses and will help phytochemists to further explore the diversity and distribution of chemical characters in STL-producing taxa. Chemotaxonomic classification and prediction of the occurrence of chemical characters in STL-producing taxa has successfully been performed on smaller datasets using chemoinformatic tools in combina- tion with self-organizing maps, neural networks and k-nearest neighbors [109-111]. Since STLs are of pharmacological importance and display a source for new drugs, the dataset serves as starting point for a guided search for plant derived drug leads focusing on STLs.

Key results and observations from Paper II and further related analyses • latest detailed compilation of STLs in the angiosperms with over 5200 different STL and over 8600 entries • clear differences and tendencies of skeleton class production per plant family • significant differences in scaffold and molecular diversity be- tween the plant families • different plant families possess specific sets of MF-scaffolds with a considerable variation in their frequency of occurrence. • low similarity between the MFs of different plant families, even though they produce STLs belonging to the same skeleton class • determination of sesquiterpene lactone groups with similar struc- tural features via clustering analysis • low molecular similarity between the families • the results and the dataset can be further utilized to: o define the chemical potential of a taxon, based on struc- tural features o detect chemical similarities between different plant groups or the separation/uniqueness of a taxon o detect possible changes in biosynthetic pathways o detect structurally unique compounds o develop chemosystematic markers o filter compounds with structural features of interest o predict the chemical and possible pharmacological poten- tial of (related) taxa à guided plant selection in drug lead discovery

62 Phytochemical and pharmacological investigation of Lindera benzoin The overall aim of Paper III was to generate novel information in the field of pharmacognosy. The selection of the plant material under study was guid- ed by the methods developed in Paper I and II in an attempt to isolate STLs with predicted scaffolds. Analyses of STL skeleton distribution and cluster- ing on the genus level using the data from Paper II gave indications for a distinct distribution of different STL skeleton classes in the Lauraceae and thus this family was selected for deeper analyses. In search for possible new drug leads from natural sources, traditional laboratory methods were used including extraction, isolation and structure elucidation of novel compounds, followed by biological assays. After comparison with literature, it was concluded that STLs are exclu- sively produced by members belonging to the “core Laureae”, a clade that consists of the genus Litsea and all possibly related genera [112]. Hence, in search for new STLs, all genera in the Lauraceae that do not belong to this particular clade can be excluded from further analyses. Records for five cor- responding genera were filtered from the STL dataset, namely Actinodaphne, Laurus, Lindera, Litsea, and Neolitsea. Of these, Actinodaphne and Litsea were excluded from further analyses since less than ten compounds have been reported from these genera. This was not considered significant to make statements about their general chemical diversity. Comparing the skeleton classes of the three remaining genera, Laurus, Lindera, and Neolitsea, it became obvious that they produce different types of skeletons, with tendencies towards certain skeleton classes. For example, Lindera is the only genus to produce lindenanolides, Neolitsea predominant- ly produce germacranolides, whereas Laurus rather produce eudesmanolides (Fig. 22a).

a) b) O O O O

OH O O O “common” germacranolides furanogermacranolide lindenanolide

O O O O O O HO

O O

guaianolide elemanolide eudesmanolide Figure 22. a) Skeleton distribution in three genera of the Lauraceae: Laurus, Lindera and Neolitsea. Color code according to skeleton. b) Example compounds of STLs with skeletons produced by these genera.

63 In the clustering analysis, Neolitsea and Lindera formed separate clusters but also overlapped with each other, whereas Laurus separated from the other two related genera. Closer inspection of the molecular structures revealed that even though the three genera produce STL skeletons belonging to the same class, their scaffolds differ noticeably. For example, the major reason why Laurus forms separate cluster is that members of this genus exclusively produce STLs with α-methylene-γ-lactone, whereas Neolitsea and Lindera with a few exceptions produce STLs with α,β-unsaturated lactones or other types of lactones. More importantly, it was determined that Neolitsea and Lindera do not produce the more “common” germacranolides but rare rear- ranged germacranolides with either two lactones or a furan moiety, so-called furanogermacranolides [113] (Fig. 22b). Laurus on the other hand only pro- duces ‘common’ germacranolides. Based on the collected data, the chemical potential of the three genera regarding STLs was defined, i.e. their skeleton class, lactone type, and physicochemical properties. The major hypothesis from this procedure were: v Members of Lindera predominantly produce lindenanolides, germa- cranolides and elemanolides à Lindera is the only genus to produce STLs with a lindenane-skeleton v Members of Neolitsea predominantly produce germacranolides and ele- manolides v Germacranolides of Neolitsea and Lindera are exclusively rearranged germacranolides v Members of Laurus predominantly produce germacranolides, guaiano- lides and especially eudesmanolides. v Germacranolides of Laurus have a “common” scaffold v STLs of Laurus contain exclusively α-methylene-γ-lactones v STLs of Neolitsea and Lindera mostly contain α,β-unsaturated or other lactone types

Insights from Paper I added: v STLs in the Lauraceae have similar physicochemical properties, since they group together in a small and well-defined volume v STLs in the Lauraceae are not likely to have any further (complex) sub- stituents

Under the assumption that the compiled data are accurate and reflect reality, the occurrence of specific STL types in related species should be predictable. Furanogermacranolides have been shown to have anti-inflammatory proper- ties [114, 115] and thus members of Lindera and Neolitsea represent inter- esting sources to potentially identify new anti-inflammatory furanogerma- cranolides. After literature research, several members of Neolitsea and Lin- dera were shortlisted based on phylogenetic relationships as potential candi-

64 date plants for phytochemical investigations. Unfortunately, it was impossi- ble to obtain sufficient amounts of dried plant material from desired species, since most species on the shortlist are distributed throughout tropical and subtropical [116, 113]. Hence, the final choice was made for a species at hand: Lindera benzoin (L.) Blume. This shrub is one of the two Lindera species native to eastern and a large specimen growing in the Uppsala University Botanical Garden provided enough plant material for further phytochemical investigation. Analyzing a Lindera species, it was hypothesized that this plant is likely to produce eudesmanolides, lin- denanolides, or rearranged germacranolides as their major STLs.

New bisabolane sesquiterpenes from the leaves of Lindera benzoin The dried leaves were extracted as described in Material and Methods and the fractionation of the purified extract resulted in the isolation of three new bisabolane sesquiterpenes, one pure compound and a diastereomeric mix- ture. The structures of these new compounds were elucidated via MS and extensive NMR measurements and identified as: 6-(2-hydroxy-6-methylhept-5-en-2-yl)-3-(hydroxymethyl)-4-oxocyclohex-2- en-1-yl acetate (1) and 3-(hydroxymethyl)-6-(5-(2-hydroxypropan-2-yl)-2- methyltetrahydrofuran-2-yl)-4-oxocyclohex-2-en-1-yl acetate (2 and 3) (Fig. 23)

HO O O O O HO 6’ O 6’ 4’ 2’ 1 5’ 2’ 1 6 2 6 2 5’ 3’ 4’ 3’ 5 OH 5 OH 3 3 4 4 1 O 2, 3 O

Figure 23. Bisabolane sesquiterpenes isolated from the leaves of Lindera benzoin: compound 1, and the diastereomers 2 and 3.

Structure elucidation and the comparison of 13C and 1H NMR data are de- scribed in detail in Paper III. Corresponding NMR spectra are available on request. In brief, compound 1 showed almost identical spectroscopic data to the known bisabolane sesquiterpene antheminone A [118], with the only difference being the presence of an acetoxy group at position C-1 in 1 (Fig. 23), instead of a hydroxyl group. Nearly equal shifts and coupling constants for the cyclohexene moieties were found for the diasterometric mixture 2

65 and 3, when comparing to 1H and 13C NMR spectra of 1. However, signals of the side chain at position C-6 were quite different and assigned to be an oxi- dative ring closure of the parent compound 1, resulting in formation of a tetrahydrofuran ring. This first phytochemical investigation did not yet result in the isolation of STLs with the predicted skeleton eudesmanolide, lindenanolide, or rear- ranged germacranolide, but does not infer that the sought-after compounds are not present in the plant. The isolated compounds represent merely a mi- nute part of the substances produced by this plant and thus further analyses are needed to determine whether the compounds are present within the plant. Analyzing the physicochemical properties of the new sesquiterpenes in ChemGPS-NP and comparing them to the STLs produced by Lindera spe- cies shows that the isolated compounds reside in close vicinity of the volume occupied by Lindera STLs (Fig. 24). With the exception that the new com- pounds are less aromatic (PC2, negative direction), size (PC1) and lipo- philicity (PC3) are in accordance with the previously known STLs from Lindera species.

PC2 PC1

PC3

2, 3

1

STLs from Lindera species sesquiterpenes isolated from L. benzoin

Figure 24. ChemGPS-NP global map of STLs from Lindera species and the isolated bisabolane sesquiterpenes from L. benzoin. First three dimensions: PC1 = size, PC2 = aromaticity, and PC3 = lipophilicity. The isolated compounds reside in the vicinity of the volume occupied by Lindera STLs.

66 This implies that the extraction method was well-chosen to isolate com- pounds with similar properties as the desired STLs. Consequently, the re- maining fractions that at this point could not be further investigated should be subjected to continuative analyses in the future. Attempting to identify additional compounds in the remaining fractions, further analyses were kind- ly performed by Prof. Hesham El-Seedi. The fractions were subjected to HRMS and the spectral data and isotopic patterns were compared to the list of compounds reviewed by [113]. Errors in exact masses were measured in ppm and accepted if < 5 ppm. Compounds were identified by matching the isotopic patterns with simulated isotopic patterns. The ratio between the patterns was expressed with mSigma values, whereas mSigma < 50 was defined as good match. This analysis revealed masses and isotopic patterns of tentative compounds in the remaining fractions that indeed possess the predicted STL skeletons, namely lindenane and eudesmane skeletons. Fur- thermore, STLs with elemane skeleton and rare rearranged lindenane skele- tons were identified, which have been isolated from other Lindera species. An abstract of identified compounds is provided in the Supporting infor- mation (Table S6, Fig. S3-11) including molecular formula, experimental and calculated masses, mSimga values, and mass spectra (further identified compounds are available on request). Hence, this analysis demonstrated, that L. benzoin most likely produces STLs with predicted skeletons. This needs to be confirmed by isolation and full structural determination via NMR.

Pharmacological investigation Lindera species are widely used in traditional medicine and are a great source of structurally diverse molecules with pharmacological potential [113]. A broad range of biological activities of extracts and pure compounds has been reported, e.g. cytotoxicity against various cancer cell lines [119], anti-inflammatory [120], or antimicrobial effects [121]. Only a small number of phytochemical investigations have been conducted on L. benzoin [122- 125], hence its chemical composition and pharmacological potential remain for the most part unknown. Investigating ethnobotanical literature disclosed that Native American tribes such as Cherokee, Iroquois, or Creek, used this plant for various medicinal purposes such as cold remedy, diaphoretic, or analgesic [126-128]. Accordingly, it was hypothesized that the plant extract of L. benzoin and the isolated compounds could have anti-inflammatory properties. The isolated compounds and the purified plant extract were tested in vitro for their anti-inflammatory activity in cellular assays. The inhibition of pro- inflammatory prostaglandin (PG) E2 production was investigated in A549 lung carcinoma cells as described in the Material and Methods section, and it was shown that both the compounds 1 and 2 + 3, and the purified plant ex- tract significantly reduced PGE2 formation. Compounds 2 + 3 were slightly

67 less potent (Fig. 25). Analysis of other prostaglandins (see Paper III) sug- gested that neither compounds nor extract are selective microsomal prosta- glandin E1 synthase (mPGES-1) inhibitors but are likely to have an effect upstream in the arachidonic acid cascade or other pathways involved in in- flammation. By performing a cell viability test, it was confirmed that the inhibitory activity was not the result of cytotoxic effects (Supplementary data of Paper III, Fig. 6).

Figure 25. Effect of the tested compounds and the purified extract on PGE2 produc- tion in A549 cells. A549 cells were stimulated with 5ng/ml IL-1β and treated with compound 1, 2+3, the purified extract, or CIII and NS-398 at indicated concentra- tions for 21 hours. Prostaglandins were analyzed from supernatants by LC-MS/MS. Results are presented as mean ± SD (n = 3) for one representative experiment. Sta- tistical significance was tested with ANOVA with post-hoc correction for multiple comparisons followed by individual student’s t test. The asterisks (* p<0.05, ** p<0.01) indicate significance compared to vehicle control. The experiment was repeated three times.

The results were congruent with other studies investigating anti- inflammatory effects of extracts from Lindera species [129, 130]. Further- more, bisabolane sesquiterpenoids have in recent years been identified as a promising class of anti-inflammatory agents in turmeric (Curcuma longa) [131, 132] and thus support the findings in Paper III. The observed anti- inflammatory activity of both, the purified extract of L. benzoin as well as bisabolane sesquiterpenes could rationalize the traditional use of this plant by Native American tribes for medicinal purposes, such as cold remedy or diaphoretic. In addition, the results reflect a cross-cultural usage of medici- nal plants of related taxa from widely separated parts of the globe [27], since Lindera species are (or have been) used in both, traditional Chinese medi- cine as well as in Native American medicine.

68 Key results and observations from Paper III and further related analyses: • three new bisabolane sesquiterpenes isolated from Lindera benzoin • further results suggest that STLs with predicted skeletons are present within L. benzoin, indicating that the strategies applied for the guid- ed selection were effective • further analyses are needed to confirm the presence of these STLs • anti-inflammatory properties of isolated bisabolane sesquiterpene and the purified plant extract • significant reduction of pro-inflammatory prostaglandin E2 for- mation in A549 cells, likely non-selective mPGES-1 inhibitors • rationalization of traditional medicinal plant use

69 Conclusion and future perspectives

The work presented in this thesis contributes to the exploration of chemical diversity in the angiosperms. A variety of chemoinformatic tools were ap- plied and further implemented to develop new methods to analyze and define the chemical potential of a plant and to demonstrate possible applications for a guided plant selection in drug lead discovery. In Paper I, it was demonstrated that physicochemical properties of se- lected specialized metabolites differ across different plant groups. Changes in physicochemical properties were assessed using ChemGPS-NP and visual inspection. Chemical diversity was quantified by calculating the volume occupied by the compounds in chemical space. By discussing the results against the background of possible underlying evolutionary mechanisms, it was concluded that evolutionary processes and the resultant metabolic diver- sification are reflected in chemical property space. In Paper II, the scaffold and molecular diversity of over 5,200 taxonomi- cally annotated STLs was investigated. The quantity and distribution of skel- eton classes were determined and it was shown that different plant families possess specific sets of molecular frameworks, with considerable variation in their frequency of occurrence. Even if many plant families produce STLs belonging to the same skeleton class, their molecular frameworks were nev- ertheless quite different. Clustering analysis enabled qualitative division of STLs into smaller groups with similar structural features, pointing out the differentiation between various plant groups. In general, only low molecular similarity was found between the plant families. The results obtained in Paper I and II made it possible to define the chemical potential of a taxon regarding both physicochemical properties and structural features. Similarities and differences were highlighted and were ascribed to a range of evolutionary processes giving rise to chemical diversi- ty in plants and metabolic diversification. Hence, a correspondence between chemical diversity and phylogenetic relationships was assumed. This corre- spondence, as well as the uncovered chemical similarity between distant related taxa, enabled the demonstration of potential applications for a guided plant selection for drug lead discovery by combining chemoinformatic tools and phylogenetics. In addition, results of the chemoinformatic analyses were used to define and predict the chemical potential of a taxon. Even though the predicted compounds were not isolated in the first trial in Paper III, further analyses

70 indicated that the desired compounds are very likely present in the plant. This suggests that the strategies applied for the guided plant selection were effective. In addition, the results of this study led to the identification of new anti-inflammatory sesquiterpenes and broadened the phytochemical and pharmacological knowledge of Lindera.

Future work would first concern the continuation of Paper III with the aim of identifying the predicted STLs in the remaining fractions isolated from L. benzoin. This includes further use of chromatography techniques in order to obtain a better separation of the fractions and enable the isolation of pure compounds. Techniques such as size-exclusion chromatography (SEC), normal phase HPLC (NP-HPLC), or supercritical fluid chromatography (SFC) could be applied [133]. Yet again, the isolated compounds need to be subjected to spectroscopic analyses and their structure needs to be confirmed via NMR. Furthermore, additional compounds isolated should also be tested for their anti-inflammatory properties as well as synergistic effects. The methods developed in Paper I and II represent a way to investigate chemical diversity of defined plant groups in an all-encompassing manner. Remarkably detailed insights can be given into the chemical potential of a plant. The applied metrics can be modified, e.g. other dimensions can be investigated in ChemGPS-NP, and different molecular fingerprints or clus- tering algorithms can be used. Hence, the methods are applicable for any kind of specialized metabolite of interest and can be further adjusted depend- ing the goal of the study. The comprehensive STL dataset analyzed in Paper II offers a compelling resource for chemosystematics, natural product research and drug lead dis- covery focused on STLs. Due to the detailed taxonomic annotation, it pro- vides the basis for phylogenetic implementations and a guided search for plant derived drug leads, as has been demonstrated. Moreover, the dataset can be investigated from different angles, based on a new study goal: STLs can be filtered by plant of origin, their carbocyclic skeleton class, small groups of similar STLs from the clustering analysis can be selected, as well as groups of STLs that contain the same molecular framework or the same lactone. Additionally, the provided SMILES strings allow for a multitude of chemoinformatic applications. Further work regarding a guided search for plant-derived drug leads fo- cused on STLs can include a detailed annotation of the known biological activities of these compounds. By combining this with building robust mo- lecular phylogenies, the biological activity of STLs can be mapped. The calculation of phylogenetic signals helps to reveal evolutionary patterns. Defining so-called “hot nodes” can help to narrow down the number of spe- cies that could be selected for phytochemical investigation [134]. In this approach, nodes with phylogenetic clustering can be identified in a phyloge- ny, i.e. that are significantly overrepresented by species in a certain category.

71 In this case it could be a specific biological activity of STLs. The D statistic [135] on the other hand measures the strength of a phylogenetic signal and determines if binary characters are randomly distributed over a phylogeny or not. Further information can be added as characters, such as e.g. different classifications of medicinal plant use in order to reveal different phylogenet- ic patterns [136]. These methods can be used to highlight lineages that have not yet, or only poorly, been investigated phytochemically and hence repre- sent the next promising plant source in plant derived drug discovery.

72 Popular scientific summary

Medicinal plants have been used since ancient times and have formed the backbone of modern medicine. Important drugs discovered from plants in- clude morphine, the pain-relieving and sleep-inducing agent isolated from poppy, or one of the most effective drugs to treat malaria, quinine, which was first isolated from the cinchona tree. Today, up to 50% of drugs on the market are either natural products (NPs) or derived thereof, showcasing the importance of NPs in drug discovery. While our ability to isolate and identify NPs has advanced tremendously in recent decades, approaches to select promising plants as starting material have fallen behind. This is one problem addressed by this thesis. The goal of this project was to understand the chemical diversity of plants better and to develop strategies for a guided search to find a promising plant source to discover new medicines. To begin, large datasets were created that contained information about the distribution of certain types of NPs in flowering plants. In order to analyze the chemical diversity across different plant groups, modern computational methods, called chemoinformatics, were combined with phylogenetics (the knowledge of the evolutionary history and relationships among the groups of plants). The chemical compounds produced by the plants were analyzed with regard to their chemical properties and chemical structures. The properties of NPs were investigated with ChemGPS-NP. This tool al- lows the comparison of compounds based on their chemical properties, i.e. how big, flexible or water soluble they are. The distribution, similarities and differences between the compounds can be easily visualized by charting their properties in so-called “chemical space”. Furthermore, the plant compounds were analyzed by looking at their structural features, e.g. what kind of chemical backbone or substructure they possess. So-called chemical “fingerprints” were used to define their back- bone/substructure and a clustering analysis was performed to group similar compounds together. These analyses revealed that each plant group contained NPs with distinct structural features and chemical properties. Each of the plant groups showed different behavior in chemical space, whereby some produced compounds with restricted chemical properties, whereas others showed broader chemical diversity. It was also found that some plant groups produced unique types of

73 chemical backbones and particular structural features, whereas others pro- duced similar backbones. If it is assumed that there is a correlation between phylogenetics and chemi- cal diversity, then phylogenetics can be used to predict the chemical poten- tial of a plant. By combining the results of the chemoinformatic analyses with phylogenetic data, strategies for a guided plant selection were devel- oped. In search for possible new drug leads from natural sources, the strategies were implemented to predict the types of compounds produced by a specific plant (spicebush). Compounds structurally similar to those predicted were found within the plant, indicating that the strategies applied for the guided search were effective. Hence, these strategies represent useful tools to select promising plant sources for the discovery of new plant-derived drugs.

74 Acknowledgments

This work would not have been possible without the help and support of the people I met on the way and I would like to express my sincere gratitude.

Anders, my main supervisor. Thank you for giving me the chance to be part of “MedPlant”, it was a great experience. Thank you for your help with Pa- per I - it was a hard piece of work. You were always generous, especially when it comes to beer. Thierry, my second supervisor. Thank you for running calculations of any kind whenever I asked for it, answering my questions in detail and replying fast to emails. Thank you also for all the effort you spent working on Paper II. You were there when I needed you. Rosa: I consider myself lucky that I had you as a teacher and as a friend. I don’t think you are aware but besides the amount of work, you contributed so much by patiently answering my over the top panic-messages. Sabine: You and your working group were the best thing that happened to me during this time! I will never forget your outstanding supervision. If there is someone I look up to, it is you. A special thanks goes also to Thomas for helping out with anything at any time in the lab! I thank you and all the co-authors for their contribution to Paper III. Julia: “5 minutes before 12” you simply put your own work aside and com- pletely focused on me. I have never seen anything like that – you were my knight in shining armor. Thank you for your aid to finalize the work on Pa- per III. Hesham: Thank you so much for spending extra time and effort to run addi- tional analyses for me. It made me truly happy. Ulf: Thank you for getting me into contact with the people from KI, and your concern and help in the last moments. You will bring this group for- ward.

I would like to thank the organizers and all members of the Marie Curie ITN “MedPlant: Phylogenetic exploration of medicinal plant diversity”. Eve- ry summer school was a blast and broadened my horizon.

A special thanks goes to the PhD-students and colleagues that simply made my days. Elisabet, without you I would not have made it. You are pretty much the only person that knows exactly how I felt during this time. One

75 day you told me that I cry “graciously” and the next day you called me “hackerb*tch” – both made me genuinely happy. Karin: you can brighten up the darkest days. I am so happy that we found each other and I will make sure that we will never run out of Ahoj-Brause! Luuuke! You have no clue how grateful I am. Thank you for all your help, explaining NMR, running extra analyses with me, and of course proofreading the puke. Camilla, thank you for all the fun we share outside of work (I’m thinking about a very spe- cific memory here). Erik, I was never a picky eater but you definitely taught me to eat everything without even questioning what it might be (“So I just ate cock balls?!”). Taj, we need more people like you. Thank you Scott for crushing English with me and proofreading Paper II. Lindon, thanks for checking up on me and listening to my nagging.

I would also like to thank colleagues (past and present) in the research group of Pharmacognosy. Thank you Blazej, Christina, Sunithi, Paco, Adam, and Quentin, for multicultural talks during fika and being encouraging. Thank you Momo for all the good times in !

And: thank you very much Birgitta, Curt and Anders K.

I also wish to acknowledge my family. Mama und Papa: Ohne eure Unterstützung wäre ich niemals so weit ge- kommen. Ich bin euch zu tiefstem Dank verpflichtet. Papa: Am Ende bleibt wahrscheinlich doch nur a2 + b2 = c2 hängen. Aber immerhin hab ich jetzt einen Doktortitel. Nächstes Projekt ist M + A. Mama: Das Gefühl bestraft zu werden und noch nicht mal zu wissen wofür. So haben sich die letzten Jahre fast ausschließlich angefühlt. Aber wenn ich eins von dir gelernt habe, dann ist es die Zähne zusammenbeißen und durch- halten! Sebastian: Dein Sorgenfresser musste ordentlich was einstecken. Ich wün- sche dir von Herzen, dass es für dich ganz anders laufen wird. Danke für all de Schmorre den wir uns hin und her schicken.

Nini. Nie. Wieder.

My husband Martin. Love of my life. Thank you for being my tank. With you by my side I can do anything!

76 References

1. Henz Ryen A., Backlund A. (2019) Charting Angiosperm Chemistry: Evolutionary Perspective on Specialized Metabolites Reflected in Chemical Property Space. J. Nat. Prod. 82: 798-812. 2. MedPlant - Phylogenetic exploration of medicinal plant diversity. Marie Curie Initial Training Network (ITN). http://medplant.eu/. (accessed: 02.01.2020) 3. Abbott H. C. (1887) The Chemical Basis of Plant Forms. J. Franklin Inst. 124: 161-85. 4. Cronquist A. The Evolution and Classification of Flowering Plants. Boston, MA: Houghton Mifflin; 1968. 5. Jones C. G., Firn R. D. (1991) On the Evolution of Plant Secondary Chemical Diversity. Philos. Trans. R. Soc., B 333: 273-80. 6. Pichersky E., Gang D. R. (2000) Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective. Trends Plant Sci. 5: 439-45. 7. Nei M., Rooney A. P. (2005) Concerted and birth-and-death evolution of multigene families. Annu. Rev. Genet. 39: 121-52. 8. Firn R. D., Jones C. G. (2003) Natural products - a simple model to explain chemical diversity. Nat. Prod. Rep. 20: 382-91. 9. Weng J. K., Philippe R. N., Noel J. P. (2012) The Rise of Chemodiversity in Plants. Science 336: 1667-70. 10. Osbourn A. (2010) Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation. Trends Genet. 26: 449-57. 11. Firn R. D., Jones C. G. (2009) A Darwinian view of metabolism: molecular properties determine fitness. J. Exp. Bot. 60: 719-26. 12. Weng J. K. (2014) The evolutionary paths towards complexity: a metabolic perspective. New Phytol. 201: 1141-9. 13. Pichersky E., Lewinsohn E. (2011) Convergent evolution in plant specialized metabolism. Annu. Rev. Plant Biol. 62: 549-66. 14. Agrawal A. A., Weber M. G. (2015) On the study of plant defence and herbivory using comparative approaches: how important are secondary plant compounds. Ecol. Lett. 18: 985-91. 15. Speed M. P., Fenton A., Jones M. G., Ruxton G. D., Brockhurst M. A. (2015) Coevolution can explain defensive secondary metabolite diversity in plants. New Phytol. 208: 1251-63. 16. Hegnauer R. Chemical evidence for the classification of some plant taxa. In: Harborne J. B., Turner B. L., editors. Perspectives in phytochemistry. London and New York: Academic Press; 1969. pp. 121-38. 17. Harborne J. B., Turner B. L. Plant Chemosystematics. London: Academic Press; 1984. 18. Smith P. M. The Chemotaxonomy of Plants. London: Edward Arnold; 1976. 19. Reynolds T. (2007) The evolution of chemosystematics. Phytochemistry 68: 2887-95.

77 20. Firn R. D., Jones C. G. (2000) The evolution of secondary metabolism - a unifying model. Mol. Microbiol. 37: 989-94. 21. Theis N., Lerdau M. (2003) The evolution of function in plant secondary metabolites. Int. J. Plant Sci. 164: S93-S102. 22. Baum D. A., Smith S. D. Tree Thinking: An Introduction to Phylogenetic Biology. Colorado: Roberts and Company Publishers; 2012 23. Schmitt I., Barker F. K. (2009) Phylogenetic methods in natural product research. Nat. Prod. Rep. 26: 1585-602. 24. Anzai K., Mayuzumi S., Nakashima T., Sato H., Inaba S., Park J. Y., et al. (2008) Comparison of groupings among members of the genus Aspergillus based on phylogeny and production of bioactive compounds. Biosci. Biotechnol. Biochem. 72: 2199-202. 25. Agrawal A. A., Fishbein M., Halitschke R., Hastings A. P., Rabosky D. L., Rasmann S. (2009) Evidence for adaptive radiation from a phylogenetic study of plant defenses. Proc. Natl. Acad. Sci. U. S. A. 106: 18067-72. 26. Ronsted N., Symonds M. R. E., Birkholm T., Christensen S. B., Meerow A. W., Molander M., et al. (2012) Can phylogeny predict chemical diversity and potential medicinal activity of plants? A case study of amaryllidaceae. BMC Evol. Biol. 12: 182. http://doi.org/10.1186/471-2148-12-182. 27. Saslis-Lagoudakis C. H., Savolainen V., Williamson E. M., Forest F., Wagstaff S. J., Baral S. R., et al. (2012) Phylogenies reveal predictive power of traditional medicine in bioprospecting. Proc. Natl. Acad. Sci. U. S. A. 109: 15835-40. 28. Wink M. (2013) Evolution of secondary metabolites in legumes (). S. Afr. J. Bot. 89: 164-75. 29. Brown F. (2005) Chemoinformatics - a ten year update - Editorial opinion. Curr. Opin. Drug Disc. 8: 298-302. 30. Chen H. M., Kogej T., Engkvist O. (2018) in Drug Discovery, an Industrial Perspective. Mol. Inform. 37: 1800041. http://doi.org/10.1002/minf.201800041 31. SMILES. https://en.wikipedia.org/wiki/Simplified_molecular-input_line- entry_system. (accessed: 02.01.2020) 32. Buonfiglio R., Engkvist O., Varkonyi P., Henz A., Vikeved E., Backlund A., et al. (2015) Investigating Pharmacological Similarity by Charting Chemical Space. J. Chem. Inf. Model. 55: 2375-90. 33. Reymond J.-L., van Deursen R., Blum L. C., Ruddigkeit L. (2010) Chemical space as a source for new drugs. Med. Chem. Commun. 1: 30-38. 34. Bohacek R. S., McMartin C., Guida W. C. (1996) The art and practice of structure-based : A molecular modeling perspective. Med. Res. Rev. 16: 3-50. 35. Johnson M. A., Maggiora G. M. Concepts and Applications of Molecular Similarity. New York: John Wiley and Sons; 1990. 36. Kitchen D. B., Decornez H., Furr J. R., Bajorath J. (2004) Docking and scoring in virtual screening for drug discovery: Methods and applications. Nat. Rev. Drug. Discov. 3: 935-49. 37. Willett P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11: 1046-53. 38. Oprea T. I., Gottfries J. (2001) Chemography: the art of natigating in chrmical space. J. Comb. Chem. 3: 157-66 39. Larsson J., Gottfries J., Muresan S., Backlund A. (2007) ChemGPS-NP: Tuned for navigation in biologically relevant chemical space. J. Nat. Prod. 70: 789- 94.

78 40. Rosen J., Lovgren A., Kogej T., Muresan S., Gottfries J., Backlund A. (2009) ChemGPS-NP(Web): chemical space navigation online. J. Comput. Aided Mol. Des. 23: 253-9. 41. Bemis G. W., Murcko M. A. (1996) The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39: 2887-93. 42. Gonzalez-Medina M., Prieto-Martinez F. D., Owen J. R., Medina-Franco J. L. (2016) Consensus Diversity Plots: a global diversity analysis of chemical libraries. J. Cheminformatics 8: 63. https://doi.org/10.1186/s13321-016-0176-9. 43. Fingerprints - Screening and Similarity. https://www.daylight.com/dayhtml/doc/theory/theory.finger.html. (accessed: 02.01.2020) 44. Rogers D., Hahn M. (2010) Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 50: 742-54. 45. BIOVIA - Scientific Enterprise Software for Chemical Research, Material Science R&D. . https://www.3dsbiovia.com/. (accessed: 02.01.2020) 46. Bajusz D., Racz A., Heberger K. (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J. of Cheminformatics 7: 20. https://doi.org/10.1186/s13321-015-0069-3 47. Amancio D. R., Comin C. H., Casanova D., Travieso G., Bruno O. M., Rodrigues F. A., et al. (2014) A Systematic Comparison of Supervised Classifiers. PLoS One 9(4): e94137. https://doi.org/10.1371/journal.pone.0094137 48. MacCuish J. D., MacCuish N. E. (2014) Chemoinformatics applications of cluster analysis. Wiley Interdiscp. Rev.-Comput. Mol. Sci. 4: 34-48. 49. American Society of Pharmacognosy. http://www.phcog.org/definition.html. (accessed: 02.01.2020) 50. Newman D. J., Cragg G. M. (2016) Natural Products as Sources of New Drugs from 1981 to 2014. J. Nat. Prod. 79: 629-61. 51. Samuelsson G., Bohlin L. Drugs of Natural Origin: A Treatise of Pharmacognosy, Sixth Revised Edition. Stockholm: Swedish Pharmaceutical Press; 2010. 52. Li J. W. H., Vederas J. C. (2009) Drug Discovery and Natural Products: End of an Era or an Endless Frontier? Science 325: 161-5. 53. Cragg G. M., Newman D. J. (2013) Natural products: A continuing source of novel drug leads. Biochim. Biophys. Acta, Gen. Subj. 1830: 3670-95. 54. Atanasov A. G., Waltenberger B., Pferschy-Wenzig E. M., Linder T., Wawrosch C., Uhrin P., et al. (2015) Discovery and resupply of pharmacologically active plant-derived natural products: A review. Biotechnol. Adv. 33: 1582-614. 55. Koehn F. E., Carter G. T. (2005) The evolving role of natural products in drug discovery. Nat. Rev. Drug. Discov. 4: 206-20. 56. Balunas M. J., Kinghorn A. D. (2005) Drug discovery from medicinal plants. Life Sci. 78: 431-41. 57. Verpoorte R. (2000) Pharmacognosy in the new millennium: Leadfinding and biotechnology. J. Pharm. Pharmacol. 52: 253-62. 58. Fabricant D. S., Farnsworth N. R. (2001) The value of plants used in traditional medicine for drug discovery. Environ. Health Perspect. 109: 69-75. 59. Dictionary of Natural Products Boca Raton, FL: CRC Press/Taylor & Francis Group. http://dnp.chemnetbase.com. (accessed: 02.01.2020) 60. Buckingham J. Dictionary of Flavonoids with CD-ROM. Boca Raton: CRC Press.; 2015.

79 61. SciFinder Columbus, OH: Chemical Abstracts Service. https://scifinder.cas.org. (accessed: 02.01.2020) 62. The International Plant Names Index 2012. https://www.ipni.org. (accessed: 12.04.2019) 63. Byng J. W., Chase M. W., Christenhusz M. J. M., Fay M. F., Judd W. S., Mabberley D. J., et al. (2016) An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 181: 1-20. 64. O'Boyle N. M., Banck M., James C. A., Morley C., Vandermeersch T., Hutchison G. R. (2011) Open Babel: An open chemical toolbox. J. Cheminformatics 3: 33. 65. Adler D., Murdoch D. rgl: 3D Visualization Using OpenGL. R package version 0.99.16, 2017. https://CRAN.R-project.org/package=rgl. 66. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2019. https://www.R- project.org/. 67. Barber C. B., Dobkin D. P., Huhdanpaa H. (1996) The Quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22: 469-83. 68. Habel K., Grasman R., Gramacy R. B., Stahel A., Sterratt D. C. geometry: Mesh Generation and Surface Tesselation. 2015. R package version 0.3-6. 2015. https://CRAN.R-project.org/package=geometry. 69. Merfort I. (2011) Perspectives on Sesquiterpene Lactones in Inflammation and Cancer. Curr. Drug Targets 12: 1560-73. 70. Amorim M. H. R., da Costa R. M. G., Lopes C., Bastos M. M. S. M. (2013) Sesquiterpene lactones: Adverse health effects and toxicity mechanisms. Crit. Rev. Toxicol. 43: 559-79. 71. Ghantous A., Gali-Muhtasib H., Vuorela H., Saliba N. A., Darwiche N. (2010) What made sesquiterpene lactones reach cancer clinical trials? Drug Discov. Today 15: 668-78. 72. OpenEye scientific Software, oetoolkit; version 2016.6; 2012. OpenEye Scientific Software Inc.; Santa Fe, NM. 73. Gonzalez-Medina M., Owen J. R., El-Elimat T., Pearce C. J., Oberlies N. H., Figueroa M., et al. (2017) Scaffold Diversity of Fungal Metabolites. Front. Pharmacol. 8. https://doi.org/10.3389/fphar.2017.00180. 74. Krier M., Bret G., Rognan D. (2006) Assessing the scaffold diversity of screening libraries. J. Chem. Inf. Model. 46: 512-24. 75. Lipkus A. H., Yuan Q., Lucas K. A., Funk S. A., Bartelt W. F., Schenck R. J., et al. (2008) Structural diversity of organic chemistry. A scaffold analysis of the CAS Registry. J. Org. Chem. 73: 4443-51. 76. Medina-Franco J. L., Martinez-Mayorga K., Bender A., Scior T. (2009) Scaffold Diversity Analysis of Compound Daft Sets Using an Entropy-Based Measure. QSAR Comb. Sci. 28: 1551-60. 77. Ertl P., Rohde B. (2012) The Molecule Cloud - compact visualization of large collections of molecules. J. Cheminformatics 4: 12. https://doi.org/10.1186/1758-2946-4-12 78. Jaccard P. (1901) Etude de la distribution florale dans une portion des Alpes et du Jura. Bull. Soc. Vaud. Sci. Nat. 37: 547-79. 79. Clark A. M., Dole K., Coulon-Spektor A., McNutt A., Grass G., Freundlich J. S., et al. (2015) Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets. J. Chem. Inf. Model. 55: 1231-45.

80 80. Rabal O., Amr F. I., Oyarzabal J. (2015) Novel Scaffold Fingerprint (SFP):Applications in Scaffold Hopping and Scaffold-Based Selection of Diverse Compounds. J. Chem. Inf. Model. 55: 1-18. 81. Downs G. M., Barnard J. M. (2002) Clustering methods and their uses in computational chemistry. In: Lipkowitz, K. B., Boyd, D. B., editors. Reviews in Computational Chemistry, vol. 18. Hoboken: Wiley-VCH; 2002. pp. 1-40. 82. Dassault Systèmes BIOVIA, Pipeline Pilot; version 16.2; 2015. Dassault Systèmes, San Diego, CA. 83. Idborg H., Olsson P., Leclerc P., Raouf J., Jakobsson P. J., Korotkova M. (2013) Effects of mPGES-1 deletion on eicosanoid and fatty acid profiles in mice. Prostag. Oth. Lipid M. 107: 18-25. 84. Firn R. D., Jones C. G. (2009) A Darwinian view of metabolism: molecular properties determine fitness. J. Exp. Bot. 60: 719-26. 85. Magallon S., Castillo A. (2009) Angiosperm diversification through time. Am. J. Bot. 96: 349-65. 86. Bohlmann J., Meyer-Gauen G., Croteau R. (1998) Plant terpenoid synthases: Molecular biology and phylogenetic analysis. Proc. Natl. Acad. Sci. U. S. A. 95: 4126-33. 87. Wink M. (2003) Evolution of secondary metabolites from an ecological and molecular phylogenetic perspective. Phytochemistry 64: 3-19. 88. Chen F., Tholl D., Bohlmann J., Pichersky E. (2011) The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J. 66: 212-29. 89. Hamberger B., Bak S. (2013) Plant P450s as versatile drivers for evolution of species-specific chemical diversity. Philos. Trans. R. Soc., B 368: 20120426. http://doi.org/10.1098/rstb.2012.0426. 20120426. 90. Nelson D., Werck-Reichhart D. (2011) A P450-centric view of plant evolution. Plant J. 66: 194-211. 91. Boutanaev A. M., Moses T., Zi J. C., Nelson D. R., Mugford S. T., Peters R. J., et al. (2015) Investigation of terpene diversification across multiple sequenced plant genomes. Proc. Natl. Acad. Sci. U. S. A. 112: E81-E8. 92. Jirschitzka J., Schmidt G. W., Reichelt M., Schneider B., Gershenzon J., D'Auria J. C. (2012) Plant tropane alkaloid biosynthesis evolved independently in the Solanaceae and Erythroxylaceae. Proc. Natl. Acad. Sci. U. S. A. 109: 10304-9. 93. Kim N., Estrada O., Chavez B., Stewart C., D'Auria J. C. (2016) Tropane and Granatane Alkaloid Biosynthesis: A Systematic Analysis. Molecules 21: 1510. https://doi.org/10.3390/molecules21111510 94. Rausher M. The Evolution of Flavonoids and Their Genes. In: Grotewold E., editor. The Science of Flavonoids. New York: Springer Science Business Media; 2006. pp. 175-211. 95. Campanella J. J., Smalley J. V., Dempsey M. E. (2014) A phylogenetic examination of the primary anthocyanin production pathway of the Plantae. Bot. Stud. 55: 10. http://dx.doi.org/.1186/999-3110-55-10. 96. Markham K. R. Distribution of flavonoids in the lower plants and its evolutionary significance. In: Harborne J.B., editor. The Flavonoids: Advances in Research since 1980. Boston, MA: Springer U.S.; 1988. pp. 427-68. 97. Yonekura-Sakakibara K., Saito K. (2014) Function, Structure, and Evolution of Flavonoid Glycosyltransferases in Plants. Recent Adv. Polyphenol Res. 4: 61-82. 98. Grotewold E. The Science of Flavonoids. New York: Springer Science Business Media; 2006.

81 99. Brockington S. F., Walker R. H., Glover B. J., Soltis P. S., Soltis D. E. (2011) Complex pigment evolution in the Caryophyllales. New Phytol. 190: 854-64. 100. Laurella L. C., Cerny N., Bivona A. E., Alberti A. S., Giberti G., Malchiodi E. L., et al. (2017) Assessment of sesquiterpene lactones isolated from Mikania plants species for their potential efficacy against Trypanosoma cruzi and Leishmania sp. PLoS Negl. Trop. Dis. 11: e0005929. http://doi.org/10.1371/journal.pntd. 101. Zamorano G., Catalan C. A. N., Diaz J. G., Herz W. (1995) Sesquiterpene Dilactones from Mikania-Ypacarayensis. Phytochemistry 38: 1257-60. 102. World Health Organization: World Health Organization. https://www.who.int/. (accessed: 2019.12.13) 103. Andersen T. B., Martinez-Swatson K. A., Rasmussen S. A., Boughton B. A., Jorgensen K., Andersen-Ranberg J., et al. (2017) Localization and in-Vivo Characterization of Thapsia garganica CYP76AE2 Indicates a Role in Thapsigargin Biosynthesis. Plant. Physiol. 174: 56-72. 104. Drew D. P., Rasmussen S. K., Avato P., Simonsen H. T. (2012) A Comparison of Headspace Solid-phase Microextraction and Classic Hydrodistillation for the Identification of Volatile Constituents from Thapsia spp. Provides Insights into Guaianolide Biosynthesis in Apiaceae. Phytochem. Anal. 23: 44-51. 105. Drew D. P., Krichau N., Reichwald K., Simonsen H. T. (2009) Guaianolides in apiaceae: perspectives on and biosynthesis. Phytochem. Rev. 8: 581-99. 106. Mahalingam D., Wilding G., Denmeade S., Sarantopoulas J., Cosgrove D., Cetnar J., et al. (2016) Mipsagargin, a novel thapsigargin-based PSMA- activated prodrug: results of a first-in-man phase I clinical trial in patients with refractory, advanced or metastatic solid tumours. Brit. J. Cancer 114: 986-94. 107. Beekman A. C., Woerdenbag H. J., vanUden W., Pras N., Konings A. W. T., Wikstrom H. V., et al. (1997) Structure - Cytotoxicity relationships of some helenanolide-type sesquiterpene lactones. J. Nat. Prod. 60: 252-7. 108. Schmidt T. J., Da Costa F. B., Lopes N. P., Kaiser M., Brun R. (2014) In Silico Prediction and Experimental Evaluation of Furanoheliangolide Sesquiterpene Lactones as Potent Agents against Trypanosoma brucei rhodesiense. Antimicrob. Agents Chemother. 58: 325-32. 109. Da Costa F. B., Terfloth L., Gasteiger J. (2005) Sesquiterpene lactone-based classification of three Asteraceae tribes: a study based on self-organizing neural networks applied to chemo systematics. Phytochemistry 66: 345-53. 110. Hristozov D., Da Costa F. B., Gasteiger J. (2007) Sesquiterpene lactones-based classification of the family Asteraceae using neural networks and k-nearest neighbors. J. Chem. Inf. Model. 47: 9-19. 111. Scotti M. T., Emerenciano V., Ferreira M. J. P., Scotti L., Stefani R., da Silva M. S., et al. (2012) Self-Organizing Maps of Molecular Descriptors for Sesquiterpene Lactones and Their Application to the Chemotaxonomy of the Asteraceae Family. Molecules 17: 4684-702. 112. Chanderbali A. S., van der Werff H., Renner S. S. (2001) Phylogeny and historical biogeography of Lauraceae: Evidence from the chloroplast and nuclear genomes. Ann. Mo. Bot. Gard. 88: 104-34. 113. Cao Y., Xuan B. F., Peng B., Li C., Chai X. Y., Tu P. F. (2016) The genus Lindera: a source of structurally diverse molecules having pharmacological significance. Phytochem. Rev. 15: 869-906. 114. Liou B. J., Chang H. S., Wang G. J., Chiang M. Y., Liao C. H., Lin C. H., et al. (2011) Secondary metabolites from the leaves of Neolitsea hiiranensis and the anti-inflammatory activity of some of them. Phytochemistry 72: 415-22.

82 115. Chen K. S., Hsieh P. W., Hwang T. L., Chang F. R., Wu Y. C. (2005) Anti- inflammatory furanogermacrane sesquiterpenes from Neolitsea parvigemma. Nat. Prod. Res. 19: 283-6. 116. Wang W. Y., Ma P., Xu L. J., Peng Y., Xiao P. G. (2014) Chemical Constituents and Biological Activities of Plants from the Genus Neolitsea. Chem. Biodiversity 11: 55-72. 117. Relund, L. gMOIP: 2D and 3D plots of linear mathematical programming models; R package version 1.3.0., 2019, https://CRAN.R- project.org/package=gMOIP. 118. Collu F., Bonsignore L., Casu M., Floris C., Gertsch J., Cottiglia F. (2008) New cytotoxic saturated and unsaturated cyclohexanones from Anthemis maritimaii. Biorg. Med. Chem. Lett. 18: 1559-62. 119. Kwon H. C., Baek N. I., Choi S. U., Lee K. R. (2000) New cytotoxic butanolides from Lindera obtusiloba BLUME. Chem. Phrarm. Bull. 48: 614-6. 120. Wang C., Dai Y., Yang J., Chou G., Wang C., Wang Z. (2007) Treatment with total alkaloids from Radix Linderae reduces inflammation and joint destruction in type II collagen-induced model for rheumatoid arthritis. J. Ethnopharmacol. 111: 322-8. 121. Joshi S. C., Verma A. R., Mathela C. S. (2010) Antioxidant and antibacterial activities of the leaf essential oils of Himalayan Lauraceae species. Food Chem. Toxicol. 48: 37-40. 122. Anderson J. E., Ma W., Smith D. L., Chang C. J., Mclaughlin J. L. (1992) Biologically-Active Gamma-Lactones and Methylketoalkenes from Lindera- Benzoin. J. Nat. Prod. 55: 71-83. 123. Babcock P. A., Segelman A. B. (1974) Alkaloids of Lindera benzoin (L.) Blume (Lauraceae) I: Isolation and Identification of Laurotetanine. J. Pharm. Sci. 63: 1495-6. 124. Ingersoll C. M., Niesenbaum R. A., Weigle C. E., Lehman J. H. (2010) Total phenolics and individual phenolic acids vary with light environment in Lindera benzoin. Botany 88: 1007-10. 125. Tucker A. O., Maciarello M. J., Burbage P. W., Sturtz G. (1994) Spicebush [Lindera-Benzoin (L) Blume Var Benzoin, Lauraceae] - a Tea, Spice and Medicine. Econ. Bot. 48: 333-6. 126. Hamel P. B., Chiltoskey M. U. Cherokee Plants and Their Uses: A 400 Year History. Massachusetts: Herald Publishing Company; 1975. 127. Moerman D. E. Native American Ethnobotany. Portland, OR: Timber Press; 1998. 128. Swanton J. R. Religious Beliefs and Medical Practices of the Creek Indians. Wahington: U.S. Government Printing Office; 1928. 129. Suh W. M., Park S. B., Lee S., Kim H. H., Suk K., Son J. H., et al. (2011) Suppression of mast-cell-mediated allergic inflammation by Lindera obtusiloba. Exp. Biol. Med. 236: 240-6. 130. Sumioka H., Harinantenaina L., Matsunami K., Otsuka H., Kawahata M., Yamaguchi K. (2011) Linderolides A-F, eudesmane-type sesquiterpene lactones and linderoline, a germacrane-type sesquiterpene from the roots of Lindera strychnifolia and their inhibitory activity on NO production in RAW 264.7 cells in vitro. Phytochemistry 72: 2165-71. 131. Cheng X. L., Li H. X., Wu P., Xu L. X., Xue J. H., Wei X. Y. (2019) Two new bisabolane-type sesquiterpenoids from the cooking liquid of Curcuma longa rhizomes. Phytochem. Lett. 29: 169-72.

83 132. Del Prete D., Millan E., Pollastro F., Chianese G., Luciano P., Collado J. A., et al. (2016) Turmeric Sesquiterpenoids: Expeditious Resolution, Comparative Bioactivity, and a New Bicyclic Turmeronoid. J. Nat. Prod. 79: 267-73. 133. Chromatography. https://en.wikipedia.org/wiki/Chromatography. (accessed: 03.01.2020) 134. Pellicer J., Saslis-Lagoudakis C. H., Carrio E., Ernst M., Garnatje T., Grace O. M., et al. (2018) A phylogenetic road map to antimalarial Artemisia species. J. Ethnopharmacol. 225: 1-9. 135. Fritz S. A., Purvis A. (2010) Selectivity in Mammalian Extinction Risk and Threat Types: a New Measure of Phylogenetic Signal Strength in Binary Traits. Cons. Biol. 24: 1042-51. 136. Ernst M., Saslis-Lagoudakis C. H., Grace O. . M., Nilsson N., Simonsen H. T., Horn J. W., et al. (2016) Evolutionary prediction of medicinal properties in the genus Euphorbia L. Sci. Rep. 6: 30531. http://doi.org/10.10380.srep30531.

84 Supporting information Supporting information

Volume calculation and filtering of compounds with activity against Volume calculation and filtering of compounds with activity against protozoan parasites within the defined volume protozoan parasites within the defined volume

Table S1. STLs with confirmed activity against T. cruzi, charted in the ChemGPS- Table S1. STLs with confirmed activity against T. cruzi, charted in the ChemGPS- NP global map. Based on these compounds the convex hull was calculated. NP global map. Based on these compounds the convex hull was calculated. CAS PC1 PC2 PC3 Family Genus Spec_epit. CAS PC1 PC2 PC3 Family Genus Spec_epit. 3533-47-9 -2.405 -0.920 -0.490 Asteraceae Ambrosia tenuifolia 3533-47-9 -2.405 -0.920 -0.490 Asteraceae Ambrosia tenuifolia 6617-45-4 -2.607 -0.926 -0.396 Asteraceae Ambrosia tenuifolia 6617-45-4 -2.607 -0.926 -0.396 Asteraceae Ambrosia tenuifolia 23971-84-8 -2.776 -0.959 0.553 Asteraceae Anthemis auriculata 23971-84-8 -2.776 -0.959 0.553 Asteraceae Anthemis auriculata 882677-14-7 -2.393 -1.091 -0.088 Asteraceae Anthemis auriculata 882677-14-7 -2.393 -1.091 -0.088 Asteraceae Anthemis auriculata 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 36150-07-9 -2.920 -0.435 0.207 Asteraceae Artemisia douglasiana 36150-07-9 -2.920 -0.435 0.207 Asteraceae Artemisia douglasiana 6754-13-8 -2.568 -0.751 -0.441 Asteraceae Gaillardia megapotamica 6754-13-8 -2.568 -0.751 -0.441 Asteraceae Gaillardia megapotamica 60066-35-5 -1.001 -0.891 -0.330 Asteraceae pohlii 60066-35-5 -1.001 -0.891 -0.330 Asteraceae Lychnophora pohlii 69883-97-2 -1.366 -0.889 -0.187 Asteraceae Lychnophora NA_spec 69883-97-2 -1.366 -0.889 -0.187 Asteraceae Lychnophora NA_spec 71939-83-8 -0.820 -0.964 -0.130 Asteraceae Lychnophora pohlii 71939-83-8 -0.820 -0.964 -0.130 Asteraceae Lychnophora pohlii 77448-64-7 -1.232 -0.776 0.445 Asteraceae Lychnophora pohlii 77448-64-7 -1.232 -0.776 0.445 Asteraceae Lychnophora pohlii 81767-50-2 -1.415 -0.698 0.245 Asteraceae Lychnophora pohlii 81767-50-2 -1.415 -0.698 0.245 Asteraceae Lychnophora pohlii 92484-33-8 -1.920 -0.875 -0.253 Asteraceae Mikania hoehnei 92484-33-8 -1.920 -0.875 -0.253 Asteraceae Mikania hoehnei 16836-47-8 -2.930 -0.451 0.117 Asteraceae Munnozia maronii 16836-47-8 -2.930 -0.451 0.117 Asteraceae Munnozia maronii 165171-06-2 -0.710 -1.425 -0.292 Asteraceae lobata 165171-06-2 -0.710 -1.425 -0.292 Asteraceae 165171-07-3 -0.710 -1.425 -0.292 Asteraceae Neurolaena lobata 165171-07-3 -0.710 -1.425 -0.292 Asteraceae Neurolaena lobata 67506-30-3 -0.226 -1.426 0.242 Asteraceae Neurolaena lobata 67506-30-3 -0.226 -1.426 0.242 Asteraceae Neurolaena lobata 24694-79-9 0.064 -1.223 0.584 Asteraceae Smallanthus sonchifolius 24694-79-9 0.064 -1.223 0.584 Asteraceae Smallanthus sonchifolius 33880-85-2 0.265 -1.284 0.291 Asteraceae Smallanthus sonchifolius 33880-85-2 0.265 -1.284 0.291 Asteraceae Smallanthus sonchifolius 72023-37-1 -0.138 -1.161 0.888 Asteraceae Smallanthus sonchifolius 72023-37-1 -0.138 -1.161 0.888 Asteraceae Smallanthus sonchifolius 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium

A A Table S2. STLs inside the convex hull (defined by STLs with activity against Table S2. STLs inside the convex hull (defined by STLs with activity against T.cruzi) with reported anti-leishmanial activity T.cruzi) with reported anti-leishmanial activity CAS PC1 PC2 PC3 Family Genus Spec_epit. CAS PC1 PC2 PC3 Family Genus Spec_epit. 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 1347761-21-0 -0.228 -1.268 0.175 Asteraceae Calea zacatechichi 1347761-21-0 -0.228 -1.268 0.175 Asteraceae Calea zacatechichi 499099-53-5 -0.410 -1.202 -0.017 Asteraceae Calea zacatechichi 499099-53-5 -0.410 -1.202 -0.017 Asteraceae Calea zacatechichi 63194-22-9 -0.226 -1.260 0.214 Asteraceae Calea zacatechichi 63194-22-9 -0.226 -1.260 0.214 Asteraceae Calea zacatechichi 73723-67-8 -0.408 -1.194 0.022 Asteraceae Calea zacatechichi 73723-67-8 -0.408 -1.194 0.022 Asteraceae Calea zacatechichi 94663-21-5 -1.097 -1.114 0.214 Asteraceae Calea zacatechichi 94663-21-5 -1.097 -1.114 0.214 Asteraceae Calea zacatechichi 110268-35-4 -1.245 -0.877 0.095 Asteraceae mollis 110268-35-4 -1.245 -0.877 0.095 Asteraceae Elephantopus mollis 354816-69-6 -1.301 -0.928 0.011 Asteraceae Elephantopus mollis 354816-69-6 -1.301 -0.928 0.011 Asteraceae Elephantopus mollis 354816-71-0 -1.266 -0.935 -0.048 Asteraceae Elephantopus mollis 354816-71-0 -1.266 -0.935 -0.048 Asteraceae Elephantopus mollis 50656-66-1 -1.251 -0.900 0.054 Asteraceae Elephantopus mollis 50656-66-1 -1.251 -0.900 0.054 Asteraceae Elephantopus mollis 59979-56-5 -1.276 -1.053 0.002 Asteraceae Tithonia diversifolia 59979-56-5 -1.276 -1.053 0.002 Asteraceae Tithonia diversifolia 59979-57-6 -1.319 -1.067 0.076 Asteraceae Tithonia diversifolia 59979-57-6 -1.319 -1.067 0.076 Asteraceae Tithonia diversifolia 75247-16-4 -1.078 -1.105 -0.237 Asteraceae Tithonia diversifolia 75247-16-4 -1.078 -1.105 -0.237 Asteraceae Tithonia diversifolia 21871-10-3 -0.979 -0.924 -0.237 Asteraceae amygdalina 21871-10-3 -0.979 -0.924 -0.237 Asteraceae Vernonia amygdalina 27428-86-0 -1.092 -0.983 -0.213 Asteraceae Vernonia amygdalina 27428-86-0 -1.092 -0.983 -0.213 Asteraceae Vernonia amygdalina 580-49-4 -1.983 -1.034 0.244 Asteraceae Xanthium macrocarpum 580-49-4 -1.983 -1.034 0.244 Asteraceae Xanthium macrocarpum 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium

Table S3. STLs inside the convex hull (defined by STLs with activity against Table S3. STLs inside the convex hull (defined by STLs with activity against T.cruzi) with reported anti-malarial activity T.cruzi) with reported anti-malarial activity CAS PC1 PC2 PC3 Family Genus Spec_epit. CAS PC1 PC2 PC3 Family Genus Spec_epit. 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 882677-15-8 -1.899 -1.093 0.425 Asteraceae Anthemis auriculata 105229-54-7 -1.123 -1.134 -0.186 Asteraceae Artemisia afra 105229-54-7 -1.123 -1.134 -0.186 Asteraceae Artemisia afra 1347761-21-0 -0.228 -1.268 0.175 Asteraceae Calea zacatechichi 1347761-21-0 -0.228 -1.268 0.175 Asteraceae Calea zacatechichi 63194-22-9 -0.226 -1.26 0.214 Asteraceae Calea zacatechichi 63194-22-9 -0.226 -1.26 0.214 Asteraceae Calea zacatechichi 94663-21-5 -1.097 -1.114 0.214 Asteraceae Calea zacatechichi 94663-21-5 -1.097 -1.114 0.214 Asteraceae Calea zacatechichi 83182-58-5 -0.21 -1.327 0.235 Asteraceae Cyanthillium cinereum 83182-58-5 -0.21 -1.327 0.235 Asteraceae Cyanthillium cinereum 82460-97-7 -2.925 -0.448 0.142 Asteraceae Dicoma anomala 82460-97-7 -2.925 -0.448 0.142 Asteraceae Dicoma anomala 21871-10-3 -0.979 -0.924 -0.237 Asteraceae Distephanus angulifolius 21871-10-3 -0.979 -0.924 -0.237 Asteraceae Distephanus angulifolius 89354-71-2 -1.004 -1.06 -0.201 Asteraceae Distephanus angulifolius 89354-71-2 -1.004 -1.06 -0.201 Asteraceae Distephanus angulifolius 1404062-33-4 -1.249 -1.087 0.524 Asteraceae Inula montbretiana 1404062-33-4 -1.249 -1.087 0.524 Asteraceae Inula montbretiana 377077-93-5 -1.072 -0.975 -0.314 Asteraceae Koyamasia calcarea 377077-93-5 -1.072 -0.975 -0.314 Asteraceae Koyamasia calcarea 71939-83-8 -0.82 -0.964 -0.13 Asteraceae Koyamasia calcarea 71939-83-8 -0.82 -0.964 -0.13 Asteraceae Koyamasia calcarea 80795-28-4 -0.82 -0.964 -0.13 Asteraceae Koyamasia calcarea 80795-28-4 -0.82 -0.964 -0.13 Asteraceae Koyamasia calcarea 59979-56-5 -1.276 -1.053 0.002 Asteraceae Tithonia diversifolia 59979-56-5 -1.276 -1.053 0.002 Asteraceae Tithonia diversifolia 21871-10-3 -0.979 -0.924 -0.237 Asteraceae Vernonia amygdalina 21871-10-3 -0.979 -0.924 -0.237 Asteraceae Vernonia amygdalina 27428-86-0 -1.092 -0.983 -0.213 Asteraceae Vernonia amygdalina 27428-86-0 -1.092 -0.983 -0.213 Asteraceae Vernonia amygdalina 854073-41-9 -1.055 -1.064 0.148 Asteraceae Vernoniopsis caudata 854073-41-9 -1.055 -1.064 0.148 Asteraceae Vernoniopsis caudata 854073-42-0 -1.049 -1.064 0.156 Asteraceae Vernoniopsis caudata 854073-42-0 -1.049 -1.064 0.156 Asteraceae Vernoniopsis caudata 90042-12-9 -1.082 -1.197 0.169 Asteraceae Vernoniopsis caudata 90042-12-9 -1.082 -1.197 0.169 Asteraceae Vernoniopsis caudata 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium 615584-08-2 -2.669 -0.755 -0.036 Asteraceae Xanthium strumarium 82461-08-3 -1.95 -1.031 -0.219 Magnoliaceae Liriodendron tulipifera 82461-08-3 -1.95 -1.031 -0.219 Magnoliaceae Liriodendron tulipifera

B B Quantitative summary of the STL dataset Quantitative summary of the STL dataset

Table S4. Quantitative summary per plant genus Table S4. Quantitative summary per plant genus

a- a-

Unique CAS Unique CAS Unique Family Genus Entries (incl. Species NA_spec) (excl. Species NA_spec) c Unique nonical SMILES MF Unique SMILES Family Genus Entries (incl. Species NA_spec) (excl. Species NA_spec) c Unique nonical SMILES MF Unique SMILES Apiaceae Ammodaucus 1 1 1 1 1 1 Apiaceae Ammodaucus 1 1 1 1 1 1 Apiaceae Anthriscus 1 1 1 1 1 1 Apiaceae Anthriscus 1 1 1 1 1 1 Apiaceae Athamanta 8 1 1 8 6 5 Apiaceae Athamanta 8 1 1 8 6 5 Apiaceae Cuminum 2 1 1 2 2 2 Apiaceae Cuminum 2 1 1 2 2 2 Apiaceae Daucus 10 1 1 10 10 2 Apiaceae Daucus 10 1 1 10 10 2 Apiaceae Ferula 104 11 11 75 73 31 Apiaceae Ferula 104 11 11 75 73 31 Apiaceae Laser 15 1 1 15 15 7 Apiaceae Laser 15 1 1 15 15 7 Apiaceae Laserpitium 35 6 6 35 31 9 Apiaceae Laserpitium 35 6 6 35 31 9 Apiaceae Meeboldia 8 1 1 8 7 5 Apiaceae Meeboldia 8 1 1 8 7 5 Apiaceae Melanoselinum 25 1 1 25 21 10 Apiaceae Melanoselinum 25 1 1 25 21 10 Apiaceae Rouya 6 1 1 6 5 4 Apiaceae Rouya 6 1 1 6 5 4 Apiaceae Smyrnium 43 7 7 30 25 15 Apiaceae Smyrnium 43 7 7 30 25 15 Apiaceae Talassia 2 1 1 2 2 1 Apiaceae Talassia 2 1 1 2 2 1 Apiaceae Thapsia 58 6 6 44 44 13 Apiaceae Thapsia 58 6 6 44 44 13 Aristolochiaceae Aristolochia 38 8 8 21 18 15 Aristolochiaceae Aristolochia 38 8 8 21 18 15 Asparagaceae Asparagus 1 1 1 1 1 1 Asparagaceae Asparagus 1 1 1 1 1 1 Asteraceae Acanthospermum 49 3 3 48 42 2 Asteraceae Acanthospermum 49 3 3 48 42 2 Asteraceae Achillea 238 22 21 163 133 56 Asteraceae Achillea 238 22 21 163 133 56 Asteraceae Achyropappus 4 1 1 4 4 3 Asteraceae Achyropappus 4 1 1 4 4 3 Asteraceae Ageratina 18 6 5 18 17 10 Asteraceae Ageratina 18 6 5 18 17 10 Asteraceae Ainsliaea 23 7 7 21 19 14 Asteraceae Ainsliaea 23 7 7 21 19 14 Asteraceae Ajania 31 3 3 31 31 22 Asteraceae Ajania 31 3 3 31 31 22 Asteraceae Allagopappus 18 1 0 18 18 4 Asteraceae Allagopappus 18 1 0 18 18 4 Asteraceae Amberboa 15 1 1 15 15 7 Asteraceae Amberboa 15 1 1 15 15 7 Asteraceae Amblyopappus 1 1 1 1 1 1 Asteraceae Amblyopappus 1 1 1 1 1 1 Asteraceae Ambrosia 126 12 11 63 57 33 Asteraceae Ambrosia 126 12 11 63 57 33 Asteraceae Amphoricarpos 41 2 1 27 26 4 Asteraceae Amphoricarpos 41 2 1 27 26 4 Asteraceae Anacyclus 2 1 1 2 2 2 Asteraceae Anacyclus 2 1 1 2 2 2 Asteraceae Andryala 8 2 1 7 7 5 Asteraceae Andryala 8 2 1 7 7 5 Asteraceae Anthemis 65 8 7 52 51 9 Asteraceae Anthemis 65 8 7 52 51 9 Asteraceae Anvillea 6 2 2 4 3 3 Asteraceae Anvillea 6 2 2 4 3 3 Asteraceae Arctium 4 1 1 4 4 3 Asteraceae Arctium 4 1 1 4 4 3 Asteraceae Arctotheca 4 1 1 4 4 4 Asteraceae Arctotheca 4 1 1 4 4 4 Asteraceae Arctotis 66 8 7 52 46 21 Asteraceae Arctotis 66 8 7 52 46 21 Asteraceae Argyranthemum 3 1 0 3 3 3 Asteraceae Argyranthemum 3 1 0 3 3 3 Asteraceae Arnica 102 7 6 70 54 17 Asteraceae Arnica 102 7 6 70 54 17 Asteraceae Artemisia 339 41 40 267 220 115 Asteraceae Artemisia 339 41 40 267 220 115 Asteraceae Aspilia 11 2 2 11 11 1 Asteraceae Aspilia 11 2 2 11 11 1 Asteraceae Aster 5 3 3 5 5 4 Asteraceae Aster 5 3 3 5 5 4 Asteraceae 15 2 2 12 7 7 Asteraceae Asteriscus 15 2 2 12 7 7 Asteraceae Athanasia 5 1 0 5 5 3 Asteraceae Athanasia 5 1 0 5 5 3 Asteraceae Athroisma 5 1 1 5 5 2 Asteraceae Athroisma 5 1 1 5 5 2 Asteraceae Atractylodes 10 2 2 7 5 3 Asteraceae Atractylodes 10 2 2 7 5 3 Asteraceae Aucklandia 4 1 1 4 4 4 Asteraceae Aucklandia 4 1 1 4 4 4 Asteraceae Austroeupatorium 3 1 1 3 3 3 Asteraceae Austroeupatorium 3 1 1 3 3 3 Asteraceae 3 2 2 3 3 2 Asteraceae Baccharoides 3 2 2 3 3 2 Asteraceae Bahia 4 2 2 4 4 2 Asteraceae Bahia 4 2 2 4 4 2 Asteraceae Bahiopsis 11 2 2 11 9 6 Asteraceae Bahiopsis 11 2 2 11 9 6 Asteraceae Baileya 14 4 3 10 10 7 Asteraceae Baileya 14 4 3 10 10 7 Asteraceae Balduina 2 1 1 2 2 1 Asteraceae Balduina 2 1 1 2 2 1

C C Asteraceae Balsamorhiza 14 1 1 14 14 5 Asteraceae Balsamorhiza 14 1 1 14 14 5 Asteraceae Baltimora 6 1 1 6 6 5 Asteraceae Baltimora 6 1 1 6 6 5 Asteraceae Barrosoa 1 1 1 1 1 1 Asteraceae Barrosoa 1 1 1 1 1 1 Asteraceae 5 1 1 5 5 4 Asteraceae Bartlettina 5 1 1 5 5 4 Asteraceae Bedfordia 8 1 1 8 6 6 Asteraceae Bedfordia 8 1 1 8 6 6 Asteraceae Bejaranoa 27 3 2 26 22 15 Asteraceae Bejaranoa 27 3 2 26 22 15 Asteraceae Berkheya 3 2 2 3 3 2 Asteraceae Berkheya 3 2 2 3 3 2 Asteraceae Bishopanthus 16 1 1 16 16 11 Asteraceae Bishopanthus 16 1 1 16 16 11 Asteraceae Blainvillea 43 4 4 42 39 12 Asteraceae Blainvillea 43 4 4 42 39 12 Asteraceae Blumea 8 2 2 7 7 2 Asteraceae Blumea 8 2 2 7 7 2 Asteraceae 5 1 1 5 5 3 Asteraceae Bothriocline 5 1 1 5 5 3 Asteraceae Brachanthemum 8 1 1 8 6 5 Asteraceae Brachanthemum 8 1 1 8 6 5 Asteraceae Brachylaena 25 1 0 25 25 16 Asteraceae Brachylaena 25 1 0 25 25 16 Asteraceae Brickellia 11 3 3 11 11 7 Asteraceae Brickellia 11 3 3 11 11 7 Asteraceae Brocchia 33 1 1 33 28 19 Asteraceae Brocchia 33 1 1 33 28 19 Asteraceae Calea 85 12 11 81 79 28 Asteraceae Calea 85 12 11 81 79 28 Asteraceae 1 1 1 1 1 1 Asteraceae Calomeria 1 1 1 1 1 1 Asteraceae Camchaya 3 1 1 3 3 2 Asteraceae Camchaya 3 1 1 3 3 2 Asteraceae Carpesium 133 10 10 90 77 46 Asteraceae Carpesium 133 10 10 90 77 46 Asteraceae Carphochaete 5 1 1 5 5 3 Asteraceae Carphochaete 5 1 1 5 5 3 Asteraceae Carthamus 1 1 1 1 1 1 Asteraceae Carthamus 1 1 1 1 1 1 Asteraceae Centaurea 183 33 32 98 86 22 Asteraceae Centaurea 183 33 32 98 86 22 Asteraceae Centaurothamnus 3 1 1 3 3 3 Asteraceae Centaurothamnus 3 1 1 3 3 3 Asteraceae Centipeda 23 1 1 23 23 7 Asteraceae Centipeda 23 1 1 23 23 7 Asteraceae Centratherum 11 1 1 11 11 7 Asteraceae Centratherum 11 1 1 11 11 7 Asteraceae Chaenactis 5 1 1 5 5 3 Asteraceae Chaenactis 5 1 1 5 5 3 Asteraceae Chamaemelum 16 2 2 16 15 8 Asteraceae Chamaemelum 16 2 2 16 15 8 Asteraceae Cheirolophus 38 2 1 30 27 6 Asteraceae Cheirolophus 38 2 1 30 27 6 Asteraceae Chiliadenus 2 1 1 2 1 1 Asteraceae Chiliadenus 2 1 1 2 1 1 Asteraceae Chondrilla 1 1 1 1 1 1 Asteraceae Chondrilla 1 1 1 1 1 1 Asteraceae Chresta 6 2 2 6 5 5 Asteraceae Chresta 6 2 2 6 5 5 Asteraceae Chromolaena 7 2 2 7 6 2 Asteraceae Chromolaena 7 2 2 7 6 2 Asteraceae 19 4 4 19 19 15 Asteraceae Chrysanthemum 19 4 4 19 19 15 Asteraceae Chrysolaena 23 3 3 21 18 5 Asteraceae Chrysolaena 23 3 3 21 18 5 Asteraceae Cichorium 66 4 4 39 36 23 Asteraceae Cichorium 66 4 4 39 36 23 Asteraceae Cirsium 1 1 1 1 1 1 Asteraceae Cirsium 1 1 1 1 1 1 Asteraceae Clibadium 6 2 2 6 6 1 Asteraceae Clibadium 6 2 2 6 6 1 Asteraceae Cosmos 12 3 3 11 11 7 Asteraceae Cosmos 12 3 3 11 11 7 Asteraceae Cota 24 3 3 18 15 14 Asteraceae Cota 24 3 3 18 15 14 Asteraceae 2 1 1 2 2 2 Asteraceae Cotula 2 1 1 2 2 2 Asteraceae Cousinia 20 3 2 15 15 5 Asteraceae Cousinia 20 3 2 15 15 5 Asteraceae Crepidiastrum 43 4 4 42 39 32 Asteraceae Crepidiastrum 43 4 4 42 39 32 Asteraceae Crepis 91 17 17 44 37 18 Asteraceae Crepis 91 17 17 44 37 18 Asteraceae Critonia 7 2 2 7 7 7 Asteraceae Critonia 7 2 2 7 7 7 Asteraceae Cronquistia 12 1 1 12 12 6 Asteraceae Cronquistia 12 1 1 12 12 6 Asteraceae Cronquistianthus 18 3 3 17 11 4 Asteraceae Cronquistianthus 18 3 3 17 11 4 Asteraceae Cyanthillium 16 1 1 16 15 2 Asteraceae Cyanthillium 16 1 1 16 15 2 Asteraceae Cyathocline 6 2 2 6 6 6 Asteraceae Cyathocline 6 2 2 6 6 6 Asteraceae Cyclolepis 10 1 1 10 10 3 Asteraceae Cyclolepis 10 1 1 10 10 3 Asteraceae Cynara 35 5 5 26 23 9 Asteraceae Cynara 35 5 5 26 23 9 Asteraceae Cyrtocymura 3 1 1 3 3 2 Asteraceae Cyrtocymura 3 1 1 3 3 2 Asteraceae Decachaeta 13 3 3 13 12 12 Asteraceae Decachaeta 13 3 3 13 12 12 Asteraceae Dendranthema 13 2 2 11 11 8 Asteraceae Dendranthema 13 2 2 11 11 8 Asteraceae Dendroseris 3 1 1 3 3 3 Asteraceae Dendroseris 3 1 1 3 3 3 Asteraceae Diaspananthus 10 1 1 10 10 7 Asteraceae Diaspananthus 10 1 1 10 10 7 Asteraceae Dicoma 3 2 2 3 3 3 Asteraceae Dicoma 3 2 2 3 3 3 Asteraceae Dimerostemma 8 5 5 8 8 5 Asteraceae Dimerostemma 8 5 5 8 8 5 Asteraceae Distephanus 5 1 1 5 5 5 Asteraceae Distephanus 5 1 1 5 5 5 Asteraceae Disynaphia 46 2 2 46 42 23 Asteraceae Disynaphia 46 2 2 46 42 23 Asteraceae Dittrichia 35 2 2 33 29 20 Asteraceae Dittrichia 35 2 2 33 29 20 Asteraceae Doellingeria 6 1 1 6 4 3 Asteraceae Doellingeria 6 1 1 6 4 3 Asteraceae Dugaldia 11 1 1 11 8 5 Asteraceae Dugaldia 11 1 1 11 8 5

D D Asteraceae Dugesia 1 1 1 1 1 1 Asteraceae Dugesia 1 1 1 1 1 1 Asteraceae Echinops 12 5 5 6 6 6 Asteraceae Echinops 12 5 5 6 6 6 Asteraceae Eirmocephala 1 1 1 1 1 1 Asteraceae Eirmocephala 1 1 1 1 1 1 Asteraceae Elephantopus 50 6 6 41 35 20 Asteraceae Elephantopus 50 6 6 41 35 20 Asteraceae Encelia 7 4 4 3 3 2 Asteraceae Encelia 7 4 4 3 3 2 Asteraceae Eremanthus 69 12 11 55 50 23 Asteraceae Eremanthus 69 12 11 55 50 23 Asteraceae Erigeron 9 2 2 9 8 4 Asteraceae Erigeron 9 2 2 9 8 4 Asteraceae Eriocephalus 38 1 0 38 38 22 Asteraceae Eriocephalus 38 1 0 38 38 22 Asteraceae Eriophyllum 23 3 3 22 22 8 Asteraceae Eriophyllum 23 3 3 22 22 8 Asteraceae Eupatorium 201 23 22 178 166 50 Asteraceae Eupatorium 201 23 22 178 166 50 Asteraceae Ferreyranthus 25 2 2 25 25 19 Asteraceae Ferreyranthus 25 2 2 25 25 19 Asteraceae Fitchia 5 1 1 5 5 2 Asteraceae Fitchia 5 1 1 5 5 2 Asteraceae Gaillardia 57 5 4 50 40 8 Asteraceae Gaillardia 57 5 4 50 40 8 Asteraceae Geigeria 51 2 1 47 45 38 Asteraceae Geigeria 51 2 1 47 45 38 Asteraceae Glebionis 10 1 1 10 8 5 Asteraceae Glebionis 10 1 1 10 8 5 Asteraceae 15 1 0 15 15 9 Asteraceae Gnephosis 15 1 0 15 15 9 Asteraceae Gochnatia 36 5 5 34 28 17 Asteraceae Gochnatia 36 5 5 34 28 17 Asteraceae Gonospermum 44 5 4 32 28 14 Asteraceae Gonospermum 44 5 4 32 28 14 Asteraceae Grangea 3 1 1 3 3 2 Asteraceae Grangea 3 1 1 3 3 2 Asteraceae Guizotia 1 1 1 1 1 1 Asteraceae Guizotia 1 1 1 1 1 1 Asteraceae Gutenbergia 24 1 1 24 24 12 Asteraceae Gutenbergia 24 1 1 24 24 12 Asteraceae Gymnanthemum 3 1 1 3 3 3 Asteraceae Gymnanthemum 3 1 1 3 3 3 Asteraceae Gynoxys 2 1 1 2 2 2 Asteraceae Gynoxys 2 1 1 2 2 2 Asteraceae 4 1 1 4 4 4 Asteraceae Handelia 4 1 1 4 4 4 Asteraceae Hedosyne 4 1 1 4 4 2 Asteraceae Hedosyne 4 1 1 4 4 2 Asteraceae Helenium 85 12 12 68 54 29 Asteraceae Helenium 85 12 12 68 54 29 Asteraceae Helianthus 177 18 17 115 108 35 Asteraceae Helianthus 177 18 17 115 108 35 Asteraceae Heliomeris 15 1 0 15 15 6 Asteraceae Heliomeris 15 1 0 15 15 6 Asteraceae Helipterum 39 1 0 39 37 8 Asteraceae Helipterum 39 1 0 39 37 8 Asteraceae Helminthotheca 11 3 3 11 11 9 Asteraceae Helminthotheca 11 3 3 11 11 9 Asteraceae Helogyne 43 2 2 41 36 7 Asteraceae Helogyne 43 2 2 41 36 7 Asteraceae Hemisteptia 10 1 1 10 10 3 Asteraceae Hemisteptia 10 1 1 10 10 3 Asteraceae Heterocoma 5 1 1 5 5 4 Asteraceae Heterocoma 5 1 1 5 5 4 Asteraceae Hirtellina 4 1 1 4 4 4 Asteraceae Hirtellina 4 1 1 4 4 4 Asteraceae Hulsea 1 1 0 1 1 1 Asteraceae Hulsea 1 1 0 1 1 1 Asteraceae Hyaloseris 5 2 2 4 4 4 Asteraceae Hyaloseris 5 2 2 4 4 4 Asteraceae Hymenoclea 9 1 1 9 8 5 Asteraceae Hymenoclea 9 1 1 9 8 5 Asteraceae 84 13 12 51 40 23 Asteraceae Hymenoxys 84 13 12 51 40 23 Asteraceae Hypochaeris 7 3 3 6 6 6 Asteraceae Hypochaeris 7 3 3 6 6 6 Asteraceae Inezia 3 1 1 3 2 2 Asteraceae Inezia 3 1 1 3 2 2 Asteraceae Inula 366 23 23 256 219 85 Asteraceae Inula 366 23 23 256 219 85 Asteraceae Inulanthera 18 4 4 18 15 10 Asteraceae Inulanthera 18 4 4 18 15 10 Asteraceae Iva 7 3 3 7 7 5 Asteraceae Iva 7 3 3 7 7 5 Asteraceae Ixeridium 30 1 1 30 29 19 Asteraceae Ixeridium 30 1 1 30 29 19 Asteraceae 38 3 3 36 32 24 Asteraceae Ixeris 38 3 3 36 32 24 Asteraceae Jefea 2 2 2 1 1 1 Asteraceae Jefea 2 2 2 1 1 1 Asteraceae Jurinea 23 5 5 17 16 7 Asteraceae Jurinea 23 5 5 17 16 7 Asteraceae Kaunia 31 3 3 29 25 16 Asteraceae Kaunia 31 3 3 29 25 16 Asteraceae Koanophyllon 11 1 1 11 10 8 Asteraceae Koanophyllon 11 1 1 11 10 8 Asteraceae Koyamasia 9 1 1 9 8 4 Asteraceae Koyamasia 9 1 1 9 8 4 Asteraceae Lactuca 225 15 15 79 75 47 Asteraceae Lactuca 225 15 15 79 75 47 Asteraceae Lapsana 5 1 1 5 5 4 Asteraceae Lapsana 5 1 1 5 5 4 Asteraceae Launaea 8 3 3 7 7 2 Asteraceae Launaea 8 3 3 7 7 2 Asteraceae Leontodon 14 2 2 10 10 7 Asteraceae Leontodon 14 2 2 10 10 7 Asteraceae 6 2 2 6 6 4 Asteraceae Lepidaploa 6 2 2 6 6 4 Asteraceae Leucanthemopsis 5 2 2 5 5 5 Asteraceae Leucanthemopsis 5 2 2 5 5 5 Asteraceae 3 1 1 3 3 3 Asteraceae Leucophyta 3 1 1 3 3 3 Asteraceae Leuzea 5 1 1 5 5 3 Asteraceae Leuzea 5 1 1 5 5 3 Asteraceae Liatris 35 7 7 32 32 12 Asteraceae Liatris 35 7 7 32 32 12 Asteraceae Lidbeckia 1 1 1 1 1 1 Asteraceae Lidbeckia 1 1 1 1 1 1 Asteraceae Ligularia 49 9 9 46 44 17 Asteraceae Ligularia 49 9 9 46 44 17 Asteraceae Lourteigia 3 1 1 3 3 1 Asteraceae Lourteigia 3 1 1 3 3 1

E E Asteraceae Lychnophora 65 11 10 43 38 17 Asteraceae Lychnophora 65 11 10 43 38 17 Asteraceae Matricaria 5 1 1 5 5 5 Asteraceae Matricaria 5 1 1 5 5 5 Asteraceae Melampodium 28 5 5 26 26 12 Asteraceae Melampodium 28 5 5 26 26 12 Asteraceae Mikania 180 18 17 143 125 54 Asteraceae Mikania 180 18 17 143 125 54 Asteraceae Milleria 23 1 1 23 20 8 Asteraceae Milleria 23 1 1 23 20 8 Asteraceae Minasia 9 1 1 9 9 4 Asteraceae Minasia 9 1 1 9 9 4 Asteraceae Montanoa 77 8 8 76 70 33 Asteraceae Montanoa 77 8 8 76 70 33 Asteraceae Munnozia 1 1 1 1 1 1 Asteraceae Munnozia 1 1 1 1 1 1 Asteraceae Neohintonia 2 1 1 2 2 1 Asteraceae Neohintonia 2 1 1 2 2 1 Asteraceae Neurolaena 44 4 4 20 18 4 Asteraceae Neurolaena 44 4 4 20 18 4 Asteraceae Notoseris 15 3 3 12 12 8 Asteraceae Notoseris 15 3 3 12 12 8 Asteraceae Oldenburgia 1 1 1 1 1 1 Asteraceae Oldenburgia 1 1 1 1 1 1 Asteraceae Oncosiphon 6 2 2 6 6 6 Asteraceae Oncosiphon 6 2 2 6 6 6 Asteraceae Onoseris 1 1 1 1 1 1 Asteraceae Onoseris 1 1 1 1 1 1 Asteraceae Otanthus 3 1 1 3 3 1 Asteraceae Otanthus 3 1 1 3 3 1 Asteraceae Pappobolus 40 3 2 34 30 14 Asteraceae Pappobolus 40 3 2 34 30 14 Asteraceae Parthenium 98 5 4 79 75 26 Asteraceae Parthenium 98 5 4 79 75 26 Asteraceae Pegolettia 25 3 3 21 20 14 Asteraceae Pegolettia 25 3 3 21 20 14 Asteraceae Pentanema 12 1 1 12 12 8 Asteraceae Pentanema 12 1 1 12 12 8 Asteraceae Pentzia 28 2 1 28 26 17 Asteraceae Pentzia 28 2 1 28 26 17 Asteraceae Perityle 16 2 2 13 12 4 Asteraceae Perityle 16 2 2 13 12 4 Asteraceae Petasites 36 2 2 36 27 4 Asteraceae Petasites 36 2 2 36 27 4 Asteraceae 2 1 1 2 2 2 Asteraceae Peucephyllum 2 1 1 2 2 2 Asteraceae Picradeniopsis 1 1 1 1 1 1 Asteraceae Picradeniopsis 1 1 1 1 1 1 Asteraceae Picris 60 10 9 38 37 25 Asteraceae Picris 60 10 9 38 37 25 Asteraceae Piptocoma 14 1 1 14 14 7 Asteraceae Piptocoma 14 1 1 14 14 7 Asteraceae Piptothrix 17 3 3 14 13 4 Asteraceae Piptothrix 17 3 3 14 13 4 Asteraceae Pleiotaxis 5 1 1 5 5 3 Asteraceae Pleiotaxis 5 1 1 5 5 3 Asteraceae Pluchea 13 1 1 13 12 7 Asteraceae Pluchea 13 1 1 13 12 7 Asteraceae 16 1 1 16 16 11 Asteraceae Podachaenium 16 1 1 16 16 11 Asteraceae 9 2 2 7 7 3 Asteraceae Podanthus 9 2 2 7 7 3 Asteraceae Polychrysum 2 1 1 2 2 1 Asteraceae Polychrysum 2 1 1 2 2 1 Asteraceae Polymnia 2 1 1 2 2 2 Asteraceae Polymnia 2 1 1 2 2 2 Asteraceae Postia 9 1 1 9 9 6 Asteraceae Postia 9 1 1 9 9 6 Asteraceae Psephellus 9 2 2 7 7 5 Asteraceae Psephellus 9 2 2 7 7 5 Asteraceae Pseudelephantopus 20 1 1 20 19 8 Asteraceae Pseudelephantopus 20 1 1 20 19 8 Asteraceae Psilostrophe 18 3 2 17 12 5 Asteraceae Psilostrophe 18 3 2 17 12 5 Asteraceae Ptilostemon 1 1 1 1 1 1 Asteraceae Ptilostemon 1 1 1 1 1 1 Asteraceae Pulicaria 39 4 4 31 25 17 Asteraceae Pulicaria 39 4 4 31 25 17 Asteraceae Ratibida 13 3 3 13 13 8 Asteraceae Ratibida 13 3 3 13 13 8 Asteraceae Reichardia 9 3 3 9 9 6 Asteraceae Reichardia 9 3 3 9 9 6 Asteraceae Rhaponticum 43 7 7 23 23 6 Asteraceae Rhaponticum 43 7 7 23 23 6 Asteraceae Richterago 1 1 1 1 1 1 Asteraceae Richterago 1 1 1 1 1 1 Asteraceae Rolandra 14 1 1 14 10 5 Asteraceae Rolandra 14 1 1 14 10 5 Asteraceae Rudbeckia 33 5 5 31 30 21 Asteraceae Rudbeckia 33 5 5 31 30 21 Asteraceae Saussurea 120 15 15 99 89 38 Asteraceae Saussurea 120 15 15 99 89 38 Asteraceae Scalesia 9 1 0 9 9 7 Asteraceae Scalesia 9 1 0 9 9 7 Asteraceae Schistostephium 55 4 3 54 47 18 Asteraceae Schistostephium 55 4 3 54 47 18 Asteraceae Schkuhria 46 3 3 45 41 15 Asteraceae Schkuhria 46 3 3 45 41 15 Asteraceae Scorzonera 15 2 1 15 15 11 Asteraceae Scorzonera 15 2 1 15 15 11 Asteraceae Scorzoneroides 9 3 3 8 8 8 Asteraceae Scorzoneroides 9 3 3 8 8 8 Asteraceae 42 12 11 34 28 10 Asteraceae Senecio 42 12 11 34 28 10 Asteraceae Seriphidium 10 2 2 9 7 7 Asteraceae Seriphidium 10 2 2 9 7 7 Asteraceae 8 2 2 8 7 7 Asteraceae Serratula 8 2 2 8 7 7 Asteraceae Sigesbeckia 16 1 1 16 16 5 Asteraceae Sigesbeckia 16 1 1 16 16 5 Asteraceae Smallanthus 51 6 6 40 39 11 Asteraceae Smallanthus 51 6 6 40 39 11 Asteraceae Sonchus 40 9 9 35 35 27 Asteraceae Sonchus 40 9 9 35 35 27 Asteraceae Sphaeranthus 5 1 1 5 4 2 Asteraceae Sphaeranthus 5 1 1 5 4 2 Asteraceae Sphagneticola 14 1 1 14 11 2 Asteraceae Sphagneticola 14 1 1 14 11 2 Asteraceae Squamopappus 4 1 1 4 4 3 Asteraceae Squamopappus 4 1 1 4 4 3 Asteraceae Stachycephalum 5 1 1 5 5 4 Asteraceae Stachycephalum 5 1 1 5 5 4 Asteraceae Steiractinia 3 1 1 3 3 1 Asteraceae Steiractinia 3 1 1 3 3 1

F F Asteraceae Stevia 159 18 17 148 134 61 Asteraceae Stevia 159 18 17 148 134 61 Asteraceae Stizolophus 6 1 1 6 5 2 Asteraceae Stizolophus 6 1 1 6 5 2 Asteraceae Stokesia 7 1 1 7 5 2 Asteraceae Stokesia 7 1 1 7 5 2 Asteraceae Symphyotrichum 2 1 1 2 2 2 Asteraceae Symphyotrichum 2 1 1 2 2 2 Asteraceae Syncretocarpus 3 1 1 3 3 2 Asteraceae Syncretocarpus 3 1 1 3 3 2 Asteraceae Synedrella 11 1 1 11 11 3 Asteraceae Synedrella 11 1 1 11 11 3 Asteraceae Synotis 8 1 1 8 5 5 Asteraceae Synotis 8 1 1 8 5 5 Asteraceae Tanacetopsis 9 1 1 9 9 8 Asteraceae Tanacetopsis 9 1 1 9 9 8 Asteraceae Tanacetum 86 9 9 60 50 38 Asteraceae Tanacetum 86 9 9 60 50 38 Asteraceae Taraxacum 69 10 9 32 32 23 Asteraceae Taraxacum 69 10 9 32 32 23 Asteraceae Telekia 10 1 1 10 10 6 Asteraceae Telekia 10 1 1 10 10 6 Asteraceae Tetraneuris 28 4 4 28 26 10 Asteraceae Tetraneuris 28 4 4 28 26 10 Asteraceae Tithonia 79 4 4 72 63 25 Asteraceae Tithonia 79 4 4 72 63 25 Asteraceae Trichogonia 51 3 2 51 39 20 Asteraceae Trichogonia 51 3 2 51 39 20 Asteraceae Urolepis 7 1 1 7 5 3 Asteraceae Urolepis 7 1 1 7 5 3 Asteraceae 11 2 2 9 8 8 Asteraceae Urospermum 11 2 2 9 8 8 Asteraceae Ursinia 84 5 4 79 70 42 Asteraceae Ursinia 84 5 4 79 70 42 Asteraceae Vernonanthura 70 5 5 65 61 17 Asteraceae Vernonanthura 70 5 5 65 61 17 Asteraceae Vernonia 94 10 9 87 85 34 Asteraceae Vernonia 94 10 9 87 85 34 Asteraceae Vernoniopsis 3 1 1 3 3 2 Asteraceae Vernoniopsis 3 1 1 3 3 2 Asteraceae Viguiera 136 20 19 96 85 28 Asteraceae Viguiera 136 20 19 96 85 28 Asteraceae Volutaria 10 3 3 8 8 5 Asteraceae Volutaria 10 3 3 8 8 5 Asteraceae Wamalchitamia 1 1 1 1 1 1 Asteraceae Wamalchitamia 1 1 1 1 1 1 Asteraceae Warionia 14 1 1 14 12 11 Asteraceae Warionia 14 1 1 14 12 11 Asteraceae 21 4 3 21 18 5 Asteraceae Wedelia 21 4 3 21 18 5 Asteraceae Wunderlichia 14 2 2 14 13 9 Asteraceae Wunderlichia 14 2 2 14 13 9 Asteraceae Xanthium 89 9 8 54 39 24 Asteraceae Xanthium 89 9 8 54 39 24 Asteraceae 14 6 6 9 9 5 Asteraceae Zaluzania 14 6 6 9 9 5 Asteraceae 3 1 0 3 3 1 Asteraceae Zexmenia 3 1 0 3 3 1 Asteraceae Zinnia 76 11 10 62 57 17 Asteraceae Zinnia 76 11 10 62 57 17 Burseraceae Trattinnickia 2 2 2 2 2 1 Burseraceae Trattinnickia 2 2 2 2 2 1 Canellaceae Canella 3 1 1 3 3 2 Canellaceae Canella 3 1 1 3 3 2 Canellaceae Cinnamodendron 3 1 1 3 3 2 Canellaceae Cinnamodendron 3 1 1 3 3 2 Canellaceae Cinnamosma 15 3 3 12 12 7 Canellaceae Cinnamosma 15 3 3 12 12 7 Canellaceae Pleodendron 1 1 1 1 1 1 Canellaceae Pleodendron 1 1 1 1 1 1 Canellaceae Warburgia 25 3 3 23 20 7 Canellaceae Warburgia 25 3 3 23 20 7 Chloranthaceae Chloranthus 233 10 10 151 137 69 Chloranthaceae Chloranthus 233 10 10 151 137 69 Chloranthaceae Hedyosmum 19 4 4 17 17 14 Chloranthaceae Hedyosmum 19 4 4 17 17 14 Chloranthaceae Sarcandra 56 1 1 56 55 33 Chloranthaceae Sarcandra 56 1 1 56 55 33 Coriariaceae Coriaria 20 4 4 13 12 5 Coriariaceae Coriaria 20 4 4 13 12 5 Lamiaceae Glechoma 9 2 2 8 8 4 Lamiaceae Glechoma 9 2 2 8 8 4 Lamiaceae Salvia 30 5 5 29 26 14 Lamiaceae Salvia 30 5 5 29 26 14 Lauraceae Actinodaphne 5 1 1 5 5 5 Lauraceae Actinodaphne 5 1 1 5 5 5 Lauraceae Laurus 33 2 1 31 29 16 Lauraceae Laurus 33 2 1 31 29 16 Lauraceae Lindera 62 2 2 57 51 34 Lauraceae Lindera 62 2 2 57 51 34 Lauraceae Litsea 1 1 1 1 1 1 Lauraceae Litsea 1 1 1 1 1 1 Lauraceae Neolitsea 82 10 10 48 42 26 Lauraceae Neolitsea 82 10 10 48 42 26 Limeaceae Limeum 1 1 1 1 1 1 Limeaceae Limeum 1 1 1 1 1 1 Magnoliaceae Liriodendron 14 1 1 14 12 11 Magnoliaceae Liriodendron 14 1 1 14 12 11 Magnoliaceae Magnolia 72 19 19 37 32 21 Magnoliaceae Magnolia 72 19 19 37 32 21 Malvaceae Abutilon 2 1 1 2 2 2 Malvaceae Abutilon 2 1 1 2 2 2 Malvaceae Bombax 2 1 1 2 2 1 Malvaceae Bombax 2 1 1 2 2 1 Malvaceae Ceiba 2 1 1 2 2 1 Malvaceae Ceiba 2 1 1 2 2 1 Malvaceae Heritiera 3 1 1 3 3 1 Malvaceae Heritiera 3 1 1 3 3 1 Menispermaceae Anamirta 5 1 1 5 5 3 Menispermaceae Anamirta 5 1 1 5 5 3 Nyctaginaceae Boerhavia 2 1 1 2 2 1 Nyctaginaceae Boerhavia 2 1 1 2 2 1 Orchidaceae Dendrobium 9 1 1 9 8 6 Orchidaceae Dendrobium 9 1 1 9 8 6 Picrodendraceae Picrodendron 15 1 1 15 15 7 Picrodendraceae Picrodendron 15 1 1 15 15 7 Putranjivaceae Drypetes 1 1 1 1 1 1 Putranjivaceae Drypetes 1 1 1 1 1 1 Rosaceae Crataegus 1 1 1 1 1 1 Rosaceae Crataegus 1 1 1 1 1 1 Rosaceae Prunus 1 1 1 1 1 1 Rosaceae Prunus 1 1 1 1 1 1 Rubiaceae Lasianthus 2 1 1 2 2 1 Rubiaceae Lasianthus 2 1 1 2 2 1

G G Rutaceae Melicope 3 1 1 3 3 1 Rutaceae Melicope 3 1 1 3 3 1 Schisandraceae Illicium 143 16 16 94 85 43 Schisandraceae Illicium 143 16 16 94 85 43 Winteraceae Drimys 5 1 1 5 4 2 Winteraceae Drimys 5 1 1 5 4 2 Winteraceae Pseudowintera 4 1 1 4 4 4 Winteraceae Pseudowintera 4 1 1 4 4 4 Winteraceae Zygogynum 5 3 3 4 4 4 Winteraceae Zygogynum 5 3 3 4 4 4 Xanthorrhoeaceae Asphodeline 4 2 1 4 3 1 Xanthorrhoeaceae Asphodeline 4 2 1 4 3 1 Zingiberaceae Curcuma 30 7 7 23 19 13 Zingiberaceae Curcuma 30 7 7 23 19 13 25 families 305 genera 8619 1074 1020 25 families 305 genera 8619 1074 1020

Skeleton distribution in lower taxonomic ranks Skeleton distribution in lower taxonomic ranks

Skeletons per genus Skeleton distribution in Inula species Skeletons per genus Skeleton distribution in Inula species

UniqueCount(CAS) per Skeleton class Data table: Data table: NEW_STLs_PII_V3_spot_inc NEW_STLs_PII_V3_spot_inc 273273 (5,2(5,2 %)%) 273273 (5,2(5,2 %)%) Marking: Marking: Marking UniqueCount(CAS) per Skeleton class Marking 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) DataColor table: by 17941794 (34,0(34,0 %)%) DataColor table: by 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_incSkeleton class 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_incSkeleton class 273273 (5,2(5,2 %)%) germacranolide 273273 (5,2(5,2 %)%) germacranolide Marking: Marking: guaianolide guaianolide MarkingUniqueCount(CAS) per Skeleton class Marking 730730 (13,8(13,8 %)%) eudesmanolide 730730 (13,8(13,8 %)%) eudesmanolide 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) Data table: ColorpseudoguaianolideUniqueCount(CAS) by per Skeleton class 17941794 (34,0(34,0 %)%) Data table: Colorpseudoguaianolide by 363363 (6,9(6,9 %)%)Occurrence 363363 (6,9(6,9 %)%)Occurrence 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_inc Skeletoneremophilanolide class 363363 (6,9(6,9 %)%) NEW_STLs_PII_V3_spot_inc Skeletoneremophilanolide class Data table: Data table: 273273 (5,2(5,2 %)%) lindenanolidegermacranolide 273273 (5,2(5,2 %)%) lindenanolidegermacranolide 14841484 (28,2(28,2 %)%) Marking:NEW_STLs_PII_V3_spot_inc 14841484 (28,2(28,2 %)%) Marking:NEW_STLs_PII_V3_spot_inc elemanolideguaianolide elemanolideguaianolide 273273 (5,2(5,2 %)%) Marking 273273 (5,2(5,2 %)%) Marking 730730 (13,8(13,8 %)%) Marking: ... eudesmanolide 730730 (13,8(13,8 %)%) Marking: ... eudesmanolide 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) 17941794 (34,0(34,0 %)%) ColorMarking by UniqueCount(CAS) per Skeletonpseudoguaianolide class 17941794 (34,0(34,0 %)%) ColorMarking by pseudoguaianolide 363363 (6,9(6,9 %)%) Skeleton class Bar Chart 363363 (6,9(6,9 %)%) Skeleton class 17941794 (34,0(34,0 %)%) Species distribution eremophilanolide 17941794 (34,0(34,0 %)%) Species distribution eremophilanolide 17941794 (34,0(34,0 %)%) Data table: Color by 17941794 (34,0(34,0 %)%) Data table: Color by germacranolide germacranolide 363363 (6,9(6,9 %)%) Data table: lindenanolide 363363 (6,9(6,9 %)%) Data table: lindenanolide 363363 (6,9(6,9 %)%) 14841484 (28,2(28,2 %)%) NEW_STLs_PII_V3_spot_inc Skeleton class 100 % 363363 (6,9(6,9 %)%) 14841484 (28,2(28,2 %)%) NEW_STLs_PII_V3_spot_inc Skeleton class NEW_STLs_PII_V3_spot_inc 100 % guaianolide Color byelemanolide NEW_STLs_PII_V3_spot_inc 100 % guaianolide Color byelemanolide 8 1 1 15 219 9 273 9 9 5 8 3 4 1 1 germacranolide1 98 2 131 19 34 10 2732738 (5,2(5,2 %)%) 8 1 1 15 219 9 273 9 9 5 8 3 4 1 1 germacranolide1 98 2 131 730730 (13,8(13,8 %)%) Marking: eudesmanolide 4 ... 2 730730 (13,8(13,8 %)%) Marking: eudesmanolide 4 ... Marking: 2 guaianolide Skeleton class Marking: 2 guaianolide Skeleton class 4 Marking pseudoguaianolide 4 Marking pseudoguaianolide 12 110 73 Marking 15 12 110 73 Marking 730730 (13,8(13,8 %)%) 12 1 eudesmanolide 5 Bar Chart 730730 (13,8(13,8 %)%) 12 1 eudesmanolide 5 17941794 (34,0(34,0 %)%) 90Species % distribution 7 7 3 eremophilanolide germacranolide 90 % 17941794 (34,0(34,0 %)%) 90Species % distribution 7 7 3 eremophilanolide germacranolide Color by pseudoguaianolide Color by pseudoguaianolide DataColor table: by lindenanolide DataColor table: by lindenanolide 14841484 (28,2(28,2 %)%) 362 Skeleton class 363363 (6,9(6,9 %)%) 14841484 (28,2(28,2 %)%) 362 Skeleton class NEW_STLs_PII_V3_spot_incSkeleton class eremophilanolide96 Colorguaianolide by 100 % NEW_STLs_PII_V3_spot_incSkeleton class eremophilanolide96 Colorguaianolide by 138 1 1 15 219 9 273 100 % 11 6 elemanolide 16 19 34 10 8 138 1 1 15 219 9 273 100 % 11 6 elemanolide 16 80 % 9 9 germacranolide5 8 3 4 1 1 lindenanolide1 98 2 131 80 % 9 9 germacranolide5 8 3 4 1 1 lindenanolide1 98 2 131 14841484 (28,2(28,2 %)%) 363 germacranolide ...4 4 97 2 14841484 (28,2(28,2 %)%) 363 germacranolide ...4 4 97 Marking: guaianolide 2 243 Skeletoneudesmanolide class 80 % Marking: guaianolide 2 243 Skeletoneudesmanolide class 4 guaianolide elemanolide 1 4 guaianolide elemanolide 1 12 110 570 73 Marking 291 75 12 110 570 73 Marking 291 13 eudesmanolide ... 5 Bar Chart 730730 (13,8(13,8 %)%) 15 13 eudesmanolide ... 5 12 Species eudesmanolidedistribution 90 % 7 7 3 1 pseudoguaianolidegermacranolide 90 % 12 Species eudesmanolidedistribution 90 % 7 7 3 1 pseudoguaianolidegermacranolide 40 730 70 % pseudoguaianolide7 40 730 70 % pseudoguaianolide7 Data table: 8 Colorpseudoguaianolide by Data table: 8 Colorpseudoguaianolide by 362 3 Bar Chart 362 3 NEW_STLs_PII_V3_spot_inc SpeciesSkeleton distribution class eremophilanolide Color by 96 6 eremophilanolideguaianolide100 % 70 % NEW_STLs_PII_V3_spot_inc SpeciesSkeleton distribution class eremophilanolide Color by 96 6 eremophilanolideguaianolide 13 9 273 100 % eremophilanolide 11 6 16 19 34 10 8 8 1 1 15 21913 9 273 100 % eremophilanolide 11 6 16 Data table: 9 9 5 8 3 804 % 1 1 1 lindenanolide98 2 131 Data table: 9 9 5 8 3 804 % 1 1 1 lindenanolide98 2 131 363 lindenanolidegermacranolide 60 % 4 4 97 2 14841484 (28,2(28,2 %)%) 363 lindenanolidegermacranolide 60 % 4 4 97 NEW_STLs_PII_V3_spot_incMarking:1309 2 ColorSkeleton by class243401 lindenanolideeudesmanolide100 % 80 % NEW_STLs_PII_V3_spot_incMarking:1309 2 ColorSkeleton by class243401 lindenanolideeudesmanolide 9 273 100 % guaianolide 13 elemanolide 4 3 31 1 19 34 10 8 8 14 1 15 219 9 273 100 % guaianolide 13 elemanolide 4 3 31 1 73 Marking570 9 elemanolide9 5 8 3 4 1 1 1 98 2 131 291459 15 12 75 110 73 Marking570 9 elemanolide9 5 8 3 4 1 1 1 98 2 131 291459 13 1 Species distribution ... 451 2 12 13 1 Species distribution ... 451 Marking: 127 901484 % xanthanolideeudesmanolide 27 7 3 Skeletongermacranolide class elemanolidepseudoguaianolide90 % 60 % 1 7 Marking: 127 901484 % xanthanolideeudesmanolide 27 7 3 Skeletongermacranolide class elemanolidepseudoguaianolide 40 730 70 % 7 11 4 40 730 70 % 7 11 73 ColorMarking by 8 pseudoguaianolide 50 % 18 12 110 73 ColorMarking by 8 pseudoguaianolide 50 % 18 seco-prezizaane 51 3 Bar Chart 15 362 seco-prezizaane 51 3 90 % 1 ) 12 90 % 1 SpeciesSkeleton distributionclass 7 7 113 96 germacranolideguaianolide 6 xanthanolideeremophilanolide90 % 70 % 1 SpeciesSkeleton distributionclass 7 7 113 96 germacranolideguaianolide 6 xanthanolideeremophilanolide eremophilanolide 6 3 S eremophilanolide 6 3 Color by drimanolide 16 13 Color by drimanolide 16 Data table: 80 % A Data table: 80 % 363 germacranolide lindenanolide 60 % 4 8 97 362 363 germacranolide lindenanolide 60 % 4 8 97 1309 19 picrotoxane Color by 401 100 % 80 % C 11 1309 19 picrotoxane Color by 401 Skeleton class 96 guaianolideeudesmanolide seco-prezizaanelindenanolide ( 50 % Skeleton class 96 guaianolideeudesmanolide seco-prezizaanelindenanolide NEW_STLs_PII_V3_spot_inc e 4011 % 243 NEW_STLs_PII_V3_spot_inc e 4011 % 243 100 % 6 13 16 1 4 3 t 100 % 6 13 16 1 4 3 guaianolide elemanolide c 31 45919 34 10 8 75 8 1 1 15 13 219 9 273 570 guaianolide elemanolide c 31 459 9 9 5 8 803 % 4 1 other1 1 98 2 131 n 9 9 5 8 803 % 4 1 other1 1 98 2 131 n 29197 n 29197 363 germacranolide 2 u 13 363 germacranolide 127 4 e 4 1 12 127 4 e 4 1 12 eudesmanolide 2 1484 r Skeleton class eudesmanolide 2 1484 r Skeleton class Marking: xanthanolide 243 eudesmanolidepseudoguaianolide drimanolideelemanolide80 % o 60 % 1 227 Marking: xanthanolide 243 eudesmanolidepseudoguaianolide drimanolideelemanolide 730 70 % 7 u 1 4 40 730 70 % 7 u 1 10 8 guaianolide 11 C 10 8 guaianolide 11 Marking pseudoguaianolide c 50 % 18 291 4 468 75 12 110 73 570Marking pseudoguaianolide c 50 % 18 291 4 468

15 e seco-prezizaane 5 c 3 30 % 1 13 seco-prezizaane 5 c 3 30 % 1 ) 90 % 7 1 90 % 528 u 12 90 % 7 1 528 eudesmanolide 7 3 O germacranolide6 pseudoguaianolideeremophilanolide picrotoxanexanthanolide70 % 40 % 1 eudesmanolide 7 3 O germacranolide6 pseudoguaianolideeremophilanolide picrotoxanexanthanolide q 730 eremophilanolide 70 % 7 3 S 730 eremophilanolide 70 % 7 3 Color by 8 drimanolide 7 i 40 Color by 8 drimanolide 7 A pseudoguaianolide n pseudoguaianolide 7 lindenanolide 60 % 3 8 362 7 lindenanolide 60 % 3 8

19 C 19 130951 1688 picrotoxane 401 U 11 130951 1688 picrotoxane 401 Skeleton class 96 guaianolide 6 eremophilanolidelindenanolide seco-prezizaane70 % ( 50 % Skeleton class 96 guaianolide 6 eremophilanolidelindenanolide seco-prezizaane 11 e 40 % other 11 e 40 % other eremophilanolide 613 16 4 3 t eremophilanolide 613 16 4 3 elemanolide c 20 % 31 459 13 elemanolide c 20 % 31 459 45 80 % other n 45 80 % other 1794 n 97 1794 n 97 germacranolide u 363 germacranolide 127 lindenanolide 60 % 4 1 e 12 127 lindenanolide 60 % 4 1 e 12 1484 r 401 1484 r 401 xanthanolide 243 eudesmanolide lindenanolideelemanolide 80 % drimanolide60 % o 1 30 %7 22 1309 xanthanolide 243 eudesmanolide lindenanolideelemanolide drimanolide 13 u 14 3 13 u 14 3 guaianolide10 11 31 C guaianolide10 11 31 elemanolide 50 % 18 c 291 459 4 468 75 570 elemanolide 50 % 18 c 291 459 4 468 e seco-prezizaane 1 c 30 % 13 seco-prezizaane 1 c 30 %

1 ) 1 127 528 u 127 528 eudesmanolide 1484 xanthanolide O 10 % pseudoguaianolide 5 elemanolidexanthanolide picrotoxane60 % 11 407 % eudesmanolide 1484 xanthanolide O 10 % pseudoguaianolide 5 elemanolidexanthanolide picrotoxane q 70 % 7 3 7 S 730 70 % 7 3 7 drimanolide 40 i drimanolide 11 7 8 11 7

50 % A 50 % 18 n 18 pseudoguaianolide7 seco-prezizaane 3 18 pseudoguaianolide7 seco-prezizaane 3 18 ) 19 C 19 51 picrotoxane1688 U 11 51 picrotoxane1688 6 eremophilanolide seco-prezizaane70 % other ( 50 % 1 6 eremophilanolide seco-prezizaane other e 40 % xanthanolide 20 % e 40 % xanthanolide

5 t 5 eremophilanolide 3 0 % S eremophilanolide 3 0 % drimanolide c 20 % drimanolide c 20 % 45 other n 45 other n n 1794 A 1794

8 u 8 lindenanolide 60 % e 12 lindenanolide 60 % e 12

19 C 19 l l picrotoxane r 401 picrotoxane r 401 o 11 1309 Figure S1. Frequency oflindenanolide occurrence andseco-prezizaanedrimanolide distribution of STL( 50 % skeletons on lower30 22% Figure S1. Frequency oflindenanolide occurrence andseco-prezizaanedrimanolide distribution of STL skeletons on lower e e

u 40 % u 40 % e e e e e e e e e e e e e e e e e e e e e e e e e e 13 4 3 t 13 4 3

C 10 a a c 31 c 31 elemanolide c 459 elemanolide c 459

4 n 4

other 468 t other 468 t n n e a a a a a a a a a a a a a a a a a a a a a a a a a a c c

30 % u 30 %

1 e 12 1 e 12 528 u 127 528 o o r r O 1484 O o xanthanolide 10 % elemanolidee e e e e e 5 e e drimanolidepicrotoxanee e e 60e % e 1 40 % 7 xanthanolide 10 % elemanolidee e e e e e 5 e e drimanolidepicrotoxanee e e e e

q 22 7 t 10 % 7 t 17 u 17 u i

11 C 10 11 c c c c c c c c c c c c c c c c c c c c c c c c c c 50 % c 7 50 % c 7 18 4 n 18 4 taxonomic ranks. Selected genera468 of Asteraceae show_ clear tendencies to predomi- taxonomic ranks. Selected genera468 of Asteraceae show_ clear tendencies to predomi- e seco-prezizaane 1 c 30 % 7 seco-prezizaane 1 c 30 % ) a a a a a a a a a a a a a a a a a a a a a a a a a a

U 51

1688 528 u 1688 528

18 s 18 s l i i i i i l i i i i i O other 1 O other xanthanolider r r r picrotoxaner r 40 % 20 % xanthanolider r r r picrotoxaner r

5 l l 5 l l q S

0 % r 0 % r

3 h 3 h drimanolide i drimanolide

20 % L 20 % L h p h p t t 7 d e u e d e 45 7 d e u e d e A n e o e o m m

1794 a 1794 a 8 t t 8 t t c i c i

T 7 T n n C 19 l 0 % l U picrotoxane n a b n A 51 1688picrotoxane n a b n A

n n 11 n n r a r a (

seco-prezizaane i s 50 % 30 % seco-prezizaane i s e e 40 % n other 40 % n other nantly produce certain typese e e ofe skeletons.e e e e e Fore e analysise e on an even lower taxonomic nantly produce certain typese e e ofe skeletons.e e e e e Fore e analysise e on an even lower taxonomic t o o i i a a a a c c a L e S a L e S l i l l l a g a g r o L o L

20 % n 20 % g g e h e c e e e d e a e e a e a e e e e e e c e a e h e c e e e d e a e e a e a t t other r other r n n A 45 A h a a a a a a a a a a a a a a a a a a a a a a a a a a t t t t d

1794 s 1794 s u a a a c a n a e a e a c a c a a a a e a a a c a n a e a e d d e e o o

12 c i 12 a a n o n o i i r n l r n n o o r r C o C o e e a e e c e c t e a a e e c e e a e e c e c t W t C W t C o 10 % e e e e e e e e e e e e e 10 % e e e e e e e e e e e e e i l o i i i a 5 drimanolide a 5 drimanolide l l

e 22 a l r l o e 7 t 30 % 10 % 7 t o o 17 u 17 u c r d c a c a _ c c a c r d c a c a _ r r i i h h C b r 10 s e e o n r b r s s c c c c c c c c c c c c c c c c c c c c c c c c c c M r M r c c a i a o a a s i a a a i a o a a s o t t o

i i 4 i i 4 i h Z i h Z l l _ _

r r e m 468 n g u r r e m 468 e c c g r L h g r L c c t n s t

p rank, single genera can be filtered. As and example, the genus Inula has been select- p rank, single genera can be filtered. As and example, the genus Inula has been select- r r h 30 % i i h 30 % c c c a a a a c a a a a a a a a a a a a a a a a a a a a a a a a a a a n o i T c n o i T 528 u 528 s r s s s

18 i 18 i i i l i i i i i l i i i i i O O A A L L L r r r r r r r r r r r r C C C 10 % picrotoxane 40 % 20 % C 10 % picrotoxane C M l l 5 l l

P S 5 S W P S 5 q S S C A A C A 0 % r 0 % r

Z 7 Z 7 h h A A i L L h p h p t t P P 7 d e u e d e 7 d e u e d e n e o e o m m a a t t t t c i c i

T 7 T

n 0 % n l l U n a b n A 51 1688 n a b n A

Family n n Family n n

Family r a Family r a i s i s

other n other n

e e e e e e e e e e e e e 20 % 5 e e e e e e e e e e e e e o o i i

ed 0 to% illustrate the specific distributiona of skeletons among its species. The species ed 0 to% illustrate the specific distributiona of skeletons among its species. The species a a a L e S a L e S l i l l l a g a g r

20 % o L 20 % o L g g e h e c e e e d e a e e a e a e e e e e e c e a e h e c e e e d e a e e a e a t t r r A A

45 h a a a a a a a a a a a a a a a a a a a a a a a a a a t t t t d s s

a a a c a n a e a e a c a c a 1794a a a e a a a c a n a e a e d d o o c i a a n o n o i i r n l r n n o o C o C o e e a e e c e c t e a a e e c e e a e e c e c t l l W t C W t C e e e e e e e e e e e e e e e e e e e e e e e e e e i l o i i i a a l l e a l r l o e t 30 % 10 % t o o

c r d c a c a _ c 17 c a c r d c a c a _ r r e e e e e e e e e e e e e e e e e e e e e e e e e e i i h h b r s e e o n r b r s s a a c c c c c c c c c c c c c c c c c c c c c c c c c c M r M r a i a o a a s i a a a i a o a a s o t t o i i i i i h Z i h Z l l _ _ r r e m n g u r r e m t t g r L h g r L a a a a a a a a a a a a a a a a a a a a a a a a a a c c t n s t p p r r h i i h c c c a a a a c a contain a distincta a a skeletona a a a a classesa a a anda do not reflect the overall distribution in Inula. contain a distincta a a skeletona a a a a classesa a a anda do not reflect the overall distribution in Inula. n o i T c n o i T s r s s s i i

18 i 18 i l i i i i i l i i i i i A A o o L L L r r r r r r r r r r r r C C C C

10 % e e e e e e e e e e e e e 10 % e e e e e e e e e e e e e C M l l l l

P S 5 S W P S 5 S S C A A C A t t r 10 % r Z 7 Z 7 h h

A 17 A L L h p h p t t P P c c c c c c c c c c c c c c c c c c c c c c c c c c d e u e d e d e u e d e e o e o m m _ _ a a t t t t c i c i T T

n 0 % n a a a a a a a a a a a a a a a a a a a a a a a a a a n a b n A n a b n A

Family n n Family n n

Familys Familys 18 r a 18 r a l i i i i i l i i i i i i s i s n n r r r r r r 20 % r r r r r r

l l 5 l l o o i i

0 % r 0 % r a a h h a L e S a L e S l i l l l a g a g r L L o L o L h p h p g g e a e a e e e e e e c e a e h e c e e e d e a e e a e a t t r r d e u e d e d e u e d e A A h t t t e o e o d m m a a s s a e a c a c a a a a e a a a c a n a e a e t t t t d d o o c i a a o n o c i c i i i n l r T T n n n n C o C o l l e c t 0 % e a a e e c e e a e e c e c t n a b n A n a b n A W t C W t C l o i n n n n i i a r a r a l l a l r l o e i s i s n n o o c a _ c c a c r d c a c a _ r e e e e e e e e e e e e e e e e e e e e e e e e e e i i o o h h i i s e e o n r b r a a s s a a a L e S a L e S M r M r l i l a s i a a a i a o a a s l l t t o a g a g r i i i o L o L i h Z i h Z l g g e a e a e e e e e e c e a e h e c e e e d e a e e a e a m n g u r r e m t t r r A A h L h g r L a a a a a a a a a a a a a a a a a a a a a a a a a a c c t t t n s t d p p r r i i h s s c c a e a c a c a a a a e a a a c a n a e a e a a a a c a d d o o c i T c n o i T a a o n o r s i i i i n l r i n n A A o o C o C o e c t e a a e e c e e a e e c e c t L L L W t C W t C C C C e e e e e e e e e e e e e e e e e e e e e e e e e e l o i C M i i a S S W P S l l S S a l r l o e A C A t 10 % t Z o o A 17 A c a _ c c a c r d c a c a _ r i i h h s e e o n r b r P P s s c c c c c c c c c c c c c c c c c c c c c c c c c c M r M r a s i a a a i a o a a s t t o i i i i h Z i h Z l _ _ m n g u r r e m L h g r L c c n s t p p r r i i h c c a a a a c a a a a a a a a a a a a a a a a a a a a a a a a a a a

T c Family n o i T r s Family s Family s

i 18 i i l i i i i i l i i i i i A A L L L r r r r r r r r r r r r C C C C M l l l l S S W P S S S A C A r r Z h h A A L L h p h p t t P P d e u e d e d e u e d e e o e o m m a a t t t t c i c i T T

n 0 % n n a b n A n a b n A

n n Family n n

r Familya r Familya i s i s n n o o i i a a a L e S a L e S i l l l a g a g r o L o L g g e e e e e e c e a e h e c e e e d e a e e a e a r r A A h t t d s s a c a c a a a a e a a a c a n a e a e d d o o c i a a n o i i n l r n n C o C o e a a e e c e e a e e c e c t W t C W t C l o i i i a l l a l r l o e o o

c c a c r d c a c a _ r i h h s e e o n r b r s s M r M r i a a a i a o a a s t t o i i i h Z i h Z l n g u r r e m h g r L c c n s t p r r i i h c c a a a c a c n o i T r s i i i A L L C C C C M S W P S S S A C A Z A A

P P Family Family Family

H H Filtering of STLs via MF of interest Filtering of STLs via MF of interest

O O O O O O O O O O O O Isoatriplicolide tiglate Isoatriplicolide tiglate

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

Figure S2. MFs similar to the trypanocidal furanoheliangolides tested in the study Figure S2. MFs similar to the trypanocidal furanoheliangolides tested in the study by Schmidt et al. Isotriplicolide tiglate is presented as example compound as one out by Schmidt et al. Isotriplicolide tiglate is presented as example compound as one out of four tested furanoheliangolides. Below, the 14 similar MFs are illustrated that of four tested furanoheliangolides. Below, the 14 similar MFs are illustrated that were used to filter the STLs dataset for all STLs that possess these particular MFs. were used to filter the STLs dataset for all STLs that possess these particular MFs.

I I Table S5. Quantitative summary of filtered STLs with similar MFs to trypanocidal Table S5. Quantitative summary of filtered STLs with similar MFs to trypanocidal furanoheliangolides furanoheliangolides

Genus Number of species Number of CAS Genus Number of species Number of CAS Austroeupatorium 1 1 Austroeupatorium 1 1 Bejaranoa 2 12 Bejaranoa 2 12 Calea 3 20 Calea 3 20 Centratherum 1 9 Centratherum 1 9 Chresta 1 1 Chresta 1 1 Decachaeta 1 1 Decachaeta 1 1 Disynaphia 1 1 Disynaphia 1 1 Eremanthus 8 17 Eremanthus 8 17 Helianthus 3 34 Helianthus 3 34 Heliomeris 1 5 Heliomeris 1 5 Koyamasia 1 8 Koyamasia 1 8 Lourteigia 1 3 Lourteigia 1 3 Lychnophora 9 20 Lychnophora 9 20 Minasia 1 8 Minasia 1 8 Neurolaena 4 17 Neurolaena 4 17 Pappobolus 1 14 Pappobolus 1 14 Piptocoma 1 9 Piptocoma 1 9 Tithonia 1 2 Tithonia 1 2 Trichogonia 1 12 Trichogonia 1 12 Viguiera 14 35 Viguiera 14 35

J J Abstract of tentative sesquiterpenoids Abstract of tentative sesquiterpenoids

Table S6. Identified sesquiterpenoids by HRMS and isotope pattern matching Table S6. Identified sesquiterpenoids by HRMS and isotope pattern matching

Calculated Experimental Calculated Experimental molecular molecular Molecular molecular molecular Molecular CAS weight weight formula mSigma CAS weight weight formula mSigma

2019176-68-0 246.1256 247.1328 C15H18O3 11.8 2019176-68-0 246.1256 247.1328 C15H18O3 11.8 2221-88-7 214.1358 215.1429 C15H18O 23.1 2221-88-7 214.1358 215.1429 C15H18O 23.1 1346458-56-7 248.1412 249.1484 C15H20O3 11.1 1346458-56-7 248.1412 249.1484 C15H20O3 11.1 20267-91-8 248.1412 249.1484 C15H20O3 12 20267-91-8 248.1412 249.1484 C15H20O3 12 24173-83-9 214.1358 215.1432 C15H18O 17.6 24173-83-9 214.1358 215.1432 C15H18O 17.6 26146-27-0 230.1307 231.1380 C15H18O2 6.8 26146-27-0 230.1307 231.1380 C15H18O2 6.8 17910-10-0 244.1463 245.1537 C16H20O2 10.9 17910-10-0 244.1463 245.1537 C16H20O2 10.9 1618-84-4 232.1463 233.1536 C15H20O2 10.5 1618-84-4 232.1463 233.1536 C15H20O2 10.5 246167-10-2 324.1573 325.1624 C17H24O6 37.8 246167-10-2 324.1573 325.1624 C17H24O6 37.8

Intens . F 13_GD2_01_3013.d: +MS, 2.5min #722, -Spectral Bkgrnd Intens . F 13_GD2_01_3013.d: +MS, 2.5min #722, -Spectral Bkgrnd x104 x104 231.1378 231.1378

8 O 8 O O O O O

6 6

233.1534 233.1534

4 4 227.0914 247.1328 227.0914 247.1328

243.0863 243.0863

2 229.1222 2 229.1222

232.1418 232.1418 234.1570 234.1570 228.0963 235.1691 228.0963 235.1691 237.1849 237.1849 230.1256 244.0898 230.1256 244.0898 0 0 227.5 230.0 232.5 235.0 237.5 240.0 242.5 245.0 m/z 227.5 230.0 232.5 235.0 237.5 240.0 242.5 245.0 m/z

Figure S3. CAS: 2019176-68-0, eudesmanolide Figure S3. CAS: 2019176-68-0, eudesmanolide

K K Intens . F 26_GD4_01_3017.d: +MS, 4.0min #1143, -Spectral Bkgrnd Intens . F 26_GD4_01_3017.d: +MS, 4.0min #1143, -Spectral Bkgrnd x105 x105 215.1429 215.1429 O O

1.5 1.5

1.0 1.0

0.5 0.5

159.0803 159.0803

147.0803 165.0908 173.0960 191.1065 223.0634 147.0803 165.0908 173.0960 191.1065 223.0634 0.0 0.0 120 140 160 180 200 220 m/z 120 140 160 180 200 220 m/z

Figure S4. CAS: 2221-88-7, eudesman skeleton Figure S4. CAS: 2221-88-7, eudesman skeleton

Intens . F 13_GD2_01_3013.d: +MS, 2.5min #718, -Spectral Bkgrnd Intens . F 13_GD2_01_3013.d: +MS, 2.5min #718, -Spectral Bkgrnd x105 x105 249.1484 249.1484 O O 1.25 O 1.25 O

1.00 HO 1.00 HO

231.1378 231.1378

0.75 0.75 213.1273 213.1273

0.50 0.50

189.0910 227.0913 189.0910 227.0913 243.0863 243.0863 207.1015 207.1015 0.25 0.25

161.0960 161.0960

167.0702 201.1273 167.0702 201.1273 193.0858 193.0858 219.1379 219.1379

0.00 0.00 160 180 200 220 240 260 m/z 160 180 200 220 240 260 m/z

Figure S5. CAS: 1346458-56-7, elemanolide Figure S5. CAS: 1346458-56-7, elemanolide

L L

Intens . F 14_GD3_01_3015.d: +MS, 2.6min #742, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 2.6min #742, -Spectral Bkgrnd x104 1+ x104 1+ OH 249.1484 OH 249.1484 O O O O 2.5 2.5

2.0 2.0

1.5 1.5

1.0 1.0

247.1327 247.1327 0.5 0.5

243.0861 243.0861 239.1277 245.1533 239.1277 245.1533 0.0 0.0 236 238 240 242 244 246 248 m/z 236 238 240 242 244 246 248 m/z

Figure S6. CAS: 20267-91-8, elemanolide Figure S6. CAS: 20267-91-8, elemanolide

Intens . F 14_GD3_01_3015.d: +MS, 3.9min #1126, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 3.9min #1126, -Spectral Bkgrnd x106 x106 O O

1+ 1+ 0.8 215.1432 0.8 215.1432

0.6 0.6

0.4 0.4

0.2 0.2

1+ 1+ 159.0806 1+ 159.0806 1+ 199.1483 199.1483

173.0962 191.1068 173.0962 191.1068 0.0 0.0 150 160 170 180 190 200 210 m/z 150 160 170 180 190 200 210 m/z

Figure S7. CAS: 24173-83-9, lindenane skeleton Figure S7. CAS: 24173-83-9, lindenane skeleton

M M

Intens . F 14_GD3_01_3015.d: +MS, 3.1min #890, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 3.1min #890, -Spectral Bkgrnd x105 x105 O O

1.0 1.0

1+ 1+ 231.1380 231.1380 OH OH 1+ 1+ 0.8 213.1274 0.8 213.1274

0.6 0.6

1+ 1+ 0.4 189.0910 0.4 189.0910 1+ 1+ 181.0860 181.0860 1+ 1+ 221.1172 221.1172 1+ 1+ 161.0961 161.0961 0.2 0.2 171.0805 171.0805 149.0598 149.0598

143.0856 143.0856

0.0 0.0 120 140 160 180 200 220 240 m/z 120 140 160 180 200 220 240 m/z

Figure S8. CAS: 26146-27-0, lindenane skeleton Figure S8. CAS: 26146-27-0, lindenane skeleton

Intens . F 14_GD3_01_3015.d: +MS, 3.1min #890, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 3.1min #890, -Spectral Bkgrnd x104 1+ x104 1+ 245.1537 245.1537 2.5 2.5 1+ 1+ 232.1426 O 232.1426 O

2.0 2.0 OMe OMe

1.5 235.1693 1.5 235.1693

1.0 1.0

1+ 241.1070 1+ 241.1070 233.1525 233.1525

0.5 0.5

236.1729 236.1729 239.1277 239.1277 242.1109 242.1109 0.0 0.0 232 234 236 238 240 242 244 m/z 232 234 236 238 240 242 244 m/z

Figure S9. CAS: 17910-10-0, lindenane skeleton Figure S9. CAS: 17910-10-0, lindenane skeleton

N N Intens . F 14_GD3_01_3015.d: +MS, 3.7min #1066, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 3.7min #1066, -Spectral Bkgrnd x105 x105 O O

5 5

1+ 1+ OH 233.1536 OH 233.1536 4 4

3 3

2 1+ 2 1+ 215.1431 215.1431

1 1

173.1325 173.1325 187.1482 187.1482 159.0805 205.1587 159.0805 205.1587 145.1012 145.1012 0 0 140 160 180 200 220 240 m/z 140 160 180 200 220 240 m/z

Figure S10. CAS: 1618-84-4, lindenane skeleton Figure S10. CAS: 1618-84-4, lindenane skeleton

Intens . F 14_GD3_01_3015.d: +MS, 2.7min #782, -Spectral Bkgrnd Intens . F 14_GD3_01_3015.d: +MS, 2.7min #782, -Spectral Bkgrnd x104 x104 O O O O O O

4 O 4 O HO OH HO OH

1+ 1+ 3 325.1644 3 325.1644

1+ 1+ 1+ 309.1694 1+ 309.1694 307.1538 307.1538

2 2

1 1

1+ 1+ 323.1485 323.1485

312.2168 319.1513 312.2168 319.1513 0 0 300 305 310 315 320 325 330 m/z 300 305 310 315 320 325 330 m/z

Figure S11. CAS: 246167-10-2, seco and rearranged lindenane skeleton Figure S11. CAS: 246167-10-2, seco and rearranged lindenane skeleton

O O Acta Universitatis Upsaliensis Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 282 from the Faculty of Pharmacy 282 Editor: The Dean of the Faculty of Pharmacy Editor: The Dean of the Faculty of Pharmacy

A doctoral dissertation from the Faculty of Pharmacy, Uppsala A doctoral dissertation from the Faculty of Pharmacy, Uppsala University, is usually a summary of a number of papers. A few University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Summaries of Uppsala Dissertations from the Faculty of Pharmacy. (Prior to January, 2005, the series was published Pharmacy. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy”.) Dissertations from the Faculty of Pharmacy”.)

ACTA ACTA UNIVERSITATIS UNIVERSITATIS UPSALIENSIS UPSALIENSIS Distribution: publications.uu.se UPPSALA Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-399068 2020 urn:nbn:se:uu:diva-399068 2020