<<

NATURAL PRODUCTS IN DRUG DISCOVERY355 doi:10.2533/chimia.2007.355 CHIMIA 2007, 61,No. 6

Chimia 61 (2007) 355–360 ©Schweizerische Chemische Gesellschaft ISSN 0009–4293

Cheminformatic Analysis of Natural Products and their Chemical Space

Stefan Wetzela ,Ansgar Schuffenhauerb ,Silvio Roggob ,Peter Ertlb ,and Herbert Waldmann*a

Abstract: Cheminformatic methods allow the detailed characterization of particular and characteristic properties of natural products (NPs) and comparison with related characteristics of drugs and other compounds. An overview of the most important properties of natural products and analogues and their difference with respect to drugs and synthetic compounds is presented. Moreover,different approaches to charting the chemical space populated by natural products arereviewed and their underlying principles aredelineated. Some insights about NP chemical space aredescribed together with possible applications of methods charting chemical space. Strengths and weak- nesses of the different approaches will be discussed with respect to possible applications in compound collection design. Keywords: Chemical space ·ChemGPS · ·Natural products ·SCONP

1. Introduction al constitute biologically validated starting may be applied in the synthesis of com- points for library design and manyoftheir pounds with NP-likeproperties. There are Natural products (NPs) have been selected core structures have been recognized to be several publications describing the results of during evolution to bind to various proteins privileged structures.[4,5] Known examples statistical analyses of NP properties. In 1999 during their life-cycle, e.g. in biosynthesis for privileged structures in NPs that have Henkel et al. published afirst comparison and degradation and while exerting their successfully been applied in library design of molecular properties of natural products, mode of action. Thus NP structures are are the benzodiazepine[4] and the indole synthetic compounds and drugs.[10] Two good starting points for the discovery and scaffolds.[6,7] years later,Lee and Schneider[11] published development of protein ligands and, there- With the upcoming newfocus on NPs apaper evaluating trade drugs and natural fore, ultimately for .Indeed in drug discovery interest in their chemical products also addressing the general no- manycurrent drugs originate from NPs and structural properties and in ways to ex- tion that natural products may not be very although their structures were chemically plore them in drug discovery have increased drug-like. In 2003 Feher and Schmidt[12] modified in manycases.[1] After arecession as well. Modern cheminformatics methods published one of the most comprehensive of NPs in the pharmaceutical industry in the allowthe rapid prediction of aplethora of comparisons of NPs, drugs and combina- 1990s recent years have witnessed acome- chemical and physico-chemical properties torial chemistry compounds and compared back of NPs in drug discovery.NPs were of available NP structures.[8] Nowadays, these three groups using 40 different prop- often viewed as being ‘too complex’ and manyofthese properties are routinely used erties. Ertl and Schuffenhauer analyzed ‘sufficiently examined’ butonce again are in library design and optimization. The properties and structural features of more recognized as avaluable source of interest- most prominent and widely used example than 130,000 NP and identified ing and diverse structures.[2,3] NPs in gener- may be Lipinski’s‘Rule-of-Five’. The substructures typical for particular classes Rule-of-Five denotes aset of property rules of source organisms.[13] One of the most re- describing orally bioavailable drug space. cent analyses of natural product properties Theywere empirically derivedfrom known and scaffolds wasconducted by Grabowski orally available drugs.[9] Chemical proper- and Schneider in 2007.[14] In the following ties constitute one waytodescribe the part paragraphs the results of these publications of chemical space occupied by acompound will be summarized. set; structural features is another.Several Henkel et al. analyzed twonatural approaches have been taken to map and product databases, the Dictionary of Natu- navigate chemical space reflecting chemi- ral Products (DNP)[15] and the Bioactive a * Correspondence:Prof. Dr.H.Waldmann cal properties and structure. Natural Product Database.[16] Synthetic Tel.: +49231 133 2400 Fax: +49231 133 2499 compounds were taken from the Chemicals E-Mail: [email protected] Dictionary (ACD)[17] and the Bayer AG in- a Max-Planck Institut für MolekularePhysiologie 2. Natural Product Properties house collection. The analysis revealed that Abteilung IV –Chemische Biologie Otto-Hahn-Strasse 11 molecular weight distribution wascompa- D-44229 Dortmund The systematic evaluation of NP prop- rable for drugs and NPs butsynthetic com- and Universität Dortmund, Chemische Biologie erties yields abetter understanding of what pounds were found to have alower mo- Dortmund b Novartis Institutes for BioMedical Research distinguishes NPs from drugs or combinato- lecular weight. The average NP CH-4002 Basel rial chemistry compounds. This knowledge wasfound to contain three stereogenic NATURAL PRODUCTS IN DRUG DISCOVERY356 CHIMIA 2007, 61,No. 6 centers, three times as manyasthe average logues of compound library vendors. Alto- animals) could identify some substructural drug molecule. The authors also reported gether,the authors calculated more than 40 features typical for these groups. that NPs contain more oxygen butsignifi- molecular properties for all test sets. They From the data described one can con- cantly less nitrogen compared to the other found the average number of stereogenic clude that NPs differ from drugs in several compound classes. Additionally,Henkel centers to be 6.2 for NPs and 2.3 for drugs. properties although the majority of them do et al. performed an analysis of pharmaco- Notably,these averages are twice as high not violate Lipinski’sRule-of-Five.Proper- phoric groups, i.e. structural units prone to as those found by Henkel et al.four years ties which distinguish NPs from drugs are interact with biological macromolecules. earlier while the ratio of 3:1 remained the heteroatom distribution, number of rings, These include different functional groups same. The NPs contained on average two fraction of aromatic rings and degree of likeacids or alcohols and their isosteres more oxygen atoms than the drugs which ring fusion. Certainly some of these dif- as well as structural motifs, e.g. aryl or is in agreement with Henkel et al.although ferences may reflect empirical knowledge alkyl moieties. As expected, NPs incorpo- not with the results of Lee and Schneider. and synthetic methodology applied in drug rate more oxygen-containing groups while Half as manynitrogen atoms were found in discovery.But others may be explained by drugs and synthetic compounds are abun- NPs compared to drugs and even much less the limitations of the biosynthesis of NPs, dant with nitrogen-containing ones. Since sulphur and halogens. NPs contain about e.g. the availability of certain elements. the statistics on heteroatoms as well as on tworings more than drugs while at the same The very lownitrogen content in NPs from pharmacophoric groups were reported as time their degree of ring fusion is twice as plants as shown by Henkel et al.[10] for ex- fraction of molecules incorporating them, high. NPs were found to be generally more ample, may partly be due to nitrogen being no average number per molecule can be unsaturated, howevertheycontain consid- agrowth-limiting factor for manyplants. In given. In general, differences in the phar- erably less aromatic rings. The number of contrast, the higher halogen content found macophoric groups of NPs and drugs tend rotatable bonds found in NPs is twoless in NPs originating from algae may reflect to be smaller than between NPs and syn- than in drugs. Together with the high degree the higher availability of halogens in the thetic compounds. Particularly interesting of unsaturation and ring fusion, this indi- marine environment. Some comparisons may be the largest differences which can cates that NPs are on average more rigid of NP properties to those of drugs may be found for arenes. NPs contain half as than drugs. Feher and Schmidt also distin- be partly biased because drugs are almost manyaryl groups as synthetics. Alcohols guished between genuine natural products exclusively small molecules while some are three times more frequent in NPs com- and ‘seminaturals’, i.e.derivatives. They NP classes also incorporate much larger pared to synthetic compounds. Henkel et found the seminatural compounds to be structures, e.g. macrocycles with different al. also did adetailed analysis of the NP more drug-likeintheir properties especially properties.[18] In general, one should keep compound classes represented in the DNP. with respect to the number of stereogenic in mind that natural products are often The result showed aclear predominance of centers, ,ring fusion and hetero- viewed as aparticular butmore or less ho- alkaloids and terpenes overmost other nat- atom distribution. mogenous and related class of compounds ural product classes in the DNP.Although Grabowski and Schneider[14] com- like, for example, drugs or synthetic com- this may have changed by nowitshould be pared the properties and scaffolds of pounds. But in fact theyconsist of manydi- kept in mind that the results of every analy- drugs, pure natural products (PNP), NP verse subclasses. The properties and scaf- sis inherently reflect the dataset which was derivativesand, particularly interesting, fold architecture of NPs vary between these analyzed. of marine natural products in their 2007 subclasses and depend on the organism of Lee and Schneider’spaper in 2001 com- publication. While most of their findings origin, the biosphere this organism livesin pared aset of parameters mainly related to are comparable to those of Feher and and the role of the NPs in nature, e.g .as the Rule-of-Five for trade drugs and natural Schmidt, Grabowski and Schneider found metabolite, poison, neurotransmitter etc. products.[11] In this survey,heteroatom fre- that 10% of the drug molecules but18% Our knowledge about these subclasses also quencies were givenasaverage atoms per of the pure natural products and 30% of varies considerably.While, for example, molecule. The natural product molecules the marine natural products analyzed by plant NPs have been studied extensively contained an average of 1.4 nitrogen atoms them showed at least twoviolations of for decades, exploration of marine NPs has per molecule, about one less than the drug Lipinski’sRule-of-Five.The violation been less frequent and less comprehensive. set while the average number of oxygen at- rate for drugs matches the value published Average NP properties give some insight oms per molecule wasreported to be about earlier by Lee and Schneider,but for the into their distinguished features and high- four in both sets. The calculated logP val- pure natural products the figure is twice light some prominent differences to other ues indicate that natural products are more as high. The higher rate found for marine classes of compounds, however, theyare lipophilic than drug molecules (2.9 vs 2.1). natural products may be due to the higher less well suited to describe and explore the One of the most surprising findings, how- molecular weight and the increased num- diversity of NP chemical space. ever,was the lowrate of Rule-of-Five vio- ber of acceptors per molecule. Notably,the lations. Only 10% of the NPs violated the marine natural products analyzed contain 2.1. Navigating Natural Product rules although these rules have been derived fewer rings and fewer atoms in aromatic Chemical Space exclusively from orally available drugs. A rings butmore rotatable bonds than PNPs, NP chemical space encompasses the comparable rate of violation wasfound in which indicates higher flexibility. regions of chemical space which are occu- the drug set suggesting that, at least with The analysis of Ertl and Schuffenhau- pied by natural products. Visualizations of respect to the Rule-of-Five,small natural er[13] on the largest set of NP structures NP chemical space facilitate analysis and product molecules were found to be more studied so far(more than 130,000 mol- exploration of the chemical diversity in drug-likethan theywere thought to be. ecules) confirmed the results obtained by Nature. The knowledge gleaned from such Amore comprehensive study on the previous studies for smaller sets. Addition- an endeavour certainly is interesting in its properties of natural products in compari- ally,the authors identified the scaffolds ownright butcan also serveasasource of son to those of drugs and libraries resulting and substituents which are typical for NPs inspiration for the design of newcompound from wasconduct- and which may serveasinspiration for the libraries. Several approaches to charting ed by Feher and Schmidt in 2003.[12] They design of NP-likecombinatorial libraries. chemical space have been taken so farex- analyzed the drugs from the Dictionary of Study of NPs produced by different classes ploring different aspects of chemical space. Drugs and all other structures from the cata- of organisms (bacteria, fungi, plants, and Some visualize the chemical property space NATURAL PRODUCTS IN DRUG DISCOVERY357 CHIMIA 2007, 61,No. 6 based on the properties of the compounds populating it, and others are based on chem- ical structure, i.e. theyexplore chemical structure space. Both approaches employ different methodology and thus yield dif- ferent butcomplementary results. Feher and Schmidt depicted the chemi- cal space of drugs, compounds originating from combinatorial chemistry and natural products based on their properties in a2D scatter plot diagram using principal compo- nent analysis (PCA). PCA is amathemati- cal method which reduces an n-dimensional vector space to aspace of smaller dimen- sion, mostly twoorthree, while keeping the characteristics that contribute most to the variance of the data. The newbase vectors are formed by linear combinations of the n base vectors of the previous vector space and each axis consists of the sum of frac- tions of the different properties initially cal- culated, the so-called ‘loading’. In order to obtain amore recent insight Fig. 1. Scatter plot of the combined chemical space of natural products (green), bioactive molecules into differences between NPs and other (blue) and synthetic organic molecules (orange). Natural products and synthetic compounds occupy molecules, we performed an analysis of different parts of chemical space, and drugs aresomewhereinbetween. Moreover,the space occup- representative sets of natural products, bio- ied by drugs and natural products is significantly larger and morediffuse than the space characteristic active molecules and synthetic organic com- for the synthetic compounds. pounds using substructural features instead of properties. The natural products were the descriptors used. This is due to the PCA pounds with different activity, e.g. COX2 taken from the Dictionary of Natural Prod- method which employs the characteristics protein inhibition, COX2 mRNAinhibi- ucts (DNP).[19] An in silico deglycosylation contributing most to the variance of the da- tion and COX1 enzyme inhibition. They using the Molinspiration toolkit[20] yield- taset as the first principal component, those were also able to identify some properties ed 113,664 unique aglycons from which contributing second most as the second which may be important with respect to 15,000 were selected to form the represen- component etc. these individual activities. However, some tative set. The 15,000 bioactive molecules The ChemGPS approach published by outliers were identified whose properties were representatively selected from the Oprea and Gottfries in 2001[23] also used had be extrapolated rather than interpo- World Drug Index [21] and the MDDR data- PCA and more than 60 descriptors. Howev- lated from the reference set. The authors base[22] (together about 120,000 structures) er,the authors proposed amethod to over- concluded that for charting natural product and the same number of organic molecules come the limited comparability by applying space anew training set with more NP-like wasselected from alarge internal database adedicated reference set of compounds to properties would fine tune the approach to- of commercially available compounds. As determinate the loading of the PCA. These wards ChemGPS-NP. [25] adescriptor,two-atom fragments were used reference compounds were chosen to have In conclusion, PCA-based approaches which were present in at least 0.3% of the extreme properties which place them at the and, in particular,ChemGPS are well suited molecules. This yielded amatrix with 110 outer rims of the chemical space of interest. for the comparison of compound sets with columns for the corresponding fragment The positions of the compounds mapped respect to their properties or structural fea- frequencies. After normalization, aPCA can then be interpolated from the refer- tures captured by the descriptors used. The wasperformed to arrive at three principal ences. The overall approach resembles very ability to rapidly process and display large components which were then displayed much the Navstar global positioning system numbers of compounds makes it valuable in the 3D scatter plot diagram shown in (GPS) where satellites in geo-stationary or- for the comparison of large libraries or da- Fig. 1. bits (far from earth) are used as references tabases. However, comparing the chemical The observation that combinatorial fortriangulationofapositiononearth,hence space depicted by different PCA models is chemistry space is much smaller than natu- the name ‘ChemGPS’. Moreover, the load- difficult, even more so if different sets of ral product or drug space and more sharply ing of the three principal components was descriptors have been used. Additionally, defined matches Feher and Schmidt’sob- optimized in away that each of the three conversion of positions in chemical space servation. However, in their dataset there axes can be translated into interpretable into chemistry and chemical structure is of- wasalmost no difference between drug descriptors, e.g. size, hydrophobicity and ten difficult so that their use in synthesis and natural product space. In the diagram flexibility.This is aconsiderable advantage planning or library design is limited. shown in Fig. 1the twosets only share a compared to standard PCA since manyof Recently,Waldmann and co-workers de- small overlap and can otherwise be distin- their principal components do not directly veloped adifferent approach to navigating guished. This difference may be explained relate to interpretable chemistry.The maps chemical space which is based on chemical by the fact that Feher and Schmidt used of chemical space generated by this method structure rather than on calculated proper- simple calculated properties to characterize resemble the 3D scatter plot in Fig. 1. ties.[26] Their structural classification of molecules, while in the analysis described ChemGPS wasapplied to natural natural products (SCONP) is based on ahi- above alarge set of substructure features products by Larsson et al.[24] using natural erarchical classification of scaffolds which wasused. cyclooxygenase (COX) inhibitors as an ex- consist of rings systems and their aliphatic One obvious limitation of the PCA ap- ample. The authors found that ChemGPS linkers butnot other chains.[27] Dissecting proach is its dependencyonthe dataset and wasable to discriminate clusters of com- these scaffolds into smaller structures ac- NATURAL PRODUCTS IN DRUG DISCOVERY358 CHIMIA 2007, 61,No. 6 cording to aset of rules yields aseries of one-to-manyparent-child relationships with the smaller scaffold being the parent of the larger one. The rules used to guide this iterative dissection process reflect syn- thetic and medicinal knowledge. Each par- ent-child assignment wasguided by the fol- lowing rules until only one possible parent scaffold wasleft: i) The parent scaffold has to be asub- structure of the child scaffold. ii) The parent scaffold has to have fewer rings than the child scaffold. iii) Breaking of ring bonds is forbidden. iv)The parent with the highest number of heteroatoms is chosen. v) The largest parent scaffold is selected. vi) The more frequent parent scaffold, i.e. the one representing more NPs, is se- lected. The relationships can be displayed in a tree-likediagram which is depicted in Fig. 2. The scaffold tree diagram shown in Fig. Fig. 2. Scaffold tree generated from the Dictionary of Natural Products 14.2. For clarity,only scaffolds 2provides an overviewovernatural product areshown which arepresent in at least 0.2% of molecules in the dataset. structure space contained in the Dictionary of Natural Products (DNP). It depicts the more frequent scaffolds, i.e. those represent- created from one molecule depends on the scaffolds independent of the dataset, i.e. ing at least 0.2% of the dataset, and their rest of the data set because only scaffolds the parent generated from one scaffold relationships in agenealogy-likefashion. present in at least one molecule are allowed. will always be the same irrespective of the At the giventhreshold the carbocyclic sec- While one might argue that thus only ‘true’ dataset used. This change in methodology tion contains most scaffolds, followed by natural product scaffolds enter the scaffold may infer scaffold constructs which are not the O-heterocycles and the N-heterocycles. tree, comparisons with trees from different present in molecules in the dataset butpro- While in the carbocyclic section manyscaf- data sets are very difficult if not impossible. vides the links otherwise missing, e.g .inthe folds with three and four rings can be found, Anew rule set wasrecently introduced natural product scaffold tree. Thus, com- the O-heterocyclic section contains mainly by Schuffenhauer et al.[29] This set includes parisons between datasets become possible, one- and two-ring scaffolds. Among the modifications to makethe dissection of an apparent application of such atool. The N-heterocycles only some branches extend beyond the first ring. Koch et al.[26] found the scaffolds with three rings to be most abundant across the whole dataset with the 1. Reduce number of acyclic linker bonds 2. Retain bridged rings, spiro rings and nonlinear ring fusion patterns with preference two- and four-ring scaffolds being within O O O O NH O one standard deviation. Together,these three N N N scaffold types account for more than 50% of S S O N O N all scaffolds found within the DNP.Ananal- O O O O ysis of the molecular volume distribution of O NH O N the NPs incorporating atwo-, three- or four- N ring scaffold showed that it ranges from 100 to 500 ų with amaximum at 250 ų. This 3. Retain rings with size other than 4. Retain aromatic ring(s) while is found to be comparable to molecular vol- 3,5 or 6atoms with preference dissecting fully aromatic systems umes of the drugs in the World Drug Index b) NH a) HN and well in the range of the size of protein N NH N N O cavities determined by Klebe and cowork- S B N [28] A B N N ers. The scaffold tree shown in Fig. 2is N N O S A N NH exclusively built from scaffolds which are HN represented in NPs themselves. This leads NH to ‘holes’ in branches where the intermedi- ate scaffolds do not occur in NPs present in 5. Retain rings with higher number of 6. If the number of heteroatoms is equal, the DNP.These missing links, however, may heteroatoms the priority to retain is N > O > S . provide promising opportunities for syn- HN N thetic molecules with interesting biological H HN N H properties. S One should keep in mind that due to historical reasons, there is abias for plant S ingredients in the DNP although that might slowly change with newentries originating Fig. 3. Six of the thirteen rules guiding the parent-child assignment in the newly developed scaffold from other species. The scaffold hierarchy tree generating algorithm NATURAL PRODUCTS IN DRUG DISCOVERY359 CHIMIA 2007, 61,No. 6

Fig. 4. Scaffold tree based on the pyruvate kinase assay data- set from PubChem containing 602 active compounds and 50000 inactive molecules. On- ly scaffolds areshown that represent at least 0.02% of the molecules in the dataset and for which at least 5% of the molecules wereac- tive on pyruvate kinase. newrules themselves are based mostly on data mining, e.g. structure–activity rela- less aromatic rings. Although NPs in gen- medicinal and synthetic chemistry knowl- tionship (SAR) analyses. Schuffenhauer et eral contain more rings than drugs, most of edge and are more detailed than the rules al.also gave afirst example for the use of them are non-aromatic and part of single used in the initial classification by Wald- the scaffold tree to find the common core fused ring system. Since natural products mann et al. Theyinclude, among others, for SAR analysis as implemented for ex- consist of several subclasses with diverse rules which for example keep macrocycles ample in Pipeline Pilot. The use of chemical properties, their properties depend strongly intact, retain unusual structural motifs like structure, which is the universal language on the chosen data set. The databases are spiro- or bridged compounds or conserve of chemists, renders the results from the subject to change overtime and thus the aromaticity.Six of the 13 rules in the new analysis directly and intuitively accessible properties regarded as NP-likemay change rule set are shown in Fig. 3. to chemists. as well. Notably,manyseminatural com- This rule set wasfirst applied in the anal- pounds, mainly natural product derived ysis of the pyruvate kinase screen dataset[30] 3. Conclusions analogues, were found to be much closer to available in PubChem.[31] The resulting scaf- drugs than to pure natural products in many fold tree is shown in Fig. 4. Color intensity While manynatural products adhere of their properties. reflects the fraction of active molecules rep- to the Rule-of-Five,theynonetheless dif- Several approaches towards charting resented by the individual scaffold. fer from other compound sets, especially chemical space in general and natural prod- Scaffolds representing ahigh fraction of drugs, in manyother properties, e.g. hetero- uct space in particular have been presented. bioactive molecules can easily be identified atom composition, number of rings or ste- ChemGPS and the SCONP scaffold tree from Fig. 3. Interestingly,insome cases al- reogenic centers. In general, NPs are found differ substantially in their methodology so the next smaller scaffold contains asimi- to contain more oxygen and fewer nitrogen and results, butcan both be applied to ava- lar fraction of active molecules. This gives atoms than drugs butare more lipophilic. riety of problems and may often give com- aquick overviewoverthe performance of Theycontain on average more stereogenic plementary results. ChemGPS is favorable alibrary and allows rapid identification of centers and their degree of unsaturation is for the assessment and comparison of large starting points for closer examination and higher than in drugs and theyincorporate datasets according to their physico-chemi- NATURAL PRODUCTS IN DRUG DISCOVERY360 CHIMIA 2007, 61,No. 6 cal properties and is amenable to very large [3] I. Paterson ,A.Anderson Edward, Science [29] A. Schuffenhauer,P.Ertl, S. Roggo, S. numbers of compounds. Some chemical pa- 2005, 310,451. Wetzel, M. A. Koch, H. Waldmann, J. rametersdescribingthemoleculesofinterest [4] B. E. Evans, K. E. Rittle, M. G. Bock, Chem. Inf.Model. 2007, 47,47. can also be derived. In contrast, the scaffold R. M. DiPardo, R. M. Freidinger,W.L. [30] J. Inglese, D. S. Auld, A. Jadhav, R. L. tree may be of better use in library design. Whitter,G.F.Lundell, D. F. Veber,P.S. Johnson, A. Simeonov, A. Yasgar,W. SCONP delineated the natural product scaf- Anderson et al., J. Med. Chem. 1988, 31, Zheng, C. P. Austin, Proc. Natl. Acad. Sci. fold space of the most frequent scaffolds. 2235. USA 2006, 103,11473. It can easily be seen that carbocycles are [5] G. Mueller, Drug Discovery Today 2003, [31] Pyruvate kinase assay in PubChem, http:// more abundant than O-heterocycles and N- 8 ,681. pubchem.ncbi.nlm.nih.gov/assay/assay. heterocycles. Among the carbocycles also [6] C. Rosenbaum, P. Baumhof, R. Mazitschek, cgi?aid=361. more complexscaffolds consisting of three O. Muller,A.Giannis, H. Waldmann, An- [32] A. Noeren-Mueller,I.Reis-Correa, Jr., and four rings can be identified while in the gew. Chem., Int. Ed. 2004, 43,224. H. Prinz, C. Rosenbaum, K. Saxena, H. heterocyclic compounds these seem to be [7] C. Rosenbaum, S. Roehrs, O. Mueller,H. J. Schwalbe, D. Vestweber,G.Cagna, less frequent. Some insight can be gained Waldmann, J. Med. Chem. 2005, 48,1179. S. Schunk, O. Schwarz, H. Schiewe, H. about the frequencyofindividual scaffolds [8] H. vandeWaterbeemd, E. Gifford, Nat. Waldmann, Proc. Natl. Acad. Sci. USA in NP chemical space which may be inter- Rev. Drug Discovery 2003, 2 ,192. 2006, 103,10606. esting for library design. The structural ba- [9] C. A. Lipinski, F. Lombardo, B. W. Domi- [33] L. O. Haustedt, C. Mang, K. Siems, H. sis establishes an immediate link to chem- ny,P.J.Feeney, Adv.Drug Delivery Rev. Schiewe, Curr.Opin. Drug Discovery istry and makes the approach intuitively ac- 2001, 46,3. Dev. 2006, 9 ,445. cessible to chemists. The pyruvate kinase [10] T. Henkel, R. M. Brunne, H. Muller,F.Rei- tree example demonstrates the value of the chel, Angew.Chem., Int. Ed. 1999, 38,643. scaffold tree as adata mining tool linking [11] M.-L. Lee, G. Schneider, J. Comb.Chem. information to structures. The natural prod- 2001, 3 ,284. uct scaffold tree not only allows exploration [12] M. Feher,J.M.Schmidt, J. Chem. Inf. Comput. Sci. 2003, 43,218. of Nature’sdiversity butalso indicates op- [13] P. Ertl, A. Schuffenhauer,in‘Natural Pro- portunities for synthetic organic chemistry ducts as Drugs’, Eds. F. Petersen, R. Am- by identifying holes in the branches which stutz, Birkhaeuser Verlag, Basel, 2007. Nature has probably not filled. [14] K. Grabowski, G. Schneider, Curr.Chem. The question remains howtotransform Biol. 2007, 1 ,115. the information gleamed from the explora- [15] ‘CRC Dictionary of Natural Products’, tion of chemical space into synthetic chem- CRC Press, 78318 structural entries, June istry and, last butnot least, compounds. One 1996, http://www.crcpress.com/. possible approach to utilize the knowledge [16] ‘Bioactive Natural Product Database’, extracted from chemical space analysis has Szenzor Management Consulting Com- recently been presented by Waldmann and pany, Budapest, Hungary,Berdy,29432 co-workers.[32] Utilizing the SCONP ap- entries of natural products with described proach, the authors explored the biologi- biological activity,status: July 1996. cally relevant and prevalidated structural [17] ‘ACD: Available Chemicals Directory’, space populated by nature so far. Scaffolds Version 93.2, Molecular Design Ltd. In- identified from the SCONP scaffold tree formation Systems Inc., San Leandro, CA, shown in Fig. 2were used as templates for USA, 182822 entries. the design of NP-derivedcompound col- [18] L. A. Wessjohann, E. Ruijter,D.Garcia- lections. Four newclasses of phosphatase Rivera, W. Brandt, Mol. Diversity 2005, 9 , inhibitors could be discovered from these 171. compound collections by biochemical [19] ‘CRC Dictionary of Natural Products’, screening. Waldmann and co-workers cre- v15.1, CRC Press, 2006, http://www. ated the term ‘biology-inspired synthesis’ crcpress.com/. (BIOS) for this approach, i.e.synthetic ef- [20] Molinspiration Cheminformatics, http:// forts of molecules enriched in biochemical www.molinspiration.com. and biological activity by exploration of the [21] ‘WDI, Derwent World Drug Index’, http:// chemical space relevant to nature. www.derwent.com/products/lr/wdi/. Together with other tools available for [22] ‘MDL Drug Data Report’, http://www. the design of natural product based librar- prous.com/product/electron/mddr.html. ies[33] and modern synthetic organic chem- [23] T. I. Oprea, J. Gottfries, J. Comb.Chem. istry methods the exploration of larger parts 2001, 3 ,157. of Nature’schemical space seems possible [24] J. Larsson, J. Gottfries, L. Bohlin, A. at reasonable effort. Backlund, J. Nat. Prod. 2005, 68,985. [25] J. Larsson, J. Gottfries, S. Muresan, A. Acknowledgements Backlund, J. Nat. Prod., ACSASAP. SW is grateful for aNovartis graduate [26] M.A. Koch,A. Schuffenhauer,M.Scheck, scholarship. S. Wetzel, M. Casaulta, A. Odermatt, P. Received: April 26, 2007 Ertl, H. Waldmann, Proc. Natl. Acad. Sci. USA 2005, 102,17272. [1] D. J. Newman, G. M. Cragg, K. M. Sna- [27] G. W. Bemis, M. A. Murcko, J. Med. der, J. Nat. Prod. 2003, 66,1022. Chem. 1996, 39,2887. [2] J.-Y.Ortholand, A. Ganesan, Curr.Opin. [28] S. Schmitt, D. Kuhn, G. Klebe, J. Mol. Chem. Biol. 2004, 8 ,271. Biol. 2002, 323,387.