<<

Medicinal cheMistry CHIMIA 2017, 71, No. 10 667

doi:10.2533/chimia.2017.667 Chimia 71 (2017) 667–677 © Swiss Chemical Society The Screening Compound Collection: A Key Asset for Discovery

Christoph Boss*, Julien Hazemann, Thierry Kimmerlin, Modest von Korff, Urs Lüthi, Oliver Peter, Thomas Sander, and Romain Siegrist

Abstract: In this case study on an essential instrument of modern , we summarize our successful efforts in the last four years toward enhancing the Actelion screening compound collection. A key organizational step was the establishment of the Compound Library Committee (CLC) in September 2013. This cross-functional team consisting of computational scientists, medicinal chemists and a biologist was endowed with a significant annual budget for regular new compound purchases. Based on an initial library analysis performed in 2013, the CLC developed a New Library Strategy. The established continuous library turn-over mode, and the screening library size of 300’000 compounds were maintained, while the structural library quality was increased. This was achieved by shifting the selection criteria from ‘druglike’ to ‘leadlike’ structures, enriching for non-flat structures, aiming for compound novelty, and increasing the ratio of higher cost ‘Premium Compounds’. Novel chemical space was gained by adding natural compounds, macrocycles, designed and focused libraries to the collec- tion, and through mutual exchanges of proprietary compounds with agrochemical companies. A comparative analysis in 2016 provided evidence for the positive impact of these measures. Screening the improved library has provided several highly promising hits, including a macrocyclic compound, that are currently followed up in different Hit-to-Lead and Lead Optimization programs. It is important to state that the goal of the CLC was not to achieve higher HTS hit rates, but to increase the chances of identified hits to serve as the basis of successful early drug discovery programs. The experience gathered so far legitimates the New Library Strategy. Keywords: Compound Library Committee (CLC) · Screening compound collection · Screening compound se- lection process · Virtual TNT-library

1. Introduction 1.1 Structural Quality quality hits, to start a lead optimization First, the structural quality of the program. They will, and should, disregard Today, pharmaceutical companies rely screening compounds needs to be care- problematic chemical starting points ex- heavily on High-Throughput Screening fully assessed. Since the seminal paper hibiting undesired functionalities or prop- (HTS) campaigns to identify novel com- of Lipinski,[1] the importance to control erties. pounds (so-called hits) to initiate their me- the physicochemical properties of me- The three-dimensional (3D) aspect dicinal chemistry programs. Two key ele- dicinal chemistry as well as screening of a compound heavily impacts its physi- ments determine the success of an HTS: compounds is broadly understood and ac- cochemical properties, pharmaceutical A biologically significant and technically cepted. During the (H2L) opti- effects, and potential side-effects. As robust biological assay, and the structural mization process, lipophilicity, molecular shown by Lovering and colleagues, spatial and physical quality of the screening com- weight, and complexity of molecules tend complexity (as measured by the number pound collection. In this article, we focus to increase.[2] To compensate for this, the of Fsp3 carbons) as well as the presence on the structural aspects of the screening rules to select druglike compounds for the of chiral centers, correlate with success compound collection, and the parameters screening library, were changed to favor as compounds transition from discovery we considered important for the design and more leadlike compounds.[3] In addition, through clinical testing to .[6] assembly of our collection. rather than applying hard cut-offs for pa- rameters like MW, cLogP or PSA, mod- 1.2 Structural Diversity ern approaches employ scoring functions Second, the structural diversity of the or multi-parametric optimization (MPO) collection is essential. The development of procedures. An interesting example of a in the 1990s per- scoring function was developed by Pfizer mitted the rapid synthesis of large numbers in their CNS MPO workflow.[4] of structurally very similar compounds. As A good library needs to be continuous- a consequence, all major pharmaceutical ly maintained. Problematic compounds, companies built very large screening li- such as frequent hitters or reactive chemi- braries with relatively low structural com- cals, have to be identified and should ei- plexity. This enabled ultra-high-through- ther be physically eliminated, or flagged. put screens (uHTS) covering millions of To exclude such structures from the begin- compounds using relatively simple (pref- ning, alerts such as the REOS and PAINS erably homogeneous and miniaturized) *Correspondence: Dr. C. Boss lists of undesired structural elements have biological activity assays. However, this Drug Discovery & been developed, which are now broadly incurred experimental constraints and high Idorsia Pharmaceuticals Ltd [5] Hegenheimermattweg 91, CH-4123 Allschwil applied. Medicinal chemists depend on financial costs. Therefore, even large phar- E-mail: [email protected] high quality lead structures, based on high maceutical companies now tend to screen 668 CHIMIA 2017, 71, No. 10

smaller library subsets initially, and then screening compound collection to the new- the CLC consists of a medicinal chemist, iteratively expand the screen based on the est scientific findings in hit and lead dis- two computational chemists, one represen- initial results. For smaller companies such covery, and propose new concepts toward tative from HTS biology and a hit-to-lead as Actelion, smaller screening compound managing and maintaining screening com- chemist. The Chemistry Group Leader re- collections assembled from high quality pound collections. sponsible for the budget allocation attends compounds covering a large area of the the committee meetings as a permanent chemical space were always the realistic guest. The CLC tightly collaborates with option. 2. The Actelion Screening compound management in order to imple- A generalized screening compound Compound Collection: Strategy ment technology changes and the signifi- collection has to deliver hits on differ- and Tactics cant compound logistics efforts triggered ent targets, such as GPCRs, ion channels, by the CLC’s decisions. The major task of kinases, proteases; it may serve different The Actelion screening compound col- the CLC was and still is to develop and im- therapeutic areas with specific require- lection comprises a dynamic library of plement a screening compound collection ments, such as blood-brain barrier pen- approximately 300’000 compounds with strategy, ensuring that the screening com- etration or bacterial accumulation; and it an average lifetime of five years per ‘cata- pound collection is fully in line with the should be compatible with different assay logue compound’, and ten years per ‘pre- strategic and tactical needs of Actelion’s technologies, such as enzymatic or cell mium compound’ (definitions see below). early drug discovery projects. The first based assays, high content screenings, or Every year, the oldest 60’000 compounds measure that the CLC took was to correct phenotypic assays. Therefore, it is tactical- are removed, and the lost chemical space the shortcomings identified in the analysis ly useful to divide the compound collection is analyzed. The result of this analysis is, of the Actelion screening compound col- into specific subsets. For example, tool together with other factors, the basis for lection. compounds annotated with known bio- the selection of 60’000 new compounds Within a year, the CLC formulated logical activities help to classify other hits to bring the collection to 300’000 com- ideas to be implemented into the new according to their mode of action, or allow pounds again. Hence, 120’000 compounds library strategy to enhance the probability to scrutinize biological pathways.A library are physically moved every year, 60’000 in to discover hits with the potential of high- subset spanning the chemical space of the and 60’000 out (Fig. 1). quality leads. To gain acceptance for these library with approximately 10% of the The collection includes selected inter- changes among medicinal chemists, the compounds enables small proof-of-con- nal project compounds as well as external commitment to full transparency was key. cept screens. A set of small molecules en- compounds. The compounds are divided All CLC proceedings are made available ables alternative hit discovery approaches; into two distinct categories. The first cat- to all Actelion drug discovery scientists. such as fragment-based screens. In sum- egory is called ‘premium compound’ and The CLC also presents its vision, plans, mary, the screening compound collection has a lifetime of ten years, encompassing activities and efforts twice a year to all in- needs to cover a large and diverse chemi- our own project compounds, and rare, nov- terested stakeholders within the research cal space with a rather limited number of el, expensive or custom-made commercial departments. This openness not only led to compounds. compounds. The second category is called a broad understanding and acceptance of ‘catalogue compound’ and has a lifetime the CLC, but also to an increased interest 1.3 Novelty of five years, encompassing the inexpen- in HTS campaigns and their results. Third, the novelty of the compounds is sive or easily commercially available com- The main elements of the new library important to commercial drug discovery, as pounds. strategy were (i) to keep the annually up- it is the basis for a competitive advantage In 2013, the Actelion screening com- dated compound turn-over (rolling mode) at a later stage, and to generate intellec- pound collection of 300’000 compounds of the Actelion screening compound col- tual property. The novelty assessment can was evaluated internally as well as by an lection; (ii) to explore new, not yet or only be easily performed using cheminformat- external company. The following conver- marginally covered chemical space such as ics tools. Several sources can be explored gent conclusions were drawn from these macrocycles, natural-product derived com- to identify inventive structures. Natural two independent analyses: pounds, or focused- and designed libraries; products or -derived com- · The library was structurally highly (iii) to implement compound exchange pounds can offer valuable and original diverse. programs with agrochemical companies starting points. Compound exchanges · The physicochemical property dis- and, potentially, with other pharmaceutical with other pharmaceutical or agrochemi- tribution was well balanced, but companies; (iv) to move from a druglike cal companies is an efficient way to ac- rather in the classical druglike range cess additional diversity and uncovered (Lipinski’s rules of 5) than the desired chemical space. The screening compound leadlike range. collections of pharmaceutical companies · The proportion of flat versus non- are often dissimilar, as demonstrated, for flat compounds was high (see example, by Bayer and AstraZeneca.[7] ‘Flatland’[6]). This is why Sanofi and AstraZeneca re- · The proportion of commercial com- cently realized an exchange of 200’000 pounds versus proprietary compounds compounds.[8] Also, contract research or- was very high. ganizations (CROs) offer screening com- pounds, designed or focused libraries, or As a reaction to this analysis, in scaffolds suitable for extensive decoration. early 2013 it was decided to create the Although a large number of compounds is ‘Compound Library Committee’ (CLC), commercially available, their originality, a group of scientists covering the diverse price, or exclusivity varies greatly. disciplines involved in early drug discov- In the following, we detail our strategy ery, to be responsible for all strategic and and the tactical measures implemented to tactical aspects related to the Actelion Fig. 1. The ‘Rolling Mode’ of the Actelion adapt Actelion’s corporate library and the screening compound collection. Today, Screening Compound Collection. Medicinal cheMistry CHIMIA 2017, 71, No. 10 669

to a leadlike structural bias, and from flat mode’. The means by which this is done 1. MultiParameter Optimization (MPO) to non-flat structures; (v) to apply multi- are explained in the following sections. It scores calculation parameter optimization (MPO) scores for is important to mention that only screening 2. Substructure filtering and removal of compound selection; (vi) to enhance the plates are discarded. Compounds which are unwanted chemical functionalities REOS and PAINS filters for compound still available as powders and/or individual 3. Library comparison and gap filling de-selection; (vii) to assess new technolo- solutions are kept in the stores. Hence the 4. Clustering and dissimilarity selection gies, for example DNA-Encoded Library total number available from Compound (DEL) screening, or increase fragment- Management at Actelion, including du- 2.1.1 Step 1 based lead discovery; (viii) to strengthen plicates and salt forms, is 850’000 com- The very first step of the selection pro- the in-house resources dedicated to ex- pounds representing 600’000 unique struc- cess, performed with Knime, is the calcu- perimental hit follow-up of both structure- tures. Although approximately 300’000 lation of leadlike MPO scores on a pool of based and -based virtual screenings; compounds are not part of the screening candidate compounds. Our leadlike MPO (ix) to invest a significant part of the bud- set at a given time, they can still be ac- scores calculations are adapted from a get assigned to the screening collection to cessed within days for targeted screening similar, published approach of CNS MPO acquire exclusive ‘premium compounds’. and rapid hit expansion. In contrast, hit ex- scores calculations.[4b] As depicted in Fig. At a practical level, it was also decided pansion through external providers is often 3, a set of eight physico-chemical proper- to keep the library formatted on 384 well a time-consuming process. ties (MW, clogP, HBA, HBD, Molecular plates for screening. To maximize the ben- Flexibility, aromatic ring count, Fsp3 and efit from the more expensive ‘premium 2.1 Compound Selection and Library stereo center count) were used to build our compounds’, it was decided to double Assembly own leadlike and non-flat compound MPO their storage lifespan to ten years, as com- The compound selection and library scoring algorithm. The following consid- pared to the five years for ‘catalogue com- design is performed with the Knime erations underpin the choice of these pa- pounds’. At the same time, the amount of Analytics Platform and additional inter- rameters. substance ordered was increased, and the nally developed software. This process As previously mentioned, the size of concentration of the compound stock so- was adapted and improved based on the compounds tends to increase during opti- lutions was raised from 4 mM to 10 mM outcome of the analysis of the Actelion mization cycles, hence compounds with a in 100% DMSO. This allows to satisfy screening compound collection in early MW < 400 were considered as preferred the increasing demand of material for hit 2013 as described above. starting points. follow-up investigations, and to increase The optimized Actelion screening Highly lipophilic compounds have an the screening concentration if needed. compound selection workflow consists of increased risk to interact promiscuously Previous experience had shown that insuf- four main steps: with proteins, leading to target selectivity ficient solubility at 10 mM is no longer a problem with compounds of high struc- tural quality. Fig. 2. Quality Control As a practical consequence, larger au- LC-MS AnalysisofScreening Library 2015 (QC) results of the tomated compound stores were acquired to (43'404MSfrom~300'000compounds) Actelion screening compound collec- accommodate the single tube compound 100% aliquots. 88% tion. According to their integrity and The strategic measures mentioned 80% above align the size of the Actelion purity compounds were flagged as good screening compound collection (300’000 60% compounds) with the available plate stor- (>85% pure), medium (50–85% pure) or bad age capacity, compound management pro- 40% (<50% pure, or wrong cesses, and to the 384-well format used compound). in the High-Throughput Screening (HTS) 20% 7% workflow. An analysis performed by oth- 3% 2% ers suggests that a library of this size can 0% representatively cover the leadlike space.[9] good medium badunknown To efficiently execute the compound selection, library management, and com- pound logistics processes, proprietary soft- ware has been developed.[10] The continuous library turn-over is la- borious and costly. However, in our experi- ence, this approach holds important advan- tages. The library always reflects the new- est developments in medicinal chemistry. Limiting the compound age increases the likelihood that a compound can still be re- ordered, and compounds remain in a good physical condition. LC-MS analysis of the latest screening library revealed that 88% of the compounds are of good quality at the beginning of their useful life (Fig. 2). Fig. 3. Each plot represents one of the eight physicochemical property desirability functions used One of the CLC’s most important re- to generate our leadlike MPO. Multiparameter optimization methods are commonly used to as- sponsibilities is the annual replacement of sess and balance the effects of several variables, weighted based on their importance to the compound sets according to the ‘rolling overall objective. 670 CHIMIA 2017, 71, No. 10 Medicinal cheMistry

issues and off-target . They also of- ten display suboptimal metabolic stability. On the other hand, highly hydrophilic mol- Ͳǡ ܠ൑ࢇ. 1 ecules may be less membrane permeable. šǦƒ , ࢇ൑ܠ൑࢈. Therefore, the optimal clogP range was set „Ǧƒ between –2 and 2. ”ƒ’‡œ‘‹†ሺšǢ ƒǡ „ǡ ǡ †ሻ = ͳǡ ࢈൑ܠ൑ࢉ. †Ǧš The number of aromatic rings in a , ࢉ൑ܠ൑ࢊ. molecule has been shown to influence the †Ǧ chance of success and Ͳǡ ࢊ൑ܠ. should be maintained below four.[11] 0 a b c d x To enable the ‘Escape from Flatland’[6], three-dimensional complexity (conferred Fig. 4. A trapezoidal membership function (MF) is specified by four inflexion points. through Fsp3 carbons) and the presence of chiral centers at scaffold level are taken into account for the scoring. filter compounds. From a technical point of 2.1.4 Step 4 Since it is connected to membrane view, each candidate compound is parsed The last step of the selection workflow permeability,[1] the number of Hydrogen against 670 SMARTS strings before being is the clustering and dissimilarity selection Bond Acceptors (HBA) and the number of allowed to enter the next selection round, using an internally developed tool based Hydrogen Bond Donors (HBD) were also if compliant with all rules. on OptiSim technology (see section 4). considered for the MPO score calculation. This allows the selection of a diverse set Finally, Molecular Flexibility (detailed 2.1.3 Step 3 of candidate compounds which efficiently in section 4) was found to be a predictor The third step involves an internally de- cover the chemical space of interest. Based for good oral . The optimal veloped tool called VSCmd which allows on SkeletonSpheres descriptors, all com- range was set between 0.20 and 0.55.[12] to perform a library comparison between pounds with a dissimilarity coefficient The use of MPO scoring for the selection the pool of candidate compounds and the below 0.2 are discarded. This approach of compounds allows for greater flexibility Actelion screening compound collection. allows the selection of a diverse set of in beyond the use of single pa- The process begins by virtually discard- compounds eligible to enter the Actelion rameters or hard cutoffs. Ultimately, MPO ing the 60’000 oldest compounds from screening compound collection. scoring should allow to identify compounds the screening compound collection as a with higher probability of success. consequence of the ‘rolling mode’. Then, 2.2 Screening Compound Collection: A trapezoidal membership function is SkeletonSpheres (see section 4) descriptors Tactical Measures to Optimize the used for the calculation of each individual are calculated for both sets of compounds, Content desirability score of each physico-chemi- the pool of candidate compounds and the As mentioned, our original analysis cal property as defined in Fig. 4. remaining Actelion screening compound revealed that the screening compound col- Then an overall desirability score D is collection. The molecular descriptors are lection initially contained too many flat calculated as defined below, where S is the used to compute similarity coefficients. molecules, and they were more drug- than p individual score per property, and W is the Each candidate compound is compared leadlike. Increasing the number of chiral p weight associated with each property: against each Actelion screening com- centers, of sp3-carbons, and of saturated pound, in order to find the nearest neigh- carbo- and heterocycles has been proposed bor. Then, to ensure broad chemical space to obtain more globular molecules, thereby D = ∑ S W / ∑ W p p p coverage, all candidate compounds having enhancing the chance to discover com- a similarity coefficient above 0.8 are dis- pounds for clinical success.[14] In a screen- The highest possible D score is 1. carded. Remaining candidate compounds ing compound collection such as ours, Compounds displaying an overall score D are now all dissimilar to the compounds of containing only 300’000 compounds, the below 0.6 are discarded at this stage of the the Actelion screening compound collec- pharmaceutically relevant chemical space selection process. tion and the gap lost by removing 60’000 can only be covered by optimal use of mo- compounds is covered again with new, lecular diversity descriptors and by avoid- 2.1.2 Step 2 similar or identical, compounds. ing redundancy by all means. The second step consists of eliminating unwanted chemical features, for example reactive functionalities which often result in promiscuity of compounds toward many SMAS Coen le targets as well as anti-targets, and therefore enhance the risk of unwanted or toxic ef- [CX3H1](=O)[#6]aldehydes Max: 0 fects. This step is performed based on the FCS(=O)(=O)[F,Cl,Br,O] triflatesMax:0 substructure filtering functionality within [SX3](=O) sulfoxides Max: 1 the Knime workflow. Originally, PAINS[5e] [OH]cccc[OH] para phenols Max: 1 and REOS[13] SMARTS strings were used. With time, our collection of SMARTS C(=O)[O-,OH] carboxylic acids Max: 1 was continuously enriched through the c12ccccc1cccc2Naphthalenes Max: 1 input and feedback of medicinal chemists. N-[OH]Hydroxylamine Max: 1 Currently, our list of SMARTS strings con- tains 670 entries. Each SMARTS string de- [#6][#6](=O)-&!@O-&!@[#6] esters Max: 2 fines how often a given substructure is al- c[OH]aromatic hydroxylsMax:2 lowed to appear in a given compound. Fig. cBr aromatic bromide Max: 2 5 represents a snapshot of some SMARTS strings with their associated rule used to Fig. 5. Examples of SMARTS strings. Medicinal cheMistry CHIMIA 2017, 71, No. 10 671

Therefore, we enriched the Actelion tabase, in order to assess complementarity, belong to the exclusive part of the com- screening compound collection with com- originality and novelty. To ensure an opti- pound collection. At Actelion, a compound pounds from several diverse sources de- mal diversity, we only kept scaffolds with exchange is initiated by the CLC and ap- scribed in the following paragraphs. at least two exit vectors. The selected scaf- proved by the Drug Discovery Chemistry folds were then virtually decorated. The re- management. To ensure a successful out- 2.2.1 Scaffold Selection and Non-ex- sulting virtual compounds were filtered as come, a project manager is assigned. A clusive Screening Compound Design previously described, and a final diversity solid and clear contract is negotiated with In order to enlarge our collection of selection was performed. To maximize the the partner. Once the contract is signed novel, innovative and non-flat compounds, diversity, we limited the number of deriva- and the lists of eligible compounds are we requested two CROs to propose novel tives per scaffold to 200. The rules applied exchanged, our computational chemistry scaffolds. The following requirements were the same as for the non-exclusive team reviews, evaluates and selects the were stated: The central core should not ap- compounds, with a weak and a strong exit compounds of interest. pear in public databases, and it must have vector subsequently enumerated to obtain The most labor intensive step is the two exit vectors, a weak one and a strong maximal diversity. manual weighing and transfer of a defined one. The weak exit vector was used to pre- In a further effort toward obtaining ex- amount of each compound requested by pare five sub-scaffolds from each initial clusive compounds, 15 natural-product like, the exchange partner, from an Actelion vial scaffold. The strong exit vector served to novel, sp3-enriched scaffolds were sourced to a destination vial provided by the part- achieve structural diversity. Thus, for each and a CRO was approached for the produc- ner; and receiving the compounds from the sub-scaffold, 40 final compounds were tion of the final exclusive compounds by partner, dissolving them followed by LC- synthesized, resulting in 200 compounds applying maximum building block diver- MS quality control, and integrating them around one initial scaffold. To maximize sity; this time on both exit vectors. into the screening compound collection. diversity, 200 different building blocks Shipping and handling needs to be ac- were selected for the final decoration of the 2.2.3 Semi-exclusive Screening cording to the partners’ varying require- strong exit vector. This clearly challenged Compounds ments, and compliant with national and the synthetic capacity of the CROs. Each We directed our efforts toward target- international safety and legal regulations. CRO worked on ten scaffolds, resulting in focused compound libraries. These are col- Finally, once the partner discovers hits a total of 2’000 compounds per CRO. lections of compounds which are designed among the provided compounds, resupply Although large numbers of innovative to interact with an individual protein target requests need to be fulfilled according to compounds were obtained in this way, the or, frequently, target class (such as kinases, the terms agreed. CROs did not succeed to prepare all select- voltage-gated ion channels, serine- or cys- In addition to the logistical efforts that ed central cores. Indeed, several scaffolds teine proteases, or GPCRs). The design of were initially underestimated, some addi- selected based on ‘paper chemistry’ turned such libraries generally utilizes structural tional lessons were learned in the course of out to be synthetically not accessible in a information about the target or family of several compound exchange projects: flask. Also, to keep the costs of this ap- interest. In the absence of such structural · As the agrochemical compounds were proach reasonable, the compounds had to information, a chemogenomic model that selected from the partners’ historical be obtained on a non-exclusive basis. incorporates sequence and mutagenesis collections, about 33% of them did In a second approach, off-the-shelf de- data to predict the properties of the bind- not pass Actelion’s stringent quality signed libraries were acquired from sev- ing site is often used. Another approach requirements (see Fig. 6). eral suppliers on a non-exclusive basis. uses the information about known ligands · The exchange program required clear This approach was considered attractive of the target, to generate focused libraries and transparent internal communica- since such libraries often represent the left- through scaffold hopping.[16] tion to counter medicinal chemists’ overs of non-exclusive designed libraries We acquired ion channel and Protein– worries about “giving away their com- ordered by potential competitors. As for Protein Interaction (PPI) focused library pounds” to another company. exclusive and non-exclusive library com- compounds on a semi-exclusive basis · To facilitate the hit validation of ex- pounds, the off-the-shelf designed library (compounds sold to a limited number of change compounds, it is essential to compounds were selected for their novel competitors). consider compound resupply, synthe- and sp3 rich scaffolds. Additionally, it was sis protocol supply, and hit expansion possible to reach a higher chemical diver- 2.2.4 Compound Exchanges to in general, when setting up the terms sity with these compounds, as this cherry Acquire Screening Compounds of the initial exchange contract. picking approach allowed to select only a Agrochemical compounds are poten- · As compounds obtained from, or few analogs around one central core. tially interesting as starting points for phar- provided to, a partner through an ex- maceutical drug discovery, as they are dis- change program are now subject to 2.2.2 Scaffold Selection and Exclusive tinct from medicinal chemistry compounds contractual obligations, it is important Screening Compound Design yet, like medicinal chemistry compounds, to electronically keep track of those We decided to also design and syn- they are designed to be active on a biologi- (compound flagging). thesize proprietary and exclusive librar- cal target.[17] A compound exchange can be Despite these challenges, the several ies. Therefore, suitable scaffolds had to an efficient way to get access to premium compound exchanges organized proved be identified, either in our in-house or in compounds, but it depends on an active very successful to Actelion, as they lead to external collections. Again, the selection commitment from various corporate func- hits with chemical structures usually not started with the identification of scaffolds tions. In our case, this included the com- observed in the pharmaceutical drug dis- exhibiting a high number of sp3-carbons pound library committee (CLC), chemistry covery chemical space. and a low molecular weight. For example, management, the legal department, com- spirocyclic scaffolds represent valuable putational chemistry, compound manage- 2.2.5 Macrocycles as Screening starting points.[15] ment, and logistics functions. Compounds The scaffold candidates were then com- To be eligible for exchange with a Macrocycles belong to a unique chemi- pared to our internal screening compound partner, proprietary compounds need to be cal class that helps to fill the gap between collection as well as to the eMolecules da- available in sufficient quantities and must conventional small molecules and large 672 CHIMIA 2017, 71, No. 10 Medicinal cheMistry

Commercial catalogue compounds Premiumcompounds Compounds from exchange programs 100% 100% 100% 83.5% 83.6% 80% 80% 80%

60% 60% 60% 55.9%

40% 40% 40% 23.6% 20% 20% 20% 9.7% 7.8% 10.4% 10.2% 3.4% 3.5% 3.3% 5.2% 0% 0% 0% good medium bad unknown good medium bad unknown good medium bad unknown

Fig. 6. Comparative QC data showing the lower quality of compounds coming from historical collections in compound exchange programs. biomolecules.[18] They represent spatially Such compounds can be synthesized at af- other agencies. The compounds in this col- pre-organized ring structures with special fordable prices and delivered within two to lection were selected to represent a broad conformational features and behavior, with three weeks. More than 2’000 compounds chemical and pharmacological diversity. molecular weights ranging from 500 to were selected from the CVL based on However, highly peculiar structural classes 2000 Daltons. Macrocycles might offer a scaffold novelty and diversity, and 2’000 were excluded (i.e. contrast agents for di- different tactic to tackling previously non- were actually synthesized for the Actelion agnostic use). druggable targets, and represent a different screening compound collection. By their nature, a useful bioavailabil- intellectual property space. The tremen- ity of these compounds is given, and the dous progress in macrocyclization method- 2.2.8UniversityScreeningCompounds safety profile in humans is known. The ologies (i.e. click-chemistry, ring-closing To further explore the chemical space annotations indicate the main biological metathesis, and palladium-catalyzed chem- and in order to get access to innovative targets and mechanisms of action. Drug istry) in the last years, and the increasing compounds, we tried to acquire com- repositioning or repurposing can be an im- number of commercially offered macro- pounds from academic laboratories. Direct portant part of a drug discovery program cycles render them valid starting points for contacts with individual academic investi- and has led to several blockbuster drugs chemistry programs. Our screening com- gators turned out to be time consuming and (i.e. Viagra and Rogaine).[21] Phenotypic pound collection did not cover any macro- unproductive. Therefore, an aggregator screens, new biomarkers and non-invasive cyclic chemical space in 2012. Therefore, compiling available university compounds imaging techniques have created new op- the CLC decided to expand the collection was again used for the selection and ac- portunities for pursuing novel indications in this direction. Over the past two years, quisition of about 300 compounds from for approved compounds. commercially available macrocycles were university labs. Due to the limited number, Discovering one or several known selected and acquired based on scaffold and it is so far difficult to evaluate the success drugs being active in a phenotypic assay decoration diversity. Importantly, as most of such compounds in HTS campaigns. can help in assay validation, target identifi- macrocycles do not fulfill the RO5, other We also observed that the physical quality cation, or identification of (novel) mecha- selection criteria need to be developed for of these compounds was comparably low. nisms of action. Of course known drug this type of compounds.[19] This approach is presently not pursued any compounds can be immediately used in further. animal models, possibly accelerating drug 2.2.6 Niche Supplier Screening discovery programs. Compounds 2.2.9 Natural Product and Derivative The ‘Bioactive Compounds Library’ Niche supplier compounds are com- Compounds is a unique collection of 2’100 biologi- pounds offered by small compound sup- Natural products and their derivatives cally active chemical compounds for high pliers. These compounds are less vis- exhibit different physicochemical proper- throughput screening (HTS) and high ible and less easily accessible than com- ties compared to standard synthetic com- content screening (HCS). The molecular pounds available from large suppliers. pounds, yet they account for 39% of drugs of these compounds Nevertheless, such compounds remain approved by the authorities. Natural prod- is known, and often also pharmacokinetic interesting as they are likely to be differ- ucts can offer very valuable starting points and safety data from preclinical research ent from compounds found in large col- to explore new chemical space. Moreover, and clinical trials. The library includes in- lections. In order to increase the scaffold by chemical modification, novel com- hibitors, APIs, natural products, and che- diversity of the Actelion screening com- pounds with different biological activity motherapeutic agents. The compounds are pound collection, a database collected by a from the starting natural product can be dis- structurally diverse and cell permeable. commercial aggregator was used to select covered.[20] For these reasons, Actelion has The collection is associated with a rich and acquire more than 1’000 niche sup- acquired 6’456 diverse compounds selected documentation including IC data on the 50 plier compounds. by cherry picking from two providers’ cata- primary target. The library in combination logs, to prepare its unique Natural Product with the approved drugs collection is used 2.2.7 Commercial Virtual Library (CVL) and Derivative Compounds Library. as a tool to validate new drug discovery as- The CVL is a virtual collection of about says and characterize orphan receptors. 7 million novel lead-like compounds con- 2.2.10 Approved Drugs Collection and structed from more than 3’500 innovative Bioactive Compounds Library 2.3 Comparative Analysis of cores (bridged ring-systems, spirocycles, The Approved Drugs Collection as- Libraries L12 and L15 to Show the medium-sized or annelated ring systems, sembles 1’586 active phar- Impact of the CLC Guided Actions and the like) and a huge collection of sub- maceutical ingredients (APIs) that have Based on the outcome of the analysis stituents, proposed by one of our suppliers. been approved by the FDA, EMA and of the Actelion screening compound col- Medicinal cheMistry CHIMIA 2017, 71, No. 10 673

lection described above, the actions pre- Fig. 7. Molecular viously summarized were implemented weight distribution of over the last three years to optimize the L12 (blue) compared screening compound selection process. In to L15 (green). order to visualize the effect of our efforts, we compare the analysis of the part of the screening compound collection purchased in 2012 (L12), with the part of the screen- ing compound collection acquired in 2015 (L15). L12 consists of 58’335 compounds and L15 consists of 65’580 compounds. The following parameters were compared: · Molecular Weight distribution · Fraction of sp3 carbons distribution (Fsp3) · clogP distribution · Premium versus Catalogue compound count Fig. 8. Fraction sp3 of · Ring system count L12 (blue) compared · Self-Nearest Neighbor Analysis to L15 (green). Fig. 7 displays the Molecular Weight (MW) distribution of L12 and L15. A shift toward lower MWs is obvious. This is a consequence of the CLC strategy to increase the fraction of smaller, leadlike compounds in the screening compound collection. Leadlike hits help to decrease the high rate of compound attrition in drug development.[22] The tendency of screening hits to gain in size during lead optimiza- tion – jokingly referred to as ‘molecular obesity’ – consists a development risk that may contribute substantially to the limited productivity of drug discovery programs.[23] Fig. 8 displays the distribution of the fraction of sp3 (Fsp3) carbons per mole- Fig. 9. Calculated cule. This parameter shifts up from L12 to logP distribution of L15, reflecting the aim to increase the frac- L12 (blue) compared tion of non-flat compounds in the screen- to L15 (green). ing collection. More complex molecules, as measured by carbon saturation, have the capacity to access greater chemical space. This increases the potential to identify compounds that better match the surface of target proteins. Fig. 9 displays the clogP value of a com- pound, i.e. the logarithm of its calculated between n-octanol and water, a well-established measure of the compound’s hydrophilicity/lipophilic- ity. Overall, compounds in L15 are slightly more hydrophilic than those in L12. Fig. 10 displays the proportion of pre- mium compounds versus catalogue com- pounds for L12 and L15. As intended, the Fig. 12 shows the Self-Nearest Tanimoto coefficients of all library com- fraction of premium compounds has in- Neighbor Analysis (SNNA) histograms of pounds are depicted as histograms. The creased over the last three years. L12 and L15. In a SNNA, each individual trend from L12 to L15 toward lower Fig. 11 displays the distribution of library compound is compared to all other Tanimoto coefficient values reflects an in- (plain) ‘ring-type’ system counts for each compounds from the same library in order creased diversity of L15. library. It can be seen that the chemical to identify its nearest neighbor. This ap- Based on this comparative analy- space in L15 is significantly wider as com- proach was performed using Schrödinger sis, we conclude that the section of the pared to that in L12. L15 contains 4’812 Canvas and MolPrint2D fingerprints and screening compound collection acquired unique ring systems distributed over its allows the evaluation of the self-diversity in 2015 is leadlike, highly diverse, and 65’580 compounds, whereas L12 contains of a library. A common measure to as- enhanced with natural product-based and only 1’540 unique ring systems distributed sess similarity/diversity is the Tanimoto- novel sp3-enriched scaffolds. It contains over its 58’335 member compounds. coefficient.[24] The Nearest Neighbor medicinal chemistry project compounds, 674 CHIMIA 2017, 71, No. 10 Medicinal cheMistry

cally available with a high probability and index)). Fig. 13 shows that despite using at a reasonable effort, we developed the proprietary cores, the TNT library chemi- concept of the TNT-library. cal space is highly complementary to the The TNT (Tractable aNd Tangible) li- Actelion screening compound collection brary is a virtual library enumerated from chemical space. in-house available building blocks and proprietary cores. The proprietary cores used to generate the compounds ensure 4. The Mathematics Behind the immediate access to in-stock material to Scenes quickly generate a large number of syn- Fig. 10. Premium compounds vs catalogue thetically accessible virtual compounds 4.1 The Algorithms beyond compounds of L12 (bottom) compared to L15 with a favorable IP situation. The building Compound Selection (top). blocks, the cores and the final compounds Four in-house implemented algorithms are carefully selected in order to bring on were used for most of the compound selec- leadlike and chemically meaningful mat- tion process. ter. The synthetic accessibility is ensured A chemical descriptor encodes mol- by restricting the synthetic steps to twelve ecules as vectors of equal length which are well established and robust chemical re- well suited for fast comparisons by a com- actions routinely used in high throughput puter. Also, structural clustering based on medicinal chemistry (i.e. amide couplings, such descriptors generally appears plau- reductive aminations, Suzuki cross-cou- sible to a medicinal chemist. pling reactions). A virtual screening algorithm was de- The system has the flexibility to gener- veloped to perform the similarity calcula- ate several tens of millions of compounds tions.This algorithm has to be very efficient but of which only approximately 5 million to allow billions of descriptor comparisons are virtually generated. This approach can at a useful speed. Furthermore, a sampling be used as an idea generator for library de- algorithm is needed to draw representative Fig. 11. Unique ring system count of L12 (blue) sign to enrich the screening collection as subsamples from a set of molecules. compared to L15 (green). well as to identify novel hits on a specific To visualize the selected molecules target. As the final compounds can be ob- and their physico-chemical properties the macrocyclic compounds, natural products tained in one to three steps from in-stock in-house developed and open-source tool and natural product-derived compounds, material, they can be synthesized by our DataWarrior was used. A multi parameter agrochemical compounds, designed non- chemists within two weeks. optimization function (MPO) was imple- exclusive, semi-exclusive and exclusive The TNT-library is increasingly used at mented to calculate a single score from compounds, target focused screening com- Actelion for 2D and 3D virtual screenings. multiple physico-chemical properties. All pounds and university group compounds. It is also used to rapidly expand hit struc- implementations were done in Java to be Our efforts had the intended effects. We tures identified in HTS campaigns. platform independent and run on differ- continue to optimize the Actelion screen- A 2D molecular fingerprint by a ent operating systems including Windows, ing compound collection by following the Nearest Neighbor Approach (NNA) was MacOS, and Linux. same strategy and consider to implement chosen to compare the physical and the further measures. virtual collections and measure their 4.2 The SkeletonSpheres Descriptor One of these is the TNT-Library de- complementarity.[7] All the compounds This descriptor was developed by scribed in section 3, which we have just of the query collection (the TNT library) Actelion. It is a vector of integers which started to implement. We are now validat- are compared to all the compounds of the represents the occurrence of different ing it in several early projects. reference collection (the Actelion screen- substructures in a molecule. Five circular ing compound collection). For each query layers with increasing bond distance are compound, the nearest neighbor in the ref- located for each atom in the molecule. 3. The TNT-Library: A Virtual Library erence collection is reported (i.e. the struc- Hydrogen atoms are not considered. This of Choice for Virtual Screening ture with the highest Tanimoto similarity results in five fragments starting with the Campaigns Fig. 12. Self Nearest To complement lead discovery through Neighbor Analysis physical HTS, we also support drug dis- (SNNA) histograms covery projects with virtual screening of L12 (blue) and L15 campaigns. For virtual approaches, the (green). same screening compound selection crite- ria apply. While it may be feasible to virtually screen any imaginable structure, in the end the hits still need to be physically obtained to confirm the predicted activity in vitro. Since not all virtual structures (and in fact, not even all published catalog structures) can be synthesized in reality, this is often the limiting step of a virtual campaign. In order to gain access to virtual librar- ies the members of which will be physi- Medicinal cheMistry CHIMIA 2017, 71, No. 10 675

above 0 is ≤b, depending on the number one of the redundant bonds is considered. of unique id-codes and hash collisions. As For instance, in chains of conjugated triple similarity value for the comparison of two bonds the following applies: If at least one descriptor vectors, their overlapping frac- terminal sp2/sp3 atom has no external non- tion o is used. This calculates for two vec- hydrogen neighbor, then no single bond is tors v and v in: considered rotatable. Otherwise that ter- 1 2 minal single bond connecting the smaller substituent is considered the only rotatable bond of the linear atom strand. Then the local environment of any rotatable bond is characterized by its first and second shell of neighbor atoms plus various atom prop- erties like ring membership, aromaticity, Fig. 13. Displays the nearest neighbor similar- The similarity values of the Flexophore and stereo configuration. A canonical rep- ity distribution of the TNT collection taken as and of the SkeletonSpheres descriptor resentation of the characterizing fragment query collection. It is remarkable that both col- were mapped to a comparable value by a is created to serve as a bond type specific lections are fully complementary as both librar- scaling function. This was done to ease the key into a bond torsion statistics table. This ies are entirely dissimilar to each other. use of the descriptors for the scientists in bond angle statistics table was compiled drug discovery. A similarity of 0.85 for a earlier by processing all purely organic, pair of SkeletonSpheres descriptors means high-resolution X-ray structures of the naked central atom, adding one layer at that 85% of the chemical structures are Cambridge Structural Database (CSD) this a time. Every fragment is encoded as a overlapping. way: Any rotatable bond was characterized canonical string (id-code), similar to the as described above and its torsion angle generation of canonical SMILES.[25] The 4.3 Molecular Flexibility Calculation added to a torsion histogram associated to canonical id-code includes the stereo- The conformational flexibility of a this particular rotatable bond type.All bond chemistry of the encoded fragment, which molecule is an important parameter influ- type specific histograms were smoothened is a feature missing in other molecular encing induced fit and the entropy of the to reduce artefacts. Then all distribution descriptors. The string is then assigned to ligand to protein binding. A frequently peak maxima and peak widths at a height one of 1024 fields n in a vector. Therefore, used surrogate for molecular flexibility is of 50% were determined. Since the CSD is the hash value of the id-code is calculated the number of rotatable bonds. However, not an open database and its license pro- and the corresponding value in the vector this simple approach neglects the fact that hibits publishing derived statistics data, our is increased by one. The Hashlittle algo- the amount of conformational change de- public source code contains statistics in- rithm from Jenkins[26] is used as a binning pends on whether a rotating bond is located formation from the Crystallography Open function which takes a text string as in- in the periphery or center of a molecule. Database[28] (COD) instead. In order to cal- culate the bond rotatability r the respective put and returns an integer value between Furthermore, bond rotatability is not a b 0 (inclusive) and 1024 (exclusive). In binary property. While some rotatable bond key is used to get the associated tor- preliminary experiments this hash func- bonds mainly populate one rotation state, sion angle histogram. In rare cases without tion showed a good uniform distribution others are rather flexible with low rotation sufficient bond precedents in CSD or COD, of the generated hash values. To consider energy barriers and multiple energetically a simple torsion histogram is predicted. An the molecular scaffold without the influ- equivalent minima. We have developed a algorithm evaluates the histogram and as- ence of the hetero atoms, the whole cal- computable molecular flexibility measure signs a value close to 1.0 if the histogram culation is repeated while replacing the that takes these effects into account. The contains three equally distributed wide hetero atoms with carbon. The resulting algorithm is built into the DataWarrior ap- peaks of similar heights, while histograms hash values are used to increment the cor- plication[10] and, hence, its source code is with one narrow single peak are considered responding fields in the vector. By adding freely available as part of the DataWarrior rather inflexible with a value close to 0.0 (r =0 for bonds that are considered non- this skeleton information to the descriptor source code. The molecular flexibility is b vector the similarity calculation between represented by a value ranging from 0.0 to rotatable when applying above conditions). Then a weighting factor w is assigned to two descriptor vectors becomes a bit in- 1.0. Individual bond rotatability values are b sensitive to the exact position of the het- assessed from torsion statistics of similar every rotatable and non-rotatable bond as follows: For ring bonds w =0.33, since ring ero atoms in two molecules. This directs bonds derived from the CSD/COD data- b the similarity value toward the percep- bases (torsion maxima, frequencies, and bonds cannot be changed without typically tion of similarity by medicinal chemists. 50% intervals). The contribution weight of affecting two other ring bonds. For other bonds w =sqrt(2*ssAC/mAC) with ssAC For them the exact position of a hetero any individual bond rotatability value to an b atom is not as discriminating as it would overall molecular flexibility is then deter- being the number of non-hydrogen atoms be for the spheres descriptor without the mined from the topological bond location, on the smaller side of the bond and mAC skeleton coding part. The additional con- i.e. central bonds are weighted higher than being the number of non-hydrogen atoms sideration of the scaffold information and those in the periphery. In more detail, the in the molecule. From the bond rotatabili- ties r and bond weights w a raw molecule the use of a histogram instead of a binary following steps are done: First, all relevant b b flexibility f is calculated as: vector distinguishes the SkeletonSpheres rotatable bonds are determined. These are raw descriptor from circular fingerprints.[27] non-aromatic single bonds, which are not The similarity calculation between two in a ring with less than six members, where SkeletonSpheres vectors is straightfor- both atoms are sp2 or sp3 hybridized and  ∙ ward: A calculation of e=5 spheres is carry at least one more non-hydrogen 𝑓𝑓  done for all atoms g in the molecule serv- neighbor, and where a torsion change   ing once as a center atom, which results modifies the relative location of at least in b=e g 2 number of id-codes. The num- one non-hydrogen atom. If there are multi- with n = number of all hydrogen bonds in ber of fields in the descriptor with a value ple rotationally redundant bonds, then only the molecule. 676 CHIMIA 2017, 71, No. 10 Medicinal cheMistry

For typical molecules the raw flex- clustering or multidimensional scaling can 2015 (L15), was used to judge the success ibility values tend to be well below 0.5, be used for sampling. However, an impor- of the new library strategy. because often many individual bond rotat- tant restriction for our choice of the sam- We indeed found several selective, cel- abilities are 0.0 or small values. To com- pling algorithm, was a time complexity lular active hits with potencies in the low pensate for this effect and to use the de- less than quadratic. We decided to imple- nM range in different screening projects. sired range from 0.0 to 1.0 more evenly, ment the OptiSim algorithm, which was In one case, a hit series with twelve de- we apply a scaling function to calculate the developed by R. Clark.[29] This sampling rivatives from a designed library, in which final molecular flexibility f as: algorithm has a moderate time complex- the most potent hit displayed a 16nM activ- m ity and works with any similarity metric. ity in two different, orthogonal assays of an The OptiSim algorithm needs two param- oncology target, was identified. Today, this   eters which are fairly robust. Additionally, scaffold is explored and optimized through 𝑓𝑓 − −𝑓𝑓  with c=0.7   it was possible to parallelize the OptiSim a parallel chemistry approach within the partially. The parallelization enabled the Hit-to-Lead (H2L) team. sampling of large descriptor sets with up In another cellular screening project, 4.4 The Virtual Screening Algorithm to 100 k SkeletonSpheres descriptors. the most potent and selective hit was ob- Virtual screening with SkeletonSpheres tained from the macrocycle collection. descriptors means the comparison of inte- 4.6 Compound Visualization with Remarkably, this macrocycle showed po- ger vectors. Integer calculations are fast DataWarrior tent activity in a 3D cell culture model con- on modern CPUs. However, a comparison At the end of the compound selection structed from different primary cell types. of a supplier library with one million mol- process with virtual screening and sam- Even though Actelion, to that date, had ecules and an in-house library with 400 k pling it is desirable to efficiently visualize only limited experience with the synthesis molecules needs 400 billion vector similar- the selected compounds, to obtain a quick of macrocycles, the H2L team immediate- ity calculations. A single SkeletonSpheres overview of the success of the process. ly succeeded to synthesize a considerable descriptor is defined by 1024 integer num- The in-house developed DataWarrior is a number of analogs, resulting in more po- bers. This increases the number of neces- visualization tool for all different types of tent and efficacious derivatives. sary integer operations to more than 400 data and has been described recently.[10] After several screening campaigns with trillion. Every integer operation uses a min Initially, the DataWarrior was developed L15, it can be stated that the different sub- and a max function as it was given in the to visualize data related to chemical struc- libraries of L15 have different success rates equation above. Consequently, a number tures. Because of its condensed storage of that are target-dependent. As Actelion’s of 800 trillion comparisons, smaller and molecular information, more than one mil- therapeutic targets belong to very differ- bigger are needed together with 800 tril- lion compound structures can be visualized ent protein classes, it is therefore essential lion summations. Summing up, a number on a desktop computer. Physico-chemical to include chemical space as diverse as of 1.6 quadrillion integer operations which properties of the molecules can be calcu- possible in our library collection. For the are needed to compare the two compound lated and their distribution visualized. If the screening campaigns performed to date, libraries. Fortunately, the vector similar- distribution has the desired form the com- the highest hit rates are observed among ity calculations are independent from each pounds are ready for ordering. DataWarrior the in-house medicinal chemistry mol- other. So they can be easily parallelized. is a free and open-source tool and can be ecules, as well as the agrochemical com- Our virtual screening algorithm detects the obtained at openmolecules.org. pounds (validated hit rates around 0.15%); number of processor cores on the computer followed by validated hit rates around and creates as many similarity calculation 4.7 Knime Analytics Platform 0.07% for focused and designed libraries, threads as processor cores are available. The Knime Analytics Platform is an macrocycles and natural-product derived The descriptors for the two libraries L and 1 integral part of our compound selection compounds (Fig. 14). The lowest hit rate L are read bulk-wise from the hard-drive 2 and library design processes. It is a free, is observed among the commercial catalog to prevent blockage of a large part of the user friendly graphical workbench for data compounds and the Commercial Virtual RAM. The SkeletonSpheres descriptors analytics including data management, data Library (CVL) (around 0.03%). from the bulks L and L are compared 1,1 2,1 transformation, investigation, visualiza- and only results exceeding a pre-defined tion and reporting. KNIME consists of a threshold are written to the output. A serv- series of pieces of program codes called 6. Conclusions er with four CPU sockets was used for the nodes that can be connected in such a way following performance test. Every socket that the input of one node is the output of The size of theActelion screening com- was equipped with an Intel Xeon proces- the previous one. Each node has a dialog in pound collection with its 300’000 com- sor with ten cores. A Xeon processor core which the user can configure the operation pounds is well aligned with a mid-sized is capable of hyper-threading, which re- of the node. Mainly, generic Knime nodes, company’s compound management capa- sults in a total number of 80 cores. The internally developed Actelion nodes as bilities. It has previously been shown that processor clock cycles were specified with well as RDKit nodes (http://www.rdkit. the commercially available leadlike space 2.4 GHz. After 16.5 h the virtual screen- org/) contribute to our Knime workflows. can be covered with a library of this size.[9] ing succeeded for the 400 billion vector The strategy of the Actelion Screening comparisons. This equals 6.7 million com- Compound Collection foresees an annual pound similarity calculations per second. 5. The Reality Check: Biological compound turn-over, i.e. every year the Interrogation of the New Library oldest 20% are removed from the collec- 4.5 Sampling with the OptiSim Strategy tion and replaced with new compounds. Algorithm The advantages of this rolling mode, as Subset selection is daily business in Today, the Actelion screening com- compared to a cumulative or static collec- compound acquisition. Therefore, a re- pound collection provides a library of tion, are a low ratio of chemically decom- liable and sufficiently fast algorithm is chemically diverse small molecules with posed or precipitated compounds, a higher needed to perform this task. Almost any leadlike properties. Its latest addition of availability for re-ordering, and a screening algorithm that is used for classification, approx. 66’000 compounds, acquired in collection that reflects new developments Medicinal cheMistry CHIMIA 2017, 71, No. 10 677

Fig. 14. Validated hit [1] C. A. Lipinski, F. Lombardo, B. W. Dominy, P. J. validated hitrate [%] rates for the different Feeney, Adv. Drug. Deliv. Rev. 2001, 46, 3. 0.20% sub-libraries of the [2] M. M. Hann, Med. Chem. Commun. 2011, 2, 349. 2015 annual library [3] M. M. Hann, T. I. Oprea, Curr. Opin. Chem. 0.15% (L15) on a panel of Biol. 2004, 8, 255. 0.15% 0.14% different therapeutic [4] a) T. T. Wager, R. Y. Chandrasekaran, X. Hou, targets. M. D. Troutman, P. R. Verhoest, A. Villalobos, Y. Will, ACS Chem. Neurosci. 2010, 1, 420; b) T. 0.10% T. Wager, X. Hou, P. R. Verhoest, A. Villalobos, 0.08% 0.08% 0.07% ACS Chem. Neurosci. 2010, 1, 435. 0.06% [5] a) J. Baell, M. A. Walters, Nature 2014, 513, 0.05% 481; b) J. B. Baell, ACS Med. Chem. Lett. 2015, 0.03% 6, 229; c) C. Aldrich, C. Bertozzi, G. I. Georg, 0.01% L. Kiessling, C. Lindsley, D. Liotta, K. M. Merz, 0.00% Jr., A. Schepartz, S. Wang, ACS Chem. Biol. 2017, 12, 575; d) S. Jasial, Y. Hu, J. Bajorath, J. Med. Chem. 2017, 60, 3879; e) J. B. Baell, G. A. Holloway, J. Med. Chem. 2010, 53, 2719. [6] F. Lovering, J. Bikker, C. Humblet, J. Med. Chem. 2009, 52, 6752. [7] T. Kogej, N. Blomberg, P. J. Greasley, S. Mundt, M. J. Vainio, J. Schamberger, G. Schmidt, J. Huser, Drug Discov. Today 2013, 18, 1014. [8] R. Mullin, Chem. Eng. News 2015, 93, 4. and trends in chemistry. Re-screening the Regular LCMS measurements provide [9] J. B. Baell, J. Chem. Inf. Model 2013, 53, 39. newly added sets can regularly provide evidence for the high physical integrity of [10] T. Sander, J. Freyss, M. von Korff, C. Rufener, J. novel chemical starting points for follow- our screening compound collection with Chem. Inf. Model 2015, 55, 460. up projects within ongoing drug develop- >80%ofourcompoundsshowing>85%pu- [11] T. J. Ritchie, S. J. Macdonald, Drug Discov. Today 2009, 14, 1011. ment programs. rity (see Fig. 2). The screenings performed [12] D. F. Veber, S. R. Johnson, H. Y. Cheng, B. A library analysis in 2012 has shown so far provided several highly promising R. Smith, K. W. Ward, K. D. Kopple, J. Med. that the screening compound collection is, hits that are currently followed-up in dif- Chem. 2002, 45, 2615. though chemically diverse, more druglike ferent Hit-to-Lead and Lead Optimization [13] a) W. P. Walters, M. T. Stahl, M. A. Murcko, than leadlike, contained a high proportion programs. It is important to state that the Drug Discov. Today 2002, 7, 903; b) W. P. Walters, M. Namchuck, Nat. Rev. Drug Discov. of flat compounds and was characterized goal of the CLC was not to achieve higher 2003, 2, 259. by a high percentage of commercial versus HTS hit rates, but to increase the chances [14] F. Lovering, Med. Chem. Commun. 2013, 4, proprietary compounds. As a consequence, of identified hits to serve as the basis of 515. the compound library committee (CLC) successful early drug discovery programs. [15] Y. Zheng, C. M. Tice, S. B. Singh, Bioorg. Med. Chem. Lett. 2014, 24, 3673. was created, composed of 2 computational We conclude that the screening results [16] a) Y. C. Martin, J. L. Kofron, L. M. Traphagen, scientists, 1 medicinal chemist, 1 Hit-to- obtained so far vindicate the current strat- J. Med. Chem. 2002, 45, 4350; b) C. J. Harris, Lead chemist and 1 HTS biologist. This egy of the annual compound turn-over ap- R. D. Hill, D. W. Sheppard, M. J. Slater, P. F. W. committee formulated the New Library proach, the exploration of novel chemical Sotouten, Comb. Chem. High Through. Screen. Strategy with the major aim to enhance space, and the change of properties from 2011, 14, 521. [17] J. Delaney, E. Clarke, D. Hughes, M. Rice, Drug the structural library quality. From 2013 druglike to leadlike with a higher percent- Discov. Today 2006, 11, 839. to 2016 the CLC implemented novel com- age of proprietary compounds. [18] L. You, R. An, K. Liang, B. Cui, X. Wang, Curr. putational tools and methods to rate the Pharm. Des. 2016, 22, 4086. value of a compound (MPO score, REOS Acknowledgements [19] a) E. A. Villar, D. Beglov, S. Chennamadhavuni, and PAINS filters). It opened novel sourc- The authors thank Thomas Weller and J. A. Porco, Jr., D. Kozakov, S. Vajda, A. Markus Riederer for their trust and support of Whitty, Nat. Chem. Biol. 2014, 10, 723; b) F. es for compound acquisition to enter new Giordanetto, J. Kihlberg, J. Med. Chem. 2014, chemical space. In particular, exchanges of the Compound Library Committee. We also 57, 278; c) B. C. Doak, B. Over, F. Giordanetto, proprietary compounds with agrochemical thank the Compound Management team with J. Kihlberg, Chem. Biol. 2014, 21, 1115. Joao Silva, Sébastien Guigue, Dora Vieira, companies were organized. [20] a) D. J. Newman, G. M. Cragg, J. Nat. Prod. Danielle Steiner, Evangelin Baskar and Cyril 2012, 75, 311; b) B. L. DeCorte, J. Med. Chem. The commitment of the CLC to be Bruyère for turning our vision into reality, the 2016, 59, 9295. transparent in its processes, decisions and HTS team with Geoffroy Bourquin, Alexandre [21] J. Aube, ACS Med. Chem. Lett. 2012, 3, 442. actions resulted in broad acceptance among Peter, Serge Brand, Raphael Lieberherr, [22] M. J. Waring, J. Arrowsmith, A. R. Leach, the stakeholders in the Drug Discovery Laksmei Siek and Solange Meyer for screen- P. D. Leeson, S. Mandrell, R. M. Owen, G. ing the newly acquired compound sets, the Pairaudeau, W. D. Pennie, S. D. Pickett, J. Chemistry and Biology departments. Wang, O. Wallace, A. Weir, Nat. Rev. Drug An analysis of the compounds added LC-MS team with Francois Le Goff and Marco Discov. 2015, 14, 475. in 2015 demonstrated that they are more Caldarone for all mass spectrometry measure- [23] M. M. Hann, G. M. Keseru, Nat. Rev. Drug leadlike than druglike. The L15 set cov- ments, and the Lead Discovery team with Discov. 2012, 11, 355. ers previously unexplored chemical space Hamed Aissaoui, Sylvia Richard, Shuguang [24] D. Bajusz, A. Racz, K. Heberger, J. Cheminform. Yuan and Geoffroy Bourquin for developing 2015, 7, 20. through macrocycles, natural-product screening hits into leads. And we thank Joel [25] a) Y. C. Martin, E. B. Danaher, C. S. May, D. derived compounds, agrochemical com- Freyss and Tobias Fink for developing software Weininger, J. Comput. Aided Mol. Des. 1988, 2, pounds, designed and focused libraries tools and Naomi Tidten for stimulating discus- 15; b) J. H. Van Drie, D. Weininger,Y. C. Martin, which are now included as premium com- sions. J. Comput. Aided Mol. Des. 1989, 3, 225. pounds in the screening compound collec- [26] http://burtleburtle.net/bob/c/lookup3.c. Received: May 31, 2017 [27] N. Wale, I. A. Watson, G. Karypis, J. Chem. Inf. tion. The increased number of Fsp3 cen- Model 2008, 48, 730. ters in compound structures proves that we [28] S. Grazulis, D. Chateigner, R. T. Downs, A. F. “escaped flatland”. Importantly, the L15 Yokochi, M. Quiros, L. Lutterotti, E. Manakova, set of compounds contains a much higher J. Butkus, P. Moeck, A. Le Bail, J. Appl. Crystallogr. 2009, 42, 726. percentage of proprietary compounds than [29] R. D. Clark, J. Chem. Inf. Comput. Sci. 1997, 37, the older L12. 1181.