Collaborative Drug Discovery: A Platform For Transforming Neglected Disease R&D and Beyond

Sean Ekins

Collaborative Drug Discovery, Burlingame, CA. Collaborations in Chemistry, Jenkintown, PA. Department of , University of Medicine & Dentistry of New Jersey-Robert Wood Johnson Medical School, Piscataway, NJ. School of Pharmacy, Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD.

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate In the long history of human kind (and animal kind, too) those who have learned to collaborate and improvise most effectively have prevailed.

Charles Darwin

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate What does "Collaboration" mean to you?

Michael Pollastri • collaboration, to me, means that folks from disparate disciplines or skills work together towards the same end-goal. … A collaboration means free and open data sharing, transparent goals and intentions, and a relationship that allows open (frank) and constructive discussion.

Markus Sitzmann • The internet is the perfect place to share (certain) data and many of the new technologies and format available at the Web (REST, SOAP etc.) are perfect to use data collaboratively.

Jun Y. • .. some people would feel comfortable to share their ideas after some literature search or primary research. If so, is it a good practice for collaboration?

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Why collaboration is important

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate How do you store / share data

• Lab notebook - paper Old • Word/excel/ powerpoint other • Lab notebook – electronic

• Sharepoint etc • Web-based database

• Wiki / Open lab notebook Young

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Typical Lab: The Data Explosion Problem & Collaborations

DDT Feb 2009

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate www.collaborativedrug.com

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate CDD Background

• A private company, founded in 2004 (Spun out of Eli Lilly). • $1.89M B&MGF Project • >6 Years CDD SaaS in the Cloud • Thousands of sciensts engaged:

• SAB:

– Christopher Lipinski, PhD (ex )

– James McKerrow, MD PhD

– David Roos, PhD

– Adam Renslo, PhD

– Wes Van Voorhis, MD PhD

– Jim Wikel (ex Eli Lilly)

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate CDD Platform

• CDD Vault – Secure web-based place for private data – private by default • CDD Collaborate – Selectively share subsets of data • CDD Public –public data sets - Over 3 Million compounds, with molecular properties, similarity and substructure searching, data plotting etc • will host datasets from companies, foundations etc • vendor libraries (Asinex, TimTec, ChemBridge) • Unique to CDD – simultaneously query your private data, collaborators’ data, & public data, Easy GUI

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate CDD3.0: Single Click to Key Functionality

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Copyright © 2009 All Rights Reserved Collaborative Drug Discovery Collaborate Mode 1 with single login to multiple groups

Consolidated CDD DB

Collaborator 1 Collaborator 2 Collaborator 3 Collaborator 4 www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Collaborate Mode 2 – p2p Individual Labs securely sharing data subsets

CDD DB group CDD DB group CDD DB group CDD DB group 1 2 3 4

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate CDD TB Project

• Funded by Bill & Melinda Gates Foundation for 2 years Nov 2008- Oct 2010

• Provide CDD software to Pilot groups and train them to use to store data

• Provide custom cheminformatics support to pilot groups

• Accelerate the discovery of new therapies and Advance clinical candidates through pipeline

• Capture TB Literature and make accessible

• CDD TB is freely available to any groups doing TB research (minimal support from CDD)

• Integrate academic, non-profit, and corporate laboratories distributed across the globe www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Building a disease community for TB

• Tuberculosis Kills 1.6-1.7m/yr (~1 every 8 seconds) • 1/3rd of worlds population infected!!!!

• Multi drug resistance in 4.3% of cases • extensively drug resistant increasing incidence • No new drugs in over 40 yrs • Drug-drug interactions and Co-morbidity with HIV

• Collaboration between groups is rare • These groups may work on existing or new targets • Use of computational methods with TB is rare • Literature TB data is not well collated (SAR)

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate The Long Tail and CDD Collated for each user the number of data uploads–de-identified Rank-frequency plot of contributions to CDD. Solid line: power law with alpha = 2.2; dashed line, alpha =2.7. (20% contribute 80%)

suggests a power law with a considerable downward tail which is a signature of "saturation" of the audience, i.e. in a fixed universe of users, a majority of possible people are becoming active contributors. Robin Spencer in Ekins et al, Book © 2009 Collaborative Drug Discovery, Inc. chapter 2010 Archive, Mine, Collaborate Molecules with activity against

15 public datasets for TB

>300,000 cpds

Patents, Papers Annotated by CDD

Open to browse by anyone http:// www.collaborativedrug.com/ register © 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Searching for molecular mimics

• Biomolecules essential for survival and growth are essential molecules.

• in vivo mutagenesis of the M. tuberculosis genome and generated, archived and genotyped 5,126 mutants.

• Statistical analysis to determine putative essential genes.

• Essential molecules classified as the products of enzymes that are encoded by genes in this list

• Similarity searches with essential molecules in CDD using ChemAxon tools

• e.g. bleomycin has 57% molecular similarity to UDP-N-acetylmuramoyl-L-alanyl- D-glutamyl-meso-2,6-diaminopimelyl-D-alanyl-D-alanine, • MIC of 0.1 µg/ml. – • guinea pigs treated with bleomycin had smaller lesions, fewer acid-fast bacilli and a prolonged survival time.

Freundlich et al., Pub # 258 Tues Aug 24 – 5.30-7.30pm Hall C © 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate biomolecules essential for metabolism and survival of Mtb and their structural analogs

© 2009 Collaborative Drug Discovery, Inc. Freundlich et al., Pub # 258 Tues Aug 24 – 5.30-7.30pmArchive, Mine, Collaborate Hall C Simple descriptor analysis

Atom Dataset MWT logP HBD HBA RO 5 count PSA RBN

MLSMR

Active ≥ 90% inhibition at 10uM 357.10 3.58 1.16 4.89 0.20 42.99 83.46 4.85 (N = 4096) (84.70) (1.39) (0.93) (1.94) (0.48) (12.70) (34.31) (2.43)

Inactive < 90% inhibition at 10uM 350.15 2.82 1.14 4.86 0.09 43.38 85.06 4.91 (N = 216367) (77.98)** (1.44)** (0.88) (1.77) (0.31)** (10.73) (32.08)* (2.35)

TAACF- NIAID CB2

Active ≥ 90% inhibition at 10uM 349.58 4.04 0.98 4.18 0.19 41.88 70.28 4.76 (N =1702) (63.82) (1.02) (0.84) (1.66) (0.40) (9.44) (29.55) (1.99)

Inactive < 90% inhibition at 10uM 352.59 3.38 1.11 4.24 0.12 42.43 77.75 4.72 (N =100,931) (70.87) (1.36)** (0.82)** (1.58) (0.34)** (8.94)* (30.17)** (1.99) © 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Bayesian Classification Screen

Laplacian-corrected Bayesian classifier models were generated using FCFP-6 and simple descriptors. 2 models 220,000 and >2000 compounds active compounds with MIC < 5uM

Good

Bad

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins et al., Mol BioSyst, 6:Archive, 840-851, Mine, 2010Collaborate Bayesian Classification Dose response

Good

Bad

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins et al., Mol BioSyst, 6:Archive, 840-851, Mine, 2010Collaborate Bayesian Classification

Leave out 50% x 100

Dateset Internal (number of External ROC molecules) ROC Score Score Concordance Specificity Sensitivity MLSMR All single point screen (N = 220463) 0.86 ± 0 0.86 ± 0 78.56 ± 1.86 78.59 ± 1.94 77.13 ± 2.26 MLSMR dose response set (N = 2273) 0.73 ± 0.01 0.75 ± 0.01 66.85 ± 4.06 67.21 ± 7.05 65.47 ± 7.96

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins et al., Mol BioSyst, 6: 840-851,Archive, Mine, 2010 Collaborate Filtering a further 100K compound library

Number of Random hit rate single point screening dose response compounds (%) (200k) Bayesian Bayesian screened model (%) model (%)

0 0 0 0

100 1.66 (0.10) 23 (1.35) 24 (1.41)

200 3.32 (0.19) 48 (2.82) 42 (2.47)

300 4.98 (0.29) 64 (3.76) 54 (3.17)

400 6.63 (0.39) 77 (4.52) 58 (3.41)

500 8.29 (0.49) 92 (5.41) 70 (4.11)

600 9.95 (0.58) 107 (6.29) 82 (4.82)

>10 fold Enrichment with TB Bayesian model

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins et al., Mol BioSyst,Archive, Mine, In CollaboratePress GSK data– hits

© 2009 Collaborative Drug Discovery, Inc. Gamo et al., Nature, 2010, 465, 305-310Archive, Mine, Collaborate Press

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Register for GlaxoSmithKline on CDD Public

• What are the challenges and opportunies for the field? • How are your priories changing? • What are the emerging bolenecks? • What if? – Any technology, any collaborators, if you had a magic wand…

http://www.collaborativedrug.com/register

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate GSK vs St Jude vs Novartis antimalarial datasets.

Dataset MW logP HBD HBA Lipinski PSA (Å2) RBN rule of 5 alerts GSK data (N = 13,471) 478.2 ± 114.3 4.5 ± 1.6 1.8 ± 1.0 5.6 ± 2.0 0.8 ± 0.8 76.8 ± 30.0 7.2 ± 3.4

St Jude (N = 1524) 385.3 ± 71.2 3.8 ± 1.6 1.1 ± 0.8 4.9 ± 1.8 0.2 ± 0.4 72.2 ±29.3 5.2 ±2.3

Novartis (N = 5695) 398.2 ± 105.3 3.7 ± 2.0 1.2 ± 1.1 4.7 ± 2.1 0.4 ± 0.7 74.7 ± 37.9 5.6 ± 3.0

Johns Hopkins All FDA 349.1 ± 355.8 1.2 ± 3.4 2.4 ± 4.6 5.1 ± 5.5 0.3 ± 0.8 96.0 ±139.8 5.4 ± 9.6 drugs (N = 2615)

Johns Hopkins Subset > 458.0 ± 298.6 2.2 ± 2.7 2.1 ± 3.4 5.4 ± 4.7 0.6 ± 0.9 90.6 ± 104.4 7.1 ± 7.7 50% malaria inhibition at 96h (N = 165)

Antimalarial drugs (N = 341.6 ± 67.0 3.8 ± 1.6 1.8 ± 1.0 5.3 ± 1.5 0.2 ± 0.6 53.4 ± 21.2 5.8 ± 3.0 14) screening hits in total are not ‘lead-like’ (MW < 350, LogP< 3) closest to ‘natural product lead-like’. Although GSK suggests that the compounds are “drug-like” the evidence for this is weak

Ekins and Williams Drug Disc Today In Press © 2009 Collaborative Drug Discovery, Inc. Ekins and Williams submittedArchive, (2010)Mine, Collaborate a Mtb Compound libraries and filter failures

Filtering using SMARTs filters to remove thiol reactives, false positives etc at University of New Mexico (http://pasilla.health.unm.edu/tomcat/biocomp/smartsfilter)

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins et al., Mol BioSystemsArchive, Mine, CollaborateIn press Antimalarial Compound libraries and filter failures %Failure

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Ekins and Williams.,Archive, submitted Mine, Collaborate (2010) b The future: Alerts in CDD

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Summary

Active compounds vs Mtb and P. Falciparum have higher mean molecular weights and logP values

A high proportion of compounds that fail the Abbott filters for reactivity when compared to drugs and antimalarials

Understanding the chemical properties and characteristics of compounds = better compounds for lead optimization.

St Jude and Novartis datasets should be screened vs Mtb as their property space is close to TB actives

GSK compounds may not be an ideal starting point for lead optimization for malaria

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Target-based screening Follow up virtual filtering/screening

TB screening molecule data base Hit to lead Docking/Virtual -efficacy vs. target, Pharmacophore, screening, QSAR, ADME filters -whole-cell, Systems biology and/or structure Pathways analysis based methods -infected organism Data bases

For chemical probe selection find new compounds that inhibit a target using tightly integrated computational methods then optimize and feedback data to data bases and pathways

When target identified could pursue target based screening workflow Phenotypic screening

TB screening molecule data Hit to lead base Pharmacophore, Docking model QSAR, ADME filters, -efficacy vs. target, target fishing -whole-cell, Commercial/ combinatorial/ -infected organism corporate library HTS following Follow up virtual screening reactivity rules and property filtering Use phenotypic data with integrated computational methods to suggest © 2009 Collaborative Drug Discovery, Inc. potential target/s and optimize ADME properties in parallel, thenArchive, verify inMine, vitro Collaborate Ekins et al, submitted 2010 c The future: crowdsourced drug discovery

• Growth in open and public resources (software + data + models) • Precompetitive data and tools – e.g. Pistoia Alliance, Innovative medicines initiative • Academic - Industry Collaborations, NM4TB, etc • Open innovation, need to capture non-published data • Competitive collaboration • Risks: -quality of data curation, new visualization and modeling tools - gap between growth in databases and new tools Compound accessibility / availability will be bottleneck – testing in silico hypotheses - Engage pharmas – (Gupta et al., Pub # 405, Wed 7-9pm Ballroom)

Williams et al Drug Discovery World, Winter 2009 Ekins S. and Williams AJ, Pharm Res, 27: 393-395 (2010) Ekins S and Williams AJ, Lab On A Chip, 10: 13-22, 2010.

© 2009 CollaborativeBingham Drug A and Discovery, Ekins Inc. S, Drug Disc Today, 14, 1079-1081, 2009. Archive, Mine, Collaborate Possessed by a single person, [the process] would remain stationary for a long time, and perhaps would die away; but being made public, it will thrive and improve through the efforts of all.

Joseph Louis Gay-Lussac 1839

© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate Acknowledgments

Antony Williams (RSC) Joel Freundlich, (Texas A&M) Gyanu Lamichhane, Bill Bishai (Johns Hopkins) Jeremy Yang (UNM) Nicko Goncharoff (SureChem) Chris Lipinski Takushi Kaneko (TB Alliance) Bob Reynolds (SRI) Carolyn Talcott and Malabika Sarker (SRI International) Robin Spencer (http://scaledinnovation.com/) GSK Accelrys

Bill and Melinda Gates Foundation

[email protected] ; © 2009 Collaborative Drug Discovery, [email protected] Archive, Mine, Collaborate PAPERS

Rishi R. Gupta, Gifford, EM, Liston T, Waller CL, Hohman M, Bunin BA and Ekins S, Using open source computational tools for predicting human metabolic stability and additional ADME/Tox properties, Drug Metab Dispos, In Press 2010.

Ekins S and Williams EJ, When Pharmaceutical Companies Publish Large Datasets: An Abundance of riches or fool’s gold, Drug Disc Today, In Press 2010.

Ekins S, Gupta R, Gifford E, Bunin BA, Waller CL, Chemical Space: missing pieces in cheminformatics, Pharm Res, In Press 2010.

Ekins S. and Williams AJ, Reaching out to collaborators: crowdsourcing for pharmaceutical research, Pharm Res, 27: 393-395, 2010.

Ekins S and Williams AJ, Precompetitive Preclinical ADME/Tox Data: Set It Free on The Web to Facilitate Computational Model Building to Assist Drug Development. Lab On A Chip, 10: 13-22, 2010.

Ekins S, Bradford J, Dole K, Spektor A, Gregory K, Blondeau D, Hohman M and Bunin BA, A Collaborative Database and Computational Models for Tuberculosis Drug Discovery, Mol BioSyst, 6: 840-851, 2010.

Williams AJ, Tkachenko V, Lipinski C, Tropsha A and Ekins S, Free online resources enabling crowdsourced drug discovery, Drug Discovery World, Winter 2009.

Louise-May S, Bunin B and Ekins S, Towards integrated web-based tools in drug discovery, Touch Briefings - Drug Discovery, 6: 17-21, 2009.

Hohman M, Gregory K, Chibale K, Smith PJ, Ekins S and Bunin B, Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery, Drug Disc Today, 14: 261-270, © 20092009. Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate CDD is Secure & Simple

• Web based database (log in securely into your account from any computer using any common browser – Firefox, IE, Safari) • Hosted on remote server (lower cost) dual-Xeon, 4GB RAM server with a RAID-5 SCSI hard drive array with one online spare • Highly secure, all traffic encrypted, server in a secure professionally hosted environment • Automatically backed up nightly • MySQL database • Uses JChemBase software with Rails via a Ruby-Java bridge, (structure searching and inserting/ modifying structures) • Marvin applet for structure editing • Export all data to Excel with SMILES, SDF, SAR, & png images

www.collaborativedrug.com© 2009 Collaborative Drug Discovery, Inc. Archive, Mine, Collaborate