<<

Free online resources:Layout 1 14/1/10 19:53 Page 33

Cheminformatics

Free online resources enabling crowd-sourced drug discovery

The availability of freely accessible online resources to enable and support drug discovery has blossomed in recent years. The PubChem platform is now accompanied by a myriad of other online databases including ChEBI, DrugBank, the Human Metabolome Database and ChemSpider. The access to the array of software tools and diverse data in public domain provides capabilities previously only available within the confines of organisations (eg, big Pharma) that could afford significant investments in . This paper provides an overview of the internet resources available to drug discovery scientists and discusses the advantages of such accessibility but also the potential risks that reside within the data. It also examines what the present resources continue to lack and sets a vision for future approaches to providing internet-based resources for drug discovery.

he past five years have seen a mini revolu- cific focus based on the domain expertise of the By Dr Antony J. tion in the availability of resources to sup- hosting organisation; examples include databases Williams, Tport drug discovery and, in particular, of curated literature data, chemical vendor cata- Valery Tkachenko, databases searchable by molecular structure logues, patents, analytical data, biological data, Dr Chris Lipinski, (Figure 1). information on the internet etc. There are too many to include in this single Professor Alexander continues to become more widely accessible and at article so only a small number will be discussed. Tropsha and an increasing rate. There are many freely available For example, the authors recommend a recent arti- Dr Sean Ekins chemical compound databases on the web1,2. cle that assesses the expanding public and com- These databases generally contain the chemical mercial databases containing bioactive com- identifiers in the form of chemical names (system- pounds3 and conclude that the commercial efforts atic and trade) and registry numbers. Since the files are ahead of the public ones. in the databases are assembled in a heterogeneous The availability of molecule databases such as manner, using variations in deposition processes PubChem (http://pubchem.ncbi.nlm.nih.gov/) has and procedures to handle chemical structures, the dramatically changed the landscape of publicly resulting data are plagued with inconsistencies and available cheminformatics resources, yet quality issues. There are many databases available PubChem covers only a fraction of the chemical from which the drug discovery community can universe, mostly of interest to chemical genomics derive value. These databases generally have a spe- and pharmaceutical . PubChem was

Drug Discovery World Winter 2009/10 33 Free online resources:Layout 1 14/1/10 19:53 Page 34

Cheminformatics

Figure 1 A graphical interpretation of the history of 60 chem/bioinformatics software, model and database development, and increasing 50 drug development costs versus registered compounds in the CAS Registry and the 40 ChemSpider database

30

20 Registered compounds (millions) Registered

10

0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year

References launched by the NIH in 2004 to support the ‘New The Chemical Entities of Biological Interest, or 1 Williams, AJ (2008). A Pathways to Discovery’ component of the ChEBI database (http://www.ebi.ac.uk/chebi/) is a perspective of publicly Roadmap for Medical Research4. PubChem highly curated database of molecular entities accessible/open-access archives and organises information about the bio- focused on small chemical compounds. The entities chemistry databases. Drug Discov Today 13 (11-12), 495- logical activities of chemical compounds into a are either natural products or synthetic products 501. comprehensive database and is the informatics used to intervene in the processes of living organ- 2 Williams, AJ (2008). Internet- backbone for the Molecular Libraries and Imaging isms. ChEBI includes an ontological classification based tools for communication Initiative, which is part of the NIH Roadmap. (Figure 2), whereby the relationships between and collaboration in chemistry. Pubchem is also intended to empower the scientif- molecular entities or classes of entities and their Drug Discov Today 13 (11-12), 502-506. ic community to use small molecule chemical com- ‘parents’ and/or ‘children’ are specified. While the 3 Southan, C et al (2009). pounds in their research as molecular probes to database presently offers access to close to 19,000 Quantitative assessment of the investigate important biological processes or gene entities this is expected to expand to more than expanding complementarity functions. The PubChem compound repository 440,000 by the end of October between public and presently contains more than 25 million unique (http://www.ebi.ac.uk/chebi/newsForward.do#Ch commercial databases of bioactive compounds. J structures with biological property information EMBL%20data%20integration). The database is Cheminformatics 1, 10. provided for many of the compounds. For now, available for download by anonymous FTP 4 Office of Portfolio Analysis PubChem remains focused on its initial intent to (ftp://ftp.ebi.ac.uk/pub/databases/chebi/). and Strategic Initiatives, support the Molecular Libraries Initiative and N.I.o.H (2008). The NIH serves as an extremely valuable and authoritative The Human Metabolome Database Roadmap Initiative. 5,6 5Wishart, DS et al (2007). resource for cheminformatics and chemical (http://www.hmdb.ca) (HMDB) is a compre- HMDB: the Human Metabolome genomics. However, there are a number of con- hensive curated collection of human metabolite Database. Nucleic Acids Res 35 straints around the system, especially in its place and human metabolism data. It contains records (Database issue), D521-526. as a repository of data and information without a for more than 6,800 endogenous metabolites. In 6 Wishart, DS et al (2009). special effort toward curating these data. addition to its comprehensive literature-derived HMDB: a knowledgebase for the human metabolome. Naturally, in the absence of data curation any data, the HMDB also contains an extensive col- Nucleic Acids Res 37 errors in the data are transferred across many lection of experimental metabolite concentration (Database issue), D603-610. online databases that depend on PubChem and data compiled from hundreds of mass spectra ultimately, the errors influence the quality of com- (MS) and Nuclear Magnetic resonance (NMR) Continued on page 36 putational models based on this data. metabolomic analyses performed on urine, blood

34 Drug Discovery World Winter 2009/10 Free online resources:Layout 1 14/1/10 19:53 Page 35

Cheminformatics

and cerebrospinal fluid samples. This is further listed above and from individual chemists. supplemented with thousands of NMR and MS ChemSpider has also integrated the SureChem spectra collected on purified, reference metabo- patent database collection (http://www.surechem. lites. Each metabolite entry in the HMDB contains org/) of structures to facilitate structure-based link- data fields including a comprehensive compound ing to patents between the two data collections. description, names and synonyms, structural ChemSpider can be queried using struc- information, physicochemical data, reference ture/substructure searching and alphanumeric text NMR and MS spectra, biofluid concentrations, searching of both intrinsic as well as predicted disease associations, pathway information, molecular properties. Unique capabilities relative enzyme data, gene sequence data, SNP and muta- to other public chemistry databases include real tion data as well as extensive links to images, ref- time curation of the data, association of analytical erences and other public databases. Recent data with chemical structures, real-time deposition Figure 2 improvements have included spectra and substruc- of single or batch chemical structures (including The ChEBI database offers a ture searching. with activity data) and transaction-based predic- detailed ontology including tions of physicochemical data. A series of web serv- subdivision into (1) Molecular Structure, in which molecular DrugBank (http://www.drugbank.ca/) is a manual- ices are provided to allow integration to the system entities or parts thereof are ly curated resource7 assembled from a series of for the purpose of searching and linking with other classified according to other public domain databases (KEGG, PubChem, online databases from other groups (academia or composition and structure (2) ChEBI, PDB, Swiss-Prot and GenBank) and industry). The integration can be with free or com- Role, which classifies entities enhanced with additional data generated within mercial resources. For example, Collaborative either on the basis of their role within a biological the laboratories of the hosts. The database aggre- Drug Discovery, Inc (http://www.collaborative context, eg antibiotic, antiviral gates both bioinformatics and cheminformatics drug.com) recently provided links to ChemSpider agent, coenzyme, hormone, or data and combines detailed drug data with com- for molecules in its CDD database12 thereby pro- on the basis of their intended prehensive drug target (ie protein) information. viding an integration path between a commercial use by humans, eg pesticide, The database contains FDA approved small mole- resource and a public domain database. CDD is a antirheumatic drug, fuel. The structure shown is for cule and biotech drugs as well as experimental highly secure, commercial collaborative drug dis- chloroquine, identified as an drugs, representing nearly 5,000 molecules8. The covery informatics platform and a new type of col- antimalarial quinoline alkaloid database supports extensive text, sequence, chemi- laborative system that handles a broad array of in the ChEBI ontology cal structure and relational query searches of the nearly 100 data fields. The data from DrugBank has been used to show that the drug to drug-target relationship is scale-free and several classes of pro- teins are selectively enriched as drug targets for FDA approved drugs9.

ZINC (http://zinc.docking.org/index.shtml) is a free, searchable database of commercially avail- able compounds for virtual screening10,11. The library contains more than 20 million molecules, each with a 3D structure and gathered from the catalogues of compounds from vendors. All mole- cules in the databases are assigned biologically-rel- evant protonation states and annotated with molecular properties.

ChemSpider (http://www.chemspider.com/)1,2 is a community resource for chemists provided by the Royal Society of Chemistry (Figure 3). It offers a number of facilities that distinguishes the service from many of the other databases listed in this arti- cle. At the time of writing it contains more than 23 million unique chemical entities aggregated from more than 200 diverse data sources, including gov- ernment databases, chemical vendors, commercial database vendors, publishers, all of the databases

Drug Discovery World Winter 2009/10 35 Free online resources:Layout 1 14/1/10 19:53 Page 36

Cheminformatics

Figure 3 ChemSpider provides links to Wikipedia articles, links out to the original data sources and commercial suppliers, links out to patents and articles on PubMed. Flexible search capabilities are available, together with visualisation tools such as a real time 3D optimisation engine and display module

Continued from page 34 data types that can be archived and then selective- reports indicate that this problem should be given 7 Wishart, DS et al (2006). ly shared among colleagues or openly shared in serious attention. For instance, benchmarking stud- DrugBank: a comprehensive standardised formats, at each research group’s dis- ies by a large group of collaborators from six labo- resource for in silico drug cretion. A focus of CDD is facilitating the growth ratories13,14 have clearly demonstrated that the discovery and exploration. of global collaborative research networks for neg- type of chemical descriptors has much greater influ- Nucleic Acids Res 34 lected diseases such as malaria, African sleeping ence on the prediction performances of QSAR mod- (Database issue), D668-672. 8 Wishart, DS et al (2008). sickness, Chagas disease and tuberculosis. els than the nature of the model optimisation tech- DrugBank: a knowledgebase Subsequently there are currently 50 datasets avail- niques. Furthermore, in another recent seminal for drugs, drug actions and able to the public upon registration which can be publication15, the authors clearly pointed out the drug targets. Nucleic Acids Res readily substructure or similarity searched. importance of chemical data curation in the context 36 (Database issue), D901-906. of QSAR modelling (eg incorrect structures gener- 9 Ma’ayan, A et al (2007). Network analysis of FDA The importance of chemical data ated from either correct or incorrect SMILEs). approved drugs and their targets. curation in QSAR modelling Their main conclusions were that small structural Mt Sinai J Med 74 (1), 27-32. Molecular modellers and cheminformaticians alike errors within a dataset could lead to significant 10 Irwin, JJ and Shoichet, BK typically analyse data generated by other losses in the predictive abilities of QSAR models. At (2005). ZINC – a free database researchers providing, in general, experimental the same time they further demonstrated that man- of commercially available compounds for virtual data. Consequently, when it comes to the quality of ual curation of the structural data leads to a sub- screening. J Chem Inf Model these data modellers are always at the mercy of the stantial increase in the model predictivity15. 45 (1), 177-182. providers. Practically any modelling cheminfor- In their report highlighting the importance of 11 Irwin, JJ et al (2005). Virtual matics study entails the calculation of chemical gathering accurate information to build the WOM- screening against descriptors that are expected to accurately reflect BAT and WOMBAT-database, Oprea et al16 dis- metalloenzymes for inhibitors and substrates. the intricate details of the underlying chemical cussed the error rate in medicinal chemistry publica- 44 (37), 12316-12328. structures. Obviously, any error in the structure tions. They found an average of approximately two 12 Hohman, M et al (2009). translates into either an inability to calculate the errors per publication in the almost 6,800 papers Novel web-based tools descriptors for erroneous chemical records or into indexed in the WOMBAT database. With a median combining chemistry erroneous descriptors. Naturally, the models built of 25 compounds per series in a publication this informatics, and social networks for drug discovery. with this data are either restricted to only a frac- implied an overall error rate of 8% with errors Drug Disc Today 14, 261-270. tion of the formally available data or, worse, they including17: incorrectly drawn or written structures, 13 Tetko, IV et al (2008). are merely inaccurate. As both data and models of unspecified position of attachment of substituents, Critical assessment of QSAR the data, as well as the body of scholarly publica- structures with the incorrect backbone, incorrect models of environmental tions in cheminformatics, continue to grow, it generic names or chemical names or duplicates. toxicity against Tetrahymena pyriformis: focusing on becomes increasingly important to address the The basic steps to curate a dataset of compounds applicability domain and issue of data quality that inherently effects the have been either considered trivial or ignored by overfitting by variable quality of models. the experts in the field. For instance, several years selection. J Chem Inf Model 48 How significant is the problem of accurate struc- ago a group of experts in QSAR modelling devel- (9), 1733-1746. ture representation as it concerns the adequacy and oped what is now known as OECD QSAR model- Continued on page 37 accuracy of cheminformatics models? A few recent ing and validation principles18,19 that the

36 Drug Discovery World Winter 2009/10 Free online resources:Layout 1 14/1/10 19:53 Page 37

Cheminformatics

researchers should follow to achieve the regulatory screening against commercially available screening Continued from page 36 acceptance of QSAR models. The need to curate libraries. Often the screening efforts arise in an the primary data from which the models are academic setting. Because of the disconnect 14 Zhu, H et al (2008). Combinatorial QSAR modeling derived was not mentioned. The Journal of between academic biology and expert medicinal of chemical toxicants tested Chemical Information and Modeling published a chemistry it is essential to carry out a medicinal against Tetrahymena pyriformis. special editorial highlighting the requirements for chemistry annotation of putative hits or leads J Chem Inf Model 48 (4), 766- QSAR papers that should be followed by authors before expenditure of significant drug discovery 784. considering publishing their results in the journal20 effort. The early stages of the annotation process 15 Young, D et al (2008). Are the chemical structures in and recent publications addressing common mis- can be done using known filters and guidelines for your QSAR correct? QSAR takes and criticising faulty practices in the QSAR acceptable chemistry functionality. A more detailed Comb Sci 27, 1337-1345. modelling field21-23 have appeared, yet none of analysis asking questions about the chemistry of 16 Oprea, TI et al (2007). these sources have explicitly described and dis- the hit or lead, and what is known biologically and WOMBAT and WOMBAT-PK: cussed the importance of chemical record curation chemically about substructures and similar com- Bioactivity Databases for Lead and Drug Discovery, Chemical for developing robust QSAR models. pounds to the hit or lead currently requires a Biology: From Small Molecules There is an obvious trend within the community medicinal chemistry expert and takes on average to Systems Biology and Drug of QSAR modellers to develop and follow the stan- about 20 minutes per compound. The in-depth Design. Schreiber, SL, Kapoor dardised guidelines for developing statistically data available through CAS SciFinder was used in TM and Wess, G (Eds), Wiley- robust and externally predictive QSAR models24. the annotation of 64 putative tools and probes VCH, New York, 2007, pp. 760- 786. The importance of developing best practices for 25. from the NIH Roadmap MLSCN effort 17 Oprea, TI et al (2003). On data preparation prior to initiating the modelling Progress towards public sector tools for chemistry the propagation of errors in process is obvious. There is therefore a pressing annotation might allow for a more affordable and the QSAR literature in need to amend the five OECD principles by adding accessible process in the future. For example, many EuroQSAR 2002 – Designing a sixth rule that would request careful data prepa- companies have instituted filters (usually SMARTS drugs and crop protectants: Processes, problems and ration prior to model development. There is a need queries) to remove undesirable molecules, false solutions. Eds Ford, M, to develop and systematically employ standard positives and frequent hitters from their HTS Livingstone, D, Dearden, J and chemical record curation protocols that should be screening libraries or to filter vendor compounds. Van de Waterbeemd H (Eds), helpful in the pre-processing of any chemical Early examples include REOS from Vertex26, New York, Blackwell Publishing, dataset and these could be automated using existing basic, hard and soft filters from GSK27 and func- 2003, 314-315. 18 Dearden, JC et al (2009). tional group compound filters from BMS28. These software packages (many of which are free for aca- How not to develop a demic investigators). The essential procedures are in addition to the many proprietary filters at quantitative structure-activity include the removal of inorganic compounds, coun- companies. A particular issue is chemical reactivity or structure-property terions and mixtures (because for the most part the towards protein thiol groups. A group from relationship (QSAR/QSPR). current chemical descriptors do not account for Abbott reported a sensitive assay to detect reactive SAR QSAR Environ Res 20 (3- 4), 241-266. molecules by NMR (ALARM NMR)29,30. A fol- such molecular records), ring aromatisation, nor- 19 Group, QE (2004). The malisation of specific chemotypes, curation of tau- low up study used 8,800 compounds with data report from the expert group tomeric forms and the deletion of duplicates. from this assay to create a Bayesian classifier on (Quantitative) Structure- Data analytical studies are impossible without model with extended connectivity fingerprints Activity Relationships trusting the original data sources. It is important, (ECFP_6) with good classification accuracy to pre- [(Q)SARs] on the principles for the validation of (Q)SARs. dict reactivity31. This also identified 175 substruc- whenever possible, to verify the accuracy of the OECD Series on Testing and primary data before developing any model. We tures that were likely of interest as potentially caus- Assessment No. 49. believe that this approach could be summarised by ing reactivity. Currently there is no freely accessible ENV/JM/MONO(2004)24. a famous proverb ‘Trust, but verify’ that was fre- automated method for filtering compounds or Organization for Economic quently used by the late president Ronald Reagan alerting users to reactivity issues. If we were to take Cooperation and Development, Paris, France. this further, how could we encode the knowledge during the cold war era and that traces back to the 206 pp. founder of the Russian KGB Felix Dzerzhinsky of many medicinal chemists with drug discovery 20 Jorgensen, WL (2006). who invented it almost 100 years ago expertise into a piece of software or database that QSAR/QSPR and proprietary (http://en.wikipedia.org/wiki/Trust_but_Verify). would identify chemical ‘trash’ or undesirable mol- data. J Chem Inf Model 46, Our hope is that other experts will also contribute ecules for biologists? There is certainly some scope 937. 21 Maggiora, GM (2006). On their expertise and best practices to this effort. here to influence the quality of hits and leads that outliers and activity cliffs – are published and annotate such molecules in pub- why QSAR often disappoints. J Improving the quality of putative hits lic databases. Chem Inf Model 46 (4), 1535. and leads Hits or leads in rare, orphan and neglected diseases Discussion (or for that matter many pharmaceutically relevant Freely available databases and tools supporting targets) can arise from phenotypic or mechanistic drug discovery and chemistry in particular are Continued on page 38

Drug Discovery World Winter 2009/10 37 Free online resources:Layout 1 14/1/10 19:53 Page 38

Cheminformatics

Continued from page 37 becoming increasingly available. In parallel we are cheminformatics models have been almost absent seeing more discussion about the need for more in the published literature. It appears that chemin- 22 Zvinavashe, E et al (2008). pre-competitive32-35, competitive36 and collabora- formaticians and molecular modellers tend to take Promises and pitfalls of tive approaches12,32 in drug discovery and the quantitative structure-activity published chemical and biological data at their face relationship approaches for pharmaceutical industry in general, covering areas value and launch calculations without carefully predicting metabolism and such as informatics, ADME/tox and clinical. This examining the accuracy of data records. However, toxicity. Chem Res Toxicol 21 raises the question: “What could we achieve by just there should be much less disagreement concerning (12), 2229-2236. making more software and data resources avail- the exact chemical structure of compounds in the 23 Johnson, SR (2008). The able on the web?” There is currently little in the trouble with QSAR (or how I databases except for arguably difficult issues such learned to stop worrying and way of freely available resources for computation- as tautomers. Thus, the accuracy of the chemical embrace fallacy). J Chem Inf al ADME/Tox (apart from efforts like the ToxCast structure representation could be addressed direct- Model 48 (1), 25-26. project37,38 at the EPA where several hundred ly in most cases. 24 Tropsha, A and Golbraikh, A compounds have been screened in more than 600 Both common sense and the recent QSAR investi- (2007). Predictive QSAR biological assays and the results have been made modeling workflow, model gations described above indicate that chemical applicability domains, and public, representing a resource for future models) record curation should be viewed as a separate and virtual screening. Curr Pharm so when will this change? Perhaps, as more data is perhaps critical component of cheminformatics Des 13 (34), 3494-3504. placed in the public domain by companies that are research. By comparison, the community of protein 25 Oprea, TI et al (2009). A holding on to it closely. If more computational x-ray crystallographers has long recognised the crowdsourcing evaluation of the NIH chemical probes. Nat tools and biological data were freely available it importance of structural data curation; indeed the Chem Biol 5 (7), 441-447. would facilitate crowd-sourced drug discovery and Protein Data Bank (PDB) team includes a large 26 Walters, WP and Murcko, basically level the playing field for small (or one- group of structure curators whose only job is to MA (2002). Prediction of ‘drug- person) virtual companies versus other pharma process and validate primary data submitted to the likeness’. Adv Drug Del Rev and biotech without requiring expensive tools and PDB by experimental crystallographers39. 54, 255-271. Furthermore, the NIH recently awarded a signifi- 27 Hann, M et al (1999). databases (eg CAS SciFinder). In this case, anyone Strategic pooling of with access to a computer anywhere in the world cant Center grant to a group of scientists from the compounds for high- can contribute to drug discovery regardless of University of Michigan (http://www.genome throughput screening. J Chem whether they belong to a company, research insti- web.com/informatics/nigms-allots-5m-new-data- Inf Comput Sci 39 (5), 897-902. tute or not. Young gamers are already contributing base-house-protein-ligand-data-pharma-contribute) 28 Pearce, BC et al (2006). An to curate primary data on protein-ligand complexes empirical process for the to the optimised folding of proteins as evidenced design of high-throughput by the success in the Community-Wide Experiment deposited to the PDB. Conversely, the largest pub- screening deck filters. J Chem on the Critical Assessment of Techniques for licly funded cheminformatics project, ie, PubChem, Inf Model 46 (3), 1060-1068. Protein Structure Prediction, or CASP. is considered a data repository and no special effort 29 Huth, JR et al (2005). (http://www.wired.com/medtech/genetics/magazin is dedicated to the curation of structural informa- ALARM NMR: a rapid and e/17-05/ff_protein). Such efforts represent truly tion deposited to PubChem by the various contribu- robust experimental method to detect reactive false distributed discovery and could contribute to fully tors. Chemical data curation has been addressed positives in biochemical integrated pharmaceutical networks. When this whenever possible by the privately funded, but pub- screens. J Am Chem Soc 127 occurs there will be more of a need to work with licly available, ChemSpider project as well as by sev- (1), 217-224. highly dispersed individual researchers, store their eral other projects reviewed above. It is critical that 30 Huth, JR et al (2007). data and possibly take molecules to the next step, Toxicological evaluation of scientists who exploit and build models of datasets thiol-reactive compounds eg enabling preclinical testing, animal studies etc. derived from current databases or extracted from identified using a la assay to This will then require companies such as publications dedicate their own effort to the task of detect reactive molecules by AssayDepot (http://www.assaydepot.com/) and data curation. nuclear magnetic resonance. CDD to help generate and store data needed for Of course the hope of using cheminformatics Chem Res Toxicol 20 (12), progressing molecules to clinical studies and find- 1752-1759. and databases in drug discovery is to increase the 31 Metz, JT et al (2007). ing larger companies or organisations to take these efficiency and quality of molecules that progress Enhancement of chemical rules further. We are seeing a shift from requiring pow- to later stages. Just identifying reactive molecules for predicting compound erful computers within insular organisations to do and false positives could be of great utility to the reactivity towards protein drug discovery to using resources on the web, and many groups that are not aware of this problem thiol groups. J Comput Aided Mol Des 21 (1-3), 139-144. so this opens up being able to use cheap portable and avoid dead ends. If we really are to empower 32 Louise-May, S et al (2009). and mobile devices to search databases and gener- the user and do crowd-sourced drug discovery, we Towards integrated web-based ate predictions from computational models. Of will create issues with IP and the ownership of the tools in drug discovery. Touch course the quality of the output will be highly collaborative discovery. This consideration could Briefings – Drug Discovery in dependent on the initial data quality. be one of the reasons why this approach has not Press. Surprisingly, the investigations into how primary been followed before. Additionally, if we are to Continued on page 39 data quality influences the quality of published identify gaps in the free tools to crowd-sourced

38 Drug Discovery World Winter 2009/10 Free online resources:Layout 1 14/1/10 19:53 Page 39

Cheminformatics

drug discovery vision then it is perhaps related to and book chapters as well as edited three books on Continued from page 38 having the molecules in a database but not physi- computational applications in pharmaceutical cally having free access to them for testing. So, the R&D and computational . His areas of 33 Ekins, S and Williams, AJ (2009). Precompetitive next big step will be how to make the physical interest are in vitro and computational Preclinical ADME/Tox Data: Set molecules more widely available to all, by making ADME/Tox, systems biology, cheminformatics and It Free on The Web to them on demand or a centralised storage facility computer-aided drug discovery. Facilitate Computational funded by the NIH etc, a topic which is outside Model Building to Assist Drug the scope of this article but worth considering. Development. Lab on a Chip in Press. 34 Hunter, AJ (2008). The Conflicts of interest statement Innovative Medicines Initiative: Sean Ekins consults for Collaborative Drug a pre-competitive initiative to Discovery Inc and is on the advisory board for enhance the biomedical AssayDepot. Antony J. Williams and Valery science base of Europe to expedite the development of Tkachenko are employed by the Royal Society of new medicines for patients. Chemistry which owns ChemSpider and associated Drug Discov Today 13 (9-10), technologies. Alexander Tropsha and Chris 371-373. Lipinski have no conflicts of interest. DDW 35 Barnes, MR et al (2009). Lowering industry firewalls: pre-competitive informatics initiatives in drug discovery. Dr Antony Williams is Vice-President, Strategic Nat Rev Drug Discov 8 (9), development, for ChemSpider at the Royal Society 701-708. of Chemistry. He has authored more than 100 peer 36 Bingham, A and Ekins, S reviewed papers and book chapters on NMR, pre- (2009). Competitive Collaboration in the dictive ADME methods, internet-based tools, Pharmaceutical and crowd-sourcing and database curation. He is an Biotechnology Industry. Drug active blogger and participant in the internet chem- Disc Today Submitted. istry network. 37 Judson, R et al (2009). The toxicity data landscape for environmental chemicals. Valery Tkachenko is Chief Technical Officer for Environ Health Perspect 117 ChemSpider at the Royal Society of Chemistry. He (5), 685-695. was intimately involved with the development of 38 Dix, DJ et al (2007). The the PubChem platform during his time with NIH ToxCast program for and has been involved with the development of prioritizing toxicity testing of environmental chemicals. enterprise level web-based software applications Toxicol Sci 95 (1), 5-12. for the Life Sciences for well over a decade. 39 Dutta, S et al (2008). Data deposition and annotation at Dr Christopher Lipinski is a Scientific Advisor to the worldwide protein data Melior Discovery. An ACS, AAPS and SBS mem- bank. Methods Mol Biol 426, 81-101. ber, he is author of the ‘rule of five’, a member of the ACS ‘Medicinal Chemistry Hall of Fame’ and winner of multiple awards. An adjunct professor at UMass Amherst, he has 235 publications and invit- ed presentations and 17 issued US patents.

Professor Alexander Tropsha is K.H. Lee Distinguished Professor and Chair of the Division of Medicinal Chemistry and Natural Products in the Eshelman School of Pharmacy, UNC-Chapel Hill. His research interests are in the areas of Computer-Assisted Drug Design, Computational Toxicology, Cheminformatics, and Structural Bioinformatics.

Dr Sean Ekins is a Computational Chemist and has authored more than 130 peer reviewed papers

Drug Discovery World Winter 2009/10 39