Free online resources:Layout 1 14/1/10 19:53 Page 33 Cheminformatics Free online resources enabling crowd-sourced drug discovery The availability of freely accessible online resources to enable and support drug discovery has blossomed in recent years. The PubChem platform is now accompanied by a myriad of other online databases including ChEBI, DrugBank, the Human Metabolome Database and ChemSpider. The access to the array of software tools and diverse data in public domain provides capabilities previously only available within the confines of organisations (eg, big Pharma) that could afford significant investments in cheminformatics. This paper provides an overview of the internet resources available to drug discovery scientists and discusses the advantages of such accessibility but also the potential risks that reside within the data. It also examines what the present resources continue to lack and sets a vision for future approaches to providing internet-based resources for drug discovery. he past five years have seen a mini revolu- cific focus based on the domain expertise of the By Dr Antony J. tion in the availability of resources to sup- hosting organisation; examples include databases Williams, Tport drug discovery and, in particular, of curated literature data, chemical vendor cata- Valery Tkachenko, databases searchable by molecular structure logues, patents, analytical data, biological data, Dr Chris Lipinski, (Figure 1). Chemistry information on the internet etc. There are too many to include in this single Professor Alexander continues to become more widely accessible and at article so only a small number will be discussed. Tropsha and an increasing rate. There are many freely available For example, the authors recommend a recent arti- Dr Sean Ekins chemical compound databases on the web1,2. cle that assesses the expanding public and com- These databases generally contain the chemical mercial databases containing bioactive com- identifiers in the form of chemical names (system- pounds3 and conclude that the commercial efforts atic and trade) and registry numbers. Since the files are ahead of the public ones. in the databases are assembled in a heterogeneous The availability of molecule databases such as manner, using variations in deposition processes PubChem (http://pubchem.ncbi.nlm.nih.gov/) has and procedures to handle chemical structures, the dramatically changed the landscape of publicly resulting data are plagued with inconsistencies and available cheminformatics resources, yet quality issues. There are many databases available PubChem covers only a fraction of the chemical from which the drug discovery community can universe, mostly of interest to chemical genomics derive value. These databases generally have a spe- and pharmaceutical research. PubChem was Drug Discovery World Winter 2009/10 33 Free online resources:Layout 1 14/1/10 19:53 Page 34 Cheminformatics Figure 1 A graphical interpretation of the history of 60 chem/bioinformatics software, model and database development, and increasing 50 drug development costs versus registered compounds in the CAS Registry and the 40 ChemSpider database 30 20 Registered compounds (millions) Registered 10 0 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year References launched by the NIH in 2004 to support the ‘New The Chemical Entities of Biological Interest, or 1 Williams, AJ (2008). A Pathways to Discovery’ component of the ChEBI database (http://www.ebi.ac.uk/chebi/) is a perspective of publicly Roadmap for Medical Research4. PubChem highly curated database of molecular entities accessible/open-access archives and organises information about the bio- focused on small chemical compounds. The entities chemistry databases. Drug Discov Today 13 (11-12), 495- logical activities of chemical compounds into a are either natural products or synthetic products 501. comprehensive database and is the informatics used to intervene in the processes of living organ- 2 Williams, AJ (2008). Internet- backbone for the Molecular Libraries and Imaging isms. ChEBI includes an ontological classification based tools for communication Initiative, which is part of the NIH Roadmap. (Figure 2), whereby the relationships between and collaboration in chemistry. Pubchem is also intended to empower the scientif- molecular entities or classes of entities and their Drug Discov Today 13 (11-12), 502-506. ic community to use small molecule chemical com- ‘parents’ and/or ‘children’ are specified. While the 3 Southan, C et al (2009). pounds in their research as molecular probes to database presently offers access to close to 19,000 Quantitative assessment of the investigate important biological processes or gene entities this is expected to expand to more than expanding complementarity functions. The PubChem compound repository 440,000 by the end of October between public and presently contains more than 25 million unique (http://www.ebi.ac.uk/chebi/newsForward.do#Ch commercial databases of bioactive compounds. J structures with biological property information EMBL%20data%20integration). The database is Cheminformatics 1, 10. provided for many of the compounds. For now, available for download by anonymous FTP 4 Office of Portfolio Analysis PubChem remains focused on its initial intent to (ftp://ftp.ebi.ac.uk/pub/databases/chebi/). and Strategic Initiatives, support the Molecular Libraries Initiative and N.I.o.H (2008). The NIH serves as an extremely valuable and authoritative The Human Metabolome Database Roadmap Initiative. 5,6 5Wishart, DS et al (2007). resource for cheminformatics and chemical (http://www.hmdb.ca) (HMDB) is a compre- HMDB: the Human Metabolome genomics. However, there are a number of con- hensive curated collection of human metabolite Database. Nucleic Acids Res 35 straints around the system, especially in its place and human metabolism data. It contains records (Database issue), D521-526. as a repository of data and information without a for more than 6,800 endogenous metabolites. In 6 Wishart, DS et al (2009). special effort toward curating these data. addition to its comprehensive literature-derived HMDB: a knowledgebase for the human metabolome. Naturally, in the absence of data curation any data, the HMDB also contains an extensive col- Nucleic Acids Res 37 errors in the data are transferred across many lection of experimental metabolite concentration (Database issue), D603-610. online databases that depend on PubChem and data compiled from hundreds of mass spectra ultimately, the errors influence the quality of com- (MS) and Nuclear Magnetic resonance (NMR) Continued on page 36 putational models based on this data. metabolomic analyses performed on urine, blood 34 Drug Discovery World Winter 2009/10 Free online resources:Layout 1 14/1/10 19:53 Page 35 Cheminformatics and cerebrospinal fluid samples. This is further listed above and from individual chemists. supplemented with thousands of NMR and MS ChemSpider has also integrated the SureChem spectra collected on purified, reference metabo- patent database collection (http://www.surechem. lites. Each metabolite entry in the HMDB contains org/) of structures to facilitate structure-based link- data fields including a comprehensive compound ing to patents between the two data collections. description, names and synonyms, structural ChemSpider can be queried using struc- information, physicochemical data, reference ture/substructure searching and alphanumeric text NMR and MS spectra, biofluid concentrations, searching of both intrinsic as well as predicted disease associations, pathway information, molecular properties. Unique capabilities relative enzyme data, gene sequence data, SNP and muta- to other public chemistry databases include real tion data as well as extensive links to images, ref- time curation of the data, association of analytical erences and other public databases. Recent data with chemical structures, real-time deposition Figure 2 improvements have included spectra and substruc- of single or batch chemical structures (including The ChEBI database offers a ture searching. with activity data) and transaction-based predic- detailed ontology including tions of physicochemical data. A series of web serv- subdivision into (1) Molecular Structure, in which molecular DrugBank (http://www.drugbank.ca/) is a manual- ices are provided to allow integration to the system entities or parts thereof are ly curated resource7 assembled from a series of for the purpose of searching and linking with other classified according to other public domain databases (KEGG, PubChem, online databases from other groups (academia or composition and structure (2) ChEBI, PDB, Swiss-Prot and GenBank) and industry). The integration can be with free or com- Role, which classifies entities enhanced with additional data generated within mercial resources. For example, Collaborative either on the basis of their role within a biological the laboratories of the hosts. The database aggre- Drug Discovery, Inc (http://www.collaborative context, eg antibiotic, antiviral gates both bioinformatics and cheminformatics drug.com) recently provided links to ChemSpider agent, coenzyme, hormone, or data and combines detailed drug data with com- for molecules in its CDD database12 thereby pro- on the basis of their intended prehensive drug target (ie protein) information. viding an integration path between a commercial use by humans, eg pesticide, The database contains FDA approved small mole- resource and a public domain database. CDD is a antirheumatic
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-