The Bioassay Network and Its Implications to Future Therapeutic Discovery
Total Page:16
File Type:pdf, Size:1020Kb
Zhang et al. BMC Bioinformatics 2011, 12(Suppl 5):S1 http://www.biomedcentral.com/1471-2105/12/S5/S1 PROCEEDINGS Open Access The BioAssay network and its implications to future therapeutic discovery Jintao Zhang1, Gerald H Lushington2, Jun Huan3* From IEEE International Conference on Bioinformatics and Biomedicine 2010 Hong Kong, PR China. 18-21 December 2010 Abstract Background: Despite intense investment growth and technology development, there is an observed bottleneck in drug discovery and development over the past decade. NIH started the Molecular Libraries Initiative (MLI) in 2003 to enlarge the pool for potential drug targets, especially from the “undruggable” part of human genome, and potential drug candidates from much broader types of drug-like small molecules. All results are being made publicly available in a web portal called PubChem. Results: In this paper we construct a network from bioassay data in PubChem, apply network biology concepts to characterize this bioassay network, integrate information from multiple biological databases (e.g. DrugBank, OMIM, and UniHI), and systematically analyze the potential of bioassay targets being new drug targets in the context of complex biological networks. We propose a model to quantitatively prioritize this druggability of bioassay targets, and literature evidence was found to confirm our prioritization of bioassay targets at a roughly 70% accuracy. Conclusions: Our analysis provide some measures of the value of the MLI data as a resource for both basic chemical biology research and future therapeutic discovery. Background Under these circumstances, NIH launched the Mole- Intense growth in drug development investment in the cular Libraries Initiative (MLI) in 2003 for identifying past decade has not yet produced significant progress chemical probes to enhance the chemical biology under- on the discovery of novel drugs and the validation of standing of therapeutically interesting genes and path- new drug targets: the average number of novel drugs ways [7], and for the purpose of expanding availability, entering the global market each year has remained flexibility, and utility of small-molecule bioassay screen- roughly constant (approximately 26), along with only ing data. The MLI especially focuses on genes in the about 6-7 new drug targets introduced annually [1-3]. “undruggable” part of human genome that has not been Moreover, the success rate of translating new drug can- well investigated in private sectors for identifying their didates into US Food and Drug Administration (FDA)- functions and potential therapeutics [7]. The MLI has approved drugs significantly decreased [4], mainly due also been synthesizing and screening much broader to lack of efficacy and discovered adverse drug reactions, types of compounds for increasing the diversity of each of which accounts for 30% of late-stage drug fail- selecting potential drug candidates in chemical space (i. ures in clinical development [5]. The increasing rate of e., the set of all possible small organic molecules). All drug attrition challenges the traditional drug design MLI results are being made freely available to research- paradigm and makes the current “one gene, one drug, ers in both public and private sectors via a web portal and one disease” assumption [6] questionable. called PubChem (http://pubchem.ncbi.nlm.nih.gov) [8]. PubChem provides valuable chemical genomics informa- tion in studying genes, pathways, cells and diseases, * Correspondence: [email protected] 3Department of Electrical Engineering & Computer Science, University of however, these data are noisy, high dimensional, with Kansas, Lawrence, KS 66045, USA large volume, and contain outliers and errors. For Full list of author information is available at the end of the article © 2011 Zhang et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Zhang et al. BMC Bioinformatics 2011, 12(Suppl 5):S1 Page 2 of 8 http://www.biomedcentral.com/1471-2105/12/S5/S1 instance, the activity score, which measure the biological data from the Online Mendelian Inheritance in Man activity level of screened compounds, are normalized in (OMIM) database [14], Goh et al.[15] built a bipartite many different approaches without a consensus. Hence human disease network, and then generated two biologi- PubChem data deserve a careful investigation for the cally relevant network projections: human disease net- related research communities. work and disease gene network. This network-based There are numerous criteria that chemical biologists approach revealed that genes associated similar disor- might place on an initiative that aims to foster new ders are more likely to have interactions between their paradigms in therapeutic discovery. Some obvious tar- products and higher expression profiling similarity get-related benchmarks should include whether the between their transcripts, indicating the existence of dis- assay priorities focus on the systems that stand a good ease-specific functional modules. In this study we use chance of being potential druggable targets in their network-based analysis approaches to integrate Pub- own right (i.e., share favorable attributes with known Chem data and existing research results on drug-target drug targets, and augment current biomedical capabil- network [2], human diseasome [15], human protein ities). Our assessment of MLI bioassay targets thus interactome [9] for a better understanding of the corre- focuses on a variety of their attributes relative to lations and interrelationships between disease genes, known drug targets. Given our interest in analyzing genetic disorders and drugs by (1) constructing a bioas- specific relationships with phenotypically interesting say network for data in PubChem, visualizing complex pathways, we choose to focus our target analysis data in a network view and characterizing the network strictly on target-based assays, whose target proteins using a variety of statistical tools, (2) mapping bioassay are referred to herein as bioassay targets. As our basis targets into the human PPI networks, and investigating for contrasting targets, we use target protein interac- the interrelationships between bioassay targets, and drug tion profiles in the human PPI network, to evaluate targets, disease genes, and essential genes, and (3) quali- whether the bioassay target selection was progressing tatively analyzing the potential of bioassay targets to be effectively toward augmenting existing chemical geno- potential therapeutic targets, quantitatively modeling mics knowledge. this potential, and confirming our results using literature Biological network analysis approaches have gained survey. Our analyses should provide some measures of popularity for organizing complex biological systems so the value of the MLI data as a resource for both basic that data retrieval, analysis, and visualization can be chemical biology research and future therapeutic highly efficient. It can also reveal important biological discovery. patterns and functions that are deeply hidden in mass data repository. For instance, Stelzl et al.[9]usedthe Results and discussion Y2H system to generate and analyze a human PPI net- We download all bioassay screening data from the Pub- work, and calculated many interesting and critical pat- Chem BioAssay Database. As of January 2009, 1306 ternsandcharacteristics.Thiswasviewedasan bioassays (1,126 with at least one active compound) and important step toward the complete human protein-pro- more than 30 million compounds have been deposited tein interactome. Chaurasia et al.[10,11] built a compre- into PubChem by a variety of screening centers, and the hensive web platform - the Unified Human Interactome size of PubChem data keeps increasing continuously. database (UniHI, http://www.unihi.org), for querying For each bioassay, tens to hundreds of thousand of com- and accessing human protein-protein interaction (PPI) pounds are tested either against specific target proteins data. The latest update of UniHI includes over 250,000 in vitro or within a cell for investigating disease-related interactions between 22,300 unique proteins collected mechanisms. There are totally 151,930 compounds that from 14 major PPI sources [11]. Using association data are active in at least one bioassay, and 555,859 bioassay- of approved drugs and drug targets obtained from the active compound pairs across all the bioassays. On aver- DrugBank database [12,13], Yildirim et al.[2] built a age each active compound is active in 3.7 bioassays, and bipartite network composed FDA-approved drugs and each bioassay has 493.7 active compounds. Therefore a their drug targets, which was an important step toward very sparse bipartite network of bioassays and com- the complete characterization of the global relationship pounds can be observed. In addition, 680 bioassays are between protein targets of all drugs and all disease-gene found associated with at least one target protein and are products in the human protein interactome. Quantita- considered target-based, and the rest are hence assumed tive topological analyses of this drug-target network cell-based bioassays. Moreover, we have found 289 dis- revealed that the targets of current drugs are highly tinct protein GI (gene identifier) numbers