Big Data: the Next Frontier for Innovation in Therapeutics and Healthcare

Perspective Big data: the next frontier for innovation in therapeutics and healthcare Expert Rev. Clin. Pharmacol. 7(3), 293–298 (2014) Naiem T Issa1, Advancements in genomics and personalized medicine not only effect healthcare delivery from Stephen W Byers1,2 patient and provider standpoints, but also reshape biomedical discovery. We are in the era of ‘ ’ ’ and Sivanesan the -omics , wherein an individual s genome, transcriptome, proteome and metabolome can be scrutinized to the finest resolution to paint a personalized biochemical fingerprint that enables Dakshanamurthy*1,2 tailored treatments, prognoses, risk factors, etc. Digitization of this information parlays into ‘big 1 Department of Oncology, Lombardi data’ informatics-driven evidence-based medical practice. While individualized patient Cancer Center, Georgetown University Medical Center, Washington, DC USA management is a key beneficiary of next-generation medical informatics, this data also harbors 2Department of Biochemistry and a wealth of novel therapeutic discoveries waiting to be uncovered. ‘Big data’ informatics allows Molecular Biology, Georgetown for networks-driven systems pharmacodynamics whereby drug information can be coupled to University Medical Center, Washington, cellular- and organ-level physiology for determining whole-body outcomes. Patient ‘-omics’ data DC USA *Author for correspondence: can be integrated for ontology-based data-mining for the discovery of new biological Tel.: +1 202 687 2347 associations and drug targets. Here we highlight the potential of ‘big data’ informatics for [email protected] clinical pharmacology. KEYWORDS: big data • clinical pharmacology • personalized medicine • systems medicine • therapeutics For personal use only. The digital revolution in healthcare is now. associated with increased risk of adverse car- The amount of data is exploding from basic diovascular events [5]. ‘Systems medicine’ has science to clinically based genomics and per- brought with it a slew of technologies and sonalized medicine, and continues to evolve in novel perspectives to analyze disease as an healthcare at both the population and the indi- interconnected synergistic system of working vidual levels. Clinical phenotypes are being parts. Clinical pharmacology is transitioning to described more quantitatively and biochemi- reflect this interconnectedness, where the tradi- cally using genomics, transcriptomics, proteo- tional unidirectional movement of ‘bench-to- mics and metabolomics [1–3]. Collecting and bedside’ is now a bidirectional process that analyzing such large data sets, coined ‘big depends on both end points to achieve novel data’, will become key to new healthcare inno- therapeutics and better understanding of effects vations. In anticipation, the USA and Euro- caused by current pharmaceuticals. The follow- pean Commission announced a national ‘Big ing sections provide insight into the available Data Initiative’ for policy preparedness [4]. resources and techniques being applied in Advancements in big data analytics not only this transformation. affect healthcare delivery from patient and pro- Expert Review of Clinical Pharmacology Downloaded from informahealthcare.com by Karolinska Institutet University Library on 03/16/15 vider standpoints but also hold promise to Big data in clinical discovery reshape biomedical discovery. For example, New-age drug discovery approaches encompass decoding a single human genome originally a wide range of big data analytics, from high- took a decade to process, but with the advent throughput cellular and protein-binding assays of ‘Big Science’, modern DNA sequencing and to chemoinformatics-driven databases. Many informatics approaches can achieve this within of these databases are undergoing extensive a week. With respect to clinical pharmacology, improvements to become central hubs for the a big data initiative by Medco recently helped integration of biological and physicochemical uncover that the simultaneous use of clopidog- information (TABLE 1). In addition, new patterns rel (Plavix) and proton pump inhibitors is and associations are being discovered using informahealthcare.com 10.1586/17512433.2014.905201 Ó 2014 Informa UK Ltd ISSN 1751-2433 293 Perspective Issa, Byers & Dakshanamurthy Table 1. Table of open-access resources available to the public for bioinformatics and chemoinformatics in drug discovery and toxicology. Category Database Description Drug BindingDB Binding affinities of small, drug-like molecules to protein targets ChEBI Small chemical compounds containing structural, nomenclature and ontology ChemBank information ChEMBL Biomedical measurements derived from cell lines treated with small molecules DrugBank Manually curated bioactive molecules with drug-like properties maintained by the PharmGkb EBI SuperTarget FDA-approved and experimental drugs with drug target, bio- and Therapeutic chemoinformatic data Target Database Pharmacogenomic-focused genetic, molecular, cellular and clinical data for drugs ZINC ~7300 drug–target associations with ~5000 manually annotated Known therapeutic protein targets with pathway information and corresponding drugs ~21 million compounds that are commercially available and prepared for virtual screening Disease National Organization NORD contains information on rare human diseases of Rare Diseases Catalog of human genes and genetic disorders maintained by the Johns Hopkins Online Mendelian University Inheritance of Man Protein–protein/–gene/ BioGRID ~730,000 raw protein and genetic interactions from major model organisms –other interactions DGIdb Drug gene interaction database curated from multiple well-established databases ExPASy STRING Known and predicted protein–protein interactions from experimental repositories MatrixDB and computational methods MINT Interactions between extracellular proteins (i.e., collagen and laminins) and Database of polysaccharides Interacting Proteins Molecular interaction database focusing on experimentally validated protein– protein interactions Manual and computational curation of experimentally determined protein–protein interactions For personal use only. Genomics Gene Expression Public functional array- and sequence-based genomics data repository Omnibus Cancer microarray database that can be subdivided by treatment, patient survival Oncomine and other demographics The Cancer Large-scale genome sequencing platform for multiple cancers led by the NCI and Genome Atlas the NHGRI UCSC Cancer Interactive annotated cancer genome-browser website hosted by the University of Genome Browser California, Santa Cruz G-DOC Broad collection of bioinformatics and systems biology tools for analysis and visualization of four major ‘omics’ types: DNA, mRNA, microRNA and metabolites Proteomics dbDEPC Database of differentially expressed proteins in human cancer GeMDBJ Proteomics Clinical and cell line protein LC-MS/MS and 2D-difference gel electrophoresis for Plasma Protein Database expression levels PRIDE Initiative of the Human Proteome Organization to characterize human plasma The Human Protein Atlas and serum proteome UniProt Centralized standards-compliant mass spectrometry proteomics and post- translational modifications Expert Review of Clinical Pharmacology Downloaded from informahealthcare.com by Karolinska Institutet University Library on 03/16/15 Immunohistochemistry-based protein expression profiles of various human tissues, cancers and cell lines Comprehensive protein sequence and annotation data Metabolomics BiGG Genomic-based reconstruction of human metabolism for systems biology HMDB simulation and flux modeling HumanCyc Human small molecule metabolites with associated chemical, clinical and SMPDB molecular biology information Human metabolic pathway/genome bioinformatics database constituting over 28,000 genes Small molecule pathway database with >400 unique human pathways not found in other databases 294 Expert Rev. Clin. Pharmacol. 7(3), (2014) Big data Perspective Table 1. Table of open-access resources available to the public for bioinformatics and chemoinformatics in drug discovery and toxicology (cont.). Category Database Description Toxicology Chemical Effects in Developed by the National Institute of Environmental Health Sciences to house Biological Systems toxicology studies Comparative Curation of chemical–gene, chemical–disease, and gene–disease associations and Toxicogenomics Database constructs networks EPA ACToR Aggregated Computational Toxicology Resource to query multiple EPA chemical FDA Adverse Event toxicity databases Reporting System Adverse event and medication error reports submitted to the FDA for post- OpenTox market safety surveillance SIDER Interoperable framework for predictive toxicology and community platform for T3DB creation of applications TOXNET Adverse drug reactions on marketed medicines extracted from public documents and package inserts ~37,000 pollutant–, pesticide– and food toxin–target associations with 50 related bioinformatics data fields Integrated database system of hazardous chemicals, toxic releases and environmental health by the NLM Structural biology/ BRENDA Comprehensive repository for molecular and biochemical information of enzymes ontology/enzymology/ Gene Ontology classified by the IUBMB pathways Ingenuity Gene and gene product annotations across multiple species with supported tools KEGG for analytics RCSB PDB Web-based applications for analyzing genomic and pathway data Reactome Five databases that connect molecular interaction networks to recapitulate

Big Data: the Next Frontier for Innovation in Therapeutics and Healthcare

Zebrafish Disease Models to Study the Pathogenesis of Inherited Manganese Transporter Defects and Provide A

Glycomics Goes Visual and Interactive

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Pathogenicity and Selective Constraint on Variation Near Splice Sites

Viroinformatics Investigation of B-Cell Epitope Conserved Region in SARS

Bioinformatics Exercises: Bovine Lactate Dehydrogenase (LDH)

Biocuration Experts on the Impact of Duplication and Other Data Quality Issues in Biological Databases

Plant Protein Annotation in the Uniprot Knowledgebase

Europe PMC Funders Group Author Manuscript Nature

Sedum Alfredii Sanramp6 Metal Transporter Contributes to Cadmium Accumulation in Transgenic Arabidopsis Thaliana

C2orf62 and TTC17 Are Involved in Actin Organization and Ciliogenesis in Zebrafish and Human

How to Get the Best from Fission Yeast Genome Data