Open Targets Platform: New Developments and Updates Two Years On

D1056–D1065 Nucleic Acids Research, 2019, Vol. 47, Database issue Published online 20 November 2018 doi: 10.1093/nar/gky1133 Open Targets Platform: new developments and updates two years on Denise Carvalho-Silva1,2,*, Andrea Pierleoni1,2, Miguel Pignatelli1,2, ChuangKee Ong1,2, Luca Fumis1,2, Nikiforos Karamanis1,2, Miguel Carmona1,2, Adam Faulconbridge1,2, Andrew Hercules1,2, Elaine McAuley1,2, Alfredo Miranda1,2, Gareth Peat1,2, Michaela Spitzer1,2, Jeffrey Barrett2,3, David G. Hulcoop2,4, Eliseo Papa2,5, Gautier Koscielny2,4 and Ian Dunham1,2,* 1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK, 2Open Targets, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK, 3Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK, 4GSK, Medicines Research Center, Gunnels Wood Road, Stevenage, SG1 2NY, UK and 5Biogen, Cambridge, MA 02142, USA Received September 14, 2018; Revised October 22, 2018; Editorial Decision October 23, 2018; Accepted October 26, 2018 ABSTRACT user support, social media and bioinformatics forum engagement. The Open Targets Platform integrates evidence from genetics, genomics, transcriptomics, drugs, animal models and scientific literature to score INTRODUCTION and rank target-disease associations for drug Drug discovery is a long and costly endeavour characterized target identification. The associations are dis- by high failure rates. Failure often occurs at the later stages played in an intuitive user interface (https://www. of the drug discovery pipeline and the reasons for the low targetvalidation.org), and are available through success are largely twofold: lack of safety and/or lack of ef- a REST-API (https://api.opentargets.io/v3/platform/ ficacy. This reflects insufficient understanding of the roleof docs/swagger-ui) and a bulk download (https://www. the chosen target in disease, and the consequences of mod- targetvalidation.org/downloads/data). In addition to ulating it with a drug. Over the last several years, there has target-disease associations, we also aggregate and been an increase in the number of biological and chemical display data at the target and disease levels to aid databases available for better understanding of drug targets target prioritisation. Since our first publication two (1). These databases can be used to assist with target identification, one of the most important stages in drug discovery years ago, we have made eight releases, added new (2). data sources for target-disease associations, started The Open Targets Platform (https://www. including causal genetic variants from non genome- targetvalidation.org) is a freely available resource for wide targeted arrays, added new target and disease the integration of genetics, omics and chemical data to annotations, launched new visualisations and im- aid systematic drug target identification and prioritisa- proved existing ones and released a new web tool tion. The Open Targets Platform capitalises on publicly for batch search of up to 200 targets. We have a available databases to create a virtuous cycle where we new URL for the Open Targets Platform REST-API, add value to the original data by computing, scoring and new REST endpoints and also removed the need for ranking integrated target-disease associations (3), linking authorisation for API fair use. Here, we present the these associations back to the underlying evidence and its latest developments of the Open Targets Platform, provenance. We have expanded the Platform to include data from expanding the evidence and target-disease associ- more projects and initiatives in translational research and ations with new and improved data sources, refin- medicinal chemistry, such as Genomics England (4), the ing data quality, enhancing website usability, and in- Structural Genomics Consortium (https://www.thesgc.org) creasing our user base with our training workshops, and the Institute of Cancer Research (https://www.icr.ac. *To whom correspondence should be addressed. Tel: +44 1223 492 697; Fax: +44 1223 494 468; Email: [email protected] Correspondence may also be addressed to Ian Dunham. Tel: +44 1223 492 636; Fax: +44 1223 494 468; Email: [email protected] C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2019, Vol. 47, Database issue D1057 uk), and continue to adhere and contribute to international New data sources for target-disease associations naming standards and ontologies through our ongoing col- We have incorporated four new data sources to enhance our laboration with the Experimental Factor Ontology (EFO) evidence: Genomics England PanelApp and the PheWAS (5) and the Evidence & Conclusion Ontology (ECO) (6). catalogue (within the data type Genetic associations) and Although the main port of access for all our data is a SLAPenrich and PROGENy (for the data type Affected graphical user interface (GUI) designed for bench scien- pathways). tists working in early drug discovery (7), we have observed an uptake in the use of our REST-API and data downloads. Moreover, due to the availability of the Open Tar- Genetic associations gets Platform snapshots, our database can now be re-created Since our previous publication, we have added two new data by other parties. Here, we describe the developments since sources as evidence for Genetic associations between tar- our first publication, focusing on the new data sources for gets and Mendelian and more common diseases: Genomics target-disease associations, new target and disease annota- England PanelAPP and the PheWAS catalog, respectively. tions for target prioritisation, and new intuitive visualisa- With these new data sources, we have been able to identify tions designed with ongoing focus on usability. new associations (e.g. between SERPING1 and Immunod- eficiency due to an early component of complement defi- ciency based on evidence from the Genomics England Pan- elAPP, or between MC1R in hyperlipidemia based on evi- NEW DEVELOPMENTS AND PROGRESS dence from the PheWAS catalog) or added further support More data with continuing emphasis on user experience to previously identified associations (e.g. between KCNE3 in Brugada syndrome based on evidence from the Genomics A key factor for drug target identification and prioritisa- England PanelAPP, or NOD2 in Crohn’s disease based on tion is the causal association of the target with a disease. We evidence from the PheWAS catalog). compute an association score based on genetics, genomics, We have included the Genomics England PanelApp transcriptomics, drug information, animal models and sci- Green genes (version 1+ panels) (4) along with their entific literature evidence. Our scoring framework has been (mainly) rare, Mendelian diseases or phenotypes, providing described in our previous publication (3). Briefly, the com- these can be mapped to an ontology, such as EFO (5), Or- putation is carried out at four different levels to give rise phanet (http://www.orpha.net) or Human Phenotype On- to evidence scores, data source scores, data type scores, and tology (HP) (8). We use the PanelApp WebServices (https: an overall association score. To compute the evidence score, //panelapp.genomicsengland.co.uk/#!Webservices) to ob- we take into account specific factors that affect the strength tain the associations and Ontoma (https://pypi.org/project/ of the evidence used for the target-disease associations (See ontoma/) for the automatic mapping of diseases and phe- table 2 in (3). In order to obtain a score for data sources notypes. and data types, we use the harmonic sum to aggregate in- The Genomics England Green genes are curated and dividual evidence and data source scores, respectively. Our crowdsourced by experts; hence the target-disease associa- overall association score is the result of the aggregation of tions that are supported by this evidence in our Platform all data sources using the harmonic sum (Supplementary have the highest score of 1. Figure S1). For common and complex diseases, we have added ge- Since our first publication, we have continued to explore netic evidence from PheWAS (9), scored following our new datasets to be included as evidence for new target- methodology for GWAS evidence (3) but scaled according disease associations or refinement of existing ones. Our cri- to the maximum number of cases (8800) and the P-value teria to consider new data sources are: (i) relevance (can the range (0.05 and 1e-25) of the PheWAS data. data be used to associate targets with diseases? Does it sug- Details on the scoring of these new data sources for Ge- gest a causal link between a target and a disease? Does it netic associations are described in our help documentation enable prioritisation decision by target properties?); (ii) ease (https://docs.targetvalidation.org/getting-started/scoring). of integration (does the data use an ontology? Are the targets provided as either UniProt ID or Ensembl gene IDs? Affected pathways How much term mapping will be required? Is there a score or threshold that can be used to rank the data points?); (iii) Besides the new data sources for Genetic associations, we accessibility (is the data publicly available, free and easy to have also included two new data sources for Affected path- access through an API or downloads?) and (iv) sustainabil- ways, more specifically in cancer. The new data sources, ity (is the data source likely to be maintained over the long SLAPenrich (10)andPROGENy(11), have mostly high- term? Is the data frequently updated?). Once we select new lighted new associations, such as EGFR in squamous cell data sources, they are combined into broader data types: lung carcinoma based on PROGENy and PTEN in prostate Genetic associations, Somatic mutations, Drugs, Affected adenocarcinoma based on SLAPenrich.

Open Targets Platform: New Developments and Updates Two Years On

The Staurotypus Turtles and Aves Share the Same Origin of Sex Chromosomes but Evolved Different Types of Heterogametic Sex Determination

Role of Transfer RNA Modification and Aminoacylation in the Etiology of Congenital Intellectual Disability

A Master Autoantigen-Ome Links Alternative Splicing, Female Predilection, and COVID-19 to Autoimmune Diseases

PROTEOME of the HUMAN CHROMOSOME 18: GENE-CENTRIC IDENTIFICATION of TRANSCRIPTS, PROTEINS and PEPTIDES Addendum to the Roadmap

Network Pharmacological Mechanism Research of Herba Drynariae Rhizoma-Epimedii Folium in Treating Osteoarthritis

Mouse Nars Knockout Project (CRISPR/Cas9)

Supplementary Table 1: Genes Located on Chromosome 18P11-18Q23, an Area Significantly Linked to TMPRSS2-ERG Fusion

Novel Genes Associated with Colorectal Cancer Are Revealed by High Resolution Cytogenetic Analysis in a Patient Specific Manner

Multiscale Genomic Analysis of The

Dissection of a QTL Hotspot on Mouse Distal Chromosome 1 That

Coexpression Networks Based on Natural Variation in Human Gene Expression at Baseline and Under Stress

Drosophila Models of Pathogenic Copy-Number Variant Genes Show Global and Non-Neuronal Defects During Development