Published online 4 November 2019 Nucleic Acids Research, 2020, Vol. 48, Database issue D845–D855 doi: 10.1093/nar/gkz1021 The DisGeNET knowledge platform for disease genomics: 2019 update Janet Pinero˜ , Juan Manuel Ram´ırez-Anguita, Josep Sauch-Pitarch,¨ Francesco Ronzano, Emilio Centeno, Ferran Sanz and Laura I. Furlong *
Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Medical Research Institute (IMIM), Department of Experimental and Health Sciences, Pompeu Fabra University (UPF), Barcelona, Spain Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D845/5611674 by guest on 20 April 2020 Received September 14, 2019; Revised October 14, 2019; Editorial Decision October 15, 2019; Accepted October 18, 2019
ABSTRACT sults of genomic analysis and the identification of variant of clinical relevance remain a significant challenge (1). Vari- One of the most pressing challenges in genomic ant assessment still involves manual exploration of multi- medicine is to understand the role played by ge- ple sources of data, which requires a significant amount of netic variation in health and disease. Thanks to the time and experts in the domain. In this context, new bioin- exploration of genomic variants at large scale, hun- formatic tools and resources that enable the automation of dreds of thousands of disease-associated loci have every possible step in this process are crucial. In this regard, been uncovered. However, the identification of vari- resources such as ClinVar (2), ClinGen (3), the Genomics ants of clinical relevance is a significant challenge England PanelApp (https://panelapp.genomicsengland.co. that requires comprehensive interrogation of previ- uk/), Orphanet (4) and OMIM (5), among others, have ous knowledge and linkage to new experimental re- demonstrated their utility to support variant interpretation. sults. To assist in this complex task, we created Dis- Here, we present a new release of DisGeNET, a knowledge management platform that houses one of the most exhaus- GeNET (http://www.disgenet.org/), a knowledge man- tive and publicly available catalogues of genes and genomic agement platform integrating and standardizing data variants associated with human diseases. Originally imple- about disease associated genes and variants from mented in 2010 as a Cytoscape plugin (6), during the last multiple sources, including the scientific literature. years DisGeNET has evolved into different formats and DisGeNET covers the full spectrum of human dis- tools (7–10), and it now undergoes its sixth release as a eases as well as normal and abnormal traits. The knowledge management platform aimed at supporting dif- current release covers more than 24 000 diseases ferent application scenarios and users. and traits, 17 000 genes and 117 000 genomic vari- ants. The latest developments of DisGeNET include The DisGeNET database contents new sources of data, novel data attributes and pri- The core concepts in the DisGeNET database structure oritization metrics, a redesigned web interface and (Figure 1A) are the Gene–Disease Association (GDA) and recently launched APIs. Thanks to the data standard- the Variant–Disease Association (VDA), that are collated ization, the combination of expert curated informa- from different data sources (Figure 2). The integration of tion with data automatically mined from the scien- these diverse sources of data is enabled by proper standard- tific literature, and a suite of tools for accessing its ization of genes, variants, diseases (diseases, symptoms and publicly available data, DisGeNET is an interopera- traits) and associations using community-driven ontologies ble resource supporting a variety of applications in and controlled vocabularies, as well as ontologies developed genomic medicine and drug R&D. ad hoc (e.g. the DisGeNET association type ontology). Of note, the provenance of the information is provided in sev- eral ways: (a) as the field ‘original database’ that indicates INTRODUCTION where the data was taken from (e.g. ClinVar or UniProt Modern genome sequencing technologies are fostering the (11)), (b) with the number of articles that support the as- integration of genomics into clinical practice. The explo- sociation and the NCBI PMIDs of these publications and ration of human variation at large scale by genome sequenc- (c) with a text excerpt from the article that expresses the ev- ing or SNP array genotyping are enabling the identification idence for the association. GDAs and VDAs are further an- of disease-associated variants for a wide range of diseases notated with in-house and external attributes easing data and conditions. Nevertheless, the interpretation of the re- analysis, exploration and prioritization. For the attributes
*To whom correspondence should be addressed. Tel: +34 93 316 0521; Fax: +34 93 316 0550; Email: [email protected]