Wikipathways: a Multifaceted Pathway Database Bridging Metabolomics to Other Omics Research Denise N
Total Page:16
File Type:pdf, Size:1020Kb
Published online 10 November 2017 Nucleic Acids Research, 2018, Vol. 46, Database issue D661–D667 doi: 10.1093/nar/gkx1064 WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research Denise N. Slenter1, Martina Kutmon1,2, Kristina Hanspers3, Anders Riutta3, Jacob Windsor1, Nuno Nunes1, Jonathan Melius´ 1, Elisa Cirillo1, Susan L. Coort1, Daniela Digles4, Friederike Ehrhart1, Pieter Giesbertz5, Marianthi Kalafati1,2, Marvin Martens1, Ryan Miller1, Kozo Nishida6, Linda Rieswijk7, Andra Waagmeester1,8, Lars M.T. Eijssen1,9, Chris T. Evelo1,2, Alexander R. Pico3 and Egon L. Willighagen1,* 1Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, 6229 ER Maastricht, The Netherlands, 2Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, 6229 ER Maastricht, The Netherlands, 3Gladstone Institutes, San Francisco, California, CA 94158, USA, 4University of Vienna, Department of Pharmaceutical Chemistry, 1090 Vienna, Austria, 5Chair of Nutritional Physiology, Technische Universitat¨ Munchen,¨ 85350 Freising, Germany, 6Laboratory for Biochemical Simulation, RIKEN Quantitative Biology Center, Suita, Osaka 565-0874, Japan, 7Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA 94720, USA, 8Micelio, Antwerp, Belgium and 9School for Mental Health and Neuroscience, Department of Psychiatry and Neuropsychology, Maastricht University Medical Centre, 6229 ER Maastricht, The Netherlands Received September 14, 2017; Revised October 09, 2017; Editorial Decision October 10, 2017; Accepted October 25, 2017 ABSTRACT omics data. New search options, monthly downloads, more links to metabolite databases, and new portals WikiPathways (wikipathways.org) captures the col- make pathway knowledge more effortlessly accessi- lective knowledge represented in biological path- ble to individual researchers and research communi- ways. By providing a database in a curated, ma- ties. chine readable way, omics data analysis and visu- alization is enabled. WikiPathways and other path- way databases are used to analyze experimental data INTRODUCTION by research groups in many fields. Due to the open and collaborative nature of the WikiPathways plat- The WikiPathways initiative (wikipathways.org)startedin 2008, creating a platform where pathway curation could form, our content keeps growing and is getting more be performed by means of crowd sourcing (1). The plat- accurate, making WikiPathways a reliable and rich form’s knowledge-base supports many life sciences com- pathway database. Previously, however, the focus munities, ranging from plant biology (2) to drug discov- was primarily on genes and proteins, leaving many ery (3). The use of pathway knowledge in the life sciences metabolites with only limited annotation. Recent cu- is wide-spread, supported by many databases (4–6)aswell ration efforts focused on improving the annotation of as integrative resources (7,8). WikiPathways has been used metabolism and metabolic pathways by associating to analyze and integrate experimental transcriptomics, pro- unmapped metabolites with database identifiers and teomics, and metabolomics data (9–12). Our previous up- providing more detailed interaction knowledge. Here, date reported over 2300 pathways for 25 different species we report the outcomes of the continued growth and (13) and over the last two years this number has increased curation efforts, such as a doubling of the number by 14% to 2614 pathways, as of September 2017. Through pathway curation and the addition of new pathways, human of annotated metabolite nodes in WikiPathways. Fur- gene coverage present in at least one of our pathways has thermore, we introduce an OpenAPI documentation increased from 30% (7600) to ∼50% (11 532) of all unique of our web services and the FAIR (Findable, Acces- human protein-coding genes, see Figure 1. sible, Interoperable and Reusable) annotation of re- The WikiPathways database is improved by continuous sources to increase the interoperability of the knowl- data curation and updates through an expanding commu- edge encoded in these pathways and experimental nity: 634 individual contributors to date and 2850 edits on 1060 pathways between August 2016 to August 2017. The *To whom correspondence should be addressed. Tel: +31 43 3881231; Fax: +31 43 3881999; Email: [email protected] C The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D662 Nucleic Acids Research, 2018, Vol. 46, Database issue Figure 1. Overview of coverage of various gene spaces. WikiPathways (WP) currently covers 11 532 unique human genes. Venn diagram A shows that 50% of the protein coding genes (Ensembl: 22 376 genes) are found in WikiPathways. B shows the 66% coverage of all disease genes (OMIM: 15,262 genes), which also illustrates that the vast majority of genes in WikiPathways are associated with a disease. The C diagram shows that WikiPathways covers 71% of all genes known to be involved in human metabolism (GO metabolic process: 11 296 genes). availability of all data under a liberal license (wikipathways. org/license) ensures that all contributors benefit from the collective effort of the full WikiPathways community. Sim- ilar to the pathway data, the code base is open source and associated formats and ontologies conform to open stan- dards, allowing other developers to participate and join the team (github.com/wikipathways). Pathways are encoded in GPML format and created with PathVisio (14). Genes, pro- teins and metabolites are linked to other databases with the BridgeDb web service (15). All components are freely avail- able, developed in open collaborations and distributed as open source or open access products. This paper describes our efforts to increase the knowledge covered by biological pathways, with a focus on metabolites, and the usability of WikiPathways in general. These efforts include a more accurate annotation of chemical identities of metabolites, their enzymatic conversions, and protein in- teractions. This work was triggered by the coming of age of the metabolomics field (16), resulting in a growing amount of experimental data (17). The following sections describe our work of the past two years, organized as updates for biologists and chemists, updates for contributors and cura- tors, and updates for bioinformaticians and data scientists. Figure 2. Metabolite coverage growth in WikiPathways. Taking KEGG as the standard, we plot the growth of WikiPathways (WP) coverage over UPDATES FOR BIOLOGISTS AND CHEMISTS the past five years. WikiPathways and KEGG unique human metabolite counts were calculated by extracting identifiers from archived WikiPath- To quantify the results of our curation efforts regarding the ways releases and unifying to, ideally, Wikidata ID, otherwise ChEBI or growth in metabolite data, the numbers of annotated hu- HMDB. About two-third of all identifiers (blue line) could be mapped to man metabolites over the past years are shown in Figure a Wikidata (red line). Metabolite IDs that could not be mapped to these three databases, have decreased over the last five years (unmapped, green 2. Comparing the last five years, the total number of an- line). notated unique metabolite nodes follow a positive trend, having grown by 158% from 1213 in 2013 to 3133 in 2017. The total number of unique metabolites was estimated by Biochemical Pathways map (Giesbertz, P., Willighagen, E. counting the unique number of identifiers: all metabolite and Slenter, D. (2017) Biochemical Pathways Part I (Homo annotations were first normalized to Wikidata identifiers sapiens). wikipathways.org/instance/WP3604 r94542). Re- (18), second to ChEBI (19),thirdtoHMDB(20), in that actome pathways have been updated to version 61 us- order, when the normalization to Wikidata was not possi- ing an extended version of the previously reported Re- ble. If none of these three databases provided an identifier actome2GPML converter (21). This converter adds com- for the listed compound, we refer to the metabolite iden- ments to pathway descriptions stating the original Reac- tifier as unmapped. Importantly, the percentage of human tome pathway ID, version, and author list. metabolites without annotation for the full set of pathways Additionally, a considerable contribution to the growth decreased from 5% to 1% over the last five years. comprises improved annotation and curation efforts. We Last year’s growth can partly be explained by the addi- improved the biological annotation of several small car- tion of large metabolic pathways as new content, such as the bohydrates for multiple Homo sapiens pathways, such Nucleic Acids Research, 2018, Vol. 46, Database issue D663 as the Glycolysis and Gluconeogenesis pathway (User- any published pathway figure to be digitized and made ma- name:Kdahlquist, Coort, S., Fidelman, N., van Iersel, M., chine readable. We will prioritize requested pathways and Hanspers, K., Kelder, T., Bouwman, J., Pico, A., Wil- assign pathway modeling tasks to people in our curation lighagen, E., Kutmon, M., et al. (2017) Glycolysis and team. Given that thousands of pathway images are avail- Gluconeogenesis (H. sapiens). wikipathways.org/instance/ able,