A Web Tool and Library for Chemical Enrichment Analysis Based on The

Moreno et al. BMC Bioinformatics (2015) 16:56 DOI 10.1186/s12859-015-0486-3 SOFTWARE Open Access BiNChE: A web tool and library for chemical enrichment analysis based on the ChEBI ontology Pablo Moreno1†,StephanBeisken1†, Bhavana Harsha1, Venkatesh Muthukrishnan1,IlincaTudose1, Adriano Dekker1,StefanieDornfeldt2, Franziska Taruttis2,3, Ivo Grosse4,5, Janna Hastings1*, Steffen Neumann2 and Christoph Steinbeck1 Abstract Background: Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. Results: We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. Conclusions: BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library. Keywords: Ontology, Enrichment, Small molecules Background which processes they participate in, and which functions High-throughput “omics” research produces large sets they bear. Enrichment analysis methods assess whether of data for genes, proteins, and metabolites. Ontology- there is a significant over or under representation of enti- based enrichment analysis helps to investigate and filter ties for ontology classes, when compared to a background these data sets by assigning meanings to statistical pat- population representing the null hypothesis that the data terns found. Ontologies are hierarchies of biologically rel- are not enriched for that class. No classes would be found evant groupings such as processes, functions, or activities. to be enriched if, for example, the measured biological Ontology annotations link ontology classes to the biolog- entities’ annotations were found to be evenly distributed ical entities that are known to belong to those groupings between all available ontology classes, or proportionately or participate in those processes, functions or activities. according to the known biological distribution for the For example, the Gene Ontology (GO [1]) provides hierar- measured sample (as represented in the proportionality of chies of cellular compartments, biological processes, and the background annotation set). molecular functions. GO annotations link genes and pro- Many enrichment analysis tools have been designed teins to those GO classes where the genes are expressed, for use with genes or proteins and the GO, including GOMiner, FatiGO, BiNGO and GSEA. For a review, *Correspondence: [email protected] please see Huang et al. [2]. These tools take as input †Equal Contributors 1Cheminformatics and Metabolism, European Molecular Biology Laboratory - lists of gene or protein identifiers and test for enrichment European Bioinformatics Institute, Cambridge, UK of GO classes against the background of the full set of Full list of author information is available at the end of the article GO annotations. With the advent of small molecule high © 2015 Moreno et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Moreno et al. BMC Bioinformatics (2015) 16:56 Page 2 of 7 throughput data arising from metabolomics – the study rather than being available as a Web utility. The National of the chemical processes involving metabolites by mea- Center for Biomedical Ontology (NCBO) has also devel- suring the presence and quantity of different chemicals oped an ontology-neutral enrichment analysis tool [7], but in biological samples – similar tools are now needed for that is available exclusively by web service, i.e. there is small molecules. no associated graphical user interface, and the output it ChEBI (Chemical Entities of Biological Interest) is a provides is either a tag cloud image (sized for relative fre- manually curated database and ontology that organizes quency) or XML, lacking detail from the source ontology small molecule knowledge [3]. ChEBI offers both an and the statistical significance of the enrichment. ontology of chemical structural classes, such as car- Performing an enrichment analysis with ChEBI has boxylic acid and steroid, and a role ontology in which additional subtleties when compared to the GO, further biological and chemical activity classes are organized, hindering the applicability of generic tools to the chemical such as toxin and solvent. ChEBI also contains fully case. Firstly, ChEBI includes actual small molecules at the specifed small molecules at the leaf level of the struc- leaf level, while the GO does not include genes or proteins. tural hierarchy – i.e. those that might be measured in Thus, ChEBI contains both complex class definitions and metabolomics experiments – such as proline and caf- content that is analogous to GO annotations. Secondly, feine. Molecules are linked to structural classes with the while the GO is available in separate files for its three is_a relation and to functional classes with the has_role main branches, and in slimmed versions that facilitate relation. For example, cefpodoxime (CHEBI:3504) is_a enrichment analysis, the ChEBI ontology is only available carboxylic acid (CHEBI:33575) and has_role antibiotic as a single ontology file that includes the roles and the (CHEBI:22582). ChEBI’s chemical ontology straightfor- structural classification. ChEBI also includes parts that are wardly allows enrichment analysis techniques to be trans- irrelevant for small molecule enrichment analyses, such ferred to the case of small molecules. Here, we present as sub-atomic particles. All of these issues are solved by BiNChE, an enrichment analysis tool for the ChEBI BiNChE. In contrast to BiNChE, the use of generic enrich- ontology. It is available online at http://www.ebi.ac.uk/ ment analysis tools would require manual pre-processing chebi/tools/binche/ and as open source software libraries of the ChEBI ontology to remove irrelevant parts of the (https://github.com/pcm32/BiNChE and https://github. ontology such as small molecules and sub-atomic parti- com/pcm32/BiNCheWeb). cles. Subsequently one would also have to produce a file BiNChE is not the first chemical enrichment analysis which annotates each queried ChEBI entity with its ChEBI service. MSEA [4] was the first small molecule enrich- ontology ancestor terms in order to resemble the use case ment analysis service, but it only used collections of expected by these tools where proteins are given that have small molecules that were considered relevant, rather than annotated Gene Ontology terms. chemical ontologies with their richer structure. MBRole [5] was the first small molecule enrichment analysis ser- Implementation vice to harness ChEBI’s ontology for enrichment. While BiNChE is implemented in Java version 1.6, starting from MBRole now provides a number of different chemical the open source code of BiNGO. The application is knowledge organization schemes for enrichment analy- divided into two projects (with no duplication of code sis, including ChEBI, it displays the results as a plain between them): a core engine, which can be used as an table, losing the richness of the structure in the underly- standalone desktop application or as a Java library for ing ontologies. BiNChE, by contrast, returns graph results integration with other projects; and a web application, and offers a rich visualisation interface. Neither does which displays the enrichment analysis results through an MBRole include weighted analysis, that is, the ability to interactive visualization of the ChEBI ontology. The graph influence the enrichment analysis through the inclusion based visualization relies on CytoscapeWeb [8] for render- of intensity values, fold-changes or p-valuesasweightsfor ing and provides export functions to generate publication the molecules – an option offered by BiNChE. quality images. Results can be also retrieved in tabular and Chemical enrichment analysis is also possible with non- network data formats. dedicated tools. BiNGO [6] is a plugin for Cytoscape that The web-based display of the results (Figure 1) enables is widely used for protein enrichment analysis and gen- the user to modify the resulting enrichment graph erates interactive graph-based outputs which mimic the through manual re-arrangements and built-in layout structure of the input ontology. BiNGO can be used with algorithms. Nodes representing ChEBI entities display ontologies

A Web Tool and Library for Chemical Enrichment Analysis Based on The

The Structure and Antioxidant Properties

S41467-020-16549-2.Pdf

DATABASE Report April 2019 [email protected] TABLE of CONTENTS TABLE of CONTENTS

Using Natural Language Processing Methods to Support Curation of a Chemical Ontology., Doctor of Philosophy, May 2014 DECLARATION

The Experimental Factor Ontology Utilities in Knowledge Discovery and Sharing

SOT 2017 P225 AOP Poster Luettich Et Al Final

Expanding Opportunities for Mining Bioactive Chemistry from Patents, Drug Discov Today: Technol (2015), 10.1016/J.Ddtec.2014.12.001

The Role of Uniprot's Protein Sequence Databases in Biomedical Research

Carbon Dioxide - Wikipedia

Evaluation and Cross-Comparison of Lexical Entities of Biological Interest (Lexebi)

Chemistry Requirements

Extension of Roles in the Chebi Ontology