Challenges for Ontology Repositories and Applications to Biomedicine & Agronomy

Position paper – Keynote SIMBig 2017 – September 2017, Lima, Peru Challenges for ontology repositories and applications to biomedicine & agronomy Clement Jonquet Laboratory of Informatics, Robotics, and Microelectronics of Montpellier (LIRMM), University of Montpellier & CNRS, France & Center for BioMedical Informatics Research (BMIR), Stanford University, USA [email protected] (ORCID: 0000-0002-2404-1582) Abstract 1 Introduction The explosion of the number of ontologies The Semantic Web produces many vocabularies and vocabularies available in the Semantic and ontologies to represent and annotate any kind Web makes ontology libraries and reposi- of data. However, those ontologies are spread out, tories mandatory to find and use them. in different formats, of different size, with differ- Their functionalities span from simple on- ent structures and from overlapping domains. The tology listing with more or less of metada- scientific community has always been interested ta description to portals with advanced ontology-based services: browse, search, vis- in designing common platforms to list and some- ualization, metrics, annotation, etc. Ontol- time host and serve ontologies, align them, and ogy libraries and repositories are usually enable their (re)use (Ding and Fensel, 2001; developed to address certain needs and Hartmann et al., 2009; D’Aquin and Noy, 2012; , communities. BioPortal, the ontology re- 1995). These platforms range from simple ontol- pository built by the US National Center ogy listings or libraries with structured metadata, for Biomedical Ontologies BioPortal relies to advanced repositories (or portals) which fea- on a domain independent technology al- ture a variety of services for multiple types of ready reused in several projects from bio- semantic resources (ontologies, vocabularies, medicine to agronomy and earth sciences. terminologies, taxonomies, thesaurus) such as In this position paper, we describe six high level challenges for ontology repositories: browse/search, visualization, metrics, recommen- metadata & selection, multilingualism, dation, or annotation. In this paper, we will focus alignment, new generic ontology-based on ontology repositories, they allow to address services, annotations & linked data, and important questions: interoperability & scalability. Then, we If you have built an ontology, how do you let present some propositions to address these the world know and share it? challenges and point to our previously How do you connect your ontology to the rest published work and results obtained within of the semantic world? applications –reusing NCBO technology– If you need an ontology, where do you go to to biomedicine and agronomy in the con- get it? text of the NCBO, SIFR and AgroPortal How do you know whether an ontology is any projects. good? Keywords If you have data to index, how do you find the most appropriate ontology for your data? Ontologies, ontology libraries & reposito- If you look for data, how may the semantics ries, ontology metadata, ontology-based of ontologies help you locate them? services, ontology selection, semantic an- More generally, ontology repositories help “ontol- notation, BioPortal. ogy users” to deal with ontologies without managing them or engaging in the complex and long process of developing them. 25 Position paper – Keynote SIMBig 2017 – September 2017, Lima, Peru However, with big number of ontologies new 6. Scalability & interoperability. The com- problems have raised such as describing, select- munity of ontology developers and users is ing, evaluating, trusting, and interconnecting growing both horizontally (i.e., new do- them. From our experience working first on the mains) and vertically (i.e., new adopters in- US National Center for Biomedical Ontologies side a domain). Ontology repositories shall (NBCO) BioPortal, the most widely adopted bio- therefore scale to high number of ontologies, medical ontology repository and later on the SIFR while facilitating their alignments, and when BioPortal, a specific sub-portal to address the multiple repositories are created, they must French biomedical community and AgroPortal, an be interoperable. In the following, we will detail these challenges ontology repository for agronomy, we review and and briefly describe/point to results obtained in discuss six challenges in designing such plat- the context of our multiple ontology repository forms: projects. In some sense, this article is an index of 1. Metadata & selection. Ultimately, ontology repositories are made to share and reuse on- 10-years of published research in the domain of tologies. But which ontology should I reuse? ontology repositories. We do not report hereafter With too many different and overlapping on- all related work for each challenge neither we tologies, properly describing them with claim to have addressed them all. However, we metadata and facilitate their identification believe our results illustrate potential solutions to and selection becomes and important issue. move forward in that domain of research. We believe, as any other data, ontologies must be FAIR. 2 Background 2. Multilingualism. We live in a multilingual world, so are the concepts and entities from 2.1 Ontology libraries & repositories this world. The Semantic Web offers now With the growing number of ontologies devel- tools and standards to develop multilingual oped, ontology libraries and repositories have al- and lexically rich ontologies. Repositories ways been of interest in the Semantic Web com- must be able to deal with multiple languages munity. Ding and Fensel (2001) introduced the also. notion of ontology library and presented a review 3. Ontology alignment. No conceptualization of libraries at that time: is an island. It is now commonly agreed data “A library system that offers various func- interoperability cannot be achieved by means tions for managing, adapting and standardiz- of a single common ontology for a domain, and interlinking ontologies is the way for- ing groups of ontologies. It should fulfill the ward. But the more ontologies are being pro- needs for re-use of ontologies. In this sense, duced, the more the need for ontology an ontology library system should be easily alignment becomes important. accessible and offer efficient support for re- 4. Ontology-based services. On reason to using existing relevant ontologies and stand- adopt Semantic Web standards and use on- ardizing them based on upper-level ontolo- tology repositories is to benefit from multiple gies and ontology representation languages.” services for –and based on– ontologies. No The terms “collection”, “listing” or “registries” one likes to reimplement something already are also used to describe ontology libraries. All existing and that can be generalized to an- correspond to systems that help reuse or find on- other ontology just by dropping it in a reposi- tologies by simply listing them (e.g., DAML or tory. The portfolio of services for ontologies DERI listings) or by offering structured metadata available in repositories should then grows. to describe them (e.g., FAIRSharing, BARTOC). 5. Annotations and linked data. Ontologies But those systems do not support any services be- and vocabularies are the backbone of seman- yond description, especially based on the content tically rich data (Linked Open Data, of the ontologies. knowledge bases, etc.) as they are used to Hartmann et al., (2009) introduced the concept semantically annotate and interlink datasets. of ontology repository, with advanced features It is also important to facilitate semantic in- such as search, browsing, metadata management, dexing, search and data access directly from visualization, personalization, and mappings and the repositories. an application programming interface to query their content/services: 26 Position paper – Keynote SIMBig 2017 – September 2017, Lima, Peru “A structured collection of ontologies (…) applications that pull their data from the foundry, by using an Ontology Metadata Vocabulary. such as the NCBO BioPortal (Noy et al., 2009), References and relations between ontologies OntoBee (Ong et al., 2016), the EBI Ontology and their modules build the semantic model of Lookup Service (Côté et al., 2006) and more re- an ontology repository. Access to resources is cently AberOWL (Hoehndorf et al., 2015). In ad- realized through semantically-enabled interfac- dition, there exist other ontology libraries and re- es applicable for humans and machines. There- pository efforts unrelated to biomedicine, such as fore, a repository provides a formal query lan- the Linked Open Vocabularies (Vandenbussche et guage.” al., 2014), OntoHub (Till et al., 2014), and the By the end of the 2000’s, the topic was of high in- Marine Metadata Initiative’s Ontology Registry terest as illustrated by the 2010 ORES and Repository (Rueda et al., 2009). More recent- workshop (d’Aquin et al., 2010) or the 2008 On- ly, the SIFR BioPortal (Jonquet et al., 2016a) pro- tologySummit.1 The Open Ontology Repository totype was created at University of Montpellier to Initiative (Baclawski and Schneider, 2009) was a build a French Annotator and experiment multi- collaborative effort to develop a federated infra- lingual issues in BioPortal (Jonquet et al., 2015). structure of ontology repositories. At that time, the The same university is also developing AgroPor- effort already reused the NCBO tal, an ontology repository for agronomy and technology (Whetzel and Team, 2013) that was neighboring domains such as food,

Challenges for Ontology Repositories and Applications to Biomedicine & Agronomy

Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

Towards a Knowledge Graph for Science

Biocuration 2016 - Posters

Gearing up to Handle the Mosaic Nature of Life in the Quest for Orthologs. Kristoffer Forslund

The Evaluation of Ontologies: Editorial Review Vs

Interactive Knowledge Capture in the New Millennium: How the Semantic Web Changed Everything

BEHST: Genomic Set Enrichment Analysis Enhanced Through Integration of Chromatin Long-Range Interactions

Contextual Analysis of Large-Scale Biomedical Associations for the Elucidation and Prioritization of Genes and Their Roles in Complex Disease Jeremy J

Enabling Semantic Queries Across Federated Bioinformatics Databases

A Context Sensitive Model for Querying Linked Scientific Data

Genome Analysis

Democratizing Genome Annotation