Semantic Framework for Neurodegeneration Research Documented”
Total Page:16
File Type:pdf, Size:1020Kb
VirtualBrainCloud Personalized Recommendations for Neurodegenerative Disease www.VirtualBrainCloud-2020.eu Public Deliverable Report M18 deliverable D3.13 (formerly D3.5.1): “Complete, updated semantic framework for neurodegeneration research documented”. Date May 2020 Stephan Gebel, Sumit Madan, Stephan Springstubbe (Fraunhofer) Authors Petra Ritter (CHARITE) © VirtualBrainCloud consortium Dissemination level: public Website www.VirtualBrainCloud-2020.eu This project has received funding from the European Union´s Horizon 2020 research and innovation programme under grant agreement No 826421 Table of content 1. Preface ............................................................................................................................................. 3 2. Introduction ..................................................................................................................................... 4 3. Referential Ontology Hub for Applications within Neurosciences (ROHAN) .................................. 5 Description of ROHAN service ................................................................................................. 5 Clinical Trial Ontology .............................................................................................................. 7 Excerpt of ontology curation guideline ................................................................................... 9 Harmonization of Alzheimer’s disease clinical trial variables with extCTO ........................... 10 4. SCAIView-Neuro, a Semantic Search Engine ................................................................................. 10 5. NeuroImaging Feature Terminology (NIFT) ................................................................................... 14 6. NeuroImaging Data Model (NIDM) ............................................................................................... 15 7. Brain Imaging Data Structure (BIDS) ............................................................................................. 17 8. Minimal Information Neuroscience Data Set (MINDS) ................................................................. 18 9. Conclusions .................................................................................................................................... 19 10. Glossary ..................................................................................................................................... 20 2 of 21 © VirtualBrainCloud | public report 1. Preface A key objective of Task 3.4 “Workflows for clinical data curation and processing” and Task 3.5 “A metadata framework for unified metadata annotations and Data Catalogues” within the wider range of the Work Package 3 “Data Processing, Standardization and Data-FAIRification” is to deliver a semantic framework for neurodegenerative diseases, that serves as a central resource for controlled vocabularies and shared ontologies to access and use them within different TVB-Cloud work packages. Within TVB-Cloud two systems will be implemented and made available for the consortium members: • The Ontology Lookup Service (OLS) from Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), and • the CTS2 system from University of Genova (UNIGE). Figure 1: Schema of the Sematic Framework The two deliverables share the same premise, but one was mainly developed by Fraunhofer SCAI and the other by the University of Genoa. With these two deliverables two demonstrators of the websites are made available at the following addresses: • OLS, the ontology lookup service software is being used to host the Referential Ontology Hub for Applications within Neurosciences (ROHAN), which is accessible at https://rohan.scai.fraunhofer.de/ols/index. • CTS2 compliant interface: The read and query part are open to everyone, while the access to the maintenance part is limited by a login. This access can be given to any member that wants to make a trial. The address is: http://www.medinfo.dibris.unige.it/VBC_CTS2 (for more information see delivery D3.14: “Integrated, open source terminology versioning and management tool CTS2 compliant application shared with partners”). 3 of 21 © VirtualBrainCloud | public report 2. Introduction With the TheVirtualBrain Cloud(TVB-Cloud) project, we are building a reference infrastructure for sharing and processing Health and Biomedical research data, specifically data in the domain of neurodegenerative disease (NDD). To enable users to use the data stored in TVB-Cloud platform, the data need to be annotated – that is they receive labels with a certain meaning. The labels are the metadata. Metadata is the data that provides information about the data contained in the infrastructure. Metadata contain the information that can be searched and queried by users to find out what data are available, how they can be accessed and what data formats can be expected. A metadata standard provides a common understanding of the meaning or semantics of the data, to ensure correct and proper use and interpretation of the data. Metadata is "data about data." Many distinct types of metadata exist and there are specialized and well-accepted models to specify types of metadata. E.g. one differentiates between descriptive, structural, and administrative metadata – but also reference metadata and statistical metadata. Standardized concepts to assemble metadata – so called metadata schemes comprise: metadata standards and metadata models. Metadata standard refers to a common understanding of the meaning or semantics of the data. The metadata model defines the relations between classes of meta data. Terminology refers to a group of specialized words or meanings relating to a particular domain, e.g. NDD. Ontologies define formal naming and definition of classes and the relations between the classes for a given domain. For the type of data that is collected in TVB-Cloud, i.e. domain specific health and biomedical research data those standards for metadata and metadata models do not exist to the full extent that would be required. Tools such as controlled vocabularies, taxonomies, thesauri, data dictionaries, and metadata registries can be used to apply standardization to the metadata. So far, such controlled vocabularies in the domain of brain imaging and neurodegenerative disease are incomplete and therefore, as part of TVB-Cloud we are advancing existing terminologies and ontologies. The major effort of this WP is related to the updating of the ontologies that will be used in semantic framework within TVB-Cloud. These ontologies comprise the majority of relevant entities and their relationships that are required to describe and harmonize data and knowledge about neurodegenerative diseases. The newly developed or revised ontologies are included in an ontology store that provides end-users the browsing and searching ability. Concepts and terms from the integrated ontologies will be used for metadata annotation and mapping of variables thereby enabling interoperability of data from clinical trials (see Task 4.1 “Characterization of clinical data sources for modelling”). In addition, a dedicated literature mining environment for brain research is part of the semantic framework. 4 of 21 © VirtualBrainCloud | public report 3. Referential Ontology Hub for Applications within Neurosciences (ROHAN) Description of ROHAN service The ROHAN service is based on the software ‘Ontology Lookup Service’1 (OLS) that is developed by European Bioinformatics Institute (EBI) for exploring ontologies. This hub is a repository for biomedical resources that aims to provide a single point of access to the latest ontology and terminology versions for translational neurodegeneration disease research. ROHAN is developed in the context of Medical Informatics Platform of Human Brain Project and is being extended in TVB-Cloud (Figure 2). The service provides a web-based user interface for exploring and visualizing the ontologies (Figure 3), and additionally, a flexible RESTful API to access the resources programmatically. It also provides an utility that allows to regularly update ontologies with ease. Furthermore, the service includes a search index for terms and synonyms with autocomplete functionality (Figure 4). To manage ontologies, the service also uses a flexible configuration system. The ROHAN service is available at http://rohan.scai.fraunhofer.de/ols. Figure 2: Referential Ontology Hub for Applications within Neurosciences (ROHAN) 1 Source code is available at https://github.com/EBISPOT/OLS 5 of 21 © VirtualBrainCloud | public report Figure 3 Screenshot of a concept page from OLS Clinical Trials ontology Figure 4: Screenshot of the autocomplete functionality of ROHAN By May 20, 2020 the repository contains 14 terminologies and ontologies. Figure 5 shows a screenshot of the resources page of ROHAN service that lists the ontologies. These ontologies comprise the majority of relevant entities and their relationships that are required to describe and harmonize data and knowledge about neurodegenerative diseases. Updating and revision of the ontologies is an ongoing process within WP3 and WP5. In the next section, we describe the updating of the Clinical Trial Ontology. Other ontologies will be updated accordingly, if necessary. 6 of 21 © VirtualBrainCloud | public report Figure 5: Screenshot of terminologies and ontologies integrated into the ontology hub2. Clinical Trial Ontology To model the field of clinical trials, the Clinical Trial Ontology for Neurodegeneration Diseases