Semantic Data Integration and Search
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Data Integration Prof. Dr. Taysir H. Soliman and Marwa H. Abdel Reheim Information Systems Department Faculty of Computers and Information Assiut University, Egypt BioDialog Team BioDialog Summer School, Hurghada 2017 1 BioDialog Summer School, Hurghada 2017 2 BioDialog Summer School, Hurghada 2017 3 Finding Relevant Data … A Big Problem Text Data Image Data In 2007 Jim Gray preached about the effects of the Data Deluge in the sciences (Hey, Tansley, and Tolle 2009). Whereas experimental and theoretical paradigms originally led science, some natural phenomena were not easily addressed by analytical models. In this scenario, computational simulation arose as a new paradigm enabling scientists to deal with these complex phenomena. Simulation produced increasing amounts of data, particularly from the use of advanced exploration instruments (large-scale telescopes, particle colliders, etc.) In this scenario, scientists were no longer interacting directly with the phenomena, but used powerful computational configurations to analyze the data gathered from simulations or captured by instruments. Sky maps built from the Sloan Digital Sky Survey observations, or the evidences found about the Higgs Boson are just two successful stories of just another paradigm, what Gray called the fourth paradigm: the eScience. A Lot of Heterogeneous Data Everywhere???? Excel Sheets Public Databases Signal Data BioDialog Summer School, Hurghada 2017 4 Example 1 • Find data relevant to a change of temperature affecting a kind of agriculture BioDialog Summer School, Hurghada 2017 5 Example 2 • Find data relevant to publication in Excel sheets Air Temperature ?? BioDialog Summer School, Hurghada 2017 6 Example 3 • Find relevant data in related web sites , i.e. gbif “Ammannia auriculata “ … BioDialog Summer School, Hurghada 2017 7 Data from GBif phylum class order family genus species Tracheophyta Magnoliopsida Myrtales Lythraceae Ammannia Ammannia auriculata Tracheophyta Magnoliopsida Myrtales Lythraceae Ammannia Ammannia auriculata Tracheophyta Magnoliopsida Ericales Primulaceae Anagallis Anagallis arvensis Tracheophyta Magnoliopsida Boraginales Boraginaceae Arnebia Arnebia hispidissima Tracheophyta Magnoliopsida Fabales Fabaceae Astragalus Astragalus sieberi Tracheophyta Magnoliopsida Fabales Fabaceae Astragalus Astragalus sieberi Lecanoromycet Ascomycota es Teloschistales Teloschistaceae Caloplaca Caloplaca erythrina Tracheophyta Magnoliopsida Cucurbitales Cucurbitaceae Cucurbita Cucurbita maxima Tracheophyta Magnoliopsida Asterales Asteraceae Centaurea Centaurea scoparia .csv data Two oleananes from Ammannia auriculata Willd. Gohar AA1, Maatooq GT, Mrawan EM, Zaki AA, Takaya Y. Author information Abstract Two new compounds: 3-β,15-α,23,28-tetrahydroxyolean- 12-en-3-O-arabinopyaranoside and 3-β,23,28-trihydroxy- olean-12-en-3-O-β-D-glucopyranoside were isolated from the aerial parts of Ammania auriculata along with the known compounds kaempferol, β-sitosterol-3-O-β- D- glucoside, 2-α,3-β,23-trihydroxyolean-12-en-28-oic acid- 28-O-β-D-glucopyranoside, quercetin, kaempferol-3-O-α- L-arabinofuranoside, kaempferol-3-O-β-D-xylopyranoside and ellagic acid. Structures of these compounds were elucidated on the basis of their spectroscopic data (NMR, UV, MS and IR spectra). The antioxidant activities of the total extract, the fractions CH(2)Cl(2), EtOAc and the remaining aqueous together with the compounds 1, 6 BioDialog Summer School, Hurghada 2017and 9 were comparable with that of the standard8 antioxidant, ascorbic acid. Data from GBif phylum class order family genus species Tracheophyta Magnoliopsida Myrtales Lythraceae Ammannia Ammannia auriculata Tracheophyta Magnoliopsida Myrtales Lythraceae Ammannia Ammannia auriculata Tracheophyta Magnoliopsida Ericales Primulaceae Anagallis Anagallis arvensis Tracheophyta Magnoliopsida Boraginales Boraginaceae Arnebia Arnebia hispidissima Tracheophyta Magnoliopsida Fabales Fabaceae Astragalus Astragalus sieberi Tracheophyta Magnoliopsida Fabales Fabaceae Astragalus Astragalus sieberi Lecanoromycet Ascomycota es Teloschistales Teloschistaceae Caloplaca Caloplaca erythrina Tracheophyta Magnoliopsida Cucurbitales Cucurbitaceae Cucurbita Cucurbita maxima Tracheophyta Magnoliopsida Asterales Asteraceae Centaurea Centaurea scoparia .csv data BioDialog Summer School, Hurghada 2017 9 Outline • Semantic Data Integration • Semantic Web • Ontologies • Semantic Data Annotation • Hands-On Tutorial BioDialog Summer School, Hurghada 2017 10 Ammannia auriculata The question is do we need more integration?? What data do we need to integrate and How?? BioDialog Summer School, Hurghada 2017 11 BioDialog Summer School, Hurghada 2017 12 BioDialog Summer School, Hurghada 2017 13 BioDialog Summer School, Hurghada 2017 14 BioDialog Summer School, Hurghada 2017 15 BioDialog Summer School, Hurghada 2017 16 BioDialog Summer School, Hurghada 2017 17 BioDialog Summer School, Hurghada 2017 18 BioDialog Summer School, Hurghada 2017 19 BioDialog Summer School, Hurghada 2017 20 BioDialog Summer School, Hurghada 2017 21 BioDialog Summer School, Hurghada 2017 22 BioDialog Summer School, Hurghada 2017 23 BioDialog Summer School, Hurghada 2017 24 BioDialog Summer School, Hurghada 2017 25 BioDialog Summer School, Hurghada 2017 26 BioDialog Summer School, Hurghada 2017 27 BioDialog Summer School, Hurghada 2017 28 BioDialog Summer School, Hurghada 2017 29 BioDialog Summer School, Hurghada 2017 30 BioDialog Summer School, Hurghada 2017 31 BioDialog Summer School, Hurghada 2017 32 BioDialog Summer School, Hurghada 2017 33 BioDialog Summer School, Hurghada 2017 34 BioDialog Summer School, Hurghada 2017 35 BioDialog Summer School, Hurghada 2017 36 BioDialog Summer School, Hurghada 2017 37 BioDialog Summer School, Hurghada 2017 38 BioDialog Summer School, Hurghada 2017 39 BioDialog Summer School, Hurghada 2017 40 BioDialog Summer School, Hurghada 2017 41 More Examples in the tutorial BioDialog Summer School, Hurghada 2017 42 BioDialog Summer School, Hurghada 2017 43 Semantic Annotation Example BioDialog Summer School, Hurghada 2017 44 BioDialog Summer School, Hurghada 2017 45 Time for Hands-On Tutorial BioDialog Summer School, Hurghada 2017 46 Semantic Data Integration Marwa Hussein (Hands-on Tutorial ) BioDialog Summer School, Hurghada 2017 47 Outline • Introduction • NCBO Bioportal • Protégé • Semantic Annotations • RightField 48 Introduction • “An ontology is an explicit specification of some topic”. • A formal vocabulary and relationships among them, for representing and communicating knowledge about some topic. 49 Introduction • Classes Animal Plant Carnivore Herbivore 50 Introduction • Classes+ Object Properties Animal Plant is_a eats is_a eats Carnivore Herbivore 51 Introduction • Classes+ Object Properties+ Individuals Animal Plant is_a eats is_a eats Carnivore Herbivore is_a is_a Lion Antelope 52 Introduction • An ontology with some individuals is considered a knowledgebase. 53 Outline Introduction • NCBO Bioportal • Protégé • Semantic Annotations • RightField 54 NCBO Bioportal • An open repository of biomedical ontologies. • Ontologies are in different representation formats. – (e.g. OWL, OBO, UMLS) • Provides a wide range of tools: – Via BioPortal web site, or the BioPortal web API. • BioPortal also includes community features for adding notes, reviews, and even mappings to specific ontologies. 55 BioPortal- tools • BioPortal contains some tools to: – Browse ontologies – Search terms – Browse mappings – Recommend ontologies 56 BioPortal-Ontology Browser Ontology name 57 BioPortal- Term Searcher Ontologies containing melanoma concept 58 BioPortal- Mappings Browser Ontologies and number of mappings with ENVO ontology 59 BioPortal- Ontology Recommender Selected ontologies and scores of each selection criteria 60 The Environment Ontology (ENVO) • ENVO is comprised of classes (terms) referring to key environment-types that may be used to facilitate the retrieval and integration of a broad range of biological data. 61 ENVO Ontology https://bioportal.bioontology.org/ontologies/ENVO 62 63 Outline Introduction NCBO Bioportal • Protégé • Semantic Annotations • RightField 64 Protégé- • http://protege.stanford.edu/ • A free, open-source platform that provides user community with a suite of tools to construct domain models and knowledge- based applications with ontologies. 65 Protégé- Biodiversity Ontology (BOF) Class hierarchy Description of each class 66 Protégé- Biodiversity Ontology (BOF) Object Properties hierarchy Description of each property 67 Protégé- Biodiversity Ontology (BOF) Description Data of each Properties property hierarchy 68 Protégé- Biodiversity Ontology (BOF) 69 Protégé- Biodiversity Ontology (BOF) OntoGraf: to visualize the ontology 70 Outline Introduction NCBO Bioportal Protégé • Semantic Annotations • RightField 71 Semantic Annotations • To attach data to some other piece of data. 72 Semantic Annotation- An Example “Aristotle, the author of Politics, established the Lyceum” 73 Semantic Annotation- An Example • “Aristotle, the author of Politics, established the Lyceum” • To semantically annotate this sentence: 1. Analyze text and Identify the concepts: • Aristotle as a Person • Politics as a written work of political philosophy 2. Classify and interlink the identified concepts in a semantic graph database. • i.e., Aristotle can be linked to his date of birth, his students, his works. • Politics can be linked to its subject, to its date of creation etc. 74 Semantic Annotation- An Example • Algorithms will be able to automatically: