Ontoconf Ontology Visualization and Instance Matches Confirmation

OntoConf Ontology visualization and instance matches confirmation Marian Szabo OntoConf Ontology visualization and instance matches confirmation by Karol Marian Szabo in partial fulfillment of the requirements for the degree of Master of Science in Computer Science at Delft University of Technology, to be defended publicly on April 20, 2017 Student number: 4255771 Thesis committee: Chair: Prof. dr. ir. G.J.P.M. Houben, TU Delft University supervisor: Dr. ir. A.J.H. Hidders, TU Delft Company supervisor: Drs. Gert-Jan van Lochem, HintTech B.V., Delft Committee Member: Dr. A.E. Zaidman, TU Delft Web Information Systems(WIS) HintTech B. V. Faculty EEMCS, Delft University of Technology Delft, the Delftechpark 37i Netherlands Delft, the Netherlands wis.ewi.tudelft.nl www.hinttech.com GertJan Jan van Lochem Hidders Family Acknowledgements Dana Friends Stuparu Ties Rijcken academic guidance, encouragements encouragements, coffee, drinks, $$$ Abstract Joining multiple ontologies with the purpose of connecting existing knowledge can be done with the help of ontology matching tools. Part of this process involves establishing links between instances of classes from the paired ontologies. This process, referred to as instance matching, is done by algorithms that automatically generate the links between them. Since the suggested links might not be correct, people who are familiar with the involved ontologies can improve the outcome of this process by assessing the generated matches. The aim of this thesis is to investigate how to design a system which facilitates ontology understanding and involves users in the instance matching evaluation process. The tool is meant to be used by domain experts, with limited knowledge in Semantic Web technologies. To achieve this, a graph visualization of the ontology was developed and then evaluated during a two-phase iterative process. Later on, an instances-matched confirmation module was created and coupled with the visualization system. The user evaluation showed that domain experts are able to perform ontology exploration tasks similar to existing ontology visualization platforms and underlines people’s behaviour when they are exposed to large amount of data. The evaluation results also support the idea that users manage to improve the quality of the instance-matching by successfully using the match confirmation module. Contents 1. Introduction …………………………………………………………………………..... 1 1.1. Research questions ……………….…………………………………………... 2 1.2. Scope ……………….……………………………………………………….... 4 1.3. Outline ……………….……………………………………………………….. 4 2. Basics of on ontologies and Semantic Web …………………………………………... 5 3. Related work ……………………………………………………………………...…… 8 4. System design and implementation ……………………………………………...…… 13 4.1. Elements which form the ontology visualization …………………………….. 13 4.1.1. OWL classes or their instances ……………………………………... 13 4.1.2. Connecting concepts using SubclassOf relation ……………………. 14 4.2. Visualization layout ………………….……………………………………….. 15 4.3. Technical choices …………………….……………………………………….. 16 4.4. Implementation ……………………….………………………………………. 21 4.4.1. First phase ……………………….………………………………….. 22 4.4.1.1. Overview ……………………….……………………….... 22 4.4.1.2. Zoom and pan ……………….……………………………. 23 4.4.1.3. Search function ……………….…………………………... 26 4.4.1.4. Filtering ………………………….……………………….. 26 4.4.1.5. Details on demand ………………………….…………….. 26 4.4.1.6. Relate ………………………….………………………….. 27 4.4.2. Second phase ………………………….……………………………. 28 4.4.2.1. List of individuals module ………………………………... 29 4.4.2.2. Detailed information module ……………………………... 30 4.4.2.3. Vertical navigation toolbar ……………………………….. 31 4.4.2.4. Search module ……………………………………………. 31 4.4.2.5. Ontology smells detection module ……………………….. 32 4.4.2.5.1. Misclassified instances …………………………. 32 4.4.2.5.2. Incorrect similarity links between individuals ….. 33 5. Evaluation ……………………………………………...………………………………. 34 5.1. Targeted users ………………………..……………………………………….. 34 5.2. Data …………………………………..……………………………………….. 34 5.3. Methodology …………………………………..……………………………… 36 5.4. Evaluation of the ontology visualization tool ……………………………….... 37 5.5. Evaluation of the ontology smells module ………………………………….... 39 6. Results ……………………...…………………………………………………...……… 41 6.1. First phase …………………………………………………………………….. 41 6.2. Second phase …………………………………………………………………. 42 6.2.1.Introductory section …………………………………………………. 42 6.2.2. Ontology visualization …………………………………………….... 45 6.2.3. Ontology smells module …………………………………………..... 49 7. Conclusions and future work ……………………...………………………………….. 62 8. References …………………………………………...…………………………………. 65 9. Appendix ……………………………………………...………………………………... 68 1 1. Introduction Semantic Web technologies have started playing an important role in the technologies stack which form the information systems of healthcare, IT, legal and publishing companies. Among them is Newz, which was established in 2012 by twelve leading news publishers from The Netherlands. They aim to create a common shared repository for storing the high quality content produced by their professionals while writing newspapers, magazines, books, websites, blogs. Using this platform, the publishers are able to gather the content and sell it to other parties, generally companies. Newz does more than simply store content in a database, it enriches it semantically through an automated process which, based on existing ontologies (DBpedia, GeoNames, Freebase and the Newz ontology itself), creates RDF triples that are assigned to each piece of content. The semantic enrichment is an ongoing process which makes the ontology grow bigger and bigger. Currently, there are more than 150 millions triples in the knowledge base, so having an overview of the current state of the ontology is challenging, especially for non-specialists. An important thing that has to be mentioned is that using multiple ontologies together is not specific only to Newz, but rather a desired behaviour in the direction of linked data. To support this, multiple researchers investigated options for matching ontologies and came up with tools for performing this task. In general, the ontology matching process takes place in two stages: schema mapping and individual matching. The first phase happens at schema level when ontology experts together with domain experts define the correspondences between classes from each ontology. After this step is performed, it is desirable to run entity matching algorithms to establish correspondences between the individuals (instances of classes) which refer to the same real-life concepts. Fortunately, these procedures are supported by automatic or semi-automatic tools which ease the work. However, these tools are not completely accurate, so correcting the errors manually is required. A common thing of the individuals matching algorithms is that they provide for each identified match a confidence coefficient (usually between 0 and 1 or 1-100%) which can be a good indicator that human intervention is needed. Because these kinds of errors are related to the knowledge base, they must be evaluated and corrected by domain experts and not by ontology engineers who design and perform the matching process. Domain experts can play an important role in improving the quality of ontologies by correcting other types of errors such as duplicates, missing entities and wrong entity types. These kinds of errors show up when semantic enrichment is done automatically, and this is not something strictly related to news processing. 2 Since these issues often occur in systems which use ontologies, many graphic tools have been developed to facilitate ontology management. Some of them are only focused on ontology experts but there are also tools designed for domain experts with basic or no knowledge in Semantic Web technologies and which will be referred as “non-expert” users, from now on. Another way of browsing ontologies is by using the SPARQL query language to extract the desired data. This approach is more suitable for ontology experts, while the graphical interfaces can be easier to use by non-experts. A common thing of most ontology visualization tools for non-experts is that they try to be exhaustive in displaying the information related to the structure of the ontology even if the focus of their target group of users is rather the knowledge base itself, than the structure-related information. Considering this, we believe that the users’ needs could be better fulfilled by building up a visualization which has the knowledge base as starting point, with less emphasis on the way it is structured. 1.1. Research questions Research question 1: How to design a visualization tool for ontologies that provides an effective support for the non-expert users in quickly understanding what are the main areas covered by the ontology and which offers knowledge base exploration support? The ontology visualisation tool should show a graphical representation of the ontology on different levels of detail that are characterized by the amount of information presented to the users. At a higher level, the tool should display the most representative entities for the ontology and the links between them whereas, at a more detailed level, granular information should be shown. An interesting challenge in this context is to determine what is representative for the ontology and to obtain this data in a short amount

Ontoconf Ontology Visualization and Instance Matches Confirmation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support