Ontology Evaluation
Total Page:16
File Type:pdf, Size:1020Kb
Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) von der Fakult¨atf¨urWirtschaftswissenschaften des Karlsruher Instituts f¨urTechnologie (KIT) vorgelegte Dissertation von Dipl.-Inform. Zdenko Vrandeˇci´c. Ontology Evaluation Denny Vrandeˇci´c Tag der m¨undlichen Pr¨ufung: 8. June 2010 Referent: Prof. Dr. Rudi Studer Erster Koreferent: Prof. James A. Hendler, PhD. Zweiter Koreferent: Prof. Dr. Christof Weinhardt Vorsitzende der Pr¨ufungskommission: Prof. Dr. Ute Werner This document was created on June 10, 2010 Koji su dozvolili da sanjam, koji su omogu´cilida krenem, koji su virovali da ´custi´ci, Vama. -2 Acknowledgements Why does it take years to write a Ph.D.-Thesis? Because of all the people who were at my side. TonˇciVrandeˇci´c,my father, who, through the stories told about him, made me dream that anything can be achieved. Perica Vrandeˇci´c,my mother, who showed me the truth of this dream. Rudi Studer, my Doktorvater, who is simply the best supervisor one can hope for. Martina N¨oth,for carefully watching that I actually write. Elena Simperl, for giving the most valuable advises and for kicking ass. Anupriya Ankolekar, for reminding me that it is just a thesis. Uta L¨osch, for significant help. Miriam Fernandez, for trans- lating obscure Spanish texts. Markus Kr¨otzsch, for an intractable amount of reasons. York Sure, for being a mentor who never restrained, but always helped. Sebastian Rudolph and Aldo Gangemi, for conceptualizing my conceptualization of conceptual- izations. Stephan Grimm for his autoepistemic understanding. Yaron Koren, for his support for SMW while Markus and I spent more time on our theses. Christoph Tempich, Peter Haase, Jim Hendler, Pascal Hitzler, Frank van Harmelen, Enrico Motta, Sofia Pinto, Marta Sabou, and Gus Schreiber, for their advise. All the people in the Rudiverse, past and present, for making me feel more like being with friends than being at work. Especially I want to thank those, who became friends over the years, and will remain so after our time in Semantic Karlsruhe ends. All my co-authors { a thousand thanks! All the people I worked with in the research projects and co-organizers of events that helped sharpening the ideas presented here. The European Union for the projects SEKT and ACTIVE, and for creating a new Europe. Vulcan Inc. for Project Halo. They funded my time in Karlsruhe. All those giants which shoulders I am balancing on, for carrying my weight, which surely is not the easiest of all tasks. All the anonymous reviewers, who made sharp comments to our papers, that lead to improvements of the presented work My sister and my other good friends, who inspired me, either with their work, their ideas, their discussions, their presence, or most importantly with their friendship and love, giving me encouragement and reality checks whenever I needed them most. Because of them, it took years to write this thesis. Without them, it would have taken decades { if not forever. 5 -2 Abstract Ontologies are a pillar of the emerging Semantic Web. They capture background knowledge by providing relevant terms and the formal relations between them, so that they can be used in a machine-processable way, and thus enable automatic aggregation and the proactive use and serendipitous reuse of distributed data sources. Ontologies on the Semantic Web will come from a vast variety of different sources, spanning institutions and persons aiming for different goals and quality criteria. Ontology evaluation is the task of measuring the quality of an ontology. It enables us to answer the following main question: How to assess the quality of an ontology for the Web? Ontology evaluation is essential for a wide adoption of ontologies, both in the Se- mantic Web and in other semantically enabled technologies. We regard the three following scenarios as relevant for ontology evaluation: • Mistakes and omissions in ontologies can lead to the inability of applications to achieve the full potential of exchanged data. Good ontologies lead directly to a higher degree of reuse of data and a better cooperation over the boundaries of applications and domains. • People constructing an ontology need a way to evaluate their results and possibly to guide the construction process and any refinement steps. This will make the ontology engineers feel more confident about their results, and thus encourage them to share their results with the community and reuse the work of others for their own purposes. • Local changes in collaborative ontology engineering may effect the work of others. Ontology evaluation technologies allow to automatically check if constraints and requirements are fulfilled, in order to automatically reveal plausibility problems, and thus to decrease maintenance costs of such ontologies dramatically. In this thesis a theoretical framework and several methods breathing life into the framework are presented. The application to the above scenarios is explored, and the theoretical foundations are thoroughly grounded in the practical usage of the emerging Semantic Web. We implemented and evaluated a number of the methods. The results of these evaluations are presented, indicating the usefulness of the overall framework. 7 -2 Short Table of Contents Acknowledgements5 Abstract7 I Foundations 11 1 Introduction 13 2 Terminology and Preliminaries 23 3 Framework 37 II Aspects 63 4 Vocabulary 65 5 Syntax 83 6 Structure 99 7 Semantics 127 8 Representation 143 9 Context 151 III Application 165 10 Collaborative ontology evaluation in Semantic MediaWiki 167 11 Related work 185 12 Conclusions 197 IV Appendix 201 List of Methods 203 List of Tables 205 List of Figures 207 Bibliography 209 Full Table of Contents 230 9 Part I Foundations 1 Introduction 13 2 Terminology and Preliminaries 23 3 Framework 37 1 Chapter 1 Introduction What I mean (and everybody else means) by the word `quality' cannot be broken down into subjects and predicates. This is not because Quality is so mysterious but because Quality is so simple, immediate and direct. (Robert M. Pirsig, b. 1928, Zen and the Art of Motorcycle Maintenance (Pirsig, 1984)) The Semantic Web (Berners-Lee et al., 2001), also known as the Web of Data, is an extension of the hypertext Web. It enables the exchange and integration of data over the Web, in order to achieve the cooperation of humans and machines on a novel, world-wide scale. Ontologies are used in order to specify the knowledge that is exchanged and shared between the different systems, and within the systems by the various components. Ontologies define the formal semantics of the terms used for describing data, and the relations between these terms. They provide an \explicit specification of a concep- tualization" (Gruber, 1995). Ontologies ensure that the meaning of the data that is exchanged between and within systems is consistent and shared { both by comput- ers (expressed by formal models) and humans (as given by their conceptualization). Ontologies enable all participants `to speak a common language'. Ontologies, like all engineering artifacts, need a thorough evaluation. But the evaluation of ontologies poses a number of unique challenges: due to the declarative 13 Chapter 1 Introduction nature of ontologies developers cannot just compile and run them like most other soft- ware artifacts. They are data that has to be shared between different components and used for potentially different tasks. Within the context of the Semantic Web, ontolo- gies may often be used in ways not expected by the original creators of the ontology. Ontologies rather enable a serendipitous reuse and integration of heterogeneous data sources. Such goals are difficult to test in advance. This thesis discusses the evaluation of Web ontologies, i.e. ontologies specified in one of the standard Web ontology languages (RDF(S) (Klyne and Carroll, 2004) and the different flavors of OWL (Smith et al., 2004; Grau et al., 2008)) and published on the Web, so that they can be used and extended in ways not expected by the creators of the ontology, outside of a central control mechanism. Some of the results of this thesis will also apply to other ontology languages, and also for ontologies within a closed environment. In turn, many problems discussed in earlier work on ontology evaluation do not apply in the context of Web ontologies: since the properties of the ontology language with regards to monotonicity, expressivity, and other features are known, they need not to be evaluated for each ontology anymore. This thesis will focus on domain- and task-independent automatic evaluations. That does not mean that the ontology has to be domain-independent or generic, but rather the evaluation method itself is. We will discuss other types of evaluations in Chapter 11. This chapter contains introductory material by providing the motivation for this thesis (Section 1.1), giving a short preview of the contributions (Section 1.2), and offering a readers' guide for the rest of the thesis (Section 1.3). It closes with an overview of related previous work by the author (Section 1.4). 1.1 Motivation Ontologies play a central role in the emerging Semantic Web. They capture back- ground knowledge by providing relevant concepts and the relations between them. Their role is to provide formal semantics to terms, so that they can be used in a ma- chine processable way. Ontologies allow us to share and formalize conceptualizations, and thus to enable humans and machines to readily understand the meaning of data that is being exchanged. This enables the automatic aggregation and the proactive use and serendipitous reuse of distributed data sources, thus creating an environment where agents and applications can cooperate for the benefit of the user on a hitherto unexperienced level (Berners-Lee et al., 2001). This section provides three preliminary arguments for the importance of ontol- ogy evaluation.