Zur Erlangung des akademischen Grades eines Doktors der Wirtschaftswissenschaften (Dr. rer. pol.) von der Fakult¨atf¨urWirtschaftswissenschaften des Karlsruher Instituts f¨urTechnologie (KIT) vorgelegte Dissertation von Dipl.-Inform. Zdenko Vrandeˇci´c.

Ontology Evaluation

Denny Vrandeˇci´c

Tag der m¨undlichen Pr¨ufung: 8. June 2010 Referent: Prof. Dr. Rudi Studer Erster Koreferent: Prof. James A. Hendler, PhD. Zweiter Koreferent: Prof. Dr. Christof Weinhardt Vorsitzende der Pr¨ufungskommission: Prof. Dr. Ute Werner This document was created on June 10, 2010 Koji su dozvolili da sanjam, koji su omogu´cilida krenem, koji su virovali da ´custi´ci, Vama.

oe iigm norgmn n elt hcsweee eddte most. them needed I and whenever friendship checks their reality with and importantly encouragement most me or sharp giving presence, love, made their work who discussions, presented reviewers, their the anonymous ideas, of the improvements All to lead tasks. that all papers, of our to easiest Karlsruhe. comments the in time not my is funded surely They Halo. Project here. for presented Inc. ideas Vulcan the Europe. sharpening ends. helped Karlsruhe that Semantic events in of became time co-organizers who our and those, after projects thank so to remain want will I and Especially years, work. the at over being friends advise. than their friends for with Schreiber, being Gus and Sabou, theses. Marta our Pinto, on Sofia time Motta, his more Enrico for spent I Koren, Yaron and Sebastian Markus understanding. while helped. autoepistemic SMW conceptual- always his for of for but support conceptualization Grimm restrained, my Stephan conceptualizing never for izations. who trans- Gangemi, mentor for Aldo reasons. a Fernandez, and of amount Miriam Rudolph being intractable for help. an Kr¨otzsch, significant Sure, for L¨osch, Markus me for York Uta reminding texts. for Spanish . Ankolekar, obscure a Anupriya lating just ass. is kicking it for and that advises valuable most the achieved. be can anything that dream Ph.D.-Thesis? a write to years take it does Why Acknowledgements ihu hm twudhv ae eae fntforever. not if – decades thesis. taken this have write would to it years them, took Without it them, of Because their work, their with either me, inspired who friends, good other my and which sister weight, My my carrying for on, balancing new am a I creating shoulders for which and giants ACTIVE, those and All SEKT projects research the the in for with Union worked I European people like The the more All feel thanks! thousand me a – making co-authors for my All present, and past Rudiverse, the in people the Harmelen, All van Frank Hitzler, Pascal Hendler, Jim Haase, Peter Tempich, Christoph giving for Simperl, Elena write. actually I that watching carefully N¨oth, for Martina dream. my this Studer, of Rudi me truth the made me showed him, Vrandeˇci´c, who about Perica mother, told my stories side. my the at through were who, who Tonˇci Vrandeˇci´c, people father, the my all of Because Doktorvater h ssml h etsprio n a oefor. hope can one supervisor best the simply is who , 5

-2

h eut fteeeautosaepeetd niaigteueuns fteoverall the of the methods. usefulness the the of indicating of usage presented, number and are practical a evaluations framework. explored, evaluated these the is and of in results implemented scenarios We The grounded above thoroughly Web. the Semantic are to emerging foundations application theoretical The the presented. are framework three the regard We technologies. evaluation: ontology enabled for semantically relevant as other scenarios in following and Web mantic spanning question: sources, main following different the of answer criteria. variety to quality us vast and goals a different from for Ontologies aiming come sources. persons will data and distributed Web institutions of Semantic reuse background aggregation the serendipitous automatic capture and on enable thus use They that and proactive way, so machine-processable the them, a Web. and between in relations Semantic used formal be emerging the can and they the terms relevant of providing by pillar knowledge a are Ontologies Abstract nti hssatertclfaeokadsvrlmtosbetiglf nothe into life breathing methods several and framework theoretical a thesis this In Se- the in both ontologies, of adoption wide Web? a the for for essential ontology is an evaluation of Ontology quality the assess to How evaluation Ontology • • • n hst eraemitnnecsso uhotlge dramatically. ontologies such of costs problems, maintenance plausibility decrease reveal to automatically thus to and order and constraints in if fulfilled, check are automatically requirements others. to of work allow the technologies effect evaluation may Ontology engineering ontology for collaborative others in of changes work Local the reuse encourage and thus community purposes. and the the own with results, make their results their will their This about share to confident steps. them refinement more any feel and possibly engineers and process ontology results construction their the evaluate of to guide way boundaries to a need the ontology over an constructing cooperation People better a a to directly and lead domains. data ontologies and of to applications Good reuse applications of of data. inability exchanged degree the of higher to potential lead full can the ontologies achieve in omissions and Mistakes stets fmauigteqaiyo notlg.I enables It ontology. an of quality the measuring of task the is 7

-2

201 165 63 11 Appendix IV Application III Aspects II Foundations I Contents of Table Short 209 207 230 205 203 197 185 151 127 99 143 83 65 37 13 Contents of 7 Table Full Bibliography 167 Figures of List Tables of List Methods 23 of List 5 Conclusions 12 work MediaWiki Related Semantic in 11 evaluation ontology Collaborative 10 Context 9 Representation 8 Semantics 7 Structure 6 Syntax 5 Vocabulary 4 Framework Preliminaries 3 and Terminology 2 Introduction 1 Abstract Acknowledgements 9

-2

Part I

Foundations

1 Introduction 13

2 Terminology and Preliminaries 23

3 Framework 37

vlaino nooispssanme fuiu hlegs u otedeclarative the to due challenges: unique of number a poses ontologies conceptualization). of their evaluation comput- by by given both (as humans – participants and shared all enable models) and Ontologies formal consistent by is (expressed systems ers within and the between an and exchanged provide data, components. describing They various for the used terms. tualization” by terms these the systems between of the semantics relations within formal the and define systems, Ontologies different novel, the a data between on of machines and integration humans and of exchange cooperation the the achieve scale. enables to world-wide It order in Web. Web, hypertext the over the of extension an Introduction 1 Chapter The noois iealegneigatfcs edathorough a need artifacts, engineering all like Ontologies, Ontologies eatcWeb Semantic .Otlge nueta h enn ftedt htis that data the of meaning the that ensure Ontologies 1995). (Gruber, r sdi re oseiytekoldeta secagdadshared and exchanged is that knowledge the specify to order in used are (Berners-Lee t pa omnlanguage’ common a speak ‘to tal. et ,as nw sthe as known also 2001), , epii pcfiaino concep- a of specification “explicit ss ipe meit and direct. immediate simple, so is Quality so because is but Quality mysterious because not is This predicates. and subjects down into broken be cannot word ‘quality’ the by means) everybody else (and mean I What Rbr .Pri,b 1928, b. Pirsig, M. (Robert evaluation oocceMaintenance Motorcycle . e fData of Web e n h r of Art the and Zen isg 1984)) (Pirsig, u the But . is , 13

1 Chapter 1 Introduction nature of ontologies developers cannot just compile and run them like most other soft- ware artifacts. They are data that has to be shared between different components and used for potentially different tasks. Within the context of the , ontolo- gies may often be used in ways not expected by the original creators of the ontology. Ontologies rather enable a serendipitous reuse and integration of heterogeneous data sources. Such goals are difficult to test in advance. This thesis discusses the evaluation of Web ontologies, i.e. ontologies specified in one of the standard Web ontology languages (RDF(S) (Klyne and Carroll, 2004) and the different flavors of OWL (Smith et al., 2004; Grau et al., 2008)) and published on the Web, so that they can be used and extended in ways not expected by the creators of the ontology, outside of a central control mechanism. Some of the results of this thesis will also apply to other ontology languages, and also for ontologies within a closed environment. In turn, many problems discussed in earlier work on ontology evaluation do not apply in the context of Web ontologies: since the properties of the ontology language with regards to monotonicity, expressivity, and other features are known, they need not to be evaluated for each ontology anymore. This thesis will focus on domain- and task-independent automatic evaluations. That does not mean that the ontology has to be domain-independent or generic, but rather the evaluation method itself is. We will discuss other types of evaluations in Chapter 11. This chapter contains introductory material by providing the motivation for this thesis (Section 1.1), giving a short preview of the contributions (Section 1.2), and offering a readers’ guide for the rest of the thesis (Section 1.3). It closes with an overview of related previous work by the author (Section 1.4).

1.1 Motivation

Ontologies play a central role in the emerging Semantic Web. They capture back- ground knowledge by providing relevant concepts and the relations between them. Their role is to provide formal semantics to terms, so that they can be used in a ma- chine processable way. Ontologies allow us to share and formalize conceptualizations, and thus to enable humans and machines to readily understand the meaning of data that is being exchanged. This enables the automatic aggregation and the proactive use and serendipitous reuse of distributed data sources, thus creating an environment where agents and applications can cooperate for the benefit of the user on a hitherto unexperienced level (Berners-Lee et al., 2001). This section provides three preliminary arguments for the importance of ontol- ogy evaluation. The arguments are about (i) the advantages of better ontologies (Section 1.1.1), (ii) increasing the availability and thus reusability of ontologies (Sec- tion 1.1.2), and (iii) the lower maintenance costs of collaboratively created knowledge bases (Section 1.1.3).

14 hs itks n hst as h ofiec fteegnesi hi w ok– work own their in engineers readily. the more of of them confidence some publish the discover to raise to thus to supposed and thus are and but tools ontology mistakes, evaluation wrong, an these Ontology it engineer doing uncomfortable. to were thus trained they and that were consistently experts all felt legal it They since studies, domain. case fosters knowledge the and their This of of impact one its reused. in directly be 2007): increases will and that Web ontologies Semantic of the number costs. accessible its the on ontologies lowers ontologies of raise reuse these number does and of actual quality it the cooperation the raise user, in not a reusers does the by this can of thus though the confidence and Even On quality the increases. their since work. about easier, their assessed reused releasing automatically be with be confident can ontologies more hand, are other engineers ontology available the of because number the increases thus and This ontologies explicit. of ontologies. them make publication sources, and different the criteria evalu- quality of Ontology encourages these variety consolidate criteria. vast to quality help different a techniques for ation from aiming coming persons and are institutions Web spanning Semantic the on Ontologies availability ontology Increasing 1.1.2 data easier possible will constrains ontology and elements the ontology), (since be interface the errors with can user data interpretations). work omit system), good actively to also (since order reading higher and in application discover the a constructed existing with within automatically an mapped be ontology in can be grounded featured Underspecified quality can and readily high semantics known more their hand, semantics. a (since other inconsistent to reused the of confidence easier On Reasoners case be errors. in approaches. can contains mapping ontologies answers syntax automatic infer the hinder or to ontologies vocabulary unable the if be suffer may may ontology an over of cooperation better a domains. and data. and reuse exchanging applications of in of degree of omissions higher potential boundaries a full and the to the directly mistakes realizing lead all not ontologies task: like makes applications Good Web worthwhile evaluated to Semantic lead be and the can in to important ontologies ontologies need an of they role evaluation central such ontology The as artifacts. and engineering artifacts, other engineering are Ontologies ontologies better of Advantages 1.1.1 o ncoa upr erfrterae oteSK aesuis(Sure studies case SEKT the to reader the refer we support anectodal For published be will ontologies more hand, one the on sides: two has argument This readability ontologies: quality low of disadvantages of examples few a name just To losaltlk omnsense” common like lot a “looks aigmsae aete elignorant feel them made mistakes making . Motivation 1.1 tal. et 15 ,

1 Chapter 1 Introduction

1.1.3 Lower maintenance costs Decentralized collaborative ontology creation requires a high independence of tasks. If local changes lead to wide-reaching effects, then the users should be able to understand these effects. Otherwise they will invariably effect the work of many others, which will need a complex system of measures and counter-measures. The prime example is the world’s currently biggest knowledge base, Wikipedia. Changes within the knowledge contained by Wikipedia require to be constantly tracked and checked. Currently, thousands of users constantly monitor and approve these changes. This is inefficient, since many of these changes could be tracked auto- matically using ontology evaluation technologies. They allow to automatically check if certain constraints and requirements are fulfilled. This allows maintainers to quickly and automatically reveal plausibility problems, and thus to decrease maintenance and development costs of such ontologies dramatically.

1.2 Contribution

The contribution of this thesis is threefold: we (i) introduce a framework for ontology evaluation, (ii) we organize existing work in ontology evaluation within this framework and fill missing spots, and finally (iii) we implement the theoretical results in practical systems to make the results of this thesis accessible to the user.

1.2.1 A framework for ontology evaluation Terminology and content in ontology evaluation research has been fairly diverse. Chap- ter3 contains a survey of literature, and consolidates the terminology used. It identifies and defines a concise set of eight ontology quality criteria and ontology aspects that can be evaluated (Section 3.6):

• Accuracy

• Adaptability

• Clarity

• Completeness

• Computational efficiency

• Conciseness

• Consistency

• Organizational fitness

16 ii n hslwrtecssfrmitiigsc nweg aedramatically base knowledge a such the maintaining within for created costs knowledge the the 10). of lower (Chapter and ex- evaluation thus was worldwide collaborative and MediaWiki installations the Semantic wiki, hundreds for community. allow several developer to by popular source implemented tended most open used the we the active for far, used, an become extension by nurturing be has an engine MediaWiki to MediaWiki, Semantic wiki thesis Semantic semantic engine. this within wiki of methods MediaWiki results introduced popular theoretical of the number allow a to order In Implementation 1.2.3 gaps. these address to methods evaluation novel are: of them number Among a introduce We methods. aspect. describe respective we its method of each For context the framework. within and introduced methods the it evaluation to for according evaluation them ontology on organized literature existing the surveyed We evaluation ontology for Methods 1.2.2 utemr,wt h epo h rmwr eietfidbidsosadmissing and spots blind identified we framework the of help the with Furthermore, ): II (Part ontologies of aspects following the detail in describe and identify We • • • • • • • • • • • • 9.1) (Section testing Unit 8.1) (Section misfit Representational 7.2) (Section stability Metric 7.1) (Section Normalization 6.2) (Section SPARQL using discovery Pattern 5.3) (Section validation Schema Context Representation Semantics Structure Syntax Vocabulary . Contribution 1.2 17

1 Chapter 1 Introduction

1.3 Readers’ guide

This thesis presents a conceptual framework and its implementations for defining and assessing the quality of an ontology for the Web. In this section we will offer an overview of the whole thesis so that readers may quickly navigate to pieces of partic- ular interest to them. The whole thesis is written in such a way that it enables the understanding of the topic in one pass. In the examples in this thesis, whenever a QName is used (see Section 5.2), we as- sume the standard namespace declarations given in Table 1.1. Often we omit names- paces for brevity and legibility.

Prefix Namespace rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# rdfs http://www.w3.org/2000/01/rdf-schema# xsd or xs http://www.w3.org/2001/XMLSchema# owl http://www.w3.org/2002/07/owl# skos http://www.w3.org/2008/05/skos# foaf http://xmlns.com/foaf/0.1/ dc http://purl.org/dc/elements/1.1/ swrc http://swrc.ontoware.org/ontology# dbpedia http://dbpedia.org/resource/ dolce http://www.loa-cnr.it/ontologies/DOLCE-Lite.owl# proton http://proton.semanticweb.org/2005/04/protont# sw http://semanticweb.org/id/ swp http://semanticweb.org/id/Property-3A swc http://semanticweb.org/id/Category-3A swivt http://semantic-mediawiki.org/swivt/1.0# aifb http://www.aifb.kit.edu/id/

Table 1.1: Namespace declaration in this thesis

The appendix contains further navigational help: a bibliography of referenced works, lists of methods, tables and figures, an index of all relevant terms, and a full table of contents. The content of this thesis is divided in three parts.

1.3.1 Foundations The foundations describe necessary preliminaries for understanding the rest of the thesis and offers a framework for the whole field of ontology evaluation that is later filled with more details in the following chapters. This first chapter introduces the

18 nisevrnet hc a e ..a plcto sn h nooy data a ontology, the of form using in application ontology the an towards requirements artifacts formalized questions. e.g. other or competency with be, domain, compared the may when about ontology which source the environment, of features its the in meaning. about of is specification aspect other This some or formal theory the logics being meaning, description intended a the regard of defines aspects and semantics Representational captures structure semantics. explicit the the and how structure models. The the these between all relation models. of the possible characteristics common of the set are infinite ontology usually an non-empty, of semantics a describes highly ontology vary sistent can structure The meaning. graph. same this the is describing ontology an when of even structure in The serializations graph. different RDF the about is aspect within This description syntaxes. syntactic widely. various Abstract the differ the Often OWL can N-Triples, syntax else. many certain RDF/XML, or a as Syntax, such Manchester value syntaxes the a Syntax, surface choices different i.e. different of literals, the or number with literals. references deals or URI aspect URIs This it used the be identifier. to ontology, language regards that a with the in or of datatype names one a all with to of set dedicated the is is part this in defined chapter as evaluation Each ontology of aspects. aspects framework. different evaluation six the the describe by thesis this of II Part user. Aspects ontology the by for 1.3.2 goals criteria defined measures achieve describes and to aspects, It order criteria, in the how used thesis. and be evaluated, can this be can of that sections. contribution aspects relevant evaluation, theoretical finding with main help reference the can for appendix it use the to in decide knowledgeable index and terms first, user The these chapter a this needed. of to skip definition as may familiar concise Readers be a thesis. may offers this terms chapter within the this technology, of Web most Semantic Whereas about work. the of rest rough structured. a is gives thesis and thesis the of topic fial ead o the how regards finally 9 Chapter of aspect the at look a takes 8 Chapter the how examines 7 Chapter the evaluates 6 Chapter ontology the about is 5 Chapter the with deals 4 Chapter theoretical the introduces 3 Chapter the gives 2 Chapter emnlg n preliminaries and terminology vocabulary structure semantics context overview syntax fa nooy e nooydsrbsan describes ontology Web A ontology. an of fa nooy h oauayo nontology an of vocabulary The ontology. an of framework fa nooycnb vlae.Acon- A evaluated. be can ontology an of e nooiscnb ecie na in described be can ontologies Web . fa nooycnb sdfrevaluation. for used be can ontology an of representation ftecnrbto n o h whole the how and contribution the of o nooyeauto htis that evaluation ontology for eddt nesadthe understand to needed hsapc captures aspect This . . edr’guide Readers’ 1.3 19

1 Chapter 1 Introduction

1.3.3 Application The last part describes practical implications and implementations of the work given in the previous part, compares it to other work in the area, and offers conclusions on the results. Chapter 10 describes Semantic MediaWiki, an extension to the MediaWiki wiki engine, that allows for the massive collaborative creation and maintenance of on- tologies. It discusses how collaborative approaches towards ontology evaluation are implemented within Semantic MediaWiki. Chapter 11 surveys related approaches and how they relate to the presented framework. Whereas most of the related approaches have been already included in the description of their respective aspect, some of them do not fit into the overall framework presented in this thesis. In this chapter we analyze the reasons for that. Chapter 12 finally summarizes the results and compares them to the motivation presented in this chapter. We collect and comment on research questions that remain open, and outline the expected future work and impact of the research topic.

1.4 Relation to previous publications

Most of the content in this thesis has been published previously. Here we present the relation to other publications of the author, in order to show which parts of the content have been already peer-reviewed. Most of the publications add further details to the given topic that has not been repeated in this thesis, either because of space constraints or because that part of the work has been performed by one of the co- authors. On the other hand, all the content in this thesis has been updated to adhere to a common framework and to OWL2, which has been released only recently. The outline of this thesis was previously published in a much shorter form in (Vrandeˇci´c,2009b). This includes the framework in Chapter3, especially the criteria selection in Section 3.6 and the definition of aspects in Section 3.8, which provides a structure for the whole thesis. The meta-model presented in Section 3.2 has been previously published in (Vrandeˇci´c et al., 2006c). A number of publications present experiences and thoughts which have informed the whole thesis. In (Cregan et al., 2005) and (Mochol et al., 2008) we present experiences from ontology engineering, issues with then current tools, and problems with the OWL semantics (which in turn have informed the methods presented in Section 7.2 on stable metrics, Section 7.3 on completeness, and Section 9.2 on using ontologies with higher expressivity for consistency checking and query answering). Also experiences with distributed ontology engineering especially using the DILIGENT methodology has been gained in the EU project SEKT and published in (Vrandeˇci´c et al., 2005), (Vrandeˇci´c et al., 2006b), (Sure et al., 2007), (Casanovas et al., 2007), and (Casanovas et al., 2005). These experiences had guided us in the the development of Semantic

20 fetnigotlge ihlgcpormrlsbid nteln fwr f(Grosof of work of line the on builds rules program al. logic et with in presented ontologies as extending (L¨osch axiomatizations of in stronger exemplified with was ontologies 9.2 extending Section of idea The 2006). 2007). Sure, and (Vrandeˇci´c in version. corrected corrected been the have presents (Vrandeˇci´c Sure, They 7.1 in tion and published some though. contained (Vrandeˇci´c previously errors 2007) in been Sure, normalization formal and of both description have The 7.2 2007). described. Section also is in evaluation sented its and AEON AEON on of and 6.3 aspects considerably Section learning idea machine (V¨olker corpus. the the in ontology presented expanded significant previously a further been have on has based we (Vrandeˇci´c evaluation thesis in some and this extended added ( Vrandeˇci´c, In and described in macros 2005) 2006). introduced with patterns Gangemi, was detecting 6.2 and explicating Section of in idea The 2007). Sure, and Vrandeˇci´c ( in published previously been has 5.3 tion The thesis. the of topic the of (Vrandeˇci´c Korea) understanding respectively. in South an available Seoul, to are in ISWC2007 heavily proceedings at contributed EON2007 have and Canada, that Banff, in 5). WWW2006 simpler (Chapter at syntax seemingly or the ) 4 about (Chapter care vocabulary need also overall as that the such and methods aspects 9.1) evaluation (Section automatic ontologies develop for to testing unit 10), (Chapter MediaWiki esnn xrsiiyt M nodrt vlaetecnetwspbihdin published was content the evaluate to implemented order further and in (Vrandeˇci´c, Introducing expanded in SMW (Kr¨otzsch(Vrandeˇci´c 2009c). ideas published and to Kr¨otzsch, The first 2006) and expressivity been reasoning have system. 10.4 whole Section the in to introduction updated (Kr¨otzschin (V¨olkel in then efficiency. for especially al. eatcMdaii SW a enfis rsne n(Kr¨otzsch in presented first been has (SMW) MediaWikia Semantic (Vrandeˇci´c in Gangemi, introduced and was 9.1 Section in presented as testing (Vrandeˇci´cUnit in published first been have 8.1 Section in described as metrics Ontological pre- metrics stable of and 7.1 Section in presented as normalization of notions The (Vrandeˇci´c in Sec- raised first in was 6.1 validation Section XML in presented enable metrics existing to of criticism order The in normalization (Vrandeˇci´c, syntax in RDF published 2009a). of was 4 idea Chapter The of version earlier An (EON2006 evaluation ontology of topic the on workshops two organized have also We ize n Vrandeˇci´c, ( Kr¨otzsch and and (Hitzler 2005) in evaluated and 2005) , n a mlmne stosi (Motik in tools as implemented was and 2006) (Motik, especially and 2003) , tal. et tal. et Kr¨otzsch ( and 2006b) , .Tefrhrdvlpeto M a endocumented been has SMW of development further The 2006). , tal. et tal. et tal. et n (V¨olker and 2009) , tal. et n (Vrandeˇci´c and 2006a) , V¨olker ( and 2005) , tal. et . eaint rvospublications previous to Relation 1.4 poie an provides 10 Chapter 2007c). , 2007b). , tal. et 2009). , tal. et tal. et tal. et .Teidea The 2007). , tal. et .Sec- 2007b). , .There 2008). , tal. et tal. et and 2005) , 2007a), , 2006a), , 21 et

1

aebe ugse ntels wnyyas uha noiga(Farquhar Ontolingua as languages also such ontology language of years, ontology twenty number The big last (Kifer A the F-logic language. in 1996), language. that suggested that in been of ontology have semantics an formal in the used defines be language. ontology can that axioms) in is whole a as ontology the then language ogy An Ontologies 2.1 words chapter, this Within discus- meanings. and capitals succinct presen- additional thesis, a of enable these this in to order of disregard order written in outside widely in read meaning will But wider be we mentioned. a to sion is have meant meaning terms terms wider order, the following that pedagogical glossary, sometimes the a or of Unlike logical, Some a thesis. tation. this in in given used be terminology will the defines chapter This Preliminaries and Terminology 2 Chapter An ontology nooylanguage ontology r eeecst h ie definitions. given the to references are bold sa(possibly a is falaim fa nooyaesae ntesm nooylanguage, ontology same the in stated are ontology an of axioms all If . r h od endi htprgah od rte in written Words paragraph. that in defined words the are tal. et ,o li rtodrlogic. order first plain or 1995), , named enswihlnug osrcs(..wihtpsof types which (i.e. constructs language which defines e of set ) axioms xosaesae nan in stated are Axioms . one ilalcm onought. to come his all or will about, counsel advising is know he should what man a the way; in same begins counsel good All Scae,49B–9 BC, BC–399 469 (Socrates, lt,30BC)) 370 (Plato, Phaedrus ontol- small tal. et 23 ,

2 Chapter 2 Terminology and Preliminaries

Web ontologies are ontologies that are written in one of the standardized Se- mantic Web ontology languages. Within this thesis we regard only Web ontolo- gies, i.e. other ontologies using ontology languages do not necessarily have the same properties and thus may not be evaluable with the methods presented here. As of writing of this thesis, the Semantic Web ontology languages are RDF, RDFS (jointly called RDF(S)),and OWL. OWL is available in a number of profiles with specific properties (Grau et al., 2008). All these languages are standardized by the Consortium (W3C), a public standards body overseeing standards relevant to the development of the Web. According to the Semantic Web ontology languages, ontologies do not include only terminological knowledge – definitions of the terms used to describe data, and the formal relations between these terms – but may also include the knowledge bases themselves, i.e. terms describing individuals and ground facts asserting the state of affairs between these individuals. Even though such knowledge bases are often not regarded as being ontologies (see (Obrst et al., 2007) for an example), for the re- mainder of this thesis we follow the OWL standard and regard ontologies as artifacts encompassing both the terminological as well as the assertional knowledge. An ontology document is a particular serialization of an ontology. As such, it is an information resource, usually a file, and thus an artifact that can be processed by a machine. Web ontologies may be serialized in one of the many W3C standards for ontology serialization, i.e. RDF/XML (Beckett, 2004), OWL Abstract Syntax (Patel-Schneider et al., 2004), OWL XML presentation syntax (Hori et al., 2003), N3 (Berners-Lee, 2006), or OWL Functional Syntax (Motik et al., 2009b). There are several further serializations, which can be translated from and to the other set of serializations, for example the KAON2 ontology serialization, the Manchester Syntax (Horridge et al., 2006), or a memory only JAVA internal presentation. In Chapter5 we will discuss the aspect of serializations and syntax more deeply. An infinite number of different ontology documents can describe the same ontology. Web ontologies are often represented with an RDF graph (Klyne and Carroll, 2004), and in turn the RDF graph is serialized in an RDF document. This is the case when using the RDF/XML or the N3 serializations. The RDF graph in turn has to be interpreted to arrive at the actual ontology. This additional step was introduced by the W3C standards in order to achieve a form of interoperability between the different layers in the so called Semantic Web Layer Cake (first introduced in (Berners-Lee et al., 2001) and updated several times since then, see e.g. (Berners-Lee et al., 2006b)). Every ontology can be represented by an RDF graph, but not every ontology is necessarily represented by an RDF graph. Whereas the interpretation of an RDF graph as an ontology always yields the same ontology, there is no canonical representation of an ontology as an RDF graph, i.e. there can be many RDF graphs representing the same ontology. This stems from the fact that the normative translation of ontologies to RDF graphs is not injective, as described in Section 4.1 of the OWL Language Semantics

24 eti eainis relation means certain it Semantically relation. the ( called tuple thus the is that property this of instantiation actual and names, vidual that sion with the has that that stating as same with an or A Facts 2.2.1 entities ontology ther axiom terminological An Axioms 2.3. Section in 2.2 entities of types the by followed axioms, of types available the describe (Patel-Schneider document Syntax Abstract and htotlge a eepesdi i ubro as hycnb xrse via expressed be can They ways. neither. of or number both big or a XML in or expressed RDF be can the ontologies (Hori express syntax that either the presentation may express OWL/XML turn the or in particular in infosets 2003)) a example, ways XML for is other 2007)). (as, in file serialized directly 24824, XML ontology be (ISO An also XML could binary infosets 2004). as XML Tobin, but (e.g. and infoset, (Cowan XML an infoset of XML serialization an by given else 1 2.4. Section see thesis, this throughout axioms the of semantics and syntax the on more For An ClassAssertion( W2introduces OWL2 PropertyAssertion( A elements Ontology ontologies Web fact axiom nomlyi en htteproperty the that means it Informally . ls axioms class pstv)relation (positive) Germany a C niiul(in)equality individual instantiation or and en a being niiulaxiom individual stesals nto nweg ihna nooy tcnb ihra either be can It ontology. an within knowledge of unit smallest the is b name being a the has ,b a, ls expression class not si h xeso fteset the of extension the in is ) capital a niiulnames individual or a C a eaierelations negative or si h xeso ftestdsrbdby described set the of extension the in is rtheir or D graph RDF h ae hsue h olwn form: following the uses This case. the rpryaxioms property r lootnsraie sXLfie (Bray files XML as serialized often also are a h type the has aia Berlin capital b a R ls instantiation class r both are ) a , or stenm ftepoet hthlsbtente.The them. between holds that property the of name the is fact betpoet instance property object sete an either is names ) . axioms ran or , C hti unepessteotlg.Ti shows This ontology. the expresses turn in that and nteexample, the In . . eatclyta en htthe that means that Semantically . and instantiation annotation and a naimdfie omlrltosbetween relations formal defines axiom An . ..tedrc osblt osaeta a that state to possibility direct the i.e. , en an being R R a h form the has nooyentities ontology onigfrom pointing en an being tal. et R . niiulname individual emnlgclaim r ei- are axioms Terminological . 2004). , Germany a , betpoet expres- property object a h form the has 1 relation a to C and b . we 2.2 Section In . od ..saying e.g. – holds an , tal. et Berlin hsi the is This . attribute individual . Axioms 2.2 or 2008) , r indi- are tal. et 25 , ,

2 Chapter 2 Terminology and Preliminaries

NegativePropertyAssertion(R a b) This means that the tuple (a, b) is not in the extension of the set R. Semantically, this was already possible to be indirectly stated in OWL DL by using the following statement: SubClassOf(OneOf(a) AllValuesFrom(R ComplementOf(OneOf(b)))) It is easy to see that the new syntax is far easier to understand. An attribute or datatype property instance uses almost the same form as a relation: PropertyAssertion(R a v) with a being an individual name, R being a datatype property expression and v being a literal.A negative attribute uses the following form respectively: NegativePropertyAssertion(R a v) Individual equality is an axiom stating that two (or more) names refer to the same individual, i.e. that the names are synonyms. An individual inequality on the other hand makes explicit that the names do not refer to the same individual. Ontology languages with the unique name assumption assume that two different names always (or by default) refer to two different individuals. In OWL, this is not the case: OWL does not hold to the unique name assumption, and thus OWL does not make any assumptions about equality or inequality of two individuals referred to by different names. The syntax for these axioms is as follows (for i ≥ 2): SameIndividual(a1 a2 ... ai) DifferentIndividuals(a1 a2 ... ai)

2.2.2 Class axioms A terminological axiom is either a class axiom or a property axiom. A class axiom can either be a subsumption, class equivalence, disjoint, or disjoint union. A subsumption has the following form: SubClassOf(CD) with C (the subclass) and D (the superclass) being class expressions. This axiom states that every individual in the extension of C also has to be in the extension of D. This means that individuals in C are described as being individuals in D. For example, SubClassOf(Square P olygon) describes squares as polygons. Subsumptions may be simple subsumptions, com- plex subsumptions, or descriptions. In a simple subsumption both the subclass and the superclass are class names instead of more complex class expressions. Simple subsumptions form the backbone of class hierarchies.

26 n hsteaimdfie nitiaecniino h osbemdl.Js like Just models. possible the on condition stand. intricate an subsumptions defines complex axiom the thus and with a defining described completely used IntersectionOf(Woman be names can EquivalentClasses(Mother the definition mother of a a meaning axiom: example, Thus, the following an on As the building by expression. description. by class class name defining a the complex and of in sufficient the meaning a complete of both the offering means offers name, by class a condition about necessary statement strongest the is definition expression class complex extension. same the have they that i.e. class, same the mean are axiom equivalences with of namesake the are Descriptions they and class. ontology, named an the in of expression axioms logics individual interesting class description an most complex the be the among also then are will class superclass, expression complex the class the to complex is fit class also must a named class offers the named the If in a individual expression. offers each expression i.e. class class, complex named the then a subclass, is the expression class complex understand expression to class hard rather be may ontology. restrictions the Such of users ontology. by the of models possible expressions class aen omnidvdas hstp faimi ytci ua o h following the for sugar syntactic is axiom of type This axiom: individuals. common no have fayo h w lse nacaseuvlnei a is equivalence class a in classes two the of any If DisjointClasses( A na In a In EquivalentClasses( A a In na In ls equivalence class disjoint C C description n n ope ls equivalence class complex ope subsumption complex ipecasequivalence class simple 1 , 1 , ucetcondition sufficient ls names class Mother ≤ ≤ n sa xo ftefr (for form the of axiom an is n , ≤ ope ls equivalences class complex ≤ . i sa as being , rteohrwyaon.Ti ecie h ae ls,ie the i.e. class, named the describes This around. way other the or , ihrtesbls sa is subclass the either , i ope usmto hsst nitiaersrcino the on restriction intricate an sets thus subsumption complex A . being , C 1 Woman C hsi iia oasnnm ic tsae httonames two that states it since synonym, a to similar is This . ssae sflos(for follows as stated is C 1 ls expressions class uhaim n hi mlctosmyb adt under- to hard be may implications their and axioms such 2 C ls expressions class ... 2 hnteaimi a is axiom the then , iha with condition ftenmdcas ..ec niiulfitn othe to fitting individual each i.e. class, named the of ... ohtesbls n h uecasare superclass the and subclass the both C i ohcasepesoso h ls equivalence class the of expressions class both , ) C child ohcassare classes both i ) ftenmdcas ftenmdcasis class named the If class. named the of i ls name class . ≥ 2): ls qiaecsare equivalences Class . oeausrmcidThing))) SomeValuesFrom(child h xo ttsta w classes two that states axiom The . i or , ≥ definition 2): definitions ope ls expressions class complex ls name class n h uecasa superclass the and eesr condition necessary ftecasnm.A name. class the of . n h te a other the and ipeclass simple . Axioms 2.2 complex complex fthe of 27 ,

2 Chapter 2 Terminology and Preliminaries

SubClassOf(C1 ComplementOf(C2 ... Ci)) for all Cn, 2 ≤ n ≤ i. A disjoint union has the form (for i ≥ 2): DisjointUnion(CD1 D2 ... Di) stating that the class C is a union of all classes Dn, 1 ≤ n ≤ i, and at the same time the classes Dn, 1 ≤ n ≤ i are all mutually disjoint. A disjoint union is also called a complete partition or a covering axiom.

2.2.3 Property axioms A property axiom describes formal semantics of properties. Unlike classes, prop- erties can hardly be described or even defined with a single axiom. In other words, whereas an OWL ontology allows to classify individuals based on their descriptions, the same is not true for property instances. Property axioms can be used to define the formal semantics of properties, but this is hardly ever expressive enough to define a property. This is because the only property expressions are inverse property and property chains. The available property axioms either define (i) the relation between two properties, (ii) their domain or range, (iii) their type, (iv) or keys for individuals. The formal semantics of all these axioms are given in Table 2.1 on page 34. Relations between properties are subproperties (e.g. mother as a subproperty of parent), equivalent properties (e.g. mother as an equivalent property to mom), disjoint properties (e.g. mother and father have to be disjoint sets), and inverse properties (e.g. child is inverse to parent). A domain definition defines the class an individual belongs to, if that property points from it. A range definition defines the class an individual belongs to, if that property points to it. E.g. the property wife would connect a groom with his bride, and thus have the domain Man and the range Woman (based on a conservative conceptualization not accounting for same-sex marriages). Note that domains and ranges are not constraints, i.e. the system will usually not detect inconsistencies if the related individuals do not belong to the specified class, a behaviour often anticipated by programmers since it resembles the usage of signatures in procedure definitions. In order to do so, the ontology requires either sufficient disjoints between the classes (see Section 9.2.1) or we need to add the ability to formulate domains and ranges as constraints (see Section 9.2.4). A number of axioms can be used to declare specific formal types of properties. These are functional, inverse functional, reflexive, irreflexive, symmetric, asymmet- ric, and transitive property declarations. The formal semantics of all these properties are given in Table 2.1. Finally, property axiom can define keys over one or more properties. Keys, just as inverse functional properties are important to infer the identity of individuals,

28 qiaetls(ahrHsau(oeFather)) HasValue(Role EquivalentClass(Father Person)) SomeValuesFrom(Child EquivalentClass(Father Child) InverseProperties(Parent Parent) SubPropertyOf(Father have). Man) can PropertyRange(Father or has role, Person) person class a the PropertyDomain(Father roles of all instance Father) for an query ClassAssertion(Familyrole (as to individual used an be the and can Consider fathers), return all class. in a (of which and class individual a an father), both its to point where may example name same following the and one i.e. the OWL2, Since An added. Entities be to 2.3 versions, more previous for allows defines to but already pointing annotations, standard ontology ontology, OWL these The the of etc. number of versions, a version previous with the compatibility ontology, stating feature the statements new of make this author to used allowed have We is powerful. it more 6.3 . introduced much Section was is in which punning analyzing directly, for Since classes annotations approach with about AEON meta-properties ). 6.3 the the Section expressing punning example, by (see the For possible state- only of make was properties. to introduction hierarchies required class or the also classes were Before annotations about we 4.1.5) thesis ments usage. Section this etc. such (see throughout OWL2 deprecated, of and in are evaluations, mechanism examples for properties see used which frequently be introduced, will can class annotations a these was of Many when detail. axiom, in specific 4.2.3 a Section in this investigate will about We information label. further human-readable adds but semantics, DL the elements. on the impact no has annotation value notation An sources. Annotations heterogeneous from 2.2.4 data merging in role crucial a play thus and nooyannotations Ontology stated who e.g. itself, data the about further express can Annotations is annotation deployed widely most The nooyenitity ontology annotation oncsan connects . names Elements a ean be may eeecn hs niisd o aet edson anymore, disjoint be to have not do entities these referencing Father d eaaaaottewoeotlg,eg ttn the stating e.g. ontology, whole the about metadata add element a eeither be can individual stenm o rpry(oncigapro to person a (connecting property a for name the is yan by rdf:label a , entities class noainproperty annotation a , tcnet neeetwt a with element an connects It . , ontologies property ran or , or , axioms ihan with ontology . Entities 2.3 An . an- 29 .

2 Chapter 2 Terminology and Preliminaries

2.3.1 Individuals Individuals can be given by their name or as an anonymous individual. An individual can be any entity with an identity (otherwise it would not be possible to identify that entity with an identifier). An anonymous individual does not have a URI but provides only a local name instead. This means that the individual can not be identified directly from outside of the given ontology, but only through indirect means like inverse functional prop- erties, keys, or nominals. We will discuss anonymous individuals in Section 4.3 in more detail.

2.3.2 Classes A class is a set of individuals. A class is given by a class expression.A class expression may either be a class name or a complex class description.A class name is simply the name, i.e. a URI, of a class. Class names do not carry any further formal information about the class. A complex class expression defines a class with the help of other entities of the ontology. In order to create these expressions, a number of constructs can be used. The formal semantics and exact syntax of all these constructs are given in Table 2.2. The constructs are set operations or restrictions. The available set operations are intersections, unions, complements, and nom- inals.A nominal defines the extension of a class by listing all instances explicitly. The available restrictions are the existential restriction on a property (i.e. a class of all instances where the property exists), the universal restriction (on a property P and a class C, constructing a class where the instances have all their P property values be instances of C), unqualifed number restriction, the qualified number restriction, and the self-restriction (on a property P , stating that an instance has to be connected to itself via P ). As we can see, classes can be expressed with a rich variety of constructs, whereas the same does not hold for individuals and properties.

2.3.3 Properties Properties are given by a property expression. Most often, a property ex- pression is just a property name. The only complex property expressions are inverse properties and property chains. An inverse property is the property expression that is used when the subject and the object exchange their place in a property instantiation. For an example, child is the inverse property of parent. Instead of giving the inverse property the property name parent, we could have used the property expression InverseOf(child) instead.

30 expressions o eilzn xosadtu noois ebleeta hsi h otunderstand- most the is this that believe We ontologies. (Motik thus Syntax and Functional axioms OWL2 serializing the for using are we thesis this Throughout Semantics 2.4 there line first the rdf:about=""> in quotes. name 2010-01-01 ontology rprychain property ae ontology named aavalue data nae ontology unnamed uncle sgvnb a by given is onc w niiul ihec other. each with individuals two connect . nacan ..teproperty the e.g. chain, a in parent noainaxioms annotation a eeither be can a eete a either be can location sntrpeetdb R u ahrb a by rather but URI a by represented not is aavalue data aayemap datatype , sthe is sister sa nooyta xlctysae t aeisd h ontol- the inside name its states explicitly that ontology an is nooyentities ontology san is saalbe h oainmyb sdisedo h name the of instead used be may location the available, is rpryexpression property and , . betproperties object named parent ontology husband o ,tetpdliteral typed the example, For . eg osaeteatoigisiuino version or institution authoring the state to (e.g. uncle and ran or uncle .Sneteeaen ola prtr on operators boolean no are there Since ). (since hthsno has that brother n a have can and nae ontology unnamed a edsrbda superproperty a as described be can oa name local uncle htcnet several connects that or hsontology this yuigtefloigaxiom: following the using by aaproperties Data aaproperties data name a lob h hiigof chaining the be also may axioms literal ie xlctywithin explicitly given literal hc srepresented is which o xml,in example, For . hsontology this odsrb them, describe to "4"^^xsd:int . h syntactic the , . Semantics 2.4 n the and Ontologies tal. et onc an connect property . 2009b) , Object data was 31 is

2 Chapter 2 Terminology and Preliminaries able OWL2 syntax curently available. OWL2 Functional Syntax is easy to read and nevertheless concise, and unlike DL syntax it actually reflects not only the semantics of the axioms but often also their intention. To give an example: a domain declaration in Functional Syntax is written as

PropertyDomain( mother Female ) whereas in DL syntax the same statement would be

∃mother.> v Female

Although the DL syntax is more concise, the intention of a domain declaration is easier to see from the Functional Syntax. RDF based syntaxes such as N3 on the other hand become very unwieldy and need to deal with many artifacts introduced to the fact that complex axioms need to be broken down in several triples (see the example below). Table 2.1 describes all OWL axiom types, their direct set semantics, and their translation to RDF. The table is abbreviated: for all axiom types marked with *, it contains only the version with two (resp. three in the case of the DisjointUnion axiom type) parameters, even though the parameter list can be arbitrarily long. This often complicates the RDF graph enormously. To give one example: the DisjointProperties axiom type is given in Table 2.1 with two possible parameters, R and S. This can be expressed in RDF with a single triple:

R owl:propertyDisjointWith S .

But the axiom type can use an arbitrary number of parameters, e.g.

DisjointProperties(RST ) stating that all the given properties are mutually disjoint, i.e.

(R u S) t (R u T ) t (S u T ) ≡ ⊥

Translating this axiom to RDF yields a much more complicated graph than the single triple above:

_:x1 rdf:type owl:AllDisjointProperties . _:x1 owl:members _:x2 . _:x2 rdf:first R . _:x2 rdf:rest _:x3 .

32 ain falette,terae hudcnuttestandards. the decla- consult normative should and reader primer, the a entities, detail, all further of to For rations adults 2009). for Motik, age and the Patel-Schneider restrict years. may eighteen one to equal i.e. properties. or values, object data than only of bigger for range allow considers be further the it but constrain way, as analogous to an abbreviated allow in also built are is properties Table Datatype The parameters. of and . rdf:nil rdf:rest . _:x4 T rdf:first . _:x4 _:x4 rdf:rest . _:x3 S rdf:first _:x3 aecmie rm( Motik from compiled are 2.2 Table and 2.1 Table The way. same the in abbreviated is 2.2 Table PropertyChain xrsintpscnalacmdt oeta h ie number given the than more accomodate all can types expression IntersectionOf tal. et Motik 2009b; , , UnionOf facets . Semantics 2.4 tal. et , Facets . 2009a; , OneOf 33 ,

2 Chapter 2 Terminology and Preliminaries

Functional syntax Set semantics RDF-Graph (N3) ClassAssertion(C a) a ∈ C a rdf:type C. PropertyAssertion(R a b) (a, b) ∈ R a R b. NegativePropertyAssertion (a, b) ∈/ R :x rdf:type (R a b) owl:NegativePropertyAssertion. :x owl:sourceIndividual a. :x owl:assertionProperty R. :x owl:targetIndividual b. SameIndividual(a b)* a = b a owl:sameAs b. DifferentIndividuals(a b)* a 6= b a owl:differentFrom b. SubClassOf(CD) C ⊆ D C rdfs:subClassOf D. EquivalentClasses(CD)* C ≡ D C owl:equivalentClass D. DisjointClasses(CD)* (C ∩ D) ≡ ⊥ C owl:disjointwith D. DisjointUnion(CDE)* C ≡ (D ∪ E) C owl:disjointUnionOf :x. (D ∩ E) ≡ ⊥ :x rdf:first D. :x rdf:rest :y. :y rdf:first E. :y rdf:rest rdf:nil. SubPropertyOf(RS) R ⊆ S R rdfs:subPropertyOf S. EquivalentProperties(RS)* R ≡ S R owl:equivalentProperty S. DisjointProperties(RS)* (R ∩ S) = ⊥ R owl:propertyDisjointWith S. InverseProperties(RS) (a, b) ∈ R ↔ (b, a) ∈ S R owl:inverseOf S. PropertyDomain(RC) (a, b) ∈ R → a ∈ C R rdfs:domain C. PropertyRange(RC) (a, b) ∈ R → b ∈ C R rdfs:range C. FunctionalProperty(R) (a, b) ∈ R ∧ (a, c) ∈ R R rdf:type → b = c owl:FunctionalProperty. InverseFunctionalProperty(R) (a, c) ∈ R ∧ (b, c) ∈ R R rdf:type → a = b owl:InverseFunctionalProperty. ReflexiveProperty(R) a ∈ > → (a, a) ∈ R R rdf:type owl:ReflexiveProperty. IrreflexiveProperty(R) a ∈ > → (a, a) ∈/ R R rdf:type owl:IrreflexiveProperty. SymmetricProperty(R) (a, b) ∈ R ↔ (b, a) ∈ R R rdf:type owl:SymmetricProperty. AsymmetricProperty(R) (a, b) ∈ R → (b, a) ∈/ R R rdf:type owl:AsymmetricProperty. TransitiveProperty(R) (a, b) ∈ R ∧ (b, c) ∈ R R rdf:type → (a, c) ∈ R owl:TransitiveProperty. HasKey(CRS)* (a, c) ∈ R ∧ (b, c) ∈ R C owl:hasKey :x. ∧(a, d) ∈ S ∧ (b, d) ∈ S :x rdf:first R. → a = b :x rdf:rest :y. :y rdf:first S. :y rdf:rest rdf:nil.

Table 2.1: Semantics of OWL axioms. Axiom types noted with * may hold more than the given parameters.

34 Smniso W xrsin sn betpoete dttpsproperties (datatypes properties object using expressions OWL of Semantics 2.2: Table PropertyChain( ExactCardinality( MaxCardinality( MinCardinality( ExactCardinality( MaxCardinality( MinCardinality( HasSelf( HasValue( AllValuesFrom( SomeValuesFrom( OneOf( ComplementOf( UnionOf( IntersectionOf( syntax Functional a ) R D C * a R ) r nlgu) xrsintpswt a odmr parameters. more hold may * with types Expression analogous). are ) ) C * S R C R ) C R n C R n R n R n C R D C C R n R n ) ) ) ) * ) ) * ) ) ) ) { { { { { { ¬ C C semantics Set { { { { { { x x x x x a ( x x x x x C ,b a, ∪ ∩ } | |∃ |∃ |∀ |∃ | | | | | # # # # # # D D ( ( ( (( { { { { { { ) ,x x, a x, y x, |∃ y y x, y y y y y | | | | | | ( ( ( ( ( ( ( ) ) ,y x, ,y x, y x, y x, y x, y x, ) ,x a, ) ∈ ∈ ∈ ∈ ) ) ) ) ) ) ) R R R R ∈ ∈ ∈ ∈ ∈ ∈ ∈ } } → ∧ R R R R R R R y y ≥ } } ≤ } ∧ ∧ ∧ ∧ ∈ ∈ = y y y ( ,b x, C C n n n ∈ ∈ ∈ } } } ) } } C C C ) ∈ } ≤ } ≥ } = S } n n n } } } D-rp (N3) RDF-Graph yrdf:first :y owl:unionOf :x rdf:nil. rdf:rest :y rdf:first :z rdf:rest :y rdf:first :y owl:intersectionOf :x zrfrs rdf:nil. rdf:rest :z rdf:first :z rdf:rest :y rdf:first :y owl:propertyChain :x owl:qualifiedCardinality :x owl:onClass :x owl:onProperty :x owl:Restriction. rdf:type :x owl:maxQualifiedCardinality :x owl:onClass :x owl:onProperty :x owl:Restriction. rdf:type :x owl:minQualifiedCardinality :x owl:onClass :x owl:onProperty :x owl:Restriction. rdf:type :x owl:cardinality :x owl:onProperty :x owl:Restriction. rdf:type :x owl:maxCardinality :x owl:onProperty :x owl:Restriction. rdf:type :x owl:minCardinality :x owl:onProperty :x owl:Restriction. rdf:type :x owl:hasSelf :x owl:onProperty :x owl:Restriction. rdf:type :x owl:hasValue :x owl:onProperty :x owl:Restriction. rdf:type :x owl:allValuesFrom :x owl:onProperty :x owl:Restriction. rdf:type :x owl:someValuesFrom :x owl:onProperty :x owl:Restriction. rdf:type :x rdf:nil. rdf:rest :y rdf:first :y owl:oneOf :x owl:complementOf :x rdf:nil. rdf:rest :y rdf:first :z rdf:rest :y :z. :z. :z. C D C S R a D :y...... C C C true :y. a . . . . R R R R R R R R R R . n ...... Semantics 2.4 C . C . :y. n n C . :y. . . . n . n n . . 35

2

w ics h ocpulztosadtheir and conceptualizations the discuss we 3.5 models Section of construction In the constraints ontology. (iv) and the document, satisfying to ontology an try by we expressed Nevertheless, is 11.1. Section in to explicitly close named as be remain will which premises few selection and evaluation meta-ontology ontology semiotic the of by inspired is framework following The Overview in 3.1 detail more 3.8). (Section in work aspects 3.6), discussed this ontology (Section finally and of criteria are 3.7), limits quality ontology (Section 3.5), the methods the (Section outline define evaluation conceptualizations of then we concepts sections: We following specification, The the 3.3 . that 3.4. Section on Section in Based in ontologies the of of earlier. types overview formally presented concepts, different informal be related framework and an connections. will evaluation the their give ontology thesis and of specifying we terms ontology this relevant an First, the describes of 3.2 introducing chapter. rest Section 3.1, Section this The in in framework evaluation. framework whole ontology the for on framework built a introduce We Framework 3 Chapter An ontology i pcfisacnetaiain i)cnit fasto xos (iii) axioms, of set a of consists (ii) conceptualization, a specifies (i) O 2 and oQual sreasonable. as oQual (Gangemi tal. et fadsue o h settlement. the not dispute, point a starting of the is definition A .W iareo a on disagree We 2006b). , Ni ota,1931–2003, Postman, (Neil agaeEuaini a in Education Language O 2 nweg Context Knowledge n h ontology the and ota,1980)) (Postman, 37

3 Chapter 3 Framework

Figure 3.1: Framework for ontology evaluation. The slashed arrow represents the ex- presses relation.

relation to ontologies in detail. The structural definition of an ontology as a set of axioms was given in Section 2.2. The serialization or expression of an ontology as an ontology document was described in Section 2.1. Constraining the models is done by the semantics of an ontology as given in Section 2.4. Ontologies are not artifacts in a narrow sense, but are expressed by ontology docu- ments which in turn are artifacts. Whenever one speaks about ontologies as artifacts they mean ontology documents. Evaluation methods are descriptions of procedures that assess a specific quality of an ontology. Since methods cannot asses an ontology directly (since they are not artifacts), methods directly always evaluate ontology doc- uments. Only indirectly it is possible for an evaluation method to assess an ontology (i.e. by assessing the ontology document that expresses the ontology). Figure 3.1 shows that in a bit more detail by describing the different levels an ontology document may express: either an XML Infoset, an RDF graph, or the ontology directly. These nuances were already discussed in Section 2.1. Figure 3.1 summarizes the relations between models and conceptualizations, ontolo- gies, ontology documents, and evaluation methods. An ontology evaluation may be expressed by an ontology, which has the advan- tage that the result can be reused with the very same tools that we use with the ontologies anyway. It enables us to integrate the results of several different methods and thus to build complex evaluations out of a number of simple evaluations. To give an example: the result of a method such as calculating the ratio between the normalized and unnormalized depth (described in Section 8.2) may be represented as a simple fact in an ontology profile:

38 n hncet e ls fotlge ae nti eaaaaotontologies: about metadata this on based ontologies EquivalentClasses(FluffyOntology} of class new a create then and "1.0"^^xsd:decimal) fact: second Ontology1 a PropertyAssertion(instancedClassRatio in result and ontology, an in classes "2.25"^^xsd:decimal) Ontology1 PropertyAssertion(normalDepthRatio ersnigteaim h atln dsapoet ietyrltn h reified the the by relating given directly as property (e.g. 6.3). analyses a Section some in formulating adds individual approach with the line AEON helps to which last them superclass, The connects and and sub- terms), axiom. reifying the on creating details representing lines, more third for and 3.2.2 (second Section names class mentioned the reifies m:Cat then line), first the type, axiom single a o:Pet) on SubClassOf(o:Cat based works ontology. reification an in the entities how and This axioms demonstrate subsumption. ontology. the an will represent reify to we to entities Here vocabulary new the create provides we meta-ontology that the means of part major A ontologies ontologies how Reifying meta- demonstrate a meta-ontology.3.2.1 then the of will of advantages we help of the 3.2.2 the 3.3 example show with Section Section will classified one In works. be we show can meta-ontology thesis. 3.2.3 we the this Section 3.2.1 how within In Section ontology understand In entities. to reifying order ontologies. discusses in about axiom ontology 2 reified Chapter an a in is presented (Vrandeˇci´c it terminology meta-ontology since the a and as 3.1 specified Section is in presented framework The Meta-ontology design 3.2 its others. measurements, by of used results being as is such it metadata how ontology or policy, of kind such describe ic h eut r ohepesdi W,w a aiycmietetofacts two the combine easily can we OWL, in expressed both are results the Since uninstantiated and instantiated between ratio the calculate may method Another h efiainfis rae e niiult ersn h xo ( axiom the represent to individual new a creates following: the first is reification subsumptions The for syntax the 2.2.2, Section in shown As (Hartmann nescinfHsau(omletRto"1.0"^^xsd:decimal) IntersectionOf(HasValue(normalDepthRatio and m:Pet tal. et ,cnet h efidtrst h em nteoiia xo (see axiom original the in terms the to terms reified the connects ), nrdcdthe introduced 2005) , aVleisacdlsRto"1.0"^^xsd:decimal))) HasValue(instancedClassRatio nooyMtdt Vocabulary Metadata Ontology tal. et .W ali meta-ontology a it call We 2006c). , . Meta-ontology 3.2 m:Axiom1 OV to (OMV) 39 in

3 Chapter 3 Framework

Figure 3.2: A subsumption axiom (on the left) and its reification. Dotted lines rep- resent instantiation, slashed lines annotations, circles individuals, and squares classes.

ClassAssertion(meta:Subsumption m:Axiom1) ClassAssertion(meta:Simple_class_name m:Cat) ClassAssertion(meta:Simple_class_name m:Pet) EntityAnnotation(o:Cat Annotation(meta:reifies m:Cat)) EntityAnnotation(o:Pet Annotation(meta:reifies m:Pet)) PropertyAssertion(meta:subclass m:Axiom1 m:Cat) PropertyAssertion(meta:superclass m:Axiom1 m:Pet) PropertyAssertion(meta:subClassOf m:Cat m:Pet)

The namespaces used in this example are meta for the meta-ontology itself, m for the reifying ontology, and o for the ontology that is being reified. Figure 3.2 illustrates the given example. Every axiom is also explicitly connected to the ontology it is part of, and also the number of axioms is defined. This allows us to close the ontology and thus to classify it (see the classification example in Section 3.3.4). Furthermore, the ontology is also defined to be about all the terms used in the ontology.

ClassAssertion(meta:Ontology m:Ontology1) PropertyAssertion(meta:axiom m:Ontology1 m:Axiom1) ClassAssertion(ExactCardinality(1 meta:axiom) m:Ontology1) PropertyAssertion(meta:about m:Ontology1 m:Cat) PropertyAssertion(meta:about m:Ontology1 m:Pet)

The meta-ontology includes further axioms about the terms regarding subsumptions.

FunctionalProperty(meta:subClass) PropertyDomain(meta:subClass meta:Subsumption)

40 ucntyaduabgosy h olwn ie noeve fsm fteuses the of some of to. methods put overview the be an express can gives to ontology following order reified in The the ontologies unambiguously. reified use and will succinctly we thesis, this Throughout meta-ontology a of the Advantages at URI 3.2.3 reified given the allows about simply This information by returns namespace. URIs function that fixed service invertible reified namespace. Web a an fixed a the to use up generate URI We set escaped) to to URIs. us (i.e. sense actual URI-encoded makes their an on it concatenating based URIs, entities random the URI new for the creating case this simply (in by thesis this of author the of name the whereas means, This are the individuals if names. these even different representing example, individuals for reified So, the of ontology. individuals synonymous the are and in terms classes used two the names that about claims the ontology statements about make but not domain, the does ontology reified The URIs Reifying 3.2.2 property a subsumption detected). inferred in- not expressed becoming are to order that without errors axioms adding erroneous of adding risk of the risk (i.e. the consistent decrease These ontology. meta:superClass) SubPropertyOf(PropertyChain(InverseOf(meta:subClass) meta:Class) PropertyRange(meta:subClass meta:Subsumption) PropertyDomain(meta:subClass FunctionalProperty(meta:superClass) meta:Class) PropertyRange(meta:subClass vntog twudb ayt raenwUI o vr efidaimadentity and axiom reified every for URIs new create to easy be would it though Even ifrnIdvdasmDnym:Zdenko) DifferentIndividuals(m:Denny o:Zdenko) SameIndividual(o:Denny meta- the property of The terms the between relations formal further determine axioms These • • PRLqeiscnb sdt icvrteapiaino nooypten or patterns ontology examples). of for application 6.2 the Section discover (see ontology to anti-patterns used the explicit. be classify ontology can to the queries SPARQL about used information be make can to meta-ontology thus and the itself how shows 3.3 Section meta:subClassOf) meta:subClassOf o:Denny ersnsteato fti thesis, this of author the represents nyrgrsepiil ttdsbupin In subsumption. stated explicitly regards only meta:inferredSubClassOf not o:Denny h ae ic hyare they since same, the m:Denny . Meta-ontology 3.2 ). represents exists. 41

3 Chapter 3 Framework

• There exists no standard OWL query language and SPARQL can not be used to query for the existence of OWL axioms (see Section 6.2 for details). Based on a reified meta-ontology it is possible to use SPARQL for queries against the axiom structure of the ontology (instead merely the RDF graph structure).

• Meta-properties and constraints on meta-properties can be directly expressed and reasoned over. The AEON approach uses the OntoClean methodology (Guarino and Welty, 2002) in order to first state the meta-properties (e.g. if a class is rigid or not) and then to check the constraints automatically (see Section 6.3 for details).

• Additional axioms can be added to check if the ontology satisfies specific con- straints, for example, it is easy to check if a subsumption axiom is ever used to express the subsumption of a class by itself by adding that meta:subClassOf is irreflexive and checking the resulting ontology for consistency.

3.3 Types of ontologies

3.3.1 Terminological ontology A terminological ontology is an ontology that consists only of terminological axioms and annotations. This is introduced in order to account for the often found understanding that an ontology indeed is a set of only terminological axioms. This disagrees with the W3C definition of the term ontology, where an ontology may also include facts or even be constituted only of facts. In DL terms, a terminological ontology consists only of a TBox, i.e. terminological knowledge. In terms of the meta-ontology, we define a terminological ontology as follows:

EquivalentClasses(Terminological_ontology IntersectionOf(Ontology AllValuesFrom(axiom UnionOf(Terminological_axiom Annotation))))

This distinction is a fairly common one, even though all facts can be expressed as terminological axioms (using nominals) thus making this distinction irrelevant. We proof this by offering terminological axioms that could replace each type of fact:

Proof. A fact is either an instantiation, a relation, an attribute, or an indi- vidual (in)equality. An instantiation is expressed as ClassAssertion(C a) Written as a terminological axiom the same meaning is conveyed like this: SubClassOf(OneOf(a) C)

42 rprynmsadcasnmsfo igeotlg.Otnsc instantiating such 2005). EquivalentClasses(FOAF_file Miller, Often and (Brickley ontology. ontology FOAF single a the a e.g. of ontology, instantiation from the by knowledge names named Often, are class ontologies ontology. and that names in property An introduced ontologies. names several class instantiate bases and names property use nD em,akoldebs nyhsa Bx ..cnan nyasrinlknowl- assertional EquivalentClasses(Knowledge_base only contains i.e. ABox, an has only base edge. knowledge a terms, DL In A base Knowledge 3.3.2 nweg base knowledge A to: DisjointClasses(OneOf( transformed be can This DifferentIndividuals( as: Finally, written be can this EquivalentClasses(OneOf( axiom terminological a As SameIndividual( An way: following the SubClassOf(OneOf( restated be can This NegativePropertyAssertion( relations as Negative written be can it SubClassOf(OneOf( axiom terminological a As PropertyAssertion( relations Positive Relations nweg base knowledge IntersectionOf(Ontology IntersectionOf(Ontology niiulequality individual niiulinequality individual and attributes b a san is instantiates lVleFo(xo UnionOf( AllValuesFrom(axiom AllValuesFrom(axiom a a ) b a R ComplementOf(HasValue( ) HasValue( ) IntersectionOf(Relation Annotation and or ontology b a eaieattributes negative a oiieattributes positive sepesdas: expressed is OneOf( ) ) ) a a eete oiieo negative. or positive either be can OneOf( ) b a R notlg ftefcswti h nweg base knowledge the within facts the if ontology an sepesdas: expressed is b R htcnit nyof only consists that noO(atAnnotation)))) UnionOf(Fact b xlsv nooyinstantiation ontology exclusive ) )) )) b aVlepoet FOAF_property)) HasValue(property )) OFfile FOAF r xrse as expressed are r xrse as: expressed are b R sa fe xlsv ontology exclusive often an is ))) facts . ye fontologies of Types 3.3 and annotations ssonly uses 43 .

3 Chapter 3 Framework

IntersectionOf(Attribute HasValue(property FOAF_property)) IntersectionOf(Instantiation HasValue(class FOAF_class)))))) EuivalentClasses(FOAF_property IntersectionOf(Property HasValue(InverseOf(about) FOAF))) EuivalentClasses(FOAF_class IntersectionOf(Class HasValue(InverseOf(about) FOAF)))

Note that ontologies are not partitioned into terminological ontologies and knowl- edge bases. Many ontologies on the Web will contain both terminological axioms and facts, and thus not belong to one or the other. Ontologies that include both facts and terminological axioms are called populated ontologies. We can further classify them into populated proper ontologies or populated taxonomies, based on the types of included terminological axioms.

3.3.3 Semantic spectrum The semantic spectrum defines one dimension of ontologies, ranging from the most simple and least expressive to the most complex and most precise ones. It was first presented at an invited panel at the AAAI 1999 in Austin, Texas, then published in (Smith and Welty, 2001) and refined in (McGuiness, 2003; Uschold and Gruninger, 2004). We aggregate the types of ontologies they report on as the following five, ordered by increasing complexity (see Figure 3.3 for a common visualization):

• catalogs / sets of IDs

• glossaries / sets of term definitions

• thesauri / sets of informal is-a relations

• formal taxonomies / sets of formal is-a relations

• proper ontologies / sets of general logical constraints We name the most expressive type of ontologies proper ontologies rather than formal ontologies as in (Uschold and Gruninger, 2004) since we regard an ontology as being formal by definition (Gruber, 1995). In order to appropriately evaluate an ontology, we have to first determine its type. The type of an ontology determines which evaluation methods can be useful for the ontology, and which make no sense. There are two ways to determine the type of the ontology: prescriptive and descriptive. Prescriptive determination is given by the

44 Label qiaetlse(ae_rpryHsVleifreSprrpryLabel)) Has_Value(inferredSuperProperty EquivalentClasses(Label_property EquivalentClasses(Label_annotation e.g. subproperties, its of EquivalentClasses(Label_annotation the labels. readable human with URIs EquivalentClasses(Catalog of set a meta-ontology. just the is in catalog types a above the define given. can actually we is since ontology automatically of ontology based type and what fulfill say required. should and is ontology ontology that an the ontology examining that of by task type the the define that they on i.e. authors, ontology fw att nld o nyisatain fthe of instantiations only not include to want we If case, simplified the In an classify actually can we 3.2 Section A in described meta-ontology the on Based IntersectionOf(Annotation IntersectionOf(Annotation IntersectionOf(Ontology rdfs:label catalog annotation Catalog Without automatedWithout reasoning sa nooyta osssol flblanttos hsmasthat means This annotations. label of only consists that ontology an is Glossary noainpoet ( property annotation Tesmni pcrmfrontologies. for spectrum semantic The 3.3: Figure oas nld t subproperties. its include also to lVleFo(noainpoet Label_property))) AllValuesFrom(annotation_property Label))) HasValue(annotation_property Label_annotation))) AllValuesFrom(axiom Label Thesaurus skos:prefLabel annotation complexity Label With automatedWith reasoning Taxonomy sdfie sa noaininstantiating annotation an as defined is or stericto of reification the is ecitv determination Descriptive skos:altLabel Proper Ontology rdfs:label . ye fontologies of Types 3.3 ene oredefine to need we , rdfs:label rprybtalso but property ). sgiven is 45

3 Chapter 3 Framework

Due to OWL’s open world semantics this definition is much harder to reason with, since the reification of the ontology we want to classify needs to include sufficient axioms to make other models impossible. We will discuss possible solutions of that in Section 3.3.4. A glossary is an ontology that only has annotations. This way, only human read- able, informal definitions of the terms can be given.

EquivalentClasses(Glossary IntersectionOf(Ontology AllValuesFrom(axiom Annotation)))

A thesaurus is an ontology that, besides annotations, also allows instantiations of classes and properties from the SKOS ontology (Miles and Bechhofer, 2009). SKOS (Simple Knowledge Organization System) is an ontology that allows to define the- sauri with a number of predefined relations between terms, such as skos:narrower or skos:broader.

EquivalentClasses(Thesaurus IntersectionOf(Ontology AllValuesFrom(axiom UnionOf( Annotation IntersectionOf(Relation HasValue(property SKOS_property)) IntersectionOf(Attribute HasValue(property SKOS_property)) IntersectionOf(Instantiation HasValue(class SKOS_class)))))) EuivalentClasses(SKOS_property IntersectionOf(Property HasValue(InverseOf(about) SKOS))) EuivalentClasses(SKOS_class IntersectionOf(Class HasValue(InverseOf(about) SKOS)))

Catalogs and glossaries do not provide the means to allow any inferences (they are simply not expressive enough). Glossaries allow a very limited number of inferences, due to domain and range axioms and inverse and transitive properties. A taxonomy or class hierarchy is an ontology that consists only of simple sub- sumptions, facts, and annotations.

EquivalentClasses(Taxonomy IntersectionOf(Ontology AllValuesFrom(axiom UnionOf(Simple_subsumption

46 ssae htteeaeeatytoaim nteotlg ie h nooyi closed), is ontology the (i.e. ontology the in axioms two exactly are there that stated is m:o1) m:Socrates m:Mortal DifferentIndividuals(m:Human m:Socrates) m:o1 PropertyAssertion(about m:Mortal) m:o1 PropertyAssertion(about m:Human) m:o1 PropertyAssertion(about m:o1) meta:axiom) ClassAssertion(ExactCardinality(2 m:o1) ClassAssertion(Ontology m:Mortal) m:Human PropertyAssertion(subClassOf m:Mortal) m:a2 PropertyAssertion(superclass m:Human) m:a2 PropertyAssertion(subclass m:a2) ClassAssertion(Subsumption m:Human) m:Socrates PropertyAssertion(type m:Socrates) m:a1 PropertyAssertion(instance m:Human) m:a1 PropertyAssertion(class m:a1) ClassAssertion(Instantiation m:Socrates) ClassAssertion(Individual m:Mortal) ClassAssertion(Class m:Human) ClassAssertion(Class i.e. types, simpler the subsumes ontologies of type each spectrum, semantic the In uCasfoHmno:Mortal) SubClassOf(o:Human o:Socrates) ClassAssertion(o:Human After- solutions. classified. possible be and may classification ontology such of an problems how discuss example we illustrative wards an gives Section This example Classification etc. 0), 3.3.4 is hierarchy class the of depth the since taxonomy, sn hswt h eaotlg,araoe a ne that infer can reasoner a meta-ontology, the with this Using h efidontology reified The ontology Given degenerated a obviously (though taxonomy a also is glossary every that means This A rprontology proper Catalog o : ⊂ m 2.2. Section in defined as axioms, possible all for allows finally Glossary stefollowing: the is ⊂ Thesaurus ⊂ Annotation)))) Fact Taxonomy ⊂ . ye fontologies of Types 3.3 m:o1 Ontology sa is Taxonomy it : 47

3 Chapter 3 Framework so it is allowed to make inferences from the universal quantifier in the definition of Taxonomy. Many reasoners cannot deal well with cardinality constraints. KAON2 (Motik, 2006) requires a long time to classify an ontology starting with more than four axioms, whereas Pellet (Sirin et al., 2007) and Fact++ (Tsarkov and Horrocks, 2006) start to break with ontologies having more than a few dozen axioms. Since ontologies may easily contain much bigger numbers of axioms, it may be preferable to write dedicated programs to check if an ontology is a taxonomy or not. These programs may assume certain structural conditions which the ontology reification has to adhere to. In this example, the dedicated classifier may assume that the ontology is always complete when reified and thus partially ignore the open world assumption.

3.4 Limits

Ontology evaluations are conducted on several different levels:

1. Ontologies can be evaluated by themselves.

2. Ontologies can be evaluated with some context.

3. Ontologies can be evaluated within an application. This is called application based ontology evaluation (Brank et al., 2005).

4. Ontologies can be evaluated in the context of an application and a task. This is called task based ontology evaluation (Porzel and Malaka, 2004).

In this thesis we will restrict ourselves to the first two possibilities. Note that each of the above levels gains from evaluating the previous levels, i.e. every ontology evaluated within an application should have been evaluated by itself and with some context before that. Many types of errors are much easier discovered on the first and second level than in the much more complex environment of an application or a task. The majority of this thesis deals with the first task (Chapters4–8), whereas the second point is dealt with in Chapter9. Ontology-based applications most often have certain requirements regarding the applied ontologies. For example, they may require that the data within the ontology is complete (e.g. a semantic birthday reminder application may require that all persons need to have at least their name and birthday stated), or they may have certain structural or semantic constraints (e.g. the class hierarchy must not be deeper than five levels). Such conditions can often be stated in a way that allows to use the evaluation methods within this thesis in order to ensure the application’s requirements are satisfied.

48 gn.I hsscinw ilrsletedsrpnybtensae n private and shared between discrepancy an the to resolve private will be we to section conceptualizations this assume conceptualizations. propositions we In that here in assume speak whereas agent. them logics we shareable, from here epistemic are whereas differs Second, propositions, itself but about conceptualizations. logics, are about logics epistemic epistemic mainly conceptualiza- of first, shared reminiscent points: a is main achieve two formalization can agents The how formalizes tion. and describes section This Conceptualizations meet 3.5 to Further order in here. framework presented the framework within evaluator. combined the the and of of needs created exemplary usage the of be the list easily a illustrate can as list to regarded complete methods a be order not rather are in should II Part They methods in presented methods. evaluation methods the regarded possible Also, be of to limitation. have this chapter, chapters, of following this light the in in throughout presented described methods framework the The thus and ontologies. well. Web as of domain verification the other that independent domain, any means some on use This usable will be thesis always evaluations. the will domain-independent throughout methods to presented methods, applied examples restrict evaluation the ontology will though domain-specific we even create thesis shared to this a sensible, in of often specification and difference. formal possible, this of the ramifications ontology than the an detail domain in that a discuss assume will of meth- 3.5 they description validation Section i.e. formal ontology conceptualization. ontologies a such of more Often methods understanding further is simpler validation. discuss a will ontology assume 11 towards ods Chapter more verification. lean ontology that to ourselves world limit the will the that with show models. to matches formal is the definitions goal with the The compliant of specify. is to meaning model meant the is ontology whether the to conceptualization refers Validation built. according criteria. built quality been ontology has specified ontology by certain the detected to that are circular confirms etc. Verification as schemes such verification. naming Errors ontology inconsistent axioms, specification. redundant the hierarchies, of class encoding the checks Verification correctly. (G´omez-P´erez, validation ontology 2004). and ification osmaie hstei ilcnetaeo h oan,ts- n application- and task-, domain-, the on concentrate will thesis this summarize, To easily is it Whereas domains. different many in used are ontologies applications, Like we automatized highly be can that methods on concentrates mainly thesis this Since validation Ontology verification Ontology ver- ontology tasks: two into Asunci´on G´omez-P´erez evaluation ontology separates stets feautn ftecretotlg a been has ontology correct the if evaluating of task the is stets feautn fteotlg a enbuilt been has ontology the if evaluating of task the is . Conceptualizations 3.5 49

3 Chapter 3 Framework

A rational agent has thoughts and perceptions. In order to express, order, and sort its thoughts and perceptions the agent creates, modifies, and uses conceptualizations. A conceptualization is the agent’s mental model of the domain, its understanding of the domain. We define CX (d) to be the conceptualization that agent X has of domain d, CX being a function CX : D → CX with D being the set of all domains and CX being the set of all conceptualizations of agent X. A shared conceptualization represents the commonalities of two (or more) con- ceptualizations of different agents. We define the operator ∩ for creating a shared conceptualization, but note that this should not be understood as the set-theoretic intersection operator – conceptualizations probably are not sets. It is rather an oper- ation done internally by the agent on two or more of his internal conceptualizations to derive the commonalities. Extracting a shared conceptualization can only be done with conceptualizations of the same agent, i.e. CX (d) ∩ CY (d) is undefined. This is because conceptualizations are always private (i.e. conceptualizations have no reality external to their agents and thus two agents can not have the same conceptualization). So in order to intersect the conceptualizations of two different agents, an agent needs to conceptualize the other agent’s conceptualization (as stated above, anything can be conceptualized, in particular another conceptualization). By interacting and communicating with the other agents, each agent builds conceptualizations of the other agents. These concep- tualized agents again are conceptualized with their own private conceptualizations. CX (CY (d)) is the conceptualization X has of the conceptualization Y has of a domain d (note that CX (CY (d)) tells us more about CX (Y ) and thus about X than about CY (d) or about Y ). For simplicity, we assume that each agent’s conceptualization of its own conceptualization is perfect, i.e. CX (CX (d)) = CX (d) (this is similar to the KK axiom in epistemic logics (Fagin et al., 2003)). Figure 3.4 illustrates two agents and their conceptualization of a domain (in the example, a tree) and their conceptu- alizations of each other and their respective conceptualizations. Note that the agents conceptualizations do not overlap. An agent can combine its conceptualizations to arrive at a shared conceptualization, i.e. CX (d) ∩ CX (CY (d)) results in what X considers to be the common understanding of d between itself and Y . Regarding a whole group of n agents we define the (private conceptualization of the) shared conceptualization SCX as follows (let CX be one of

CYi ):

n \ SCX (d) = CX (CYi (d)) i=1

Furthermore, X assumes that ∀i : CX (SCYi (d)) = SCX (d), i.e. X assumes that everybody in the group has the same shared conceptualization. This is true for all

50 3.5 Conceptualizations

3

Figure 3.4: Two agents X and Y and their conceptualizations of domain d (the tree), each other, and their respective conceptualizations.

members of the group, i.e. ∀i, j : CXi (SCYj (d)) = SCXi (d). In Figure 3.5 this is visualized by CX (X) and CX (Y ) having the same shared conceptualization SCX (d). An ontology O is the specification (defined as the function S) of a conceptualiza- tion C, i.e. it is the result of the externalization of a conceptualization O = S(C). For our discussion it is not important, how S was performed (e.g. if it was created collaboratively, or with the help of (semi-)automatic agents, or in any other way). The resulting ontology is a set of axioms that constrain the interpretations of the ontology. This has two aspects, depending on the interpreting agent: a formal agent (e.g. an application using a reasoning engine to apply the formal semantics) will have the possible logical models constrained, and based on these models it will be able to answer queries; a rational agent (e.g. a human understanding the ontology) is constrained in the possible mappings of terms of the ontology to elements of its own internal conceptualization (possibly changing or creating the conceptualization during the mapping). Figure 3.5 illustrates an agent Z who internalizes ontology O and thus builds its own conceptualization CZ of domain d from its understanding of the ontology (i.e. CZ (d) is not build from the perceptions of d by Z but from O). To give an example: if the property married is stated to be a functional property, a formal agent will only allow models to interpret the ontology where the set R denoted by married fulfills the condition (x, y) ∈ R ∧ (x, z) ∈ R → y = z. A rational agent in turn will map the term married to a monogamous concept of marriage, and will not use the term to refer to polygamous marriages.

51 Chapter 3 Framework

Figure 3.5: Three agents and an ontology. Y ’s conceptualization is omitted for space reasons. Z internalizes ontology O, thus connecting it to or creating its own conceptualization CZ of domain d, in this case, the tree.

In this case, the rational agent already had a concept that just needed to be mapped to a term from the ontology. In other cases the agent may need to first create the concept before being able to map to it (for example, let a wide rectangle be defined as a rectangle with the longer side being at least three times the size of the short side – in this case readers create a new concept in their mind, and then map the term wide rectangle to it). So after the creation of the ontology O, each member Y of the group can create a conceptualization of O, i.e. internalize it again. So a group member Yi gets the internal conceptualization CYi (O), and then compares it to its own understanding of the shared specification SCYi (d). Ideally, all members of the group will agree on the ontology, i.e. ∀i : CYi (O) = SCYi (d) Note that creating an ontology is not the simple, straight-forward process that is presented in this section. Most of the conceptualizations will be in constant flux during the process. The communication in the group during the creation of the specification may change each member’s own conceptualization of the domain, each member’s con- ceptualization of each other member’s conceptualization of the domain, each member’s shared conceptualization of the domain, and in some cases even the domain itself.

52 n vlaino h nooy oeo hmcnb ietymaue,adms of most and measured, creation directly the be guide can achieved. to them perfectly goals of be desiderata, None cannot as ontology. them regarded the of be evaluation to and (G´omez-P´erez, need Gangemi ontologies criteria 2004; quality good 1995 ; Fox, for These Gr¨uninger principles and or 1995; criteria criteria. Gruber, these quality meets ontology then ontology of and the evaluation set well given criteria how the the assess for of relevant to criteria some methods the evaluation – choose proper criteria to the an these evaluator of the all goal of for succinct as task The well such and equally ontologies. contradicting, coherent Web even perform a for are to form relevance not to and section is them applicability this evaluation their aggregate In discuss literature, criteria. and from different set, criteria several of list number will a we regard can evaluation Ontology Criteria 3.6 rbr 1995): (Gruber, criteria following the defines Gruber Thomas snio ´mzPee it h olwn rtra(G´omez-P´erez, criteria Asunci´on G´omez-P´erez following 2004): the lists own their defined each where literature, from papers important five selected We • • • • • • • hudb oeetwt h omlstatements. formal documentation the language with natural coherent the be Also, should consistent. logically be a should axioms over preferred language is natural definition Coherence with a documented possible, be stated Where should be entities can be. All definition should a description. it When axioms, objective. logical be in should Definitions terms. defined Clarity ontology. the from Conciseness inferred be can or stated explicitly either match) Completeness descriptions formal the descriptions and informal the comments and the formal the (i.e. between consistency the and inferred) be Consistency h ontology. the Sensitiveness semantics. stated already the altering Expandability axioms. notlg hudeetvl omnct h neddmaigof meaning intended the communicate effectively should ontology An : nerdsaeet hudb orc.A h es,tedefining the least, the At correct. be should statements Inferred : fteotlg sfe fayuncsay sls,o redundant or useless, unnecessary, any of free is ontology the if : atrn ohtelgclcnitny(..n otaitoscan contradictions no (i.e. consistency logical the both capturing : eae ohwsalcagsi naimatrtesmnisof semantics the alter axiom an in changes small how to relates : l h nweg hti xetdt ei h nooyis ontology the in be to expected is that knowledge the All : eest h eurdeott d e entoswithout definitions new add to effort required the to refers : conciseness and completeness tal. et Obrst 2005; , ti hrfr h first the therefore is It . tal. et . Criteria 3.6 2007). , 53

3 Chapter 3 Framework

• Extendibility: An ontology should offer a conceptual foundation for a range of anticipated tasks, and the representation should be crafted so that one can extend and specialize the ontology monotonically. New terms can be introduced without the need to revise existing axioms.

• Minimal encoding bias: An encoding bias results when representation choices are made purely for the convenience of notation or implementation. Encod- ing bias should be minimized, because knowledge-sharing agents may be imple- mented with different libraries and representation styles.

• Minimal ontological commitment: The ontology should specify the weakest theory (i.e. allowing the most models) and defining only those terms that are essential to the communication of knowledge consistent with that theory.

Gr¨uningerand Fox define a single criteria, competency (or, in extension, com- pleteness if all the required competencies are fulfilled). In order to measure compe- tency they introduce informal and formal competency questions (Gr¨uningerand Fox, 1995). Obrst et al. name the following criteria (Obrst et al., 2007):

• coverage of a particular domain, and the richness, complexity, and granularity of that coverage

• intelligibility to human users and curators

• validity and soundness

• evaluation against the specific use cases, scenarios, requirements, applications, and data sources the ontology was developed to address

• consistency

• completeness

• the sort of inferences for which they can be used

• adaptability and reusability for wider purposes

• mappability to upper level or other ontologies

Gangemi et al. define the following criteria (Gangemi et al., 2005):

• Cognitive ergonomics: this principle prospects an ontology that can be easily understood, manipulated, and exploited.

54 aearaybe ae hstemntnct fOLgaate eti kind certain that a inferences guarantees any OWL retract of to monotonicity possible the not Thus is made. it been language already semantics. monotonic stated have a already the is altering without OWL definitions Since expressivity, new add Therefore as to effort such required ontologies. the language, of ontology majority the vast on the based that etc. are for complexity, specify decidability, that used to criteria is order disregard OWL in we Today, developed were For ontology. languages used. one actively ontology were became specific languages OWL ontologies, representation Before knowledge some itself. of ontology plethora the a evaluating widespread, of instead described ontology criteria the given describing the to map they how and criteria the above. define we following, the fitness survey: nizational literature this from completeness result criteria Eight n xml stecriteria the is example One for used language underlying the with deal that criteria evaluation ignored have We set. concise a into them summarized and criteria given the analyzed taken have We • • • • • • • • elydwti nognzto,adta a odcvrg o htcontext. that for coverage good a has that and organization, an within application. deployed effective for accessed fitness easily Organizational be can that ontology an prospects accessibility manipu- and Generic understood easily adaptation. be and can reuse that for ontology lated etc. an adaptation, prospects integration, principle extension, this for procedures to Compliance users. more or one to pliant expertise to Compliance views. multiple to adapted easily be can indicators. Flexibility quality as assumed are that criteria ordering engine, (inference integrity reasoner Meta-level a by processed successfully/easily etc.). be classifier, can that ogy efficiency and integrity Computational conceptual of formalization rich a motivations. with detail, and in choices analyzed be can that ontology an Transparency , opttoa efficiency computational cnetbuddes:ti rnil rset notlg that ontology an prospects principle this (context-boundedness): l rtragvni h ieaueaesbue yteti e.In set. this the by subsumed are literature the in given criteria All . epiins fognzn rnils:ti rnil prospects principle this principles): organizing of (explicitness hspicpepopcsa nooyta epcscertain respects that ontology an prospects principle this : expandability cmuainla ela omril:ti principle this commercial): as well as (computational hspicpepopcsa nooyta a eeasily be can that ontology an prospects principle this : hspicpepopcsa nooyta scom- is that ontology an prospects principle this : .I sdfie as defined (G´omez-P´erez, is from It 2004). , conciseness accuracy hspicpepopcsa ontol- an prospects principle this : , , consistency adaptability and , . Criteria 3.6 , clarity orga- 55 : ,

3 Chapter 3 Framework of expandability for all ontologies in OWL. A complete list of methods given in this thesis is available in the appendix.

3.6.1 Accuracy Accuracy is a criteria that states if the axioms of the ontology comply to the knowl- edge of the stakeholders about the domain. A higher accuracy comes from correct definitions and descriptions of classes, properties, and individuals. Correctness in this case may mean compliance to defined “gold standards”, be it other data sources, con- ceptualizations, or even reality ((Ceusters and Smith, 2006) introduces an approach to use reality as a benchmark, i.e. if the terms of the ontology capture the intended portions of reality). The axioms should constrain the possible interpretations of an ontology so that the resulting models are compatible with the conceptualizations of the users. For example, all inferences of an ontology should be true. When stating that the foaf:knows property is a superproperty of a married property, then this axiom would only be accurate if indeed all married couples know their respective spouses. If we find counterexamples (for example, arranged prenatal marriages), then the ontology is inaccurate. The following methods in this thesis can be used to measure this criteria: Method3: Hash vs slash (Page 71), Method 13: Querying for anti-patterns (Page 114), Method 14: Analysis and Examples (Page 125), Method 18: Class / relation ratio (Page 146), Method 19: Formalized competency questions (Page 154), Method 20: Formalized competency questions (Page 155), Method 21: Affirming derived knowledge (Page 157), Method 22: Expressive consistency checks (Page 160), and Method 23: Consistency checking with rules (Page 161).

3.6.2 Adaptability Adaptability measures how far the ontology anticipates its uses. An ontology should offer the conceptual foundation for a range of anticipated tasks (ideally, on the Web, it should also offer the foundation for tasks not anticipated before). It should be possible to extend and specialize the ontology monotonically, i.e. without the need to remove axioms (note that in OWL, semantic monotonicity is given by syntactic monotonicity, i.e. in order to retract inferences explicit stated axioms need to be retracted). An ontology should react predictably and intuitively to small changes in the axioms. It should allow for methodologies for extension, integration, and adaptation, i.e. include required meta-data. New tools and unexpected situations should be able to use the ontology. For example, many terms of the FOAF ontology (Brickley and Miller, 2005) are often used to describe contact details of persons. FOAF was originally designed to

56 apn oteroncnetaiain.Ised h Rscudarayinclude the already suitable could of a URIs users find as the case, to such mean, order Instead, this they in In what elements conceptualizations. to the own hints of labels). their context the whole to omit the mapping regard even to may need (and ontology elements the minimized. their of be identify convenience should the bias encoding for documented the made be be i.e. should be should implementation, axioms not Complex or Entities should notation languages. choices necessary Representation all classes. in for documented. should labeled fully ontology descriptions be An context. and of the unambiguous. sufficiently of instead and independent understandable definitions and be use objective should be elements should of Definitions Names terms. defined the of Clarity Clarity 3.6.3 with checking Consistency 23: competency 161). Method Formalized (Page 22: and rules Method ), 160 157 ), 19: (Page (Page checks knowledge Method derived consistency 17: Affirming 145 ), Expressive Method 21: (Page 140), Method taxonomy154), (Page (Page Stability the questions 15: of 13 : Method depth Method 82), 114 ), Maximum (Page (Page nodes Blank anti-patterns for 10 : Method Querying 75 ), (Page punning and of declarations URI books address formalize to allows also vocabulary kinds. its all but networks, social describe t.Cmltns locvr h rnlrt n iheso h ontology. the of richness present?), and needed granularity is the that covers data all also (is Completeness to complete- requirements captured?), regards applications differentetc. everything concepts the with relevant are to (is completeness all regards There are language?), with language present, ness given individuals the the answered. all (are to using be domain stated regards the can be with answer could completeness that to stated able completeness: be of should aspects ontology the tions Completeness 14: 146 ). (Page Method Completeness ratio 81), relation 3.6.4 (Page / Class comments : 18 and Method declarations Labels and 8: URI 125 ), Method 9: (Page 78), 6: Examples slash (Page Method and datatypes vs Method 79 ), Analysis and Hash 73), literals (Page (PageTyped : 3 tags URIs 7: Method Language Method of 69), 75), (PageOpaqueness (Page punning data 4: and Linked Method 2: 71), Method (Page 67), (Page data Linked o xml,a nooyt ecietentoaiiso l ebr fagroup a of members all of nationalities the describe to ontology an example, For : 1 Method criteria: this measure to used be can thesis this in methods following The as such URIs use to choose may ontology an example, For 6: Method criteria: this measure to used be can thesis this in methods following The esrshweetvl h nooycmuiae h neddmeaning intended the communicates ontology the effectively how measures esrsi h oano neeti prpitl oee.Alques- All covered. appropriately is interest of domain the if measures ex:Jaguar or ex:Lion . ex:a734 or . Criteria 3.6 ex:735 57 to

3 Chapter 3 Framework should provide the list of all relevant countries. Such closed sets in particular (like countries, states in countries, members of a group) can often be provided as an external ontology by an authority to link to, and thus promise completeness. The following methods in this thesis can be used to measure this criteria: Method3: Hash vs slash (Page 71), Method6: URI declarations and punning (Page 75), Method7: Typed literals and datatypes (Page 78), Method9: Labels and comments (Page 81), Method 10: Blank nodes (Page 82), Method 11: XML validation (Page 86), Method 12: Structural metrics in practice (Page 101), Method 15: Stability (Page 140), Method 16: Language completeness (Page 141), Method 17: Maximum depth of the taxonomy (Page 145), and Method 19: Formalized competency questions (Page 154).

3.6.5 Computational efficiency Computational efficiency measures the ability of the used tools to work with the ontology, in particular the speed that reasoners need to fulfill the required tasks, be it query answering, classification, or consistency checking. Some types of axioms may cause problems for certain reasoners. The size of the ontology also affects the efficiency of the ontology. For example, using certain types of axioms will increase the reasoning complexity. But more important than theoretical complexity is the actual efficiency of the imple- mentation used in a certain context. For example, it is known that number restriction may severely hamper the efficiency of the KAON2 reasoner (Motik, 2006), and should thus be avoided when that system is used. The following methods in this thesis can be used to measure this criteria: Method6: URI declarations and punning (Page 75), Method7: Typed literals and datatypes (Page 78), Method 10: Blank nodes (Page 82), Method 12: Structural metrics in practice (Page 101), and Method 16: Language completeness (Page 141).

3.6.6 Conciseness Conciseness is the criteria that states if the ontology includes irrelevant elements with regards to the domain to be covered (i.e. an ontology about books including ax- ioms about African lions) or redundant representations of the semantics. An ontology should impose a minimal ontological commitment, i.e. specify the weakest theory pos- sible. Only essential terms should be defined. The ontology’s underlying assumptions about the wider domain (especially about reality) should be as weak as possible in order to allow the reuse within and communication between stakeholders that commit to different theories. For example, an ontology about human resource department organization may take a na¨ıve view on what a human actually is. It is not required to state if a human has a soul or not, if humans are the result of evolution or created directly by God, when

58 ClassAssertion(ex:Car being and inconsistency (Parsia to example lead for debugging, that ontology see errors of repairing incoherence, area and the explaining, in discovering, community covers research that active an is There superficially. and (Guarino to taxonomy has the ontology on the constraints that OntoClean defined the 2002). Welty, be as the such can in with, descriptions principles aligned consistent informal ordering be Logical be should Further and comments formal all. and axioms. the documentation the at the also with interpreted external but i.e. consistent, be it, an be of can with should part ontology itself ontology one ontology the just the is of consistency that compliance states the consistency states source, accuracy Whereas dictions. Consistency Consistency Stability 3.6.7 questions 18: competency 15: Method Formalized 145), Method 20 : (Page Method 82 ), taxonomy and (Page the 146), 155). (Page of nodes (Page ratio depth Blank relation Maximum / 10: 17: Class Method Method 140 ), 74), (Page (Page statements data. make reuse that and to exchange URI issues, order later these in and all it whom on hired use has silent to department remain evolutionists which would and about ontology creationists The both allows ends. thus or starts life human a edpoe ihna raiain ol,lbais aasucs n other and sources, data libraries, Tools, organization. an within deployed be can fitness Organizational fitness Organizational 3.6.8 rules with checking Consistency : 23 Expres- Method 22: and 161). completeness Method 160), (Page 114), 157), Language (Page (Page (Page checks knowledge16: anti-patterns derived consistency Method Affirming sive for 125 ), Structural 21: Querying (Page Method : 12 URI Examples 141), 13: Method and (Page 5: Method 81), Analysis Method (Page 101), 14: 73 ), comments (Page(Page Method and URIs practice Labels of in 9: Opaqueness metrics Method 4: ), 74 Method (Page 71), reuse (Page slash vs Hash manage- change implemented maintenance. badly ontology a in or procedures engineering ment ontology distributed of result the oeta ihnti hssw ilda ihlgclcnitnyadchrneonly coherence and consistency logical with deal will we thesis this within that Note 5: Method criteria: this measure to used be can thesis this in methods following The 3: Method criteria: this measure to used be can thesis this in methods following The element the of description the is inconsistency non-logical a for example An TeJga safrlctlvn ntejungle.” the in living cat feral a is Jaguar “The ecie htteotlg osnticueo lo o n contra- any for allow or include not does ontology the that describes grgtssvrlciei htdcd o aiya ontology an easily how decide that criteria several aggregates auatrrex:Jaguar) manufacturer tal. et HaeadQ,2007 ). Qi, and Haase 2007; Lam, 2005; , uhdsrpnisaeoften are discrepancies Such . u aigalgclaxiom logical a having but , . Criteria 3.6 ex:Jaguar 59

3 Chapter 3 Framework ontologies that are used constrain the ontology, and the ontology should fulfill these constraints. Ontologies are often specified using an ontology engineering methodology or by using specific data sets. The ontology metadata could describe the applied methodologies, tools, and data sources, and the organization. Such metadata can be used by the organization to decide if an ontology should be applied or not. For example, an organization may decide that all ontologies used have to align to the DOLCE upper level ontology (Gangemi et al., 2002). This will help the organization to align the ontologies and thus reduce costs when integrating data from different sources. The following methods in this thesis can be used to measure this criteria: Method1: Linked data (Page 67), Method2: Linked data (Page 69), Method3: Hash vs slash (Page 71), Method4: Opaqueness of URIs (Page 73), Method5: URI reuse (Page 74), Method8: Language tags (Page 79), Method9: Labels and comments (Page 81), Method 11: XML validation (Page 86), and Method 19: Formalized competency ques- tions (Page 154).

3.7 Methods

Evaluation methods either describe procedures or specify exactly the results of such procedures in order to gain information about an ontology, i.e. an ontology description. An evaluation method assesses specific features or qualities of an ontology or makes them explicit. The procedures and result specifications given in PartII are not meant to be implemented literally. Often such a literal or na¨ıve implementation would lead to an unacceptably slow runtime, especially with mid-sized or big ontologies. Many of the methods in this thesis remain for now without an efficient implementation. The relationship between criteria and methods is complex: criteria provide justifi- cations for the methods, whereas the result of a method will provide an indicator for how well one or more criteria are met. Most methods provide indicators for more than one criteria, therefore criteria are a bad choice to structure evaluation methods. A number of the methods define measures and metrics and also offers some upper or lower bounds for these metrics. Note that these bounds are not meant to be strict, stating that any ontology not within the bounds is bad. There are often perfectly valid reasons for not meeting those limits. These numbers should also not lead to the implementation of automatic fixes in order to implement changes to an ontology that make the ontology abide to the given limits, but nevertheless decrease the ontology’s overall quality. The numbers given in the method descriptions are chosen based on evaluating a few thousand ontologies in order to discover viable margins for these values. In the case a certain measure or metric goes well beyond the proposed value but the ontology author or evaluator has good reasons, they should feel free to ignore that metric or measure or, better, explain in a rationale why it is not applicable in the

60 hudb vlae nodrt euti h etpsil hiefrtoefie aspects. tools fixed these those for turn, choice cases In possible such best insights. In the useful in aspects. result to the certain to lead of order on not in design choice evaluated do the of be aspect should degree during this choice). a made only for offer the been methods is not have evaluation it do that since tools choices evaluation Some no some be ontology. describes can there aspect freedom, each of So degree no show is we there (if aspect each different For chapters. following the evaluation. the of ontology within evaluations qualitative order methods in evaluation aggregated, the results we evaluation an on different achieve task- section the Based integrate and to this then domain- can ontology. automatic, In evaluators aspects, the an ontology to of resource. amenable verification are information that independent multi-layered aspects complex, different identify a will is ontology An Aspects . II 3.8 Part aspects for different structure introduces the section provide following aspects the These ontology. methods, an the chosen of of be description should the and for evaluation. flexible to given is allows the methods of here of needs described introduced set the be framework The on will The based changes. methods values further such deprecated. that better for become expected find accommodate may also will ones is we It existing ontologies, and bounds. these sharing of and many engineering for with experience more case. given ahapc fa nooyta a eeautdms ersn ereo freedom of degree a represent must evaluated be can that ontology an of aspect Each structure some give to order In methods. evaluation describes thesis this of II Part gather we as that expected be to is It infancy. its in still is Web Semantic The • • • • nt e fpsil oes h eatc fa nooyaetecommon the are ontology an of semantics The models. possible of set describing finite even highly this by vary Semantics evaluated can is structure graph given The 6). explicitly (Chapter The aspect graph. ontology. same this the is semantically ontology an of widely differ Structure can syntax certain a within 5). description (Chapter syntactic the Often taxes. 4 ). datatype regards Syntax (Chapter a with literals choices with different or value the URIs a with used deals i.e. the aspect literals, to This or identifier. references language a URI or be can Names ontology. Vocabulary e nooiscnb ecie nanme fdffrn ufc syn- surface different of number a in described be can ontologies Web . e nooycnb ecie ya D rp.Testructure The graph. RDF an by described be can ontology Web A . ossetotlg sitrrtdb o-mt,uulyin- usually non-empty, a by interpreted is ontology consistent A . h oauayo notlg stesto l ae nthat in names all of set the is ontology an of vocabulary The . . Aspects 3.8 61

3 Chapter 3 Framework

characteristics of all these models. This aspect is about the formal meaning of the ontology (Chapter7).

• Representation. This aspect captures the relation between the structure and the semantics. Representational aspects are usually evaluated by comparing metrics calculated on the RDF graph with features of the possible models as specified by the ontology (Chapter8).

• Context. This aspect is about the features of the ontology when compared with other artifacts in its environment, which may be, e.g. a data source that the ontology describes, a different representation of the data within the ontology, or formalized requirements for the ontology in form of competency questions or additional semantic constraints (Chapter9).

Note that in this thesis we assume that logical consistency or coherence of the ontology is given, i.e. that any inconsistencies or incoherences have been previously resolved using other methods. There is a wide field of work discussing these logical properties, and also well-developed and active research in debugging inconsistency and incoherence, e.g. (Parsia et al., 2005; Lam, 2007; Haase and Qi, 2007). Ontologies are inconsistent if they do not allow any model to fulfill the axioms of the ontology. Incoherent ontologies have classes with a necessarily empty intension (Haase and Qi, 2007). Regarding the evaluation aspects, note that the vocabulary, syntax, and structure of the ontology can be evaluated even when dealing with an inconsistent ontology. This also holds true for some parts of the context. But semantic aspects – and thus also representational and some contextual aspects – can not be evaluated if the ontology does not have any formal models.

62 Part II

Aspects

4 Vocabulary 65

5 Syntax 83

6 Structure 99

7 Semantics 127

8 Representation 143

9 Context 151

otnmsi nooisaeUIrfrne UiomRsuc dnie,(Berners- Identifier, Resource (Uniform references URI are Lee ontologies in names Most references URI 4.1 discuss also will we directly Finally, be can 4.2 . literals URIs entity, Section external in an literals). presented nodes identify the to are URI the Literals without a called vocabulary using interpreted. is of the ontology instead of an i.e. 4.1. subset value, of Section the URIs in thus all discussed of is are set (and The ontology the of literals. corpus or large URIs a either on based an values of the vocabulary of the some evaluating for for ontologies. in Web comparison methods used a discuss names provide we the and evaluate chapter ontology, to this means In ontology ontology. an the of aspect vocabulary the Evaluating Vocabulary 4 Chapter The tal. et 4.3 ). (Section ontologies within entities unnamed i.e. , vocabulary ) Rsaemr eei om fUL UiomRsuc Locator, Resource (Uniform URLs of forms generic more are URIs 2005)). , fa nooyi h e fall of set the is ontology an of Literals r ae htaempe oacnrt data concrete a to mapped are that names are names ol ml ssweet as smell would name other rose any a By call we which that name? a in name! What’s other some be O, sdi t ae a be can Names it. in used WlimShakespeare, (William hkser,1597)) (Shakespeare, oe n Juliet and Romeo signature 1564–1616, blank 65 of

4 Chapter 4 Vocabulary

(Berners-Lee et al., 1994)). Unlike URLs, URI references are not limited to identifying entities that have network locations, or use other access mechanisms available to the computer. They can be used to identify anything, from a person over an abstract idea to a simple information resource on the Web (Jacobs and Walsh, 2004). An URI reference should identify one specific resource, i.e. the same URI reference should not be used to identify several distinct resources. A URI reference may be used to identify a collection of resources, and this is not a contradiction to the previous sentence: in this case the identified resource is the collection of resources, and thus a resource of its own. Classes and properties in OWL ontologies are also resources, and thus are identified by a URI reference. A particular type of resources are information resources. Information resources are resources that consist of information, i.e. the digital representation of the resource captures the resource completely. This means that an information resource can be copied without loss, and it can be downloaded via the Internet. Therefore informa- tion resources can be located and retrieved with the use of a URL. An example of an information resource is the text of Shakespeare’s “Romeo & Juliet” (from which this chapter’s introductory quote is taken) which may be referenced, resolved, and downloaded via its URL http://www.gutenberg.org/dirs/etext97/1ws1610.txt Non-information resources can not be downloaded via the Internet. There may be metadata about non-information resources available, describing the resource. An example is the book “Romeo & Juliet”: the book itself can not be downloaded via the Internet (in contrast to its content). There may be metadata stating e.g. the weight or the prize of the book. In order to state such metadata we need to be able to identify the book, e.g. using its ISBN number (ISO 2108, 2005). In this case we can not use an URL: since the resource is not an information resource, it can not be located on the Web, and thus can not be accessed with an URL. Nevertheless it may (and should) have an URI in order to identify the resource. Non-information resources and information resources are disjoint classes (i.e. no in- formation resource can at the same time be a non-information resource and vice versa). A further formalization of information resources can be found e.g. in the “Functional Requirements for Bibliographic Records” ontology FRBR (Tillett, 2005), widely used in the bibliographic domain, or in the DOLCE-based “Ontology of Information Objects” (Guarino, 2006).

4.1.1 Linked data URI references are strings that start with a protocol. If the protocol is known and implemented by the ontology based application, then the application may resolve the URI, i.e. use the URI according to the protocol in order to find more information about the identified resource. In case the URI is an URL, the identified information resource can be accessed and downloaded by using the protocol.

66 hi noois o R cee.Teeaeerr nteotlg htcnb easily prefixes and be namespace can times), mis-interpreted that for ontology examples the Further in include 1. errors are Method These using a discovered schemes. up URI makes not schemes ontologies, this non-HTTP their 8 prominent that or most discovered times, two We (42,331 the protocols: scheme. Non-HTTP will were protocol the prefix of the namespace deal The as good 2004 ). interpreted Walsh, be and (Jacobs then as serialization (such XML in mistyped especially just are protocols these Sometimes hhpt ids), connections). user HTTP and secure channels chat protocol), relay transfer internet for times; mailto eore ..teUIi nomto eoreta a adi)acsil vrthe over served accessible the is) of (and name 200, can a the that is resource actually code information is response a URI the is on the If URI and response then the URI response. 303 requested the implications: i.e. a of the stronger resource, field e.g. of even location equality 4.1 , has the Table the it in in implies returned identifier given URI fragment are the a facts about without These facts URI imply an responses reference. HTTP URI Certain used reference. URI the the to related content some URIs. 108,085,656 contains corpus Watson The Web. the (0 on 491,710 Most Only protocols of reference. usage URI the the (Fielding given by HTTP the Protocol resolved Transport al. HyperText about be the et information using can by further that achieved is fetch protocol this commonly a to using order indeed in are machine ontologies Web in references ocos e falwdpooosfrteeauto ak h sg fany of have usage ontologies The the in protocols. URIs task. allowed All evaluation the the of explained. has one be for evaluator use should The protocols to HTTP allowed URIs. than well-formed of other be set protocol to a checked are choose ontology to the in URIs protocols) All used (Check 1 Method a itknyb nepee sa URI, an as interpreted be mistakenly can 5.2) Section (see QNames Sometimes, eovn nHT R eeec ilrtr nHT epnecd n usually and code response HTTP an return will reference URI HTTP an Resolving are HTTP besides protocols prominent Other URI Most identifier. fragment optional an with URI an of consist references URI ). ofiueout figure to 11.3) Section (see corpus Watson the examined have We 1999). , UnitOfAssessment 2,7 ie;frealadresses), email for times; (22,971 xs 1,2 ie;ue o M ceadatatypes), schema XML for used times; (19,927 owl . 8 times). (84 5)o hmaeUI o sn h TPprotocol. HTTP the using not URIs are them of 45%) . tel %.Bt fte r en orpeetnmsaepexsin prefixes namespace represent to meant are them of Both 6%). 73tms o eehn ubr) or numbers), telephone for times, (703 4,9 ie,o 9 or times, (46,691 . mid %o l o-TPUI)and URIs) Non-HTTP all of 5% 1,4 ie o emails), for time; (13,448 file 3,2 ie;frlclfiles), local for times; (37,023 ftp 176tms o h file the for times, (1,716 rdf https 22times), (272 . R references URI 4.1 16tms for times; (186 irc Institute rdfs (3,260 67 (7

4 Chapter 4 Vocabulary

Table 4.1: An overview of what different response codes imply for the resolved HTTP URI reference U. I is the information resource that is returned, if any. L is the URI given in the location field of the response. The table covers the most important responses only, the others do not imply any further facts. Response code U has a fragment identifier U has no fragment identifier 200 OK I should describe U U is the name of I. I is an information resource. 301 Moved L should describe U L and U are names of I. Permanently I is an information resource. 303 See Other L should describe U L should describe U Any other Nothing implied for I Nothing implied for I

Web (Sauermann and Cyganiak, 2008). Section 4.1.2 gives more details on fragment identifiers and how they can be used for evaluation. Both OWL classes and OWL properties are not information resources. With Ta- ble 4.1 this allows us to infer that in OWL DL both class and property names (without fragment identifiers) should not return a 200 or a 301 when resolved via the HTTP protocol. In OWL 2 this is not true anymore, since punning allows URIs to be indi- viduals, properties, and classes at the same time (Motik, 2007). But as we will discuss later, punning should be avoided (see Section 4.1.5). The table also lists the cases when the served resource should describe the name. We can easily check if this is the case, if the description is sufficiently useful, and if it is consistent with the knowledge we already have about the resource. We have tested all HTTP URIs from the Watson EA corpus (see Section 11.3). For the slash namespaces URIs, we got 14,856 responses. Figure 4.1 shows the distribution of the response codes. It can be easily seen that for the vast majority (about 75%) of URIs we get 200 OK, which means that the request returns a document. We conclude that the current Semantic Web is indeed a Web of metadata describing the connections between resources, i.e. documents, on the Web. 85% of the tested URIs return a 200 or 3XX response.1 We got 509 responses on the URIs with hash namespaces, of which significantly more (about 85%) responded with a 200 OK. This shows that in general hash URIs are better suited for terminological entities, and there should be good reasons for using a slash namespace.

Significance of the difference between hash and slash URIs. For significance testing we apply the two-proportion z-test. Let the null hypothesis be Ps = Ph, i.e. the proba- bility for a slash URI to return a 200 OK being the same as the probability for a hash

1A 3XX response means an HTTP response in the range 300-307, a group of responses meaning redirection (Fielding et al., 1999)

68 nooy hra h eodsto em osntbln oteFA ontology. FOAF the to FOAF belong the not in does defined terms indeed of are that set terms second all the are whereas set ontology, first the that uncovers difference is true is esis z in nldn hi oilcnat,itrss t.Sm ftetrsrtr a return terms the For of Found Some terms. Not used etc. the interests, with contacts, Other problems social their some including indicate website codes social the response example, Differing code. set we corpus, EA Watson the on tests our to According URI. the from URIs HTTP the on codes response HTTP the of Distribution 4.1: Figure p h oldsml rprinˆ proportion sample pooled the resp. 2 h = http://www.livejournal.com hudrtr h aersos oe,ohrieti niae nerror. an indicates this otherwise codes, response same the return should be a make should URIs, HTTP codes) all For response (Check 2 Method ae rmtesm ls aepc hudawy euntesm response same the return always should namespace slash same the from Names e = 200 OK 75% p = s n − t e h (e.g. h p p /t h 3 oiiesmls ..UI returning URIs i.e. samples, positive 437 = h p ˆ − ≈ (1 asnE ops h ethn iesostesahUI,teright the URIs, slash the shows side hand URIs. hash left side The hand corpus. EA Watson (e.g. ≈ 0 OK 200 foaf:nick < p − 0 5 . . p 55(ihsml sizes sample (with 8585 ˆ 36 rmta tflosta h rbblt httenl hypoth- null the that probability the that follows it that From 5316. foaf:tagLine )(1 0 . 01 hc en h euti ttsial ihysignificant. highly statistically is result the means which 0001, 404 NotFound /n or 14% s 1 + , 0 e Other See 303 502 BadGateway 301 MovedPermanently foaf:knows 302 MovedTemporarily 405 MethodNotAllowed /n LiveJournal 0% 303 SeeOther 2% h 4% 403 Forbidden 4% ) 1% p , HEAD foaf:member_name 0% 500 InternalServerError ≈ = 410 Gone 401 Unauthorized 0% 406 NotAcceptable 400 BadRequest 0 p 0% . al(or call 0% s 14 h etsaitci -cr endas defined z-score a is statistic test The 0194. , n n 0% 0% s s 2 foaf:Person + ae ihtesm ls namespace slash same the with Names . + n xot OFpolsaottermembers, their about profiles FOAF exports p n s h h n 14 = h GET ≈ , 5 resp. 856 200 OK al nte.Tersos code response The them. on call) 0 85% . 58 h tnaderrresults error standard The 7548. , ,weesohr euna return others whereas ), foaf:image 0 OK 200 n h p s 0 and 509 = 404 NotFound oe) ecalculate We codes). = .Ivsiaigthis Investigating ). 11% n . R references URI 4.1 s /t s ≈ t s 0 301 MovedPermanently 403 Forbidden 302 MovedPermanently 400 BadRequest 304 NotModified 303 SeeOther . 11 = 53and 7513 0 See 303 1% 0% 0% 1% 1% 1% , 404 161 69

4 Chapter 4 Vocabulary

Note that this method can only be applied after the ontology has been published and is thus available on the Web.

In summary, the usage of resolvable URI references allows us to use the Web to look up more information on a given URI. This can help to discover available mappings, or to explore and add new information to a knowledge based system. This is the major advantage of using the Semantic Web instead of simply an ontology based application.

4.1.2 Hash vs slash There was a long running debate in the Semantic Web community on the usage of fragment identifiers in URI references. The basic question is on the difference between using http://example.org/ontology#joe and http://example.org/ontology/joe in order to refer to a non-information resource. The former type of URI is called a hash URI (since the local part is separated by the hash character # from the namespace), the latter type a slash URI (since the local part is separated by the slash character / from the namespace). The discussion was resolved by the W3C Technical Advisory Group (Lewis, 2007; Jacobs and Walsh, 2004; Berrueta and Phipps, 2008). When resolving a hash URI, only the namespace is resolved. All hash URIs with the same namespace thus resolve to the same resource. This has the advantage that the ontology can be downloaded in one pass, but it also has the disadvantage that the file can become very big. Therefore, terminological ontologies and ontologies with a closed, rarely changing, and rather small set of individuals (e.g. a list of all countries) would use hash URIs, whereas open domains with often changing individuals often use slash URIs (see for example in Semantic MediaWiki, Section 10.2.3). We analyzed the Watson corpus to see if there is a prevalence towards one or the other on the Web. We found 107,533,230 HTTP URIs that parse. 50,366,325 were hash URIs, 57,166,905 were slash URIs. Discounting repetitions, there were 5,815,504 different URIs in all, 2,247,706 of them hash URIs, 3,567,789 slash URIs. Regarding their distribution over namespaces, there are much bigger differences: we find that there are only 46,304 hash namespaces compared to 2,320,855 slash names- paces. The hash namespaces are, in average, much bigger than the slash namespaces. 2,197,267 slash namespaces (94.67%) contain only a single name, whereas only 16,990 hash namespaces (36.69%) contain only one name. On the other extreme, only 253 slash namespaces (0.01%) contain more than a 100 names, in contrast to 2,361 hash namespaces (5.10%) that have more than 100 names. Still, as we can see in Table 4.2, the namespaces with the biggest names are slash namespaces, being several times as big as the biggest hash namespace. What does that mean?

70 rnlto fWrNt(W) oeo hmaeitrcieybito h e by Web the editors. on by centrally built curated interactively rather are represent but them community, Gene they of OWL open (FMA, an None but an and (OWN). sciences (RAE), ontologies, WordNet life assessment of education huge the for translation for see vocabulary a social vocabularies also thesaurus), providing to NCI we ontology, domains, belong namespaces, stable namespaces more hash five biggest much five of the the four at at that and Looking Looking see add we that definitions. ( vocabularies, 4.2 , their networks Table dynamic change very in and with in namespaces time well reside slash the deal namespaces all of not hash usage names can since other remove they ontologies: The file, built namespaces. single collectively different one for many is introduce namespaces thus and slash Web, the over slash all 3,142 only This names: 9 them. few about or a metadata only provide with namespaces (0 to slash namespaces and of Web number the huge on the resources explains identify to used are Tefiehs n ls aepcswt h igs ubro names. of number biggest the with namespaces slash and hash five The 4.2: Table eorei notlg,ceki h nooydsrbstenm.I so, If name. the describes ontology returned the the name If conformant if any. check data if linked ontology, a redirects, Resolve an make correctly. is namespace set resource slash be should a type has content The that name every For o vr aeta a ahnmsaemk a make namespace hash a has that name names) every For up (Look 3 Method ls aepcsaeue ntovr ieetwy notlge.Frt they First, ontologies. in ways different very two in used are namespaces Slash . 8) nti ae hnpoiigmtdt,tenmsaeotndistributed often are names the metadata, providing when case, this In 88%). http://www.hi5.com/profile/ http://www.aktors.org/scripts/ http://www.deadjournal.com/ http://www.ecademy.com/ http://www.livejournal.com/ namespace Slash http://www.geneontology.org/owl/# http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl# http://www.hero.ac.uk/rae/# http://www.loa-cnr.it/ontologies/OWN/OWN.owl# http://purl.org/obo/owl/FMA# namespace Hash aktors.org . 4)hv oeta 0nms(oprdt ,7 ahnamespaces, hash 4,575 to (compared names 10 than more have 14%) steWbieo KbsdSmni e eerhproject). research Web Semantic UK-based a of Website the is fnt h aemyb wrong. be may name the not, If . GET GET alaanttenamespace. the against call alaanttename. the against call . R references URI 4.1 names # names # 104,987 454,376 32,652 41,312 79,093 29,370 58,077 64,799 65,975 75,140 N sa is 71

4 Chapter 4 Vocabulary

4.1.3 Opaqueness of URIs

URIs should be named in an intuitive way. Even though the URI standard (Berners- Lee et al., 2005) states that URIs should be treated opaque and no formal meaning should be interpreted into them besides their usage with their appropriate protocols, it is obvious that a URI such as http://www.aifb.kit.edu/id/Rudi_Studer will invoke a certain denotation in the human reader: the user will assume that this is the URI for Rudi Studer at the AIFB, and it would be quite surprising if it were the URI for the movie Casablanca. On the other hand, an URI such as http://www.aifb.kit.edu/id/p67 does not have an intuitive denotation, and so they become hard to debug, write manually, and remember. Their advantage is that they do not depend on a specific natural language. An unfortunate number of tools still displays the URI when providing a user interface to the ontology. Therefore intuitive URIs, that actually identify the entities to which their names allude to (for the human reader) and that have readable URIs, should be strongly preferred. Also, since URIs unlike labels should not change often (Sauermann and Cyganiak, 2008), it is important to catch typos when writing URIs. URIs should follow a naming convention. When using natural language based names, the local name of classes may use plural (Cities) or singular forms (City), the lo- cal name of properties may use verbs (marries), verbal phrases (married_to), nouns (spouse), or nominal phrases (spouse_of or has_spouse). All of these naming con- ventions have certain advantages and disadvantages: phrases make the direction of the property explicit and thus reduce possible sources of confusion (given the triple Aristotle Teacher Plato, is it immediately clear who the teacher is, and who the student?). But using simple nouns can help with form based interfaces such as Tabu- lator (Berners-Lee et al., 2006a), a Semantic Web browser. Tabulator also constructs a name for the inverse property by appending “ of ” to the word, e.g. the inverse of Teacher would be Teacher of. Capitalization and the writing of multi-word names should also be consistent. The ontology authors should decide which names to capitalize (often, classes and individ- uals are capitalized, whereas properties are not). Multi-word names (i.e. names that consist of several words, like Wide square) need to escape the whitespace between the words (since whitespaces are not allowed in URIs). Often this is done by camel casing (i.e. the space is removed and every word after a removed space starts with a capital letter, like WideSquare), by replacing the space with a special character (often an un- derscore like in Wide square, but also dashes or fullstops), or by simple concatenation (Widesquare). Many of these choices are just conventions. The used naming conventions should be noted down explicitly. Metadata about the ontology should state the naming convention for a given vocabulary. Many of the above conventions can then be tested automatically. It is more important to consistently apply the chosen convention than

72 ): 1998 (Berners-Lee, are These follow. should unclear). Web often hypertext is the latter on the since (especially, convention best the choose to e ae) ttetm fwiigmn oan tl ontpoiea exhaustive an provide not introducing do of still (instead domains helpful many Semantic writing be the of to on time proves the information URIs At of used names). aggregation commonly new and of exchange, reuse sharing, the Web, ease to order In reuse URI 4.1.4 rmohrotlge a s ieetnmn conventions. naming different use may ontologies other from etc.) inclusion hierarchy, ontology length, directory check of i.e. Check depth URIs, extensions, namespace. good file a for parameters, of guidelines query names general of all the with throughout fulfills URI URI applied capitalization) the is (Fellbaum, the convention if and Wordnet of the like names if resources multi-word part test lexical (like local and using conventions by the naming or Formalize comparing entity by the 1998). checked to given be label can the naming proper conventions) A naming (Check 4 Method nadto,UI nteSmni e hudflo lotesm ue htURIs that rules same the also follow should Web Semantic the on URIs addition, In • • • • • • oeta nylclnmsfo h same the from names local only that Note o’ nld eaaa(..dntadteato racs etitosinto restrictions access or author the in add http://example.org/style/uri as don’t e.g. URI, (e.g. the metadata include don’t o’ xoetcnlg eg o’ s l xesoslk in like extensions file use don’t org/foaf.rdf (e.g. technology expose don’t as such URIs use don’t instead) (i.e. parameters org/interest?q=Pop+Music query but show nesting, don’t hierarchical deep use to structure) not flat means rather also a (which short reasonably be http://example.org/movie/tt0133093 ehmngesbe(..prefer (i.e. URI guessable the human change URI) be nor old to, the refers redirecting resource without the resource what a change of don’t (i.e. change don’t edt ossetyuetesm aigcneto,ie ae reused names i.e. convention, naming same the use consistently to need , use , http://example.org/foaf http://example.org/style/timbl/private/uri use , http://example.org/movie/The_Matrix uhrhpadacs ee a change) may level access and Authorship . http://example.org/interest/pop_music namespace ) instead) o l oa ae nthe in names local all not , . R references URI 4.1 http://example. http://example. utuse just , over 73

4 Chapter 4 Vocabulary lexicon of URIs yet, but projects such as Freebase3 or DBpedia (Auer and Lehmann, 2007) already offer a huge amount of identifiers for huge domains. Some domains, like life sciences, music, literature, or geography already have very exhaustive knowledge bases of their domains. These knowledge bases can easily be reused. Analyzing the Watson EA corpus, we find that 75% of the ontologies use 10 or more namespaces, in 95.2% of the ontologies the average number of names per namespaces is lower than 10, in 76.5% it is lower than 3, in 46.4% lower than 2. This means that most ontologies use many namespaces, but only few names from each namespace. Considering knowledge bases this makes perfect sense: in their FOAF files persons may add information about their location by referencing the Geonames ontology, and about their favourite musician referencing the MusicBrainz ontology. Terminological ontologies often reference the parts of external ontologies relevant to them in order to align and map to their names.

Method 5 (Metrics of ontology reuse) We define the following measures and metrics:

• Number of namespaces used in the ontology NNS

• Number of unique URIs used in the ontology NUN

• Number of URI name references used in the ontology NN (i.e. every mention of a URI counts)

NUN Ratio of name references to unique names RNU = • NN

NUN Ratio of unique URIs to namespaces RUNS = • NNS Check the following constraints. The percentages show the proportion of ontolo- gies that fulfill this constraint within the Watson EA corpus, thus showing the probability that ontologies not fulfilling the constraint are outliers.

• RNU < 0.5 (79.6%)

• RUNS < 5 (90.3%)

• NNS ≥ 10 (75.0%)

3http://www.freebase.com

74 o xml,w ol nrdc h URI the introduce could consists we corpus example, For Watson (42 the them of occurrences: 26,750,027 of and number triples, sheer 59,749,786 their by of shown be can Web 2009 as March (such numbers 6th e.g. varied, very be contain often Ontologies be only Literals should 4.2 Punning punning. with deal since to especially equipped user, ontologies well the two applied. yet with of carefully not confusion merger are cause the tools may most render punning Nevertheless, to or other. two, the the for where inconsistent names different introduce to lion the of ontologies in appear yet not does thus property. and annotation or 2, cases. object OWL test datatype, a in of individual, outside introduced an was class, a feature the as This declared brings is 2006 ). further it Horrocks, This if and i.e. (Motik declared. ontologies properly of parsing are efficient names more used a of all benefit if that additional check problem e.g. can the if tools to discern that leads to This of reasoner typo declared. a a be for to impossible names is require it not do ontologies Web punning and declarations URI 4.1.5 h xo ti lascerwihtp fett h aerfr o hr r good are There entity to. in the refers position name the example, the individual the use on for the entity to Based represent of allowed punning: may type class. is allow which the it clear to or always now reasons property is and the it 2, axiom individual, OWL the the in either removed for was names restriction This disjoint. hc o vr R fteeeit elrto fteUI fs,ceki the erro- if detect check to so, possible If is it URI. way the This of usage. declaration punning. the introduced a with neously exists consistent there is type if declared URI every for declarations) Check name (Check 6 Method ntigta a erpeetdb iea ol lob ersne ya URI. an by represented be also could literal a by represented be can that Anything type, its also but exists name a that only not stating axioms, are declarations The nOL1D h e fcasnms niiulnms n rprynmswere names property and names, individual names, class of set the DL 1 OWL In ex:Address Simba ,o tig (e.g. strings or ), .Teei onecessity no is There 2007). Motik, 2005; Phipps, and (Berrueta lion hscnb icmetdb eurn to requiring by circumvented be can This . sue sacasi h n nooy n sa niiulin individual an as and ontology, one the in class a as used is literals lion Jaguar htrepresent that hti ftetype the of is that 4.75 .Teiprac fltrl nteSemantic the on literals of importance The ). ex:Number4dot75 ,pit ntm sc as (such time in points ), ex:Adress aavalues data . % nld literal. a include 8%) lion species eedn ntecontext, the on depending , ob h R orepresent to URI the be to sanwett rmerely or entity new a is ri a etetype the be may it or , hs aavle can values data These . :4p ETon CEST pm 2:14 declare . Literals 4.2 ae,so names, 75

4 Chapter 4 Vocabulary

the number 4.75. Using OWL Full, we could state that the literal and the URI are the same individual. Often it is more convenient to use a literal instead of a URI, especially since their meaning is already agreed on. Literals have the disadvantage that in triples they are only allowed to be objects. This means that we cannot make statements about literals directly, e.g. say that 4 is an instance of the class ex:EvenNumber. This can be circumvented by using URIs as proxies for data values. It is currently discussed to drop this limitation and to allow literals to also be subjects in triples. Literals can be typed (see Section 4.2.1) or plain. A plain literal may be tagged with a language tag (Section 4.2.2). The standard does not allow literals to be both typed and language tagged. Language tagged literal would often make little sense: the integer 4 is the same number regardless of the language. Since it makes sense for text, the specification for the new data type rdf:text allows for language tagged typed text literals (Bao et al., 2009b). The RDF standard states that plain literals denote themselves (Hayes, 2004), i.e. the plain literal Jaguar denotes the ordered list of characters Jaguar. Most of the literals on the Web are plain literals – only 1,123,704 (4.2%) of them are typed.

4.2.1 Typed literals and datatypes A typed literal is a pair of a lexical representation and a data type. The data type is given by an URI that defines the interpretation of the lexical representation. Most on- tologies use data types defined by the XML Schema Definition (Fallside and Walmsley, 2004). The OWL standard requires all tools to support xsd:string and xsd:integer (Bechhofer et al., 2004) and names further recommended data types. OWL 2 extends the number of required data types considerably, adding further data types for num- bers, text, boolean values, binary data, URIs, time instants, and XML literals. OWL 2 also adds the possibility to constrain data type literals by facets (Motik et al., 2009b). Figure 4.2 shows the most often used data types. The most often used data types all belong to the set of recommended data types by the specification. The only two non-recommended data types that are used more often than a hundred times are from a deprecated XML schema version4 and from the W3C’s calendar working group.5 A fairly common error in data types is to use a namespace prefix (xs:string appears 19,927 times, xsd:string 518 times). Other common errors include misspelling of the data type URIs (e.g. forgetting the hash, or miscapitalizing the local name) or using deprecated versions of the XML schema. The Semantic Web standards allow to define new custom data types, but this option is very rarely used. It makes it nevertheless hard to automatically discover if a data type URI is just an unknown data type, or if it is indeed a typo. Data type URIs

4http://www.w3.org/2000/10/XMLSchema#nonNegativeInteger, used 157 times 5http://www.w3.org/2002/12/cal/icaltzd#dateTime, used 128 times

76 h e falwddt ye.Ohriei ilb adt icvrsml syntactic simple discover the to check hard to be able will it be choosing Otherwise when to considered literals. types. tool be within data evaluation should errors allowed This the of types. for set data important the used all is of form it correctness the syntactic this of For literal typed A corrected. type. data type given data the types. for data correct values used used syntactically different tool the of understand a equality to check example, to needs For to needs hand need concerned. who other not reasoner is the unknown A does itself on an Syntax types. tool with data Abstract the dealing the OWL of of understand understand into capable task to RDF/XML-serialization perfectly the need converting be as tools for may far all tool as that A discover type mean to types. not data data help does used will That the ontology. set types. all the closed data in This these literals with the set. dealing of this should declarations in tools type All types data allowed used. data the of being the in tools set the with errors and a syntactic deal case on to use automatic the decide an able on make should be depends to evaluator set tool This the a allow types. ontologies, thus data evaluating and URIs When other all the check. as Note just resolvable be corpus. should Watson the in types data used often most fifteen The 4.2: Figure 1.000.000 100.000 10.000 h eodcekta srlvn o ye ieasi oceki h ieasare literals the if check to is literals typed for relevant is that check second The tools the by understood are ontology the in types data used all if checked be should It 1.000 100 10 1 a n

y 856.928 U R I http://www.w3.org/2001/XMLSchema#integer

oaihi scale. logarithmic s tr 164.173 ing u n s igne d Lo n 21.452 g

d a te Time 19.577

inte

g 18.831 er

d e c

im 14.962 a l

int 9.906

u Datatypes n s igne d S h o 5.215 rt rd f :XML

L n it o e nNeg 4.292 ra l a tiv e In te

g 2.866 er

sa ro n ed obe to needs and error an is d

a 1.786 te

fl

o 1.147 a t

d o ub

le 489 "Four" b Literals 4.2 o ole a n 433

long 371 and 77

4 Chapter 4 Vocabulary

Method 7 (Check literals and data types) A set of allowed data types should be created. All data types beyond those rec- ommended by the OWL specifications should be avoided. Creating a custom data type should have a very strong reason. xsd:integer and xsd:string should be the preferred data types (since they have to be implemented by all OWL confor- mant tools). Check if the ontology uses only data types from the set of allowed data types. All typed literals must be syntactically valid with regards to their data type. The evaluation tool needs to be able to check the syntactical correctness of all allowed data types.

4.2.2 Language tags Language tags can be used on plain literals to state the natural language used by the literal. This enables tools to choose to display the most appropriate literals based on their user’s language preferences. An example for a literal with a language tag is "university"@en or "Universit¨at"@de. Language tags look rather simple, but are based on a surprisingly large set of specifications and standards. Language tags are specified in the IETF RFC 4646 (Phillips and Davis, 2006b). IETF RFC 4647 specifies the matching of language tags (Phillips and Davis, 2006a). Language tags are based on the ISO codes for languages, if possible taking the Alpha-2 code (i.e. two ASCII letters representing a language) as defined by (ISO 639-1, 2002), otherwise the Alpha-3 code (three ASCII letters representing a language) defined by (ISO 639-2, 1998). Not all languages have an Alpha-2 code. The specification allows to use Alpha-3 codes only if they do not have an Alpha-2 code. For example, to state that the literal Gift is indeed the English word, we would tag it with en, the Alpha-2 code for the English language. If we wanted to state that it is a German word, we would have tagged it with de, the Alpha-2 code for German. If it were the Middle English word, it would have to be tagged with enm (since no two letter code exists). Language tags can be further refined by a script, regional differences, and variants. All these refinements are optional. The script is specified using ISO codes for scripts (ISO 15924, 2004). To state that we use Russian with a latin script would be ru-latn. Every language has a default script, that should not be specified when used, e.g. en always assumes to be en-latn, i.e. English is written in latin by default. The IANA registry maintains a complete list of all applicable refinement subtags, and also specifies the default scripts for the used languages.6 Following the optional script parameter,

6http://www.iana.org/assignments/language-subtag-registry

78 ieyue yfr um ( Suomi ( far, by used ( widely (67 tag Japanese language by a followed few have surprisingly literals found ( 17,313,981 we standards English all, of in mesh All complex than a errors. such less For with correctly. only, applied characters are case lower codes. use case to upper usual of be occurrences to 200 seems it Web our Semantic indeed, and used. en-US avoided, are none be that should shows tags corpus Watson Private the meanings of have English. analysis terms standard certain where from community, research deviating Web Semantic the by use spoken could one similar. So and languages, sign special deviations, for example, language defined specified For historic be as can such variants omitted. further cases of preferably number small and relatively a needed, Finally, as when defined English using used of be be instead 1998). or, either only M.49, 1999) would 3166, (UN should Kong (ISO codes modifiers Hong codes three-digits in ISO numeric the spoken regions M.49 for as and UN codes countries the The the on differences. either alternatively, regional on accommodate based to are specified regions be can language the ucet ute nprpit sgso rosta a efudi h agaetags language the in found be can that are errors or usages inappropriate Further sufficient. and cit(eie n xml iea)o ain,adol adu ftg specify tags literal) of example handful one a (besides only standard codes. and M-49 specifies variant, 3166 UN tag ISO a the single no or using stems never literal) used: probably region, never example This a are one features low. advanced (besides the extremely script of is a most tags that language fact the regarding from errors of number the jp fr 7 ohctdeapeltrl r ae nteWCswieu nisitrainlzto effort, internationalization its on up write W3C’s the on based are literals example cited Both h ototnapidrfie asare tags refined applied often most The eeaie h sg flnug aso h eatcWbt n ftestandards the if find to Web Semantic the on tags language of usage the examined We uses one if matter not does it So insensitive. case are tags Language an by denoted refiners, tag or tags language private define to possible also is It ihihigtepsiiiiso agae as e here: See tags. languages of articles/language-tags/Overview.en.php possibilities the highlighting sol be (should ,1 ie,adSaih( Spanish and times, 1,810 ) en-uk fr-fr , En-uS en i hudbe should (it 8 ie.Telte ol ergre srdnat since redundant, as regarded be could latter The times. 288 a yfrtems fe sdtg ple 67752tms(96 times 16,767,502 applied tag, used often most the far by was ) rayohrcmiaino pe n oe aecaatr.O the On characters. case lower and upper of combination other any or , ja ja-jp .I umr,wt epc otecmlxt fteue standards, used the of complexity the to respect with summary, In ). en-x-semweb for ja en-gb ih5911tg (3 tags 519,191 with ) aaeea pkni Japan in spoken as Japanese fi sd9 times), 90 used , sue ,5 ie,Gra ( German times, 9,759 used is ) es stelnug a o h ido nls htis that English of kind the for tag language the as 6 times. 866 ) en-us frp . %.Teohrlnugsaeless are languages other The 0%). ih224tags, 2,284 with udfie a,ue 0tms,and times), 30 used tag, (undefined en-hk http://www.w3.org/International/ h tag the or . %o l li literals). plain all of 6% de ja en-344 ,9 ie,French times, 4,893 ) ol epreferred. be would en-gb h regional The . en-us 7 fr ,6 times, 1,767 . Literals 4.2 u always but ol be would , EN-US . 8%), 79 x , .

4 Chapter 4 Vocabulary

Method 8 (Check language tags) Check that all language tags are valid with regards to their specification. Check if the shortest possible language tag is used (i.e. remove redundant information like restating default scripts or default regions). Check if the stated language and script is actually the one used in the literal. Check if the literals are tagged consistently within the ontology. This can be checked by counting nl, the number of occurrences of language tag l that occurs in the ontology. Roughly, nl for all l should be the same. Outliers should be inspected.

4.2.3 Labels and comments

Labels are used in order to provide a human readable name for an ontological entity. Every ontological entity should have labels in all relevant languages. Almost none of the ontologies in the Watson corpus have a full set of labels in more than one language, i.e. most ontologies are not multi-lingual. Thus they miss a potential benefit of the Semantic Web, i.e. the language independence of ontologies. Comments add further human-readable explanations to a specific entity, and should also be language tagged. Labels and comments should follow a style guide and be used consistently. A style guide should define if classes are labeled with plural or singular noun, if properties are labeled with nouns or verbs, and under what circumstances comments should be used. Labels and comments should never use camel case or similar escape mechanisms for multi word terms, but instead simply use space characters (or whatever is most suitable for the given language). I.e. an URI http://example.org/LargeCity should have a label "large city"@en. External dictionaries such as WordNet (Fellbaum, 1998) can be used to check consistency with regards to a style guide. In an environment where ontologies are assembled on the fly from other ontologies (Alani, 2006), the assembled parts may follow different style guides. The assembled ontology will then not adhere to a single style guide and thus offer an inconsistent user interface. It is not expected that a single style guide will become ubiquitous on the whole Web. Instead, an ontology may specify explicitly what style guide it follows, and even provide labels following different style guides. For example, the SKOS ontology (Miles and Bechhofer, 2009) offers more specific subproperties for labels such as skos:prefLabel. This would allow to introduce a subproperty of label that is style guide specific, which would in return allow for the consistent display of assembled ontologies. Even when subproperties of rdfs:label are defined, there should always be one label (per supported language) given explicitly by using rdfs:label itself. Even

80 o pl esnn o ecigtelbl fa niybtsml okfrteexplicit the for look simply but entity label. do an entity’s tools) of the labels visualization stating the (especially triple fetching tools for reasoning many apply redundant, not semantically is this though xo ietywt rpe.W endteekn fbakndst be avoided. be to should nodes thus blank and conceptualization of our kind in these concept any since defined necessity We structural nodes of triples. blank ( entities out with three only directly between introduced relationship axiom are the but state cannot domain, nodes, we the blank two in are entities there any example, given the In . rdf:nil rdf:rest . _:y E rdf:first . _:y _:y rdf:rest . _:x D rdf:first . _:x _:x owl:disjointUnionOf C triples: RDF following the axiom by The represented RDF. be in will E) axiom union scenario D disjoint first the Second, a DisjointUnion(C for of example graphs. An representation RDF the entities. within ontology is anonymous axioms for OWL used are certain nodes of blank representation structural the for additional in all), impose would at which (if node, an graph. every referenced of the for author indirectly creating URIs the good on be relieve with costs only nodes up Blank come can to property. node graph functional RDF inverse the an way using This by example URI. a it giving nodes Blank nodes Blank 4.3 l eeatlnugs hc ftelbl n omnsflo h tl guide style the in follow one languages comments have all and comment in labels a ontology. label the need the that if a for entities Check have defined all entities languages. if all relevant Check if all relevant. comment Check being and label as all tagged. defined if Check language ontology. are an for literals languages relevant of comments) set and the Define labels (Check 9 Method ln oe nRFaerpeetdb sn h aepc prefix namespace the using by represented are RDF in nodes Blank used are nodes blank First, nodes. blank using for scenarios different two are There vntog hycudb ie Rs hs Rswudntrepresent not would URIs these URIs, given be could they though Even . r nRFfaueta lost s oei h D rp without graph RDF the in node a use to allows that feature RDF an are :x and C , D :y and , hyd o represent not do They . E rmteoriginal the from ) . ln nodes Blank 4.3 (underscore). structural 81

4 Chapter 4 Vocabulary

The second scenario uses blank nodes to represent anonymous ontology entities. For example, in the first few years it was regarded as good practice not to define a URI for persons in FOAF documents but to use blank nodes instead. The argument in favor of using blank nodes for persons was that it was regarded inappropriate to name people via URIs. This echoes the sentiment of ”I am not a number”, or rather, ”I am not a URI”. FOAF preferred to identify persons by using inverse functional properties such as eMail-adresses or their hash sums. Web architecture later suggested that all entities of interest should have a URI (Jacobs and Walsh, 2004). The FOAF project also deprecated the use of blank nodes for persons (since they are definitively entities of interest). Using a URI for an entity allows for all the advantages described earlier about linked data (see Section 4.1.1), most importantly the possibility to look up further data about the given entity by resolving the URI. In summary, blank nodes should be avoided unless structurally necessary.

Method 10 (Check for superfluous blank nodes) Tables 2.1 and 2.2 list all cases of structurally necessary blank nodes in RDF graphs. Check for every blank node if it belongs to one of these cases. Besides those, no further blank nodes should appear in the RDF graph. All blank nodes not being structurally necessary should be listed as potential errors.

82 .Tefiesecdn hudb appropriate. be should encoding file’s The be should 4.1.5). Entities Section together. (see grouped usage be terminological should before and entity same declared facts, the precede graph-based about also should facts should (for and axioms pattern proper axioms Terminological axioms ontology triples an the complex together. forming the are axioms forming grouped of of way Triples be Groups order uniform together. syntaxes). the grouped fairly ontology-based be and a should (for file in axioms the syntaxes or in the syntaxes) indentation of consistent most over and evaluated be can other. each (Horridge (Hori Syntax Manchester the (Patel-Schneider Syntax are Abstract latter the the on for- of based the (Bechhofer examples describe ontology of that 2004), RDF/XML the that Examples are ones from ones 2009)). group calculated the Motik, the mer and be and (Patel-Schneider still groups: ontology), in can different described the graph transformation two describes (a into turn syn- directly surface ontology in classified different the (which be of graph number can a growing) syntaxes describe (and Surface big a in taxes. serialized are ontologies Web Syntax 5 Chapter ahsraesna arnsteroneauto ehd.Cmo etrsthat features Common methods. evaluation own their warrants syntax surface Each tal. et .Alteesnae r rnfral uoaial rmadinto and from automatically transformable are syntaxes these All 2003). , tal. et ,o h W M rsnainSyntax Presentation XML OWL the or 2004), , tal. et rn n Beckett, and (Grant NTriples or 2004) , aeadffrn effect. different a have arranged differently meanings and meaning, different a have arranged differently Words Bas acl 1623–1662, Pascal, (Blaise tal. et ,OWL 2006), , acl 1670)) (Pascal, Pens´ees 83

5 Chapter 5 Syntax

There should be a good reason for using anything but UTF-8 as the file encoding (Unicode Consortium, 2006). Even though ontologies will rarely be edited in simple text editors, these guidelines will help tremendously once it does happen. Understanding a disjoint union axiom when the triples making up the axiom are scattered throughout the graph serialization creates an unnecessary challenge. We assume that the capability to debug the source file of an ontology increases dramatically when the guidelines given above are imple- mented. These guidelines are derived from the rules applied for software engineering big programs, where similar rules exist for the layout of the code in the source files (Kernighan and Plauger, 1978). In this thesis we refrain from instantiating the given guidelines for any specific surface syntax. The rest of this chapter will first discuss two further specifics of the syntax which apply for most surface syntaxes (syntactic comments in Section 5.1 and qualified names in Section 5.2) and close with introducing a possibility of using XML validation for the completeness test of OWL knowledge bases (Section 5.3).

5.1 Syntactic comments

Many OWL surface syntaxes allow for comments. For example, an XML-based surface syntax may contain an XML-style comment such as . We call these syntactic comments and discern them from RDF-style comments, i.e. comments that are part of the RDF graph (see Section 4.2.3). Syntactic comments are often lost when two ontologies are merged, or when one ontology is transformed from one syntax to another. RDF-style comments on the other hand are stable with regards to transformation and merger. For many syntaxes it is best practice to include some comments with metadata about the ontology document. XML documents, for example, often start with an in- troductory comment stating the version, author, and copyright of the given document. In OWL document files this is not necessary, since all these informations can be expressed as part of the ontology itself. For example, the above quoted XML-style comment is injected by the Prot´eg´eeditor (Noy et al., 2000) in order to show that the ontology was created with that editor. Instead a triple could state that the ontology has been created using the Prot´eg´eeditor. ex:Ontology_A ex:authoringtool sw:Protege .

Most of the metadata in syntactic comments can be expressed using statements about the ontology. This allows ontology repositories to automatically and uniformly interpret the metadata about the ontologies in the repository, and to provide access and searches over the ontology using the very same sophisticated tools that are used to access the data within the ontology themselves.

84 n xesvl ul xets ilntb otbtrmi eeatadi continued in and relevant tools remain deployed but widely lost standards, be these not of will reuse expertise for the other, built URIs to each expensively due as and of that such top was standards on promise building established inherent standards well of and (Berners-Lee set existing identifiers a already as incorporating created and was Web Semantic The validation XML 5.3 practice common follow to and Web. namespaces, the of on definitions the regarding expectations define we han- if be will care not libraries to do OWL and underlying correctly, the namespaces ontologies, the dle combining and merging When rors. service Web resolution. their of and querying prefixes tool. the automatic validation on the vote and allow edit, interfaces add, can users repository. prefix where ontology the an if offer maintaining and automatically to by Web. check used offered the is be can on statistics can we patterns usage usage Swoogle that common so ontologies. follow crawled namespaces, definitions the for on definitions statistics default of number thesis. this a in offers used are that prefixes the lists 1.1 Table (Smith as such the prefixes example, as For defined URIs. appropriate be the with abbreviations known foaf well bind to taken as entities XML (Bray entities XML They URL (Bray as full namespaces such references. XML approaches, URI XML-based or known abbreviate 2008) on to based mechanisms often include are formats serialization Most names Qualified 5.2 3 2 1 http://prefix.cc http://pingthesemanticweb.com/stats/namespaces.php http://ebiquity.umbc.edu/blogger/2007/09/23/top-rdf-namespaces/ oeta sn aepcsicnitnl ihcmo sg ilntla oer- to lead not will usage common with inconsistently namespaces using that Note Swoogle hndfiigabeitosadpexsi notlg ouet aesol be should care document, ontology an in prefixes and abbreviations defining When ss rvln nteWbta tsol always should it that Web the on prevalent so is 2005 ) Miller, and (Brickley prefix foaf tal. et http://xmlns.com/foaf/0.1/knows (Ding or ,adfi ola nooispoel hr hsi o h case. the not is this where properly ontologies load to fail and 2004), , dc http://xmlns.com/foaf/0.1/ tmrl ep oaodcnuinfrtehmnue o obreak to not user human the for confusion avoid to helps merely It . &foaf;knows tal. et rdf , saSmni e rwe n nooyrpstr that repository ontology and crawler Web Semantic a is 2004) , rdfs tal. et or , igteSmni Web Semantic the Ping rXLfrsraiain(Bray serialization for XML or 2005) , ruigXLnmsae as namespaces XML using or owl tal. et r enda pcfidi h W standards OWL the in specified as defined are .Tu,isedo aigt rt the write to having of instead Thus, 2006). , 2 prefix.cc prefix.cc oetosee sueta certain that assume even tools Some . ecnabeit ae ihrusing either names abbreviate can we 3 evc hthlswt crawling with helps that service a , sascal osrce website, constructed socially a is 1 o sg nasraiainor serialization a in usage for http://xmlns.com/foaf/0.1/ nte iewt namespace with site Another foaf:knows . ulfidnames Qualified 5.2 tal. et . .The 2008). , tal. et 85 ,

5 Chapter 5 Syntax use. This promise has not been fully realized. A characteristic of the standard RDF/XML syntax (Beckett, 2004) is that it can be used to let ontologies, particularly simple knowledge bases, mimic traditional XML documents and even be accompanied by an XML schema. But we will show in Sec- tion 5.3.2 that most XML oriented tools can only deal superficially with RDF/XML files. Due to numerous possibilities the RDF graph can be expressed in XML, creating applications using the XML set of tools and expertise is often inefficiently expen- sive and unreasonably hard. This is particularly true for typical XML evaluation approaches such as document validation. Basically, they are currently not applicable. In this section an approach resolving this problem is presented. It uses well- established standards to sufficiently constrain the serialization of RDF/XML in order to be usable by classic XML processing approaches. We suggest to use the same ap- proach humans would use in order to serialize their conceptualizations in XML, namely following an XML schema. Three popular XML schema languages are currently widely used, the XML-native Document Type Definitions (DTD) (Bray et al., 2008), the W3C XML Schema Definition language (XSD) (Fallside and Walmsley, 2004), and the Reg- ular Language for XML Next Generation (RELAX NG) (Clark and Murata, 2001). Due to its legibility and high availability we choose DTD for our prototype implemen- tation, but the work can be extended to use the other schema languages as well, as will be discussed in Section 5.3.7. Note that by restricting RDF/XML with DTDs, the resulting files are still fully compliant RDF files, and therefore can be used by other Semantic Web tools out of the box.

Method 11 (Validating against an XML schema) An ontology can be validated using a standard XML validator under specific cir- cumstances. In order to apply this, the ontology needs to be serialized using a pre-defined XML schema. The semantic difference between the serialized ontology and the original ontology will help in discovering incompleteness of the data (by finding individuals that were in the original ontology but not in the serialized one). The peculiar advantage of this approach is that it can be used with well-known tools and expertise.

Depending on the schema, this method may severely impact the extensibility of the ontology document. The advantage of using an XML validating ontology document is that we can check if the document is data complete with regards to a previously made specification. This way tools can check before hand if the data is not only consistent with a given ontology, but also if there is sufficient data in order to perform a specific task.

86 .TeDDgvstegamrfrteXLeeet:teroot the the that elements: (note XML else nothing the but for ber, grammar the elements gives the DTD contains and The characters 5.2 ). special Listing defining (entities Libby brevity). Dan for removed by been 0.9 have RSS comments 5.1 for Listing document structure. DTD document the describes DTD (Beged-Dov etc. 0.9 updates, RSS diaries, by video given is RDF Syndication of serialization al. XML-based et an of example An Example 5.3.1 de- and we 2005) example Miller, presentation. running and vCard a (Brickley and FOAF-files As HTML we arbitrary to before takes questions. them outlining, that translates research Sec- service are further Web we of a 5.3.5. goal scribe discussion Section the a towards in with approaches demonstration close related and description describes a implementation 5.3.6 by followed prototype tion is This current works. the tool of creation DTD dialog-driven service. the Web how a poses as workflow generate example to implemen- order the our in provide (XSLT) through We techniques them applicability. processing pages. pipe and XML Web, HTML standard usefulness the use from its then files and demonstrate FOAF tation, to take we order example in our constraints For method these the to of adhere instantiation easily standards. to Web developer Semantic of the requested tool understanding is allowing dialog-based deep that a thus without data describe developer, the We contain the 5.3.3). that interpretation Section by DTDs arbitrary in This RDF/XML-compliant (i.e., described create schemas results. as quickly XML covered used the to XML the be serialize on input not constraints to can given of how number DTDs a a query define to define then check order to in us and to file forces graph schema meant XML RDF same grammars very given the a formal use schemas we Instead, as XML validity. serialization. for regarded document requested roughly the generating be for can specification a as file schema 4 rgnlsuc at source Original h T a h otelement root the has DTD The the where files, XML DTD-described valid and files RDF valid both are files RSS pro- 5.3.4 Section serializations. the create to approach the describes 5.3.3 Section example an present and approach, the of implementation prototype a provide We XML the of interpretation the of details the in lie challenges theoretical major The http://www.rssboard.org/rss-0.9.dtd .RS(ihrue sasotu for shortcut a as used (either RSS 2000). , safra oeal h ydcto fWbfeso ..bos podcasts, blogs, e.g. of feeds Web of syndication the enable to format a is ) http://my.netscape.com/rdf/simple/0.9/ channel , image rdf:RDF * rudtebakt nln ae the makes 1 line in brackets the around , item dfie nteata S ouet see document, RSS actual the in (defined , textinput D ieSummary Site RDF 4 sa brvae eso fthe of version abbreviated an is nabtayodradnum- and order arbitrary in nvial,cp codn to according copy unavailable, . M validation XML 5.3 or elySimple Really rdf:RDF ? and 87 +

5 Chapter 5 Syntax

Listing 5.1: Document type definition for RSS 0.9 documents.

§ ¤

¦ ¥ inside the brackets meaningless). All these may contain a further number of elements, which are all flat #PCDATA fields. An RSS document that is valid with regards to the given DTD could look like the example in Listing 5.2. A standard RDF tool will hardly serialize its RDF graph like this. The DTD defines the order of the elements, and it defines the names of the XML elements to be used. For an RDF library the order does not matter, and there are a number of different ways to serialize triples in XML. In order to serialize an RDF graph so that we can use it for validation against a schema, the serializer must be aware of the schema. Listing 5.2: Example RSS document.

§ ¤ semanticweb.org http://semanticweb.org Semantic Web Community

88 ¦ fte eur eprudrtnigo h ehooyta sraiyavailable. readily is than technology the of understanding deeper a require user. them of the of communities of to domain access specific provides and the websites books within numerous many even are often and There practice, technologies, available. widely XML is on expertise courses XML hand, other the On title ok nSmni e ehooyhv nybe ulse ntels w years two last the in published been audience only general have Segaran most technology 2009; and Web (Pollock, curricula, Semantic students’ technologies Web on to Semantic introduced all, books at been If recently expertise. only sufficient Semantic have of of lack deployment wide the the is achieving technologies towards Web hindrances major the of one Today, logic autoepistemic of Motivation also usage we 5.3.2 the but to exist, 9.2.3). similar they Section is that in (this know described values only as their OWL not with know do we we that properties know certain for that we that means us guarantee will the DTD above the every the against that validating that infer but to file, us allows RDF valid all a (i.e. and do filled we document properties validates, certain XML file have valid the graph a If the is in errors. validation it described all entities that of know list only a not with us provide will validator /rfRF> rdf:RDF item image item 3 usOL nls al< il > title call link http: puts >W3C link http: <link >List <title > link </ org . semanticweb // >http: <link > url </ > title </ org . > <url SemanticWeb > <title property). rpryfrevery for property tal. et Hitzler 2008 ; Hendler, and Allemang 2009; , item item uCasfIe xcCriaiy1title)) ExactCardinality(1 SubClassOf(Item a ohv a have to has hsalw st state to us allows This . title rpry(ei ie rnot), or given it (be property know aacompleteness data . M validation XML 5.3 h culvleof value actual the item tal. et ilhv a have will s 2009). , This . that 89 ¥</p><p>5 Chapter 5 Syntax</p><p>Strengthening the ties between RDF and XML allows not only to reuse existing ex- pertise, but also to re-enable the already existing tools. To illustrate the problems with using RDF/XML, consider the following ontology, serialized in N3 (Berners-Lee, 2006):</p><p> aifb:Rudi_Studer rdf:type foaf:Person .</p><p>The following four documents are examples that all serialize this single triple in RDF/XML:</p><p>Listing 5.3: Expanded RDF/XML serialization</p><p>§<rdf:RDF > ¤ <rdf:Description rdf:about="&aifb;Rudi_Studer"> <rdf:type > <rdf:Description rdf:about="&foaf;Person"/> </rdf:type> </rdf:Description> </ rdf:RDF ></p><p>¦ An object may be moved to the property element as an attribute value: ¥</p><p>Listing 5.4: Object as attribute value</p><p>§<rdf:RDF > ¤ <rdf:Description rdf:about="&aifb;Rudi_Studer"> <rdf:type rdf:Resource="&foaf;Person"/> </rdf:Description> </ rdf:RDF ></p><p>¦ Typing information can be moved to the element: ¥</p><p>Listing 5.5: Typing by element name</p><p>§<rdf:RDF > ¤ <foaf:Person rdf:about="&aifb;Rudi_Studer"/> </ rdf:RDF ></p><p>¦ Finally, the encompassing rdf:RDF root element is optional if it only has one child ¥ node:</p><p>Listing 5.6: Removing optional root element</p><p>§<foaf:Person rdf:about="&aifb;Rudi_Studer"/> ¤</p><p>¦ ¥</p><p>90 § ¦ § ¦ ntnehsaUIadmyntb ln node. blank a be not may every and that other URI ensures (unlike 7 a namespace-aware Line not has RelaxNG). are instance or se XSD per like DTDs languages schema that XML fact the circumvent to is This (Bray h euti esnbylgbeXLfieadas ul ai D file, RDF valid fully a also and tools. file Web XML Semantic legible all reasonably with used a be is can result which The 5.8. ing number arbitrary an has element root by (given the address that mail some states have It files. of XML in RDF/XML data the FOAF normalize will schema. an that XML or workflow authoring given XQuery the a a An simplify provide following to parser. we serialization order RDF In solutions, according an write. persons XML-based using to all of without cumbersome of trivial be list will not simple transformation is XSLT a semantics creating RDF Thus, the semantics. to RDF equal but tions, /rfRF> rdf:RDF </ <rdf:RDF > REQUIRED # CDATA rdf:resource foaf:mbox ATTLIST <! 10 !ATITfa:esnrfaotCAA#RQIE > REQUIRED > # EMPTY )> > CDATA *) PCDATA (# "> foaf:mbox foaf:mbox /0.1/ rdf:about foaf / -ns#"> foaf:name com ELEMENT . <! syntax - foaf:name FIXED xmlns ( # rdf // - 9 foaf:Person ELEMENT http: " <! CDATA /1999/02/22 8 org foaf:Person ATTLIST FIXED # <! .w3. www // 7 xmlns:foaf ELEMENT CDATA http: " <! 6 rdf:RDF > 5 xmlns:rdf *) ATTLIST <! foaf:Person 4 ( rdf:RDF 3 rdf:RDF ATTLIST <! 2 ELEMENT <! 1 i ie nList- in given is 5.7 Listing in DTD given the against validates that FOAF-file A express to way the normalizes that DTD XML an shows 5.7 Listing in example The serializa- XML different very and infosets XML different very have documents All foaf:Persons /fa:esn> foaf:Person </ foaf:Person < ofmo d:eore= alosue@i d /> " edu . mailto:studer@kit =" > foaf:name rdf:resource </ Studer foaf:mbox < >Rudi foaf:name < mn:d "ht:/ w w.og/990/2-rf-sna -ns#" syntax - rdf - /1999/02/22 org .w3. www // http: =" xmlns:rdf mn:of= tp /xls.cm/fa 01 "> /0.1/ foaf / com . xmlns // http: =" xmlns:foaf tal. et d:bu "ht:/ w ib.kt.eu/d uiSue "> Rudi_Studer /id/ edu . kit . aifb . www // http: =" rdf:about ftersligdcmn.Nt httenmsae r fixed. are namespaces the that Note document. resulting the of 2006) , ln )adeach and 1) (line ADDfrafamn fFOAF of fragment a for DTD A 5.7: Listing FA-l fe normalization after FOAF-file 5.8: Listing foaf:mbox foaf:Person ie6.Lns25dfietenamespaces the define 2-5 Lines 6). line , uthv a have must foaf:name . M validation XML 5.3 foaf:Person n may and 91 ¤ ¥ ¤ ¥</p><p>5 Chapter 5 Syntax</p><p>Even though the approach does not allow to create arbitrary XML – specifically, it does not allow to create XML documents that are not valid RDF – and thus cannot be used as a one-step solution towards reusing already existing DTDs, we will show that one can create a custom DTD to serialize the RDF file first, and then translate it to any representation the user requires (even in non-XML formats) by using readily available XSLT tools and expertise. Therefore it is possible to ground any RDF file into an already existing file format in two steps, given that all required information is available.</p><p>5.3.3 Normalizing the serialization In order to create the normalized serialization we use the provided DTD to generate a series of SPARQL queries. The results of the queries are then used to write the actual serialization. In this section we describe how this is accomplished. The given DTD has to fulfill a number of constraints that will be listed explicitly in the following section. First we need to read and define all given namespaces from the root element and declare them appropriately in the SPARQL query. Furthermore we add the RDF and RDFS namespaces, if not already given (for the namespaces used here, see Table 1.1). Next we go through every possible child element type of the root element rdf:RDF. In our example we can see in line 1 that there is only one possible child element type, foaf:Person. This is the type of the sought for instances, i.e. we translate it into the following query fragment:</p><p>SELECT ?individual WHERE { ?individual rdf:type foaf:Person . }</p><p>Next, for every required child element of the found elements, we add a respective line to the WHERE-clause. In our example, foaf:Person requires only foaf:name (line 6, foaf:mbox is optional). So we add the following line (introducing a new variable every time):</p><p>?individual foaf:name ?v1 .</p><p>If the foaf:name itself would have pointed to another element, this element would be the type of ?v1, and in return we would add all required properties for ?v1 by adding the subelements of this type, and so on. The result of this query is a list of all individuals that are described with all the necessary properties as defined by the DTD. In the example they are, thus, not only</p><p>92 rvosscin a ofll h olwn constraints: following the fulfill to has section, previous DTDs. it such propose also create Since we easily constraints, (which to these explicitly. fulfilling RDF/XML-serializations tool DTDs constraints used dialog-based to create be to a those regards expect) can lists with not they do expertise section explicitly that we some so This quite constraints require of tool. number would normalization a fulfill the to by have DTDs provided The schemas compliant quickly of to Creation way easy 5.3.4 and chains. powerful tool a XML-based instances provides existing all This to contain knowledge will subclasses. further serialization add indirect normalized actually and the are and direct transparent file expected of fully input as a the back for in come provide individuals will to the OWL if class and even the having RDFS example, For of This power mapping. endpoint. SPARQL the ontology reasoning-aware employ a to on queries us the allows use can tool. we normalization queries, the junctive with created was that serialization a such shows the in stated for children. been properties element’s have required root would all property add for a to ?result done of required was subelement it been the have as would if it Again, DTD, 1. to query } { WHERE ?result SELECT 1 LIMIT } { WHERE queries: ?result two following SELECT the issue would we example, the In individually. one least of instances nRFcmlatDD htcnb sdb h omlzrdsrbdi the in described normalizer the by used be can that DTD, RDF-compliant An con- describing all are queries resulting the since approach: the 5.8 on Listing note serialization. further the generating A start can we queries, the of results the With one only is there that states DTD the of 6 line Since property optional and required every for asking list, result the over iterate we Next ibRd_tdrfa:bx?eut. ?result foaf:mbox aifb:Rudi_Studer . ?result foaf:name aifb:Rudi_Studer lohst egtee ntesm a sw ofralteisacso the of instances the all for do we as way same the in gathered be to has also foaf:name foaf:Person swrc:Student ?v1 9.2.3). Section (see given bv.Frhroe ahrqie n pinlpoet of property optional and required each Furthermore, above. u loof also but sln sti sasbls of subclass a is this as long as , K foaf : name . > ..w nwta hyhv at have they that know we i.e. , foaf:name foaf:Person . M validation XML 5.3 elmttefirst the limit we , ?result h results the just , 93</p><p>5 Chapter 5 Syntax</p><p>• the resulting XML-file must be a valid RDF/XML-file</p><p>• all used namespaces have to be fixed in the DTD and defined in the root element of the resulting XML-file</p><p>• the root element of the XML-file has to be rdf:RDF</p><p>• since DTDs are not context-sensitive, the DTD can use each element only a single time</p><p>Especially the last constraint is a severe restriction necessary due to the shortcom- ings of DTDs. In Section 5.3.7 we will take a look at possible remedies for this problem using other, more modern XML schema languages. The first constraint is basically impossible to fulfill without deep knowledge of the RDF/XML serializations. Because of that we suggest a tool that analyses a given dataset or ontology and then guides the developer through understandable dialog options to create a conformant DTD. The tool follows the following approach:</p><p>1. the tool loads a number of RDF files. It does not matter if these files contain terminological ontologies, knowledge bases, or populated ontologies.</p><p>2. the tool offers the developer to select a class from the given RDF files. rdfs:Class and owl:Class are both considered. The developer has to decide if the result should contain exactly one instance, or an arbitrary number of instances.</p><p>3. for the selected class, all sensible properties are offered. Sensible properties are those that either are defined with a domain being the given class, or where the instances of the given class have assertions using the property. For each selected property the developer has to decide if this property is required, can be repeated arbitrary times, or both.</p><p>4. for each selected property the developer has to decide on the type of the filler, especially if it is a datatype value or an individual, and if the latter, if it is of a specific class (which again will be selected from a provided list, based both on the range of the property and the classes of the actual fillers in the knowledge base).</p><p>5. if a class was selected, enter recursively to Step 3.</p><p>6. as soon as a property is selected and fully described, the developer can select another property by repeating from Step 3 on.</p><p>7. as soon as a class is fully described, the developer can continue with another class by repeating from Step 2.</p><p>94 § est looesatasainit TL nXL l ffrn h aeresults and write same demo to The the harder offering and translation. longer, file this much XSLT for been An have XSLT maintain. would HTML. full RDF into we the data, serialized translation gives arbitrarily example, contact over a exchanging 5.9 our for Listing offers In standard also use. IETF (Dawsonwebsite well. wide pre-XML vCard as in a a format into is is file which other vCard RDF any XML resulting 1998). the to XML Howes, the turn or translating and use the to of files, transformation we for XSLT capable XML an implementation technologies is other provide example XSLT numerous to our are either 1999). In files There (Clark, knowledge XSLT further files. any language so. have XML transformation to do of need to processing not technologies do further they Web and Semantic files, of XML any please with examples have complete would for 5.8, Listing site). in demo demonstration the given to our is refer from example incomplete online (an accessed syntax be can DTD the DTD of full describe extension The an should uses It site. 5.7 . service Web file. Listing The RDF in vocabulary. arbitrary FOAF given an the prototype is using a developed person input have a The we approach, implementation. our of service feasibility Web the demonstrate to be order may In properties sensible of Implementation list 5.3.5 the especially that context is 5.3.7. aware the Section This also of in aware are this tool. We are discuss the We that in. of languages incomplete. used thorough schema expressivity is powerful more the element more and extend an to work thus moving future and when expect it true we relax and to required, analysis than stronger is constraint s:epaemth= ofPro "> foaf:Person =" " yes match =" indent " text xsl:template < =" method xsl:output < " 1.0 =" version xsl:stylesheet < ESO: .0 VERSION:3 BEGIN:VCARD 5 http://km.aifb.uni-karlsruhe.de/services/RDFSerializer/ o eeoescnps-rcs h upto h eilzrwt h aees they ease same the with serializer the of specified output a the using post-process can person, developers one Now describing file RDF an is output the of step first The This twice. element any of selection the allow to not careful be to has tool The mn:d "ht:/ w w.og/990/2-rf-sna -ns#" syntax - rdf - " /1999/02/22 Transform org / XSL .w3. www /1999/ // org http: .w3. =" www // xmlns:rdf http: =" xmlns:xsl mn:of= tp /xls.cm/fa 01 "> /0.1/ foaf / com . xmlns // http: =" xmlns:foaf 5 XL o rnfrigFA ovCard to FOAF transforming for XSLT 5.9: Listing ei ye= et/-vad"/> " vcard /x- text =" type - media . M validation XML 5.3 95 ¤</p><p>5 Chapter 5 Syntax</p><p>UID:<xsl:value-of select="@rdf:about" /> N;CHARSET=UTF-8:<xsl:value-of select="foaf:family_name" />; <xsl:value-of select="foaf:firstName" />;;; FN;CHARSET=UTF-8:<xsl:value-of select="foaf:name" /> <xsl:for-each select="foaf:mbox"> EMAIL;TYPE=internet:<xsl:value-of select="@rdf:resource" /> </xsl:for-each> URL:<xsl:value-of select="foaf:homepage/@rdf:resource" /> CLASS:PUBLIC ORG;CHARSET=UTF-8:; END:VCARD </xsl:template> </xsl:stylesheet></p><p>¦ The demonstration site uses Xalan6 as the the XSLT processor. Applying XSLT ¥ transformations to the sample RDF file yields the the result given in Listing 5.10.</p><p>Listing 5.10: A vCard file created by transforming RDF/XML</p><p>§BEGIN:VCARD ¤ VERSION:3.0 UID:http://www.aifb.kit.edu/id/Rudi_Studer N;CHARSET=UTF-8:Studer;Rudi;;; FN;CHARSET=UTF-8:Rudi Studer EMAIL;TYPE=internet:mailto:rudi.studer@kit.edu URL:http://www.aifb.kit.edu/web/Rudi_Studer CLASS:PUBLIC ORG;CHARSET=UTF-8:; END:VCARD</p><p>¦ ¥ 5.3.6 Related approaches</p><p>In this section we discuss alternative approaches towards bridging the gap between the Semantic and the Syntactic Web.</p><p>1. The main related approach is to combine or extend XSLT with capabilities to seamlessly deal with RDF, and still continue to provide the same output format- ting power. There are a number of implementations towards this goal, such as</p><p>6http://xml.apache.org/xalan-j/</p><p>96 10 9 8 7 http://xsparql.deri.org/spec/ http://www.wsmo.org/TR/d24/d24.2/v0.1/20070412/rdfxslt.html http://rdfweb.org/people/damian/treehugger/index.html http://rdftwig.sourceforge.net/ Aohrapoc st hneteto sn h aas hti eoe RDF becomes it that so data the using tool the change to is approach Another 6. always they that so rewritten be could tools generating RDF of serializers The 5. result XML SPARQL the use then and first, SPARQL use to is approach One 4. the read first that translators, hard-coded custom write to is approach Another 3. auto- which 2001) Studer, and (Erdmann DTDMmaker is approach early very A 2. D Twig, RDF htcncnuevadt loacp D ecitoso esn.Ti renders This tools this existing we persons. prefer all of unlikely. would format descriptions extend solution RDF personally to required accept this require also we the to would though vCard this consume to Even case can that graph use RDF. example use RDF our to in the approach, tool translating the of enable instead can 5.3.1. Section i.e. in aware, shown RDF as the chosen, into has approach RSS the proposed that XML our approach If an the codes using is done hard specified. This been just source. have be it would same therefore not this the and usually then using schema, would specified, tools is other serialization indeed with XML serialization interoperability these lack of the output we since only and that ontology, used, mean be would this can But tools serialization. required the return close is that their vocabulary of XML representation an the of instead with domain. result format the deal SPARQL it result to to SPARQL the but scripts the viable, and XSLT in is SPARQL create data approach understand then This to and translations. both format, XSLT developer the the for requires basis approach the Our as format parser. RDF one code. least proprietary requires at written, it newly of and any API case, but require way, the use not easiest each learn does the for to be solution programmer may proprietary This the a serialization. hard-coding the requires create it then and graph RDF of RDF-conformant shortcomings create not the does (about also here and DTDs. customizable identified flexibly issues not ontologies. same is (F-logic) but the in DTDs) about defined reports vocabulary also the on It based DTDs creates matically understand to developer the require feasible, all they proven issue: all original RDF. have the approaches resolve not These do SPARQL. but extending by but bilities, 7 TreeHugger, 8 rRDFXSLT. or 9 XSPARQL 10 rvdssmlrcapa- similar provides . M validation XML 5.3 97</p><p>5 Chapter 5 Syntax</p><p>We conclude that the major difference of our approach is in the zero-knowledge assumption in using it: no knowledge of Semantic Web technologies is required. Ex- pertise in wide-spread XML technologies is sufficient to start using Semantic Web knowledge bases as data sources.</p><p>5.3.7 Open questions for XML schema-based RDF validation We described and implemented an approach towards bridging the gap between classic XML-technologies and the novel Semantic Web technology stack. The current imple- mentation is already usable, but it exhibits a number of limitations. We list these limitations here in order to explicitly name open challenges.</p><p>• The given approach can be redone using other, more powerful and modern XML schema languages. These languages add further features, e.g. cardinalities.</p><p>• DTDs do not allow for context sensitive grammars. Therefore elements can only appear once in every DTD, which severely constraints their expressive power. For example, it is not possible to ensure that every person in a FOAF-file requires a name and mailbox and may have friends (which are persons), and at the same time define that friends should not have a mailbox. Using a context-sensitive XML schema language can remedy this.</p><p>• Even without moving to more powerful schema languages, the given constraints in Section 5.3.4 can be relaxed. Further analysis is required to understand this bigger language fragment.</p><p>• For now the prototype implementation ignores that a property that appears several times should have several different values, i.e. is basically a cardinality declaration. This could be expanded.</p><p>• A number of features have not been explored for this first implementation, that future work will take into account, e.g. datatypes, language tags, the collection and xmlliteral parsetypes, and blank nodes.</p><p>• As we have seen, DTDs have to use fixed namespaces and prefixes, whereas XML namespace-aware schema languages could deal with them more elegantly.</p><p>For now we provide the current implementation and a Web-accessible demonstration workflow to show the advantages of the described approach. We expect that the approach will be applied in a number of early use cases in order to gain more insight in the actual usage and to direct the future development of the given project.</p><p>98 tutrlmaue aeanme fadvantages: of number the a tree, than have a more measures indeed defines Structural is literature it Current If measured. etc. G´omez-P´erez,Tartir be circularities, class (Gangemi has can 2004; explicit metrics it tree the different if the forty if or of see trees, breadth to of and checked set depth be a can tree, a subgraph is This only hierarchy hierarchy). consisting one class the explicit be the name would the ontology. subgraph with the investigated edges describing extensively of graph an RDF partial of Graph or example complete measures. the An structural on are applied ontologies be on can used measures measures explored widely most The Structure 6 Chapter • • • rmasnl oet v a ecekdo ahcmi fteotlg oa to ontology the created. of be commit can each message appropriate on checked an violation, be Upon can example, five system. type control For to the version node of single constraints. edges outgoing a against of from automatically number maximal checked the be constraining can results their ontology understood, well the not is of is. itself evolution number often the the change of meaning its tracking the makes case in This even li- because easy, numbers. metrics task. yield Graph this simply for graph. they used ontology be the can and from available effectively are calculated braries be can they rdfs:subClassOf tal. et tal. et htmauetesrcueo h ontology. the of structure the measure that 2005) , Gangemi 2005; , n h oe once yteeegs(i.e. edges these by connected nodes the and tal. et ofso sntignew nothing is confusion circles in up Caught Lzn-el and Lozano-Tello 2006b; , apradHmn 1983)) Hyman, and (Lauper CnyLue,b.1953, Lauper, (Cindy rdfs:subClassOf ieatrTime after Time 99</p><p>6 Chapter 6 Structure</p><p>• they can be simply visualized and reported.</p><p>Due to these advantages and their simple implementation, most ontology toolkits provide ready access to a number of these metrics. Also ontology repositories often provide annotations and filtering options of the ontologies in the repositories using these metrics. But in practice, structural metrics are often not well-defined. That is, based on their definitions in literature, it is hard to implement them unambiguously. Also there is often confusion with regards to their meaning. Often they define a new graph structure (and do not use the existing RDF translation), but then fail to define a proper translation of the ontology features to the graph (e.g. how to translate that a property is transitive, how to translate domain and ranges, etc.). Section 6.1 examines four metrics taken from literature and provides exemplary analysis of the shortcomings of their definition, and also offers remedies for these definitional problems. Besides measures counting structural features of the ontology, the structure can also be investigated with regards to certain patterns. The best known example is to regard cycles within the taxonomic structure of the ontology as an error (G´omez-P´erez, 2004). But also more subtle patterns (or anti-patterns) and heuristics can be used to discover structural errors: Disjointness axioms between classes that are distant in the taxonomic structure (Lam, 2007), as well as certain usages of the universal quantifier (Vrandeˇci´c,2005). Section 6.2 applies the RDF query language SPARQL over the OWL graph in order to detect structural patterns and anti-patterns. Section 6.3 shows how the meta-ontology defined in Section 3.2 can be used to detect constraint validations within the ontology structure. As an example, we will formalize the OntoClean constraints in OWL and show how OWL reasoners can be used to discover OntoClean constraint validations (Guarino and Welty, 2002).</p><p>6.1 Structural metrics in practice</p><p>In this thesis we focus on the foundational aspects that form the base for automatically acquirable measures. Therefore we will not define a long list of metrics and measures, but rather take a step back and discuss conditions that measures have to adhere to in order to be regarded as semantically aware ontology metrics. This also helps to understand clearly what it means for a metric to remain on a a structural level. Thus the scope of this work compares best to other metric frameworks, such as the QOOD (quality oriented ontology description) framework (Gangemi et al., 2006b) and the O2 and oQual models (Gangemi et al., 2006a). The authors created semiotic models for ontology evaluation and validation, and thus describe how measures should be built in order to actually assess quality. They also describe the relation between the ontology description, the ontology graph, and the conceptualization that is expressed</p><p>100 xrsiecntut r sd ure oteotlg a fe easee much 2007). answered that Parsia, indicate and be Experiments (Wang often of suggests. can estimates bound ontology priori a upper a the theoretical in the to e.g., more than if queries are, efficiently But even more Furthermore, we used, OWL). that are of ontology. fragment know constructs tractable described given we a expressive language to a constructs, query corresponds MediaWiki of in 10 Semantic Chapter set the used in example, certain an being a (as actually fragment uses tractable only is ontology that the fragment if language the NExpTime- show be to known d’Aquin are to checks known satisfiability is as DL 1994). such OWL (Schaerf, tasks Complete example, For reasoning refine thus task. further and reasoning and logic can description bound the one the complexity of lower to ontology correspond possibly estimation the a better within at tasks. arrive a used reasoning thus thus constructs and the fragment, to the language applies of language used that the give used list complexity not the simple the of does on a expressivity that bound With The upper but an known, ontologies. defines are particular that merely languages of on OWL instances of information over the much measure reasoning of us a for complexity actually complexity The is the Complexity defines language. and ontology. language, given ontology the an of complexity reasoning the usability and functional, of structural, overview the graphical for A measures define dimension. they and graph, the within enn Llnugs(Baader languages DL defining define also 2 ). Chapter in given feature. function type (as filter language type a ontology expression defining an each first or of by appearance this do the We counting measures define We complexity) (Ontology 12 Method otOLotlge ontuetemr xrsiecntut (Wang constructs expressive more the use not do ontologies OWL Most as such features interesting of number a uncover can measures structural Simple hnw a ute en e hrct,drvdfo h epcieletters respective the from derived shortcuts, few a define further can we Then metric counting a define further can We • • T rprisbigdsrbda transitive as described being properties transitivities of ontology Number the in axioms subsumption of number the ubro subsumptions of Number tal. et ralaim aiga xrsino type of expression an having axioms all or N .Otlg dtr uha WO (Kalyanpur SWOOP as such editors Ontology 2007a). , ( O = ) eoigspeed resoning | O | . tal. et SHOIN a eprudbsdprl nsrcua features structural on purely based pursued be can O N 2 + N ,frexample: for 2003), , ( . 189 page on 11.1 Figure in given is v O ( = ) O ( = ) D N N T orcsadPtlShedr 2004), Patel-Schneider, and (Horrocks ) O TransitiveProperty T O : N → O T : SubClassOf . tutrlmtisi practice in metrics Structural 6.1 O → O eun l h xoso axiom of axioms the all returns T . N as ( with N O T = ) ( ( O O T = ) :tenme of number the ): | O en naxiom an being SubClassOf | O T tal. et tal. et ( O ) | 2006) , ( We . 2006; , O ) 101 | :</p><p>6 Chapter 6 Structure</p><p>• Number of nominals NO(O) = NOneOf(O): the number of axioms using a nominal expression</p><p>• Number of unions Nt(O) = NUnionOf(O): the number of axioms using a union class expression</p><p>• etc.</p><p>With these numbers we can use a look-up tool such as the description logics complexity navigator (Zolin, 2010). If NO > 0, then the nominals feature hast to be selected, if N+ > 0 we need to select role transitivity, etc. The navigator will then give us the complexity of the used language fragment (as far as known). We further define H(O): O → O as the function that returns only simple subsumptions in O, i.e. only those SubClassOf axioms that connect two simple class names.</p><p>The rest of this section discusses exemplary measures from the literature, shows how they are ambiguous, and offers actions to remedy these ambiguities (in favor of a purely structural application). In Chapter8 these measures will be revisited in order to redefine them with the help of normalization, introduced in Chapter7. Note that the following is criticizing only the actual description of the single measures, not the framework they are described in. The remedies are often simple, and after applying them the now remedied measures can be used with the original intention.</p><p>6.1.1 Maximum depth of the taxonomy OntoMetric (Lozano-Tello and G´omez-P´erez,2004) is a methodology for selecting an ontology, based on five dimensions which in turn are organized into selection factors. The content dimension is the one most related to the work presented in this thesis, organized in the four factors concepts, relations, taxonomy, and axioms. We will take a closer look at one of the metrics related to the factor taxonomy, the maximum depth. The maximum depth of the concept hierarchy is ”defined as the largest existing path following the inheritance relationships leading through the taxonomy”.1 This definition leads to a number of problems. First, cycles in the class hierarchy will lead to the result that the maximum depth is ∞, which may be a not too useful result. Furthermore, consider the following ontology:</p><p>1Translated from the original Spanish: ”La profundidad m´aximaen la jerarqu´ıade conceptos: definida como el mayor camino existente siguiendo las relaciones de herencia que puede alcanzar la tax- onom´ıa.” (Lozano-Tello, 2002, p. 72)</p><p>102 6.1 Structural metrics in practice</p><p>Figure 6.1: Example for a circular hierarchy path.</p><p>EquivalentClasses(A MinCardinality 1 R) EquivalentClasses(B MinCardinality 2 R)</p><p>Since the definition only considers explicit inheritance relationships, it will miss that B is a subclass of A, and thus it will report a maximum depth of only 0 (no explicit inheritance relationships) instead of 1. 6 In Section 7.1 we will introduce the technique of normalization, which will allow us to use the maximum depth metric as it is intuitively meant to be used (see Section 8.2). In order to define a purely structural maximum depth, it would be required to first rename the metric so that the meaning of the resulting number is clear. The appropriate name obviously depends on the actual definition of the metric, and one possible definition could be: the biggest number of subsumption axioms connecting consecutively simple class names without the same class name ever being repeated. An appropriate name for such a metric could be maximum subsumption path length. Compared to the original definition, this definition can deal with cycles in the sub- sumption graph. Some issues still remain. Consider the following ontology, given in Figure 6.1:</p><p>SubClassOf(B A) SubClassOf(B C) SubClassOf(C B) SubClassOf(D C) SubClassOf(D B)</p><p>The longest path is four (either ABCD or ACBD), even though we would expect the maximal depth of the class hierarchy to be 3 (since B and C are on the same hierarchy level). But this is an inherent flaw of structural metrics. Only by renaming the metric from a name that implies that it measures a semantic feature (maximum depth of the taxonomy) to a name that states explicitly that it measures a structural feature (maximum subsumption path length) we can break the original intuition and thus achieve a metric that does not deceive due to its name.</p><p>103 Chapter 6 Structure</p><p>6.1.2 Class / relation ratio In (Gangemi et al., 2005) the authors introduce 32 ”measures of the structural di- mension”. Note that these measures do not fully correspond to what we define to be structural measures within this thesis. E.g., (M24) (Consistency ratio) is defined as nCons with nCons being the number of consistent classes and nG the number of classes. nG In order to count the number of consistent classes we need a reasoner and thus this measures not a structural, but rather what we define a representational aspect of an ontology. Measure (M29) in (Gangemi et al., 2005) is called the ”Class / relation ratio”, suggesting that it returns the ratio between classes and relations (or properties). The nG∈S exact definition of the measure is: ” where nG∈S is the cardinality of the set of nR∈S classes represent[ed] by nodes in g, and nR∈S is the cardinality of the set of relations represented by arcs in g” (Gangemi et al., 2005). But applying the definition yields the ratio between the number of nodes represent- ing classes and the number of nodes representing relations within the ontology graph, which will be a different number since a number of nodes, and thus names, can all denote the same class or relation. Therefore the metric is improperly named, since it does not yield the ratio between classes and relations, but rather between class names and relation names. In Section 8.3 we will return to this metric and redefine it properly to capture the intuition behind the name. For now, in order to achieve a structural metric, it is again required to rename the metric, e.g. to ”class name / property name ratio”. It is unclear if this metric is useful, but since it is far easier obtained than the actual ”class / relation ratio” metric and will often correlate to it (compare Chapter8), we expect it to remain displayed by ontology tools.</p><p>6.1.3 Relationship richness OntoQA is an analysis method that includes a number of metrics (Tartir et al., 2005), and thus it allows for the automatic measurement of ontologies. They define metrics such as richness, population, or cohesion. Whereas all these metrics are interesting, they fail to define if they are structurally or semantically defined – which is a common lapse. As an example we choose the metric (RR) ”relationship richness”. (RR) is defined |P | as RR = |SC|+|P | , ”as the ratio of the number of relationships (P) defined in the schema, divided by the sum of the number of subclasses (SC) (which is the same as the number of inheritance relationships) plus the number of relationships” (Tartir et al., 2005). In our terminology |P | is the number of property names, and |SC| the number of subsumptions with the subclass being a class name. According to the authors, this metric ”reflects the diversity of relations and place-</p><p>104 t hsmaueaddsushwtemauecnb repaired. be can measure the how discuss and measure this to 8.4 Section in axioms. return subsumption We the of any some get explicating basically to simply and by axiom axioms want, this subsumption we add explicated such we our and if axiom, axioms But range the since redundant. 0, of is to instead it ontology i.e. original ontology, our the of axiom equivalence remain proton:Object) constantly will SubClassOf(proton:Agent it all, axiom: at following the richness consider relationship Now the 1. increase not will properties Proton: proton:Agent) from PropertyRange(proton:isOwnedBy axiom following the add we now But undefined. EquivalentClasses(proton:Object DisjointUnion(proton:Entity subsumptions): simple indeed uses ontology Proton al. et claim. this evaluate to order relationships” in class-subclass only with onomy ontology” the that in relations of ment o ecnesl mgn digcasst h nooy l endb ls equiv- class by defined all ontology, both the – to unions classes disjoint adding or imagine alences easily can we Now introduced, been have erties hrfr esaethat state we Therefore class the in indirectly stated relation subsumptions the of one explicates axiom This RR (Terziev Proton from is hierarchy class (the ontology level top following the Consider ic ehv o sdayepii usmto axioms, subsumption explicit any used not have we Since cnan ayrltosohrta ls-ucasrltosi ihrta tax- a than richer is relations class-subclass than other relations many ”contains ,btteaim r rte ieetyt lutaeteagmn.The argument. the illustrate to differently written are axioms the but 2005), , ilsdel become suddenly will | SC | oleb and 1 be woulde proton:Object) proton:Happening proton:Abstract UnionOf(proton:Agent RR | P 0+1 sntaueu esr ocpuerltosi richness. relationship capture to measure useful a not is 1 | proton:Statement)) proton:Service proton:Product proton:Location s0a el edn to leading well, as 0 is | R | ,ie src spsil.Ee digfurther adding Even possible. as rich as i.e. 1, = SC | n sbsdo h supinta nontology an that assumption the on based is and ean0 n o ecnadamxo property of mix a add can we now And 0. remain | and | R ewl oka ubro examples of number a at look will We . | ilawy e0 and 0, be always will . tutrlmtisi practice in metrics Structural 6.1 RR RR = au ol aedropped have would value | SC 0+0 0 | hc sundefined. is which , s0 ic oprop- no Since 0. is RR RR ilremain will au that value 105</p><p>6 Chapter 6 Structure</p><p>Figure 6.2: Two class hierarchies with identical semantics. Semantic similarity mea- sure ssm of C1 and C3 is lower in the left ontology than in the right one.</p><p>6.1.4 Semantic similarity measure</p><p>Another example of a structural metric is given by (Alani and Brewster, 2006), where the authors describe metrics for ranking ontologies, such as the class match or the density measure. Interestingly even the so called semantic similarity measure ssm is not a semantic measure in the sense described here, since they apply all these measures on the graph that describes the ontology, not on the ontological model. ssm is defined between two classes and is the reciprocal of the length of the shortest path in the ontology graph from one class to the other (Alani and Brewster, 2006).</p><p>In Figure 6.2 we see two class hierarchies that represent the same semantics. The structure in the right hand side ontology contains a number of redundant, explicit subsumptions, i.e. the ontology graph contains a few redundant rdfs:subClassOf- arcs. Since the semantics have not changed, one would expect that the semantic similarity measure would remain constant in both ontologies, which is not the case.</p><p>In fact, one could always connect any class C with an explicit subsumption to the top class without changing the semantics, and thus have any two classes be connected in two links via the top class. Note that when introducing ssm,(Alani and Brewster, 2006) notes already that further studies are required in order to find whether the assumption that ssm describes the semantic similarity depends on certain properties of the ontology. As shown here, it does: in Section 8.5 we will describe these properties.</p><p>106 .A xml o h appli- the for example of An partition the ). 2005 is (Rector, pattern notes this practice of best cation Web Semantic the in owl:Nothing) B2) SubClassOf(IntersectionOf(B1 A) SubClassOf(B2 A) SubClassOf(B1 the examples. subclasses in disjoint example of for block building as quality a patterns, as a used complex often also more are (G´omez-P´erezfor Partitions by pattern. engineering defined ontology as known pattern, RDF partition to The OWL 6.2.5 . Section from in patterns anti-patterns Translating of notion 6.2.1 the good on remedy surprisingly results to with the na¨ıve approaches approach apply most several then the We discuss test we results. empirically 6.2.2 and Section problems, In solutions these way. and different problems numerous patterns. the in complex of more many engineer- to keep ontology expect apply to known We also order best to the In pattern. here of investigated partition them. one the on of patterns, concentrate one will ing applying we when simple these results examples resolving towards the empirical approaches report three discuss and We problems ontologies. OWL engines in patterns query tect SPARQL available approach an readily introduce using 2008). We Seaborne, ontologies them. easily and in can of (Prud’hommeaux and unaware patterns patterns patterns are Otherwise detecting recognize that them. for to tools with by need working compromised tools for become engineering support ontology user support, appropriate offer best the an of offer case in to the appear in can especially pattern, they specific i.e. a emerge, use ontologies. to also networked intention can prior patterns the without But ontology entities. Svatek, 2005; of (Gangemi, configurations patterns predefined of set Staab a 2004; using ontologies build to (Alexander patterns engineering of fields mature Gamma all in elements crucial are Patterns patterns finding for SPARQL 6.2 sn h aegahclsna as syntax graphical same the using 6.3, Figure in displayed is pattern partition The graphs RDF to translated be can patterns DL how describes section following The de- to SPARQL using when arise that problems and issues several investigate We order In them. recognize to is patterns with working when tasks major the of One tal. et .Teptenpriin class a partitions pattern The 2005). (Rector, pattern tal. et .I nooyegneig eerhhsfcsdo nooydesign ontology on focused has research engineering, ontology In 1995). , .I hscs,pten r sdi re ofraiecommon formalize to order in used are patterns case, this In 2001). , B 1 B . . . n o ipiiy eassume we simplicity, For . animal into . PRLfrfidn patterns finding for SPARQL 6.2 ausa ucasspartitioning subclasses as values tal. et biped ,i ab h best the maybe is 2003), , and n ntefollowing the in 2 = quadruped A noanumber a into tal. et ti also (this 1977; , 107</p><p>6 Chapter 6 Structure</p><p>Figure 6.3: The partition pattern: class A is partitioned into the subclasses B1 ...Bn</p><p> exemplifies the difference to a complete partition, as defined by (G´omez-P´erez et al., 2003), that would make the far stronger claim that every individual of the class A needs to be an individual of exactly one of the subclasses B1 ...Bn by adding the ax- iom SubClassOf(A UnionOf(B1 B2 ... Bn)). The DisjointUnion axiom type was introduced in OWL2 and can state a complete partition directly (see Section 2.2.2). This is not true for the simple partition, that is taken as an example in this section. Since SPARQL is a query language for RDF we need to consider how this pattern is represented in RDF. This means that the pattern needs to be translated from OWL to RDF (as shown in Table 2.1 and 2.2). Unfortunately, this does not necessarily lead to a unique representation in RDF. In order to make the relation to SPARQL more obvious, we use the Notation3 syn- tax for RDF (Berners-Lee, 2006) instead of the more verbose standard RDF/XML- serialization (Klyne and Carroll, 2004).</p><p>B1 rdfs:subClassOf A . B2 rdfs:subClassOf A . _:a rdf:type owl:Class . _:a owl:intersectionOf _:b . _:b rdf:first B1 . _:b rdf:rest _:c . _:c rdf:first B2 . _:c rdf:rest rdf:nil . _:a rdfs:subClassOf owl:Nothing .</p><p>108 he atrs rpten iia otee vntog aymr r conceivable, likely. are less more thus many and though complicated Even more these. be to would similar they patterns or patterns, three . B1 owl:complementOf . _:d owl:Class rdf:type . _:d rdf:nil rdf:rest . _:c _:d rdf:first . _:c _:c rdf:rest . _:b A rdf:first . _:b _:b owl:intersectionOf . _:a owl:Class rdf:type . _:a _:a rdfs:subClassOf . B2 A rdfs:subClassOf B1 complementOf(B1))) intersectionOf(A SubClassOf(B2 A) SubClassOf(B1 pattern: partition the of version equivalent tically of pairs possible all for stated be to . B2 owl:disjointWith . B1 A this: rdfs:subClassOf . like B2 look A could variant rdfs:subClassOf second B1 the graph RDF an As numerous variant. first add the to in need not do we ijitlse(1B2) DisjointClasses(B1 A) SubClassOf(B2 A) SubClassOf(B1 directly: classes disjoint for constructor the using easm httepriinptenwl euulyrpeetdb n fthese of one by represented usually be will pattern partition the that assume We graph: RDF following the to translated be can turn in This seman- following the by given be would partition of instantiation possible Another the that Note hsrpeetto a h datg fbigesl xesbei case in extensible easily being of advantage the has representation This by is OWL in pattern the describe to possibility equivalent semantically Another, DisjointClasses SubClassOf Aimlast tleast at to leads -Axiom B ,adtu ilb estrefor terse less be will thus and s, Aim ttn noeetuin like unions incoherent stating -Axioms . PRLfrfidn patterns finding for SPARQL 6.2 n 2 2 − n rpe snei needs it (since triples > n > n 2). ,since 2, 109</p><p>6 Chapter 6 Structure</p><p>6.2.2 Querying with SPARQL The RDF graphs can be translated straight-forward to the following SPARQL query. select distinct ?A ?B1 ?B2 where { ?B1 rdfs:subClassOf ?A . ?B2 rdfs:subClassOf ?A . ?B1 owl:disjointWith ?B2 . }</p><p>With an OWL DL aware SPARQL implementation this query should return the same result on all the variants above. But none of the implementations we are aware of gives this result (this result is unsurprising, since SPARQL is not a query language that fully addresses querying OWL DL ontologies, and the relation between the two is underspecified in the SPARQL standard (Prud’hommeaux and Seaborne, 2008). Most often SPARQL engines allow to query for ABox results, i.e. so-called conjunctive queries, but forbid TBox queries such as the one given above). Implementations that are not aware of OWL DL can not be used to discover all instances of the pattern with the given SPARQL queries. In this case all possible patterns would be needed to be queried to gain a complete result, which is not practical. In order to be able to search for knowledge engineering patterns using SPARQL, we thus can follow three possible approaches:</p><p>1. Implementing and using an OWL DL aware SPARQL engine</p><p>2. Materializing or normalizing the relevant parts of the ontology and then use an OWL DL unaware SPARQL engine</p><p>3. Committing to an incomplete search by searching only specific variants of the pattern’s implementations</p><p>We consider the first option to be out of scope for this thesis, and thus we will discuss the other two options in the following two sections.</p><p>6.2.3 Ontology normalization In Section 7.1 we define ontology normalization as an approach to make features of the semantics of an ontology explicit within the ontology’s structure. By using an OWL DL unaware SPARQL engine, SPARQL is useful only for querying the structure of the ontology, i.e. its explicit RDF graph. In order to use the query successfully with a SPARQL engine that is not aware of the OWL DL semantics, we first have to use an OWL DL reasoner to materialize all relevant parts of the ontology. In the case of the partition, we would need to materialize</p><p>110 etotlge n o en ob sd 3 ssdsonns opriinthe partition to disjointness uses SPARQL the C37 by obviously unrecognized are is used. that L66 partition and be a B98, contains to B90, H71 E71, Finally, meant so. domain. should not do whole ontologies anti- L66). and to these an and that fail implement ontologies there but that L22, test argue ontology pattern, will H71, time partition We the the G13, 6.2.5. of instantiate Section versions E71, in be discussed C37, to as out pattern B98, turned (B90, L22 and disjointness G13, used still as but such was partitions, fact complete This and partitions applied. present to when also disjointUnion time construct is the powerful pattern of more (Horrocks partition most some language that ontology the DAML+OIL means the used, This by is recognized (85.5%). axiom ontologies 55 disjointness these the corpora. of ontology 47 the in filtering pattern and collecting towards approaches different the the to using due were (Wang ontologies the Web: surveyed using the (d’Aquin the on sur- and of ontologies previous OWL 7.6% construct, with in that consistent axioms reported roughly expressive is of 2006) usage This the (4.1%). on all query veys at the axioms complete disjointness how using understand further- to We order 11.3). Section in of (see tests partition are. requirement corpus SPARQL other results our the the against several drop took it performed We ran we more and if pattern. 6.2.2 Section knowledge happens from a what query of empirically discovery explore the we guaranteeing section this In searches Incomplete be 6.2.4 would steps out. normalization left which be anew could consider which to and have required, would we patterns, valid other are pattern tition class materialized. a of Subclasses class a emerge. would problem same the i re oaodti problem. three this first avoid the to use order to in complete need 7.1 we Section the Therefore in is materialize introduced siblings anymore. we steps siblings not normalization If direct are recognize that not partition. classes could between a disjointness considered Stating not materialized. be should the ehv ae oka h nooista i o hwtepriinpattern partition the show not did that ontologies 8 the at look a taken have We partition the discovered we patterns, partition detect to query SPARQL the Using where 55 only 11.3), Section (see corpus OWL Watson the of ontologies 1331 the Of par- the recognizing for required be would that steps normalization the that Note utemr,the Furthermore, subClassOf B ALC ol l edson to disjoint be all would agaefamn rsrne.Tedffrne a eesl explained easily be can differences The stronger. or fragment language and - wihhsbe nrdcdi W2(Grau OWL2 in introduced been has (which disjointWith disjointWith only tal. et o h atto atr.I ewne orecognize to wanted we If pattern. partition the for eotd45 fOLD nooiswere ontologies DL OWL of 4.5% reported 2007a) , B gahsol o eflymtraie ihr since either, materialized fully be not should -graph cntut.Btol direct only But -constructs. swl.Sc neie ijitessol o be not should disjointness inherited Such well. as . PRLfrfidn patterns finding for SPARQL 6.2 A htwsdcae ijitto disjoint declared was that tal. et tal. et subClassOf subClassOf htincluded that 2001) , again). 2008) , disjointWith gah we -graph, -relations tal. et 111 - ,</p><p>6 Chapter 6 Structure pattern, because the partitioned class is not subclassed directly but rather through an IntersectionOf-construct. In order to see if any of the other two variants (or even similar implementations) given in Section 6.2.1 were used, we issued further queries. In order to detect variant 1 and similar patterns, we used the essential part of the variant 1 pattern: select ?a where { ?a rdfs:subClassOf owl:Nothing . }</p><p>This query did not return any results. Whereas in DL subclassing ⊥, i.e. the empty class, is often used to state constraints, in OWL this possibility goes practically unused. Since OWL includes numerous powerful constructs to state such constraints directly, the rather counterintuitive usage of the empty class is usually avoided. Based on this result we can conclude that variant 1 and any similar patterns were not used to create partitions. Based on these empirical results we may claim that any appearance of owl:Nothing in an OWL ontology is an anti-pattern (see Section 6.2.5) and thus indicates prob- lems with the formalization. Furthermore, also any appearance of owl:disjointWith outside of a partition is a strong indicator for either a problem in the ontology, or an incomplete formalization. Further anti-patterns should be gathered from the Web. In order to detect instantiations of variant 3, we search for any subclasses that are used to define a complement: select ?a ?b where { ?c owl:complementOf ?b . ?b rdfs:subClassOf ?a . }</p><p>This query returned 12 ontologies (A23, B14, C35, C53, F96, I27, J57, K13, L03, L77, M31, and N11). We examined them manually and found that in all but two cases the complement was used to state a restriction on the values of a property (e.g. Vegetarian ≡ ∀eats.¬Meat). In the remaining two cases, the complement appeared once in a complex class description2 and once to partition the whole domain.3 Based on these empirical results we see that the query indeed has detected all but three (C37, H71 and N11) instances of the partition pattern (thus yielding a recall of 94%). So even though we have discussed numerous problems and drawbacks of</p><p>2F96: ViewableFile ≡ File u ¬(MediaFile u ¬ImageFile) 3N11: IntangibleEntity ≡ ¬TangibleEntity</p><p>112 ttesm ie u uhdgnrt nevl r qiaetto equivalent are intervals degenerate such But time. same the at time. an as defined is TemporalEntity auliseto hyaltre u ob ieetvrin ftetm ontology ontology. time the the of of part of kind relevant versions two the different illustrates 6.4 be Figure to 2004). out Pan, turned and all (Hobbs they inspection manual } { where ?C1 ?B2 ?B1 ?A such detects distinct query select SPARQL following The 2007 ). (Lam, anti-pattern. superclass an its an in of classes sibling rather the but disjoint, be to declared one.” isn’t a but of pattern solution, a The like superficially pattern, in He looks ontology. a thats to patterns ). 1995 an similar something (Koenig, detecting in be by gives as to engineering problems software anti-pattern important for to an introduced as defines indicators was least anti-patterns strong of at are notion is Anti-patterns anti-patterns called ontologies. so detect To anti-patterns for Querying will 6.2.5 experiments art the large-scale of for though state possibilities assume the limited. Therefore We behind rather well be research. patterns. lagging pattern is complicated ontology practice more in engineering other, ontology for current confirmed that be will results very are results experimental patterns, ontology detecting promising. for queries SPARQL using 4 h icse eatc ftetm nooyaentflycpue nteOLvrino h time the of version OWL the in captured fully not are ontology time the of semantics discussed The Interval ntetm nooy a ontology, Upon time N25). the and In L22, G13, F54, (E71, ontologies five in detected was pattern The anti- similar the introduce we pattern partition the of example our to close Staying positive these if see to order in patterns further investigate to needs work Future C w:ijitih?2. ?B2 owl:disjointWith . ?C1 ?B1 rdfs:subClassOf . ?C1 ?A rdfs:subClassOf . ?B2 ?A rdfs:subClassOf ?B1 ob n Pan, and (Hobbs <a href="/tags/Description_logic/" rel="tag">description logic</a> order first their in described as 2004) formalized are but ontology, 4 o hti h enn fteclass the of meaning the is what Now kwdpartition skewed TemporalEntity a nrdcdt atr lodgnrt nevl htsatadend and start that intervals degenerate also capture to introduced was Interval htsat n nsa h aetm,weesa whereas time, same the at ends and starts that TemporalEntity htde aeasattm hti ieetfo h end the from different is that time start a have does that nasee attni snttesbigcassta are that classes sibling the not is it partiton skewed a In . : Instant and a tr n nedtm.Teeare There time. end an and start a has Interval Interval uncle ecp htisedo ouinit solution a of instead that ”except . PRLfrfidn patterns finding for SPARQL 6.2 rlto,ie ls sdson to disjoint is class a i.e. -relation, ? An . Instant Instant ProperInterval sdfie sa as defined is n thus and , 113</p><p>6 Chapter 6 Structure</p><p>Figure 6.4: The upper levels of the time ontology. Note that ProperInterval and Instant are declared disjoint even though they are not sibling classes.</p><p>Interval is equivalent to TemporalEntity. The upper level of the time ontology could thus be simplified by removing the Interval class and creating a partition of TemporalEntity into ProperInterval and Instant (actually, a complete partition). Cleaning up the ontology in such a way increases the understandability, and avoids confusion (users would rightly assume a difference between interval and temporal en- titiy, why else would there be two distinct classes?). Changing to a complete partition as discussed above, and removing a class makes the ontology smaller, and thus easier to understand, and brings the invoked conceptual model when studying the ontology closer to its formal semantics.</p><p>Method 13 (Searching for Anti-Patterns) SPARQL queries over the ontology graph can be used to discover potentially prob- lematic patterns. For example results to the following queries have been found to be almost always problems. Detecting the anti-pattern of subsuming nothing:</p><p> select ?a where { ?a rdfs:subClassOf owl:Nothing . }</p><p>Detecting the anti-pattern of skewed partitions:</p><p>114 GaioadWly 2004 ; Welty, and Guarino 2002; Welty, and (Guarino docu- publications well numerous is in OntoClean mented potential Although hierarchy. are indicating subsumption classes violations the tagged constraint in misconceptualizations the with Second, constraints, predefined meta-properties. against OntoClean checked turn the to in regards which with meaning, a tagged specified of technique. more nature evaluation a objective the proper to more of itself the a part commit deciding for not to allows because is ontology engineering this the ontology speaking, forces strictly class the however, of discussions, may but long OntoClean evaluation of of notions might subjects philosophical subsequently The On- the which structure. concept. be occur taxonomic another mismatches the of improve why subconcept to of able the help is explanation be not one an should e.g. provides concept the nature, toClean certain account structural a into a takes that have which derive mismatches structure to Such ideal an subsumption. to of By respect semantics with type. taxonomy a essential will of consistently. an ontology matches used as the be rather them, to or meant answering easier is individual, engineer and class an ontology specific this to more the if applies be understand and example, that to questions For role helps way. these not a formal raising or as and rigid used concise is be class a to certain in a means can that ontology engineer stating the ontology by the what meta-properties, of OntoClean more the capture with classes notions the philosophical tagging well-known the By formal on a based the hierarchy is enables subsumption rigidity 2002) OntoClean their Welty, and precisely, classes More and of analysis evaluation. Guarino ontology 2000; for Welty, methodology and (Guarino OntoClean approach AEON The 6.3 system. we build automatically, ontology checked an be in can easily library it a include such can Since investigations. further warrant } { where ?C1 ?B2 ?B1 ?A distinct select h plcto fOtCencnit ftomi tp.Frt l lse are classes all First, steps. main two of consists OntoClean of application The mis- measurable derive to means provides OntoClean perspective practical a From that ontologies in areas flag to help would anti-patterns such of library bigger A C w:ijitih?2. ?B2 owl:disjointWith . ?C1 ?B1 rdfs:subClassOf . ?C1 ?A rdfs:subClassOf . ?B2 ?A rdfs:subClassOf ?B1 , unity , dependency and identity kona h nola meta-properties). OntoClean the as (known . h ENapproach AEON The 6.3 115</p><p>6 Chapter 6 Structure</p><p>Guarino and Welty, 2000), and its importance is widely acknowledged, it is still used rather infrequently due to the high costs for application. Several tools supporting the OntoClean methodology have been developed and integrated into ontology editors such as ODEClean for WebODE (Fern´andez-L´opez and G´omez-P´erez,2002), OntoEdit (Sure et al., 2003) or Prot´eg´e(Grosso et al., 1999). In order to leverage the adoption of OntoClean, we have developed AEON, an approach to automatize both steps of OntoClean. By means of AEON, we can auto- matically tag any given ontology with respect to the OntoClean meta-properties and perform the constraint checking. For creating the taggings, our implementation of AEON5 makes extensive use of the World Wide Web as the currently biggest existing source of common sense knowledge. In line with several approaches such as (Cimiano et al., 2005) and (Etzioni et al., 2004) we defined a set of domain independent patterns which can be considered as indicators for or against Rigidity, Unity, Dependence and Identity of given concepts in an ontology. To evaluate our automatic tagging approach we created a gold standard, i.e. we cre- ated a manually tagged middle-sized real-world ontology, and compared AEON results against it. A number of OntoClean experts as well as ontology engineering experts were involved in the creation of the more than 2,000 taggings in the gold standard. Each expert had to tag the PROTON ontology (Terziev et al., 2005) with OntoClean meta-properties. Even though from a philosophical perspective one may argue that there can be only one OntoClean tagging for a given ontology our experiments had the interesting and important finding, that the experts agreed only to a certain extend on how to tag each individual concept. This shows again the difficulty of applying OntoClean in real-world settings. We see it as an advantage of our approach that it is based on the text corpus of the whole Web, instead of being defined by a small group or a single person. As key result of our evaluation our approach compares favorably with respect to the quality of the automatic taggings while reducing significantly the time needed to do the tagging. In order to check the OntoClean constraints automatically, we decided to reuse an existing OWL DL formalization of the OntoClean meta-properties and constraints (OntoClean ontology). We used the meta-ontology given in Section 3.2 to represent the tagged ontology and were then able to automatically check the tagged ontology according to the OntoClean ontology. We expected two types of errors when analyz- ing the inconsistencies. First, the tagging of a concept is incorrect, and second, the corresponding taxonomic relationship is incorrect. We found both kinds of errors in our experimental data and looked at some of the errors in more detail to understand the rationale behind. In the next section, we briefly introduce the idea behind OntoClean. Then we describe the four meta-properties and the most important OntoClean constraints. The</p><p>5http://ontoware.org/projects/aeon/</p><p>116 es,weesw ilrfanfo sn h term the using term from term the refrain the will use we will whereas we vocabulary,sense, section OntoClean this the refer Within In description erties. thorough 2000). call Welty, more we a and what are for (Guarino OntoClean, to to example introduction for brief a provide We theory in the OntoClean for the 6.3.1 examples of ontology. some tagged version meta- present the DL the in also OWL on found We existing we based ontology. inconsistencies an constraints tagged of reused kind of the we for checking reified purpose support and the this AEON’s constraints For i.e. describe OntoClean, we taggings. 6.3.3, of property Section step described In second is the approach. V¨olker and tagging Johanna automatic by the developed mostly (V¨olker been in has tagging automatic ear(na nola es)teotlg,i re ormv l icvrdmiscon- discovered all remove to may order engineer in ontology ontology, an the ceptualizations. list sense) OntoClean this an to (in According repair misconceptualizations. of list empty) con- and meta-properties ideal of combination predefined OntoClean the a by versus applying ontology defined straints. words, tagged is a which other of structure In part taxonomic intensional taxonomical way, classes the relationships. the This comparing subsumption analyzing means formally their classes. by and tagged approach content unique of a combinations provides specific OntoClean between subsumptions allowing pligteOtCenmtoooycnit ftomi steps. main two of consists methodology OntoClean the Applying fe efrigtetosestersl satge nooyada(potentially a and ontology tagged a is result the steps two the performing After dis- by relations taxonomic possible the constrain to is OntoClean of idea key The • • hs every Thus, 6.3.2. Section as in such tagging described certain are a occurrences which with has tagged class meta-properties, is evaluated core be the to ontology of the of class single every First, ocp carries concept a facntan niae oeta icnetaiaini h subsumption violation the Any in misconceptualization 6.3.3). potential Section a in indicates hierarchy. accord- (described checked constraint are constraints a ontology OntoClean of the the of subsumptions to all ing tagging, the after OntoClean, Second, to regards (with ontology tagged precise). a be classes to tagged with ontology an class tal. et 2 . Chapter in introduced as classes nfl eal rvdn loteeprmna aiainof validation experimental the also providing detail, full in 2008) , Rigidity nti thesis. this in and +U Meta-properties eoe httecnetcarries concept the that denotes +R+U-D+I meta-property hr o example for where , property r hrfr rpriso prop- of properties therefore are u ahrcnitnl use consistently rather but . h ENapproach AEON The 6.3 nteuulOntoClean usual the in +R Unity eoe that denotes properties ecall We . 117</p><p>6 Chapter 6 Structure</p><p>6.3.2 OntoClean meta-properties As already indicated, the main ingredients of OntoClean are four meta-properties and a number of rules. The four meta-properties are: rigidity (R), unity (U), dependence (D) and identity (I). They base on philosophical notions as developed by (Strawson, 1976) and others, even dating back to Aristotle (Aristotle, 330 BC). Here we will offer a short description of these meta-properties.</p><p>• Rigidity. Rigidity is based on the notion of essence. A class is essential for an instance iff it is necessarily an instance of this class, in all worlds and at all times. Iff a class is essential to all of its instances, the class is called rigid and is tagged with +R. Iff it is not essential to some instances, it is called non- rigid, tagged with -R. An anti-rigid class is one that is not essential to any of its instances. It is tagged ∼R. An example of an anti-rigid class would be teacher, as no teacher has always been, nor is necessarily, a teacher, whereas human is a rigid class because all humans are necessarily humans and neither became nor can stop being a human at some time.</p><p>• Unity. Unity tells us what is part of the object, what is not, and under what conditions the object is whole (Guarino and Welty, 2004). This answer is given by an unity criterion (UC), which describes the conditions that must hold among the parts of a certain entity to consider that entity as a whole. For example, there is an unity criterion for the parts of a human body, as we can say for every human body which parts belong to it. Classes carrying an UC have Unity and are tagged +U else -U.</p><p>• Dependence. A class C1 is dependent on a class C2 (and thus tagged +D), iff for every instance of C1 an instance of C2 must exist. An example for a dependent class would be food, as instances of food can only exist if there is something for which these instances are food. This does not mean that an entity being food ceases to exist the moment all animals die out that regarded it as food, it just stops being food. Another way to regard dependency is to distinguish between intrinsic and extrinsic class. Intrinsic classes are independent, whereas extrinsic classes need to be given to an instance by circumstances or definitions.</p><p>• Identity. A class with identity is one, where all instances can be identified as themselves, by virtue of this class or a superclass. This means that the class carries an identity criterion (IC). It is tagged with +I, and with -I otherwise. It is not important to answer the question of what this IC is (this may be hard to answer), it is sufficient to know that the class carries an IC. For example, the class human carries an IC, as we are able to identify someone as being the same or not, even though we may not be able to say what IC we actually used for</p><p>118 aiae uoaial ya neec nie ewl os ntefloigsection. following the in so do will 2004). We Welty, and engine. (Guarino inference some an to give by refer will automatically list (Sure validated we existing full in Here, a the For shown use consistency. We rules. As for ontology. these ontology for tagged tagged example the a illustrative on check applied to is rules rules OntoClean OntoClean of number A constraints OntoClean tag 6.3.3 the about constraints subsumption no are there and and inheritance) anyway tagging the through as (also its thesis, Criterion carries this Identity it an that second carries the simply concept the that nola ieetae ewe h w tags two the between differentiates OntoClean • • • h aooyo ntetgig nti xml eseta h ult fthe tagging. of applied quality the in the of error that quality an see the either we as example to good this as us In only points is This tagging. analysis the taxonomical broken. in is or taxonomy rule the this shows analysis formal class the color. their of because as apart such red class of a instances hand, tell other the On that. hstecaswudb needn.W let We independent. be would class the thus tagging). (or taxonomy our instances. in the error +D an of revealing identification contradiction, the the a of for instances be also allows would are This explicitly they although which – class, identified be subsuming not can class subsumed the ∼ ea pl n hsfo hc ol eacnrdcint h statement the to contradiction a be +I would always food. which would essentially – is apple is apple food every food an thus no since hand, that and apples, other apple namely the an food, On be be eaten. by essentially subsumed be would were apples not that if may But or apple. may an always it – – instances its food of essentially any for true contradiction. necessarily apple a not is be it would that which mean would this because tagging: the of necessarily meaning the is (this instances its of all C for true hold always must tagged l fisisacs hrfr hr r tlatsm ntne of instances some least at are there Therefore instances. its of all 2 R sargdconcept). rigid a is a’ subsume can’t a’ subsume can’t a’ subsume can’t ,ntigis nothing 2004), Welty, and (Guarino in explained is it As class. rigid a , ∼ R candy C and 1 tal. et ste are they as edcd hteeyhn ihmr hn2%sgri candy, is sugar 20% than more with everything that decide we , C 2 uhrlscnb omlzda oia xosand axioms logical as formalized be can rules such 2003) , tagged +O -I +R -D own . a utb rae ietetagging the like treated be just may C fti uewsboe,i ol enta ntne of instances that mean would it broken, was rule this If 2 . C . sasbue ocp,wudawy imply always would concept, subsumed a as , food 2 aigaclass a Having +R dniyCiein h ieec sntrlvn for relevant not is difference The Criterion. Identity Thus . ol edt h olwn inconsistency: following the to lead would , sa xml o eedn ls.Modeling class. dependent a for example an is Example: C 1 a o eat-ii,a h agn says, tagging the as anti-rigid, be not can red C food 1 I ol etagged be would usmn h class the subsuming and food nat-ii ls,subsuming class, anti-rigid an , food O hrb h rtmeans, first the whereby , hr ol esm food some be would there , . h ENapproach AEON The 6.3 subsume O +I . as , -I candy sw cannot we as , +O C C 2 implies 1 n the and , with , htare that C 1 119 for C C +I 1 2</p><p>6 Chapter 6 Structure</p><p>6.3.4 Constraint checking Equipped with the OntoClean taggings we are able to check the hierarchical part of the ontology with regards to the meta-property constraints defined by OntoClean. In order to check these constraints automatically, we use the meta-ontology described in Section 3.2 and extend it in order to provide a formalization of the constraints in OWL in order to check the reified ontology.</p><p>Listing 6.1: OntoClean constraints meta-ontology in OWL.</p><p>§ (1) TransitiveProperty(meta:subClassOf) ¤ (2) DisjointUnion(meta:Class oc:RigidClass oc:NonRigidClass) (3) DisjointUnion(meta:Class oc:UnityClass oc:NonUnityClass) (4) DisjointUnion(meta:Class oc:DependentClass oc:NonDependentClass) (5) DisjointUnion(meta:Class oc:SortalClass oc:NonSortalClass) (6) SubClassOf(oc:AntiRigidClass oc:NonRigidClass) (7) SubClassOf(oc:AntiUnityClass oc:NonUnityClass) (8) EquivalentClasses(oc:RigidClass AllValuesFrom(meta:subClassOf ComplementOf(oc:AntiRigidClass))) (9) SubClassof(oc:UnityClass AllValuesFrom(meta:subClassOf ComplementOf(oc:AntiUnityClass))) (10) SubClassOf(oc:DependentClass AllValuesFrom(inverseOf(meta:subClassOf) DependentClass)) (11) SubClassOf(oc:SortalClass AllValuesFrom(inverseOf(meta:subClassOf) SortalClass))</p><p>¦ ¥ The formalization given in Listing 6.1 is based on the OntoClean formalization in OWL DL as described in (Welty, 2006). The ontology builds on top of the meta- ontology introduced in Section 3.2, and it is updated to use features from OWL 2.6 Axiom (1) adds the transitivity of the meta:subClassOf property, so that indirect sub- sumptions are checked as well. Axioms (2)-(7) describe the tagging hierarchy and the</p><p>6The original ontology can be found at http://www.ontoclean.org/ontoclean-dl-v1.owl</p><p>120 .Nwfrtesk fteeapeltsasm that assume let’s example the of formalization: sake following the the for in results Now 6.3.3 ). Section apple in described subontologies. (as inconsistent minimal of experiments different sets Therefore, different in deterministic. than not result more is can fixes it potentially finally, it And (ii) inconsistency. step one in because number subontologies, inconsistent total minimal the of algo- of this approximation Obviously, good again ontology. a consistent. over the us is in all gives inconsistencies ontology starts it of algorithm whole but the the non-deterministic, incon- (iii) until is Finally, one rithm (i) least) ontology. step (at the with fixes find of beginning which part to – this tries subontology in (i) inconsistent given sistency minimal and any this ontology, an for from inconsistent them axiom subontologies the among inconsistent with – minimal starts ABox any algorithms of algorithm and of The set TBox number a both ontology. a identifying of features for coherence RaDON and algorithm 2006). consistency (Motik, checking KAON2 for on based agnosis nooydbgigtoscnb ple nodrt icvradrpi inconsistencies. repair and discover and to services order reasoning in applied limited Standard be the ontology). can to tools constraint due debugging the ontology sufficient in be used would fragment reasoners language weaker much even (actually, reasoner individual the ftecasheacy o ahtg elr h niiulta eae oteclass the within the tag to class that fact relates to a the that corresponding if add class individual and reification example, the For the the to added declare ontology. belong we to constraint tag, ontology Then each original For ontology. the from hierarchy. constraint respective class the DL of OWL the instantiation class the of a rigid in as by it described interpreting subsumed and class be tagging each to taking have well, as classes rigid all that infer regards (8), with classes. and constraint the (1) describes axioms Sec- (10) ticular in Axiom given example, i.e. partially an dependency, as take to constraints, to Just actual the 6.3.3. describe tion (8)-(11) axioms the Finally, either is class each (i.e. partitions complete h class The . 6.3.2 Section in described b lsAsrino:iiCasapple) ClassAssertion(oc:RigidClass (b) food) ClassAssertion(oc:AntiRigidClass (a) aeteclasses the Take number absolute the compute not does it that is algorithm this of disadvantage The o u xeiet,w sdRDN(Ji RaDON used we experiments, our For DL OWL an by satisfiability for checked simply be can ontology created thus The eto h agn rae ntepeiu etosadfraie hmi W DL OWL in them formalized and sections previous the in created tagging the took We iia nossetsbnooy(Haase subontology inconsistent minimal sdfie sasbls of subclass a as defined is i C ClassAssertion(oc:RigidClass n d h fact the add and apple +D ed ob usmdby subsumed be to needs and food food ClassAssertion(oc:NonUnityClass . oc:Sortal apple eriyti xo sdsrbdaoe which above, described as axiom this reify We . +R stagged is C tal. et tal. et or stagged is ecie h dniymeta-property. identity the describes -R +D olfricnitnydi- inconsistency for tool a 2009) , .Te,(i tremoves it (ii) Then, 2005). , , i oeta h nooy npar- in ontology, the that Note . C +U ) +R faclass a If . or whereas , +R -U aeteriidclass reifiied the take , . h ENapproach AEON The 6.3 t. of etc.) , C food stagged is meta:Class stagged is i C ) etc. , -U take , any 121 ∼ i as C R</p><p>6 Chapter 6 Structure</p><p>(c) PropertyAssertion(meta:subClassOf apple food)</p><p>Together with axiom (8) from the constraint ontology in Listing 6.1:</p><p>(8) EquivalentClasses(oc:RigidClass AllValuesFrom(meta:subClassOf ComplementOf(oc:AntiRigidClass)))</p><p>This leads to an unsatisfiability: apple is a RigidClass (b), which has a local range axiom for the subClassOf relation (8) so that from the instantiated relation (c) we must infer that food belongs to the ComplementOf(AntiRigidClass) class description – which is a clear contradiction to the given fact (a) ClassAssertion(AntiRigidClass food). Reifying the subsumption hierarchy in OWL DL (Vrandeˇci´c et al., 2006c) and the formalization of the OntoClean constraints also in OWL DL, allowed us to simply merge the two and reuse standard tools in order to detect inconsistencies with regards to the formal properties of the ontologies.</p><p>6.3.5 Analysis and Examples For evaluating our approach, we have created a set of manual taggings for an ontology provided by three independent annotators A1, A2 and A3 (V¨olker et al., 2008). Table 6.1 shows the number of inconsistencies (each of them corresponding to a constraint violation) which were detected by RaDON for these tagging sets. On average, 17 constraint violations were found per annotator – most of them related to rigidity and identity. This also holds for the agreements, i.e. the data sets consisting of those taggings where two or three annotators agreed upon the same meta-property tagging for each concept. A further analysis and discussion of the annotator agreement can be found in (V¨olker et al., 2008). As shown by the lower part of the table the overall number of inconsistencies drops for the intersection of any two human taggings to an average of 3.0 constraint violations per data set.7 This can be explained by the fact that the number of agreed taggings is much lower than the overall number of tagged concepts. How does this compare to the automatically generated taggings? After training a classifier on each of the data sets we obtained seven fully automatic taggings. Since AEON so far has not been trained to distinguish between an anti-rigid and non-rigid (respectively, anti-unity and non-unity) tagging we converted all taggings to their</p><p>7The agreement statistics represent lower bounds as they are computed in a cautious manner with respect to the number of possible inconsistencies. If at least one of the individual annotators tagged the regarding concept as -R whereas the others agreed upon ∼R, we assumed the agreement to be -R (or -U, respectively), i.e. the weaker tagging.</p><p>122 aecorrect. are that 6.1 assumed Listing We in in was assignment. given relation concept meta-property as subsumption constraints a plausible the OntoClean a interpreted (ii) the to or annotators contradictory did, the (presumably) i.e. author because wrong, its e.g., than way incorrect, different was a tagging the (i) constraint corresponding the for reasons possible in discuss violations. rare and ontology, of very the any are in for tags detected anti-unity case that the fact is sets. the it tagging to created than due manually inconsistencies probably, the more – 40 taggings far to manual cause 17 con- the of from to number significantly the seems average increased the taggings for set expected, unity bound As data computed diagnosis per 6.2. upper Table inconsistency violations in an straint the presented of obtain are taggings results to these The for order violations. in constraint of possible, number wherever counterpart stricter ta- the of part lower The Taggings. Manual for Violations Constraint 6.1: Table c rpryseto(easblsO eiCmaycompany) mediaCompany PropertyAssertion(meta:subClassOf (c) mediaCompany) ClassAssertion(oc:RigidClass (b) company) ClassAssertion(oc:AntiRigidClass (a) h at otaitaim()fo h osritontology: constraint the from (8) axiom contradict facts The facts: following the by given is kind first the of error an of example An unsatisfiabilities: the analyzing when errors of kinds different two expected We were which inconsistencies for examples illustrative some present we following the In agnsweeannotator inter-annotater where the taggings from taggings e.g. on sets, based agreed violations constraint shows ble A 1 / A A A A 2 1 1 2 / / / / avg avg A A A A A A A A 3 3 3 2 3 2 1 2 / Inconsistencies A 3 hw h ubro iltosbsdo nythe only on based violations of number the shows A 2 and 17.0 3.0 13 14 24 2 2 2 5 A 3 10.0 gedon. agreed osritViolations Constraint 0.3 20 R 0 1 1 1 3 7 0.3 0.7 U 0 0 0 1 0 0 2 . h ENapproach AEON The 6.3 1.0 0.7 D 0 0 0 1 0 1 1 . 1.3 5.7 .Teautomatic The 7. 10 2 1 1 2 6 1 I 123</p><p>6 Chapter 6 Structure</p><p>Constraint Violations Inconsistencies R U D I A1 74 23 43 1 7 A2 17 3 8 5 1 A3 31 1 0 0 30 avg 40.7 9.0 17.0 2.0 12.7 A1 / A2 13 1 6 1 5 A1 / A3 5 2 0 0 3 A2 / A3 7 3 0 0 4 avg 8.3 2.0 2.0 0.3 4.0 A1 / A2 / A3 3 0 0 0 3</p><p>Table 6.2: Constraint Violations for Automatic Taggings</p><p>(8) EquivalentClasses(oc:RigidClass AllValuesFrom(meta:subClassOf ComplementOf(oc:AntiRigidClass)))</p><p>This error uncovers the improper tagging given by the taggers: MediaCompany and Company should be tagged in a compatible way. As of now, Company is said to be not rigid for all of its instances, whereas MediaCompany is rigid for its instances. Granted that the subsumption of MediaCompany by Company is correct, the error must be with the tagging of the two concepts. But here the taggers seem to have two different notions of companies in mind when they tagged Company and MediaCompany. An anti-rigid company is the role an organization can have: a university, which is an educational organization, can become a company; or a company can turn into a not-for-profit charity organization. In this case, the concept company means an organization that is meant to generate profit. On the other hand, a company that is tagged rigid actually is a type for individuals: now, a company can not cease to be a company any more, but a change as described above would basically require to generate a new individual. It depends heavily on the conceptualisation which of the two company concepts are useful within a given ontology. The example given above shows a confusion with regards to the concepts, and thus a possible source for errors. An error of the second kind was discovered in the subsumption relation between Group and PoliticalParty, which is only an indirect relation: Group is, in PROTON, actually a superclass of Organization which in turn is a superclass of PoliticalEntity which is a superclass of PoliticalParty. The problem here is indeed in the subsump- tion relation between Group and Organization: a group is defined in PROTON as “a group of agents, which is not organized in any way.” (Terziev et al., 2005). This description is not applicable for an organization (since an organization is, by its very</p><p>124 lutaeta ENpit opsil aooi rosi notlg,adguides and ontology. ontology, the an of in parts errors problematic taxonomic towards engineer possible ontology to the points AEON that illustrate than if hotels obvious for not differently is counted it illustrate is to that it example illustrates Building rooms: would good This of building A number buildings. the manager. the for a Then be have would would itself? difference hotel the the object whereas the height, describes a have building the whereas building, of subsumption group, and example). (Guar- organization The in pair, an discussed same as is meta-ontology). very taken change this constraints a is incidentally, (such the where, that 2004), of Welty, reflect and to 24 changed ino and be to 1 need axioms would terms, ontology on formal (based In inconsistency organized). an name, its in ∼ shown as and nature U ro sdsoee,adte ihrcrettetgigo eeintesubsumption the redesign or of tagging type the which correct hierarchy. discover either inconsistency, then and each discovered, consider is carefully error to has evaluator The sources: possible two expensive, is classes AEON. of system tagging tagging the automatic Since an provide automat- violations. we then constraint and meta-properties for OntoClean checked the ically with tagged be can check) ontology An meta-property (OntoClean 14 Method l h ie xmlsaetknfo h uoaial agdotlg.They ontology. tagged automatically the from taken are examples given the All the was inspection closer required that relation subsumption a of example Another whereas , l osritvoain,ie nossece ntemt-nooy oefrom come meta-ontology, the in inconsistencies i.e. violations, constraint All • • nicretsubsumption. incorrect an or tagging, meta-property incorrect an rnt stecntan ilto suggested. violation constraint the as not, or Organization Hotel by Building adas oiiaPry eetge as tagged were PoliticalParty) also (and Is . Hotel ahrterl rtefnto fa of function the or role the rather Hotel . h ENapproach AEON The 6.3 hudb usmdby subsumed be should Group +U hc causes which , a agdas tagged was 125</p><p>6</p><p> icmtne hyaepolmtc n hnte a ecniee reeat We irrelevant. considered be which can under they explore when also will and we problematic, problematic: are always they not circumstances are issues These ontology them. proper avoid of definition work Third, the foundational future. that facilitate etc.). the believe substantially in (subsumption We will metrics account assumption. issues into world these DL open number addressing OWL the small very of consider a semantics metrics Second, OWL few the solely. an take metrics represents graph that metrics basically graph of are RDF thus the and over ontology defined DL are metrics ontology most First, on the based are account are metrics into We most that taking 6 general. Chapter in in argued ontologies have We many for been years, measures has and last work metrics the initial of work. some In this nature and extending the suggested evolution. study of been subsequent to and assessment have their done simple measures assurance track and and quality to fast metrics perform the also ontology allow to and Metrics ontology precondition improvement. an necessary of a process the is (DeMarco, control engineering measure and during can both one application ontologies evaluate what to and control necessary only is can ontologies Measuring one fields, 1982). related other many in As Semantics 7 Chapter nti hpe ewl ecieteeise nmr eal n ugs ehd to methods suggest and detail, more in issues these describe will we chapter this In measures. and metrics ontology existing with problems of set recurring a is There semantics hc ed oicmaal esrmn results. measurement incomparable to leads which lesisfo pain? from skies blue tell hell, can from you heaven think you so So, tutrlnotions structural aesadGlor 1975)) Gilmour, and (Waters Pn ly,1965–1994, Floyd, (Pink ihYuWr Here Were You Wish without 127</p><p>7 Chapter 7 Semantics will outline the foundations for a novel set of metrics and measures, and discuss the advantages and problems of the given solutions. Our approach is based on notions of ontology normalization for measuring (Section 7.1), of stable metrics (Section 7.2), and of language completeness (Section 7.3). Normalization will help to properly define better ontology metrics in subsequent research in this area. Stability and completeness will help us understand metrics better.</p><p>7.1 Normalization</p><p>We define ontological, or semantic, metrics to be those which do not measure the structure of the ontology, but rather the models that are described by that structure. In a na¨ıve way, we could state that we base our metrics not on the explicit statements, but on every statement that is entailed by the ontology. But measuring the entailments is much harder than measuring the structure, and we definitively need a reasoner to do that. We also need to make a difference between a statement X that is entailed by an ontology O to be true (O |= X), a statement that is not entailed by an ontology (O 6|= X), and a statement that is entailed not to be true (O |= ¬X). To properly regard this difference leads us to so called stable metrics that can deal with the open world assumption of OWL DL. Note that measuring the entailments is rather an intuitive description than an exact definition. In many cases – for example for a measure that simply counts the number of statements in an ontology – measuring all entailed statements instead of measuring all explicit statements often leads to an infinite number of statements. Just to give one example, the ontology</p><p>SubClassOf( SomeValuesFrom(R owl:Thing) C) also entails the statements</p><p>SubClassOf( SomeValuesFrom(R SomeValuesFrom(R owl:Thing)) C) and</p><p>SubClassOf( SomeValuesFrom(R SomeValuesFrom(R SomeValuesFrom(R owl:Thing))) C)</p><p>128 w eciea xml of example an metric. describe a we of description 8.1 the Section for In normalization order metric. the in apply their tools case. some to of general designer how description the metric the in the and provide result simplify it actually wrong normalizing to to is the first ontology, normalization return of an will goal in actually The classes number atomic the of like calculating number then systems the know control to version want we some as base require. character would by SVN character or a CVS are on extension of compared serialization an be RDF/XML such the of like (Smith in benefits serializations, ontologies extended the common OWL be but that forms, can Considering normalized here clear. canonic not described in as result normalization to The order forms). example normal an steps. conjunctive give the We of be with. some will dealt illustrate classes are to actual redundancy 7.1.6 and and names Section ontology mapping class in the normal of of problems guaranteed: number and are the equal, graph cycles, no the have of will properties graph some since ontology normalized with this kind this discuss 8.1. call We We Section ontology ontology. in 2007). same transformed original (Vrandeˇci´cexamples Sure, the the the normalization of of and explicit transformations describe measures measures ontology ontological structural of to the take as of need can them features we interpret they semantic thus and certain is, – make structure that to their need ontology, need in transformation also the These they of But ontology. semantics the models. of the structure preserve the to transform can we features, capture way. to us terminating allow a that approaches in need we metrics thus ontological of and interest, chain practical of endless are an measures on, so and lo omlzto snta plcbeslto o vr erc o xml,if example, For metric. every for solution applicable an not is normalization Also, example for (like, results unique canonical, in result not do normalizations Often the on defined easier much are metrics that advantage the offers Normalization steps: normalization five define We structural of measurement cheap and simple the of advantage the gain to order In 7.1.5 ) (Section instances property property and normalize individual each 5. for property or class specific most 7.1.3 ) the (Section instantiate names normalize 4. and hierarchy subsumption the materialize 3. 7.1.2 ) left (Section individuals are anonymous descriptions name class 2. complex anonymous no so classes, relevant all name 1. 7.1.4) (Section instance 7.1.1) (Section tal. et ,lc aoi rnlto nwy hycannot they anyway, translation canonic a lack 2004), , SomeValuesFrom dsrpin.Btol terminating only But -descriptions. . Normalization 7.1 129</p><p>7 Chapter 7 Semantics</p><p>Further note that the algorithms provided in this section are merely effective but not efficient. They are given for the purpose of understanding normalization, and not as a blueprint for implementing them. Implementing the given algorithms will be unnecessarily slow, and more clever strategies for efficiently normalizing ontologies remain an open issue.</p><p>7.1.1 First normalization In the first normalization our aim is to get rid of anonymous complex class descrip- tions. After the first normalization, there will be only two types of class axioms left: definitions (i.e. class equivalence axioms between a simple class name and a class description) and simple subsumptions (i.e. subsumptions between two simple class names). Other class axioms (i.e. disjoints, disjoint unions, class equivalences involivng more than one complex class description, and subsumptions involving any complex class descriptions) will be reformulated. Class and property assertions will both use only simple class or property names, and no complex class descriptions. The first normalization can be done as follows: 1. replace all axioms of the form DisjointUnion(C D E ...) by the following axioms: EquivalentClasses(C UnionOf(D E ...)) EquivalentClasses(owl:Nothing IntersectionOf(D E ...)) 2. replace all axioms of the form Disjoint(C D) by the following axiom: EquivalentClasses(owl:Nothing IntersectionOf(C D)) 3. for every axiom of the form SubClassOf(C D) where C (or D) is a complex class description, add a new axiom EquivalentClasses(A C) (or EquivalentClasses(B D)) with A (or B) being a new class name. Replace the original axiom with SubClassOf(A D) (or SubClassOf(C B) or even SubClassOf(A B)) (so that only simple class names remain in the subsumption axiom) 4. in all axioms of the form EquivalentClasses(C D) where both C and D are complex class descriptions, replace that axiom with the following two axioms: EquivalentClasses(A C)</p><p>130 deet nqenm supin,ti sn rbe.Frhroe h etstep synonyms. next such the resolve Furthermore, to problem. care no not take is does will this DL normalization assumptions, OWL of name since But unique a individuals. to introduced) adhere newly other (or avoided. existing be as already argued anyway, should already individuals We express anonymous to list. such mailing want that their to 4.3 on rather as Section seem FOAF normalization in are around users second discussions RDF most The the in semantics by avoided. exemplified the nodes be captures thus blank properties. should here of functional and semantics defined inverse understood, the with fully but since not URI, often problematic, the is via regularly practice practice done occurs good not This this as was 2005) regarded data was Miller, of it Integration and URI time, an (Brickley some with files replaced for be FOAF since, to in needs individual Especially inferred) or reference. (asserted an is that node blank The normalization again. names Second additional mean 7.1.2 these not In remove does can classes. we introduced This 7.1.3 ) newly (Section these unsatisfiable. solely step are but this third that done unsatisfiable, that the Note becomes processing is introduced ontology published). and normalization be the be engineering that that to could of meant noted classes not purpose for be is named the ontology purposes, to way normalized for presentation has a ontology, not (i.e., for it and ontology the example but the processes, to (for on), measuring names cases so for all only class and in classes, new desirable the introduce counting be do not They may which equivalent. semantically are ti osbeta hs el nrdcdidvda ae d ute aeto name further a add names individual introduced newly these that possible is It they that means, that models, possible the change changes structural these of None i l xoshvn n ftefloigforms: following the of one having axioms all in 6. form the of axioms all replace 5. eodnormalization second qiaetlse( C) axiom: following EquivalentClasses(A the add and name) where S) R HasKey(C C) PropertyRange(R C) PropertyDomain(R a) ClassAssertion(C C) EquivalentClasses(A where A) EquivalentClasses(C with D) EquivalentClasses(A A C C en e ipecasname. class simple new a being sacmlxcasdsrpin replace description, class complex a is sacmlxcasdsrpinand description class complex a is esrdo nnmu niiul.Ti en htevery that means This individuals. anonymous of rid gets A ipecasnm ihteaxiom the with name class simple a C with not odfieUI o persons. for URIs define to A biganwsml class simple new a (being . Normalization 7.1 131</p><p>7 Chapter 7 Semantics</p><p>7.1.3 Third normalization The third normalization will materialize the subsumption hierarchy and normalize the names. The first step requires a reasoner.</p><p>1. for all pairs of simple class names (A, B) in the ontology, add the axiom SubClassOf(A B) if the ontology entails that axiom (that is, materialize all subsumptions between simple named classes)</p><p>2. detect all cycles in the subsumption structure. For each set Mi = {A1 ... An} of classes that participate in a cycle, remove all subsumption axioms from the ontology where both classes of the axiom are members of this set. For each such set Mi introduce a new class name Bi. In subsumption axioms where only one class is a member of this set, replace the class with Bi in the axioms. Add the axioms EquivalentClasses(Bi A1),..., EquivalentClasses(Bi An) to the ontology. If Bi is unsatisfiable, take owl:Nothing instead of Bi. If Bi is equal to owl:Thing, take owl:Thing</p><p>3. regarding solely the subontology that consists of all subsumption axioms of the ontology, remove all redundant subsumption axioms (that is, remove all sub- sumption axioms that are redundant due to the transitivity of the subsumption relation <a href="/tags/Alone_(Again)/" rel="tag">alone</a>). This also removes all subsumption axioms involving owl:Thing and owl:Nothing</p><p>The subsumption structure now forms a directed acyclic graph that represents the complete subsumption hierarchy of the original ontology. We define a set of normal class names of an ontology as follows: every class name that participates in a sub- sumption axiom after the third normalization of an ontology is a normal class name of that ontology. Now in all axioms of the types ClassAssertion, PropertyDomain, PropertyRange, and HasKey we can replace all not normal class names with its equivalent normal class names. Note that instead of creating a new class name for each detected cycle, often it will make more sense to choose a name from the set of classes involved in that cycle, based on some criterion (e.g. the class name belonging to a certain namespace, the popularity of the class name on the Web, etc.). For many ontology metrics, this does not make any difference, so we disregard it for now, but we expect the normalizations to have beneficial effects in other scenarios as well, in which case some steps of the normalization need to be revisited in more detail.</p><p>132 otidnraiain.Ti osntma hteeyisac ilbln oonly to belong general. will in instance necessary due every be hierarchy still that the will mean of instantiations not explicitness multiple does asserted class, the This one of normalization). because third cheap instantiations very to deriving is (and levels explicitly information higher most of the conveys this as properties, The is normalization that Fourth information versioning 7.1.4 normalize displayed to be bad will be ontology may the annotations). it when within but label captured the interface, (for normalize property user to annotation the instantiated useful on property the be annotation and ontology may within scenario the it names the of example, some of version on replacement for normalized both possible the done purpose depends the use instances the is further then for normalization to scenario, dropped planned some the is in be impact if it may have Nevertheless, and they not purpose, anyway, metrics. other do well ontology semantic as annotations the name measuring Since of normal of semantics the comments. DL for sometimes the useful the of or be on introduction may the labels, hand created to especially other related who discussion the – introduced, a on was to annotations There point URI Some that certain individual. URI. or or a state, property, when deprecation class, its describe actual it, that the annotations about not be individual and could URI, normal the the about with tations equality instances). normal the property stating the with annotation names to axiom within individual equality the besides non-normal explicitly and within the name, state of (besides and occurrences one one all normal normal replace a then property introduce and the name, or normalize on also decide can we hierarchy. We class instances) the name. property normalized property we annotation as normal within just the besides hierarchy with and equality replaced stating name, axiom property are All the normal ones). within non-normal the the (besides materialize the names with we and property is, names non-normal (that property of to normal occurrences equal to are the explicitly they between stated names relations be property equality to other property have all names analogous Therefore to property names equivalent OWL. normal property be All in and normalization. inverses stated name normalizing class be as to regarded can be descriptions can property normalization complex other no eei osbeagrtmt efr h orhnraiaino nontology an of normalization fourth the perform to algorithm possible a is Here edseadantto rpryisacssnete a eue osaeanno- state to used be may they since instances property annotation disregard We name, one than more has individual an case In individuals. for true holds same The properties inverse Besides neglected. often are properties classes, to Compared frec omlcasname class normal each for 1. orhnormalization fourth lsAsrinCi) ClassAssertion(C istwrsisataigtems pcfi lse and classes specific most the instantiating towards aims C n ahnra niiulname individual normal each and . Normalization 7.1 i in O add , 133 O .</p><p>7 Chapter 7 Semantics</p><p> to O if it is entailed by the ontology</p><p>2. for each normal object property instance PropertyAssertion(R i j) and each normal object property name S so that SubPropertyOf(S R) is an explicit axiom in O, add PropertyAssertion(S i j) if it is entailed by the ontology. Check this also for the property instances added this way (this step will terminate since the subsumption hierarchy is finite)</p><p>3. for each normal data property instance PropertyAssertion(T i d) and each normal data property name U proceed as in the previous step.</p><p>4. create a subontology IO out of O including only the facts and the explicitly stated subsumption hierarchy of the classes and properties (after third normalization)</p><p>5. remove all facts from O that are redundant in IO</p><p>We do not want to remove all redundant facts from the ontology at this step, since there may be some facts that are redundant due to an interplay of different other terminological axioms. For example, in the following ontology</p><p>ClassAssertion(Person Adam) PropertyAssertion(likes Adam Eve) PropertyDomain(likes Person) the first statement is actually redundant, but would not be removed by the above algorithm. This is because we only remove axioms that are redundant within the subontology IO, and the axiom stating the domain of likes would not be part of it.</p><p>7.1.5 Fifth normalization The fifth normalization finally normalizes the properties: we materialize property instances of symmetric, reflexive and inverse properties, and we clean the transitivity relationship. This can be done similar to the creation of the subsumption hierarchy in the third normalization: after materializing all property instances, we remove all that are redundant in the subontology TO, which contains only the property instances of all transitive properties, and the axioms stating the transitivity of these properties.</p><p>7.1.6 Examples of normalization The metric we will regard in this example is the maximum depth of the taxonomy as defined by (Lozano-Tello and G´omez-P´erez,2004) and described in Section 6.1.1.</p><p>134 uCasfFE) SubClassOf(F E) EquivalentClasses(D C) way:SubClassOf(D following the in axioms the rewriting by resolved md E) SubClassOf(F E) SubClassOf(D D) SubClassOf(E C) SubClassOf(D version. normalized the on case calculated In not (Hartmann of version. and repository metadata ontology normalized 1, as of stated the be kind if should in some – – 5/3 in (M30) to example shared, and for raises ontology, distributed but the being 1, is ontology is original it the ontology original the On ihnraiain n(Gangemi In 3. returns normalization. and with ontology the of depth the of intuition the D) SubClassOf(E C) SubClassOf(D R)) MinCardinality(3 EquivalentClasses(E R)) MinCardinality(2 EquivalentClasses(D R)) the MinCardinality(1 normalization after EquivalentClasses(C But node). this: one to has transformed path gets ontology every axioms, subsumption stated qiaetlse( iCriaiy3R)) MinCardinality(3 EquivalentClasses(E R)) MinCardinality(2 EquivalentClasses(D R)) MinCardinality(1 EquivalentClasses(C measure the name subsumption We the has. of hierarchy length class the the levels as of described md number intuitively the is else measure or hierarchy, to want we What e srgr nte xml.I h olwn ontology following the In example. another regard us Let sdsusderir hseapeas hw sta oemtiswl o work not will metrics some that us shows also captures example actually this ontology, earlier, normalized discussed the As to applied metric, same very the Now ytedfiiinof definition the By ontology: following the regard us Let . ilbe will ∞ u otesbupinccebetween cycle subsumption the to due md h et fteotlg s1(ic hr r oexplicitly no are there (since 1 is ontology the of depth the , tal. et ,mti M0 sthe is (M30) metric 2005), , D and E h yl a be can cycle The . . Normalization 7.1 xo/ls ratio axiom/class tal. et 2005) , 135 .</p><p>7 Chapter 7 Semantics</p><p>But due to the definition, md would yield 2 here – there are two explicit subsumption paths, (C, D) and (E, F), both having two nodes, and thus the longest path is 2. The structural measure again does not bring the expected result. After normalization, though, the ontology will look like this:</p><p>SubClassOf(A C) EquivalentClasses(A D) EquivalentClasses(A E) SubClassOf(F A)</p><p>We have introduced a new class name A that replaces the members of the cycle (D, E). Now the depth of the ontology is 3, as we would have expected from the start, since the cycle is treated appropriately. Existing structural metrics, as discussed in Chapter6, often fail to capture what they are meant for. Normalization is a tool that is easy to apply and that can easily repair a number of such metrics. Even seemingly simple metrics, as demonstrated here with the ontology depth, are defined in a way that makes too many assumption with regards to the structure of the measured ontologies.</p><p>As we can see in this chapter, simple structural measures on the ontology do yield values, and often these values may be highly interesting. If we know that md resolves to ∞, then this tells us that we have a cycle in the subsumption hierarchy. Also a high number of classes and complex axioms, but a low md may indicate an expensive to reason about ontology, since the major part of the taxonomy seems to be implicitly stated (but such claims need to be evaluated appropriately). But both results do not capture what the measure was meant to express, that is, the depth of the class hierarchy.</p><p>7.2 Stability</p><p>Another aspect of semantic metrics is their stability with regards to the open world assumption of OWL (Vrandeˇci´cand Sure, 2007). The question is, how does the metric fare when further axioms are added to the ontology? For example, a taxonomy may have a certain depth, but new axioms could be added that declare the equivalence of all leaves of the taxonomy with its root, thus leading to a depth of 1. This often will not even raise an inconsistency, but is still an indicator for a weak ontology. Stable metrics are metrics that take the open world assumption properly into account. Stable metrics allow us to make statements about the behavior of an ontology in the context of a dynamic and changing World Wide Web, where ontologies may frequently be merged together in order to answer questions over integrated knowledge.</p><p>136 en ob mse n nertdcntnl n yaial,cnw rdc how if predict is, we are that can ontologies behave, dynamically, will Since ontology and the constantly happen? of integrated ontology properties and certain the smushed to be additions to when meant behave metric the does know is we world that all open even – an (and in paper time authors the least any 2 to statements at has authors add could indeed further we add paper that that the consider we that if answer wrong the possibly at arrive thus and Denny) careful, DifferentIndividuals(York more becoming even, or 2, to as that such answer answer your an change giving would suddenly you then Denny) that knew SameIndividual(Zdenko you if now, But 3. for be should name answer the that seems rpryseto(uhrpprZdenko) paper PropertyAssertion(author Denny) paper PropertyAssertion(author York) paper PropertyAssertion(author the though even metrics the problems these in overcome cases. to differences many way a in to offers lead Normalization transformations untouched. often structural the remained seen, semantics description off have ontology we close as the But to of ontology. nominals the of use undesirable representation has can actual often we closure or a reasoners. case group, such many work this that in certain note effects In But a side for of completeness. computational countries. coverage: its members complete declare all all and offers of of class base list list knowledge complete complete a a a that publish know may we we cases ontologies.example, some stable for In indicators classes. ontology are robust metrics more Stable example a changes. the (for indicating future render ontology raise, to will the may regards depth to that with minimal axioms changes heavy-weight the certain more partitions) prevent complete adding can By engineer useless. ontology ontology An changed. be hncetn erc ehv oakorevstefloig iia usin how question: similar following, the ourselves ask to have we metric, a creating When o e saktesml usin o ayatosde the does authors many how question: simple the ask us let Now ontology: following the consider stability, metrics illustrate to order In the of independent are that ontology the of features capture to intend metrics Often certain closing by used be also can base knowledge the to regards with Stability can ontology an how and if indicate will metric stable a built, is ontology an When w authors). two Denny n hsstate thus and , Ia o ue ti ihr1o 2” or 1 either is it sure, not am “I so now of as M ( O 1 and ) ofial ecnstate can we finally So . Zdenko sta h ae has paper the that is M ( paper O sjs another just is 2 o metric a for ) . Stability 7.2 ae It have? 137</p><p>7 Chapter 7 Semantics</p><p>M and two ontologies O1 and O2 are known, what can we state about M(O1 ∪ O2)? Or even, can we give a function fM so that we can calculate fM (M(O1),M(O2)) = M(O1 ∪ O2) without having to calculate M(O1 ∪ O2) directly (which may be much more expensive)? In the previous section we have discussed the simple example of ontology depth. Given an ontology O1:</p><p>SubClassOf(D C) SubClassOf(E D) and a second ontology O2:</p><p>SubClassOf(C D) SubClassOf(E D)</p><p>In this case, md(O1) = 3, md(O2) = 2. We may expect md(O1 ∪ O2) to be 3, since md is defined as the maximal depth, but since the union of both ontologies actually creates a cycle in the subsumption hierarchy, md is ∞ – or, after normalization, just 2, and thus even smaller than the maximal depth before the union. We can avoid such behavior of the metrics by carefully taking the open world as- sumption into account when defining the metric. But this leads us to three possibilities for defining metrics,</p><p>1. to base the value on the ontology as it is,</p><p>2. to measure an upper bound, or</p><p>3. to measure a lower bound.</p><p>We need a more complicated example to fully demonstrate these metrics:</p><p>DisjointUnion(C D E) SubClassOf(F E) EquivalentClasses(G ComplementOf(C)) SubClassOf(H C) ClassAssertion(F i) ClassAssertion(D j) ClassAssertion(G k)</p><p>The normalized version of this ontology looks like this (shortened slightly for read- ability):</p><p>138 ic oitrcinbtenteaim a apnta a nraeo reduce or increase may that happen define can thus axioms can the We namespaces). between XSD interaction and in no used RDFS, names since RDF, all OWL, case, of the set of this from the signatures In is names the ontology the of an intersection (besides of ontology the signature is, the The that empty. ontology, is existing ontologies two the the to ontology, relate the not of do that depth maximum added. a are with to names provide added class will when new metric unsatisfiable no this become assuming So would classes path. named existing other an axiom the the all add since may possible, we example, above the F) in Thus, ontology. malized we following, form the the of In axioms maximum value. practical a of define be to definitions. to possible need order two in we discuss way Therefore different will slightly hierarchy). a class in depth long arbitrarily an about no is there add, each further than we owl:Thing individuals axiom what more matter contains no it hand, other let the to on way But 2. to crease stesals ubro eesteotlg ls irrh ilhv udrthe (under have will hierarchy class what added, ontology are satisfiable)? the axioms remains levels what ontology matter of the no that number condition is, smallest that ontology, the this is of depth minimal the lsAsrinGk) ClassAssertion(G j) ClassAssertion(D i) ClassAssertion(F C) SubClassOf(H ComplementOf(C)) EquivalentClasses(G E) SubClassOf(F C) SubClassOf(E C) SubClassOf(D E)) IntersectionOf(D EquivalentClasses(owl:Nothing E)) UnionOf(D EquivalentClasses(C nte osblt ocntanteaim ob de,i oalwol o axioms for only allow to is added, be to axioms the constrain to possibility Another add to allow only we added, be may that axioms arbitrary for allowing of Instead usually is ontology an of depth maximum The ntegvneape fw d h axiom the add we if example, given the In md oteotlg nodrt increase to order in ontology the to fti nooyi ( 3 is ontology this of C deto (due olpewith collapse md ftemre nooyi h maximal the is ontology merged the of uCasfAB) SubClassOf(A f k md en usd of outside being ( md ( C D O , and E 1 ) , md , F .Btbsdsteata et,w a localculate also can we depth, actual the besides But ). E therefore , ( O with 2 C md )= )) ,temnmmdpho h nooyi 2. is ontology the of depth minimum the ), D or A rm3t .N ogrsbupinpt is path subsumption longer No 4. to 3 from and max E C qiaetlse( E) EquivalentClasses(F ln) n because And alone). ( sapoe uesto oh(htis, (that both of superset proper a is B md ∞ en omlcasnmso h nor- the of names class normal being ( snew a lasadaxioms add always can we (since O 1 ) md , md ( ftesnl ontologies, single the of O 2 )) C antbecome cannot SubClassOf(H . Stability 7.2 md ilde- will md 139 .</p><p>7 Chapter 7 Semantics</p><p> which is much cheaper to calculate than md(O1 ∪ O2).</p><p>Stable metrics are metrics that take the open world assumption into account. Stable metrics will help us to evaluate ontologies for the Wide Wild Web. Since we expect ontologies to be merged on the Web dynamically, stable metrics allow us to state conditions that the ontology will fulfill in any situation. The depth of an ontology may be a too simple example to demonstrate the advantages of stable metrics, but imagine a dynamic, ontology-based graphical user interface. Having certain guarantees with regards to the future development of the properties of the ontology may help the designer of the user interface tremendously, even if it is such a seemingly trivial statement such as “the depth of the ontology is never less than 3”. There is no simple recipe to follow in order to turn a metric into a stable metric, but the question outlined at the beginning of this section, and then discussed throughout the rest – how does the ontology behave when axioms are added? – can be used as a guideline in achieving a stable metric.</p><p>Method 15 (Ensuring a stable class hierarchy) Calculate a normalized class depth measure, i.e. calculate the length of the longest subsumption path on the normalized version of the ontology md(N(O)). Now calculate the stable minimal depth of the ontology mdmin(O). If</p><p> md(N(O)) 6= mdmin(O)</p><p> then the ontology hierarchy is not stable and may collapse.</p><p>We expect that the ready availability of metrics that take the open world assumption into account will lead to more robust ontologies. Since ontology engineers will have these numbers available at engineering and maintenance time, they will learn easier how to achieve their actual goals. For example, ontology engineers that want to create a class hierarchy that will not collapse to less levels can always check if the minimum depth as described above corresponds to the asserted depth. This would be useful when regarding a class hierarchy with a certain number of levels, which are known not to collapse (e.g. a biological taxonomy). The ontology engineer now could check if the well known number of levels indeed corresponds to the calculated minimum depth. Tools could guide the ontology engineer towards achieving such goals. Ontology engineers get more aware of such problems, and at the same time get tools to measure, and thus potentially control them.</p><p>140 xrsielnug,teotlg utoe endasesfralqetosta can that language. questions given all the for answers with defined asked offer be must ontology the language, expressive of of domain the is facts: ground O, nwn bu l osbegon at htcnb ecie yteotlg ie for (i.e. ontology the by described be can that short: fact facts each (or ground possible language all about that knowing to other. regards each know with individuals the of which and persons, are of individuals set (i.e. signature signature that the that knowledge specific knowledge the the a between and Given ratio expressed the language). measures completeness the language of names), subset (or language ontology completeness Language completeness Language 7.3 rpryseto(asEeApple) Eve PropertyAssertion(eats Adam) Eve PropertyAssertion(knows Adam) ClassAssertion(Person Food) PropertyRange(eats Person) PropertyDomain(knows Person) Disjoint(Food ontology following the example signature. for example Consider above the using. using is ontology the fragment language ie,teasrinlfamn sasmd rma ontology of an signature from the over assumed) axioms is possible fragment assertional the Υ given, function a define completeness) We language (Measuring 16 Method nepesv nooylnug losnmru oeqetost eakdbesides asked be to questions more numerous allows language ontology expressive An hsasmn ipeasrinllnug uha D,lnug completeness language RDF, as such language assertional simple a assuming Thus oeta h agaefamn h opeeesmauei sn sntte othe to tied not is using is measure completeness the fragment language the that Note ∀ knows i eintroduce We ∈ O, nodrt aealnug opeeotlg ihrgrst h more the to regards with ontology complete language a have to order In ? { ClassAssertion ∀ j Adam ∈ O } C , ecnsyi ti reo o,adnn fte sunknown). is them of none and not, or true is it if say can we C i Eve as i ( O , agaecompleteness language sdfie nacranotlg ihrgrst specific a to regards with ontology certain a on defined is = ) Apple i ( ihteindex the with i C |{ knows X , ) |∀ knows | X is C ∈ ∈ ttd o xml,i ehv notlg with ontology an have we if example, For stated. a O, Υ( , Person O eats ∀ O ie h agaefragment language the given i ) i setoa completeness assertional | O , ∈ Υ( , en agaefamn i oeis none (if fragment language a being si h ag?Is range? the it Is ? Person O O | vrtelnug fragment language the over = { ∪ } ) | X ∨ PropertyAssertion , O Food | = . agaecompleteness Language 7.3 ¬ ecnakwiho the of which ask can we , X }| O eats otesto all of set the to sahee by achieved is ) i subproperty a . ( j i R i . can ) |∀ 141 R be ∈</p><p>7 Chapter 7 Semantics</p><p>With the help of Table 7.1 we can calculate the assertional completeness C(O) = 17 24 ≈ 0.71. We see that we are using a far more expressive language to state the on- tology than the simple assertional fragment we use for calculating the completeness. Relational exploration is a method to explore language fragments of higher expressiv- ity, and to calculate the smallest set of questions that have to be answered in order to achieve a language complete ontology (Rudolph et al., 2007). In order to improve completeness we can thus add further axioms, either by adding more facts such as</p><p>PropertyAssertion(knows Adam Eve) NegativePropertyAssertion(eats Apple Apple) or by adding terminological axioms that allow to infer that certain facts hold, such as</p><p>SymmetricProperty(knows) IrreflexiveProperty(eats) which in this case adds exactly the same amount of information to our given signature 19 using the same number of axioms (i.e. improving the completeness to 24 ≈ 0.79). Apple Adam Eve knows Apple Adam Eve eats</p><p>Class Apple Adam Eve Apple %%% Apple ? %% Food !%% Adam ??? Adam ? %% Person %!! Eve ? ! ? Eve !%%</p><p>Table 7.1: Class and property assertions to calculate the language completeness of the example ontology.</p><p>Even though both sets of axioms improve the ontology with the same information content, the second set seems intuitively better as it further describes the terms in- tensionally instead of just adding facts extensionally. How can we capture that in a completeness metric? Instead of using assertional completeness, which indeed is not suited for capturing intensional completeness, we have to use a more expressive language fragment. For example, by adding the symmetry axiom to the language fragment used for computing language completeness, we see that the second set indeed displays a higher complete- ness (0.77) than the first set (0.73). The more expressive the language fragment used for calculating the completeness, the more the measure will reflect the value of inten- sional axioms.</p><p>142 n eitosbtenterslso h w esrmnsidct lmnso the of elements indicate measurements two in described the results. as of respective metric the results similar) compare the very and between a normalization, deviations (or described after same Any and normalization the before the use applied Using often 6, even Chapter structural measures. can the semantic we of 7, the results Chapter of the in results compare we the representation, to the measures of features evaluate to der today. exist metrics introduced structural that. have structural we many gain and simple to required, and Sometimes is way hand, model a semantic at metric. the of task designed capture understanding the an easier metric. the Sometimes for to the with sufficient methods creating express are use when to metrics to choices mean a ready their designer they help ontology in will what the aware they offer since explicitly they metrics, being Furthermore ontological in of designer definition metric the with helps Normalization metrics Ontological 8.1 will conceptualizations. the This the between to represented. isomorphic relation structurally the are within semantics omissions cation semantics the and the way mistakes between uncover the relation often i.e. the structure, with deal the ontology and an of aspects Representational Representation 8 Chapter h seto ersnaincvr o h tutr ersnstesmni.I or- In semantic. the represents structure the how covers representation of aspect The n the and hrdconceptualization shared ra es h oeswihaesupposedly are which models the least at or – einetpsuepipe. une pas n’est Ceci RneMgit,1898–1967, (Ren´e Magritte, h rahr fImages of Treachery The arte 1929)) (Magritte, omlspecifi- formal 143</p><p>8 Chapter 8 Representation</p><p>Figure 8.1: A simple taxonomy before (left) and after (right) normalization. The ar- rows denote subsumption.</p><p> ontology that require further investigation. For example, consider the ontology given in Figure 8.1. The number of classes before normalization is 5, and after normaliza- tion 3. This difference shows that several classes collapse into one, which may be an error or done by intention. In case this is an error, it needs to be corrected. If this is done intentionally, the rationale for this design decision should be documented in the ontology. This becomes especially evident if you imagine removing any single one of the subsumption relations between B, C and E. The result will be very different, i.e. such an erroneous axiom has a high impact on the resulting conceptualization.</p><p>By contrasting the two ontology structures in Figure 8.1 we see that the right one is a more faithful representation of the semantics of the ontology. Both structures have the same semantics, i.e. allow the same sets of models. The right one is more concise, and for most cases more suitable than the left one. Evaluation methods dealing with representational aspect can uncover such differences and indicate problematic parts of an ontology (Vrandeˇci´cand Sure, 2007).</p><p>In the remainder of this Section, we will discuss the four metrics we have introduced in Section 6.1 as examples of how they can be turned into semantic metrics that actually reflect their descriptions. This is a prerequisite for the discussion of the representational metrics introduced subsequently, and their meaning. For notation, we formalize the five steps of normalization as the functions N1 to N5 : O → O, where Ni+1(O) always means Ni+1(Ni(O)), N0 is the identity function, and N(O) being a shortcut for N5(O).</p><p>144 hr omlzto) i.e. normalization), third order be the in that that metric: (let use the be actually of can that we meaning hand, (let original at the normalization of capture tool to the named have metric we be the that that Now appropriately argued more have be We G´omez-P´erez,would and ). 2004 (Lozano-Tello the by introduced defined have 6.1.1.we Section In taxonomy the of depth Maximum 8.2 osbeerr ntecasheacyw ol ahrcalculate metric rather the would whenever we hierarchy But hierarchy class again. the class 1 in the to errors out of possible balances part result and the collapse, that ET hierarchy so class explicated not the is of parts where ontology hierarchy subsumption its the capture of indeed ness does metric the of result the and 6.1.1 meaning. Section in mentioned have l igesbupin htaentpr fteheacy n h te a around. way other the and hierarchy, the of part not axiom are Each that subsumptions single all with ehd1 Epiins ftesbupinhierarchy) subsumption the Calculate of (Explicitness 17 Method o ecnitoueanwmetric, new a introduce can we Now oeta hsts osntncsaiydsoe l ros–oecudiaiean imagine could one – errors all discover necessarily not does test this that Note ( O H • • • osntrsl n1 hr sahg rbblt fa ro.I re ofidall find to order In error. an of probability high a is there 1, in result not does ) h olpe lse n earteepii ls hierarchy class If explicit the repair and classes collapsed the If If atadrpi h ls hierarchy class the repair and part : O → O ET ET ET ET x SL D T ( ( ( O O O ∈ ( ftenraie eso fteontology the of version normalized the of ) ) vrtigsesfine seems everything 1 = ) ) O D fa ontology an of ) ucinta eet nythe only selects that function a > < ). stu oeta rbeai xo htsol echecked. be should that axiom problematic potential a thus is ato h ls irrh a o enepiae.Fn that Find explicated. been not has hierarchy class the of part 1 hnsm ftecassi h nooyhv olpe.Find collapsed. have ontology the in classes the of some then 1 D = D T H ( ( O O ) = ) /H . O ( N SL qasthe equals ( ET O ( aiu usmto ahlength path subsumption maximum aiu et ftetaxonomy the of depth maximum N )) ( 3 O ∪ ( O = ) H ) hsrsle l h rbeswe problems the all resolves This )). . aiu et ftetaxonomy the of depth Maximum 8.2 ( ipesubsumptions simple N aiu usmto ahlength path subsumption maximum D T SL ( aiu et ftetaxonomy the of depth maximum O ( ( O O )) ) ) /H hc ecie the describes which ( O O ) t eeat fe the after exact, be (to hscalculates This . ercas metric explicit- instead. 145</p><p>8 Chapter 8 Representation</p><p>8.3 Class / relation ratio</p><p>In Section 6.1.2 we discussed the (M29) class / relation ratio from (Gangemi et al., 2005) being |C(O)| M29(O) = |P (O)| (with C(O) yielding the set of used class names in O, and P (O) the set of property names in O). We have shown that the metric would be better named class name / property name ratio. But making the same modification as above and yielding a new metric |C(N(O))| M29∗(O) = M29(N(O)) = |P (N(O))| also would not yield the ratio between classes and relations, since the normalization does not remove synonymous class and property names (but rather potentially adds further such names). Instead we need to define a metric that counts the number of normal class and property names (we do that by introducing CN (O) yielding the set of normal class names in the normalized version of O and PN (O) yielding the set of normal property names in the normalized version of O), and thus leading to a new metric</p><p>|C (O)| N29(O) = N |PN (O)|</p><p>Comparing the two ratios M29(O)/N29(O) does not yield a value with an obvious meaning. Instead we should regard the ratio between each of the two components, i.e. the ratio of classes and class names</p><p>|C (O)| R (O) = N C |C(O)| and the ratio of properties and property names</p><p>|P (O)| R (O) = N P |P (O)|</p><p>Method 18 (Explicit terminology ratio) Calculate RC (O) and RP (O).</p><p>• If RC (O) = RP (O) = 1 then this indicates no problems with the coverage of elements with names in the ontology</p><p>146 e fnra rprynms eutn in resulting names, property normal of ber P ls-ucasrltosi ihrta aooywt nycassbls relation- class-subclass only with taxonomy a than ontology” that richer the ontology in is an relations relations that of class-subclass assumption placement the and on relations definition. based of intended original diversity the the to ”reflects get can we closest the probably is which is. it as meaningless much pretty is value this 6.1.3 Section (with as here 6.1.3 Section in original described the redefine We richness Relationship 8.4 quality the in original insight the more It and with normalized definition. us the desired ontology. provides of the the that metrics to of metrics the fits comparing interesting that simply yields metric that ontology a true yield not to also order is in version normalized its ( eadn h ainl o hsmti,teatosaeloigframti that metric a for looking are authors the metric, this for rationale the Regarding redefine just cannot we this repair to order In with ontology the replace mechanically just cannot we that see we metric this In N ( • • • O H )de o il h ubro rpris nta ecuduetenum- the use could we Instead properties. of number the yield not does )) If fnmshv olpe odsrb h aeclass same If the number describe a to since collapsed problems have possible names indicates of this then vocabulary external an xenlvcblr ecnrmv l xospoiigtempigand mapping the providing axioms all remove calculate can we vocabulary external If lse rpoete aebe ie ae ..tecvrg fcassand classes sufficient of coverage be the not i.e. may name, names a with given properties been have properties or classes and 8.2 Section in defined as R R R C C C ( ( ( O O O ) ) ) R < < > C or 1 ( or 1 or 1 O 0 and ) R R P eainhprichness relationship R RR ( P P O RR ( ( R ( O ) O O P < ) ∗ ) = ) ( ( < O O n h nooyde o nld apn to mapping a include not does ontology the and 1 > 0 = ) anew ) n h nooyicue apn oan to mapping a includes ontology the and 1 | hnti niae htntalinteresting all not that indicates this then 1 H P ( | N H .A icse in discussed As 8.3). Section in defined as ( ( O O | P | )) ) P N | | ( + ( O + ercfo (Tartir from metric O cnan ayrltosohrthan other relations many ”contains | ) ) P | | | P ( N RR O ( ) O | ( ) O | = ) . eainhprichness Relationship 8.4 RR ∗ ( tal. et N ( O )because )) as 2005) , n is and 147</p><p>8 Chapter 8 Representation ships”. We question this assumption: it seems more straightforward to simply use the number of classes than the number of class-subclass relations. We do agree with the original definition that the relationship richness should be of the form |PN (O)| , but X +|PN (O)| we disagree that |H(N(O))| is a good value for X but instead we suggest |CN (O)|. In order to understand the difference we first investigate the relationship between the number of class-subclass relations and the number of classes in an ontology. We understand the number of class-subclass relations to be |H(N(O))|, i.e. the number of simple subsumptions in the normalized ontology. The number of classes is |CN (O)|, i.e. the number of normal class names in an ontology. Now we can define the set of all root classes, i.e. of all classes that have no given superclass (besides owl:Thing) as</p><p>R(O) = {C|C ∈ CN (O) ∧ ∀D ∈ CN (O): SubClassOf(CD) 6∈ H(N(O))}</p><p>Further we can define the treelikeness of the class hierarchy as</p><p>|C (O)/R(O)| t(O) = N |H(N(O))|</p><p>(or 0 if |H(N(O))| = 0). The closer the value to 1 the more treelike the class hierarchy is. So if there is exactly one simple subsumption for each class that is not a root class then the treelikeness of the class hierarchy is 1 (this allows us to easily give a formal definition for the terms tree and set of trees describing the taxonomy of an ontology: a tree is given if t(O) = 1 ∧ |R(O)| = 1, a set of trees if t(O) = 1 ∧ |R(O)| > 1).</p><p>So if |CN (O)| is fixed, an increased value of |H(N(O))| leads to a less tree-like class hierarchy, but has no other obvious effect on the ontology. The treelikeness of the hierarchy seems to be independent of the relationship richness. Therefore we suggest to choose another function for X : obvious candidates seem to be the size of the ontology, i.e. the number of axioms |O|, the number of terminological axioms, or simply the number of classes. We think that the number of classes is a better choice, since a growth in the number of axioms but having a fixed number of entities indicates an overall growth of richness. Therefore it would be counterintuitive for the relational richness to decrease if existing classes are described in more detail. This effect does not happen if we choose X to be instead the number of classes, i.e. |CN (O)|. So we suggest the best metric to capture relational richness that is still close to the original metric as defined by (Tartir et al., 2005), to be</p><p>|P (O)| RR(O) = N |CN (O)| + |PN (O)|</p><p>Again it would not make sense to compare this metric to the original metric. But we see that normalization is a useful tool to define metrics more precisely, and to get</p><p>148 eto ubro erc seapeaddmntae h sfleso h tools the this of in usefulness shown the have demonstrated We and thesis. example qualities. this as ontology in metrics for provided of metrics number define a and section capture. may discuss properly connections to not which explicit do though in So semantics (even formal semantics). relation the formal specific that the a closeness in semantic on anything some emphasis In indicate human- mean put in rationale. not relation, to a this does be has exactly be that that may for often may evidence case it axiom the axiom ontologies the on an subsumption based engineered such be be redundant of may indeed a it inclusion ontologies, may though the learned ontology, it even the similarity, usually, provides in semantic ontology true: represented a normalized is the calculating opposite that for the think base may better one Whereas a yield. will ontology. but ontologies the transformation, further measure syntactic to specific to semantic-preserving need a order a of we in value is example required the step this is a measure given in it can with the see we engineer we Again, before metric ontology as metric. the but a of metrics, provide preprocessing test They the and extend tasks. define to all tool for distance. new sufficient the nor for number necessary finite neither a have always namely will procedure, and normalization graph the define subsumption to either step whole can further we axiom a case, the add this can add In we graphs. or 0, disconnected of be number a to leading connecting axioms subsumption calculating then pitfalls. and ontology of the the number argumentation normalizing of a first value the avoids that Continuing reciprocal distance see the the ontology. can as we the defined 6.1.4, of is Section graph It from subsumption ontology. the an over in distance classes function two a of similarity is the measure similarity semantic The measure similarity Semantic metric. 8.5 given a with capture to want we what to closer hseauto soto cp o hstei.Orts st rvd framework a provide to is task Our thesis. this for scope of out is evaluation This of comparision the what unclear is It are steps given the that explicitly stated we normalization, introduced we When explicit all removes normalization that is though, here, into run we problem The uCasfCowl:Thing) SubClassOf(C owl:Thing ssm ihtero classes root the with oteotlg.Ti a ecnetthe connect we way This ontology. the to vrteoiia n h normalized the and original the over ssm : . eatcsmlrt measure similarity Semantic 8.5 → C × C × O R ( O hspotentially thus ) R htdescribes that ∀ C ∈ ssm R ( 149 O to )</p><p>8</p><p> ol li htteteqeylnug a ob xrsieeog oecd the competency encode the to of enough relevance intuition the expressive first enough, be A expressive to not for. has is developed it language is If query ontology query questions. the the a competency tool the in the that formalized by claim be used would to be regards need can with questions that evaluation competency language automatic the the questions, enable competency to to order In 1995). Fox, (Gr¨uninger framework. and our used within artifacts context further of considered deriving specific check- is Only automatically for process for 6.3). it evaluation or Section an this be in 4.1.1 ) to in well, AEON Section input context as in in as a aspects (as data be other classes linked to of of for Web evaluations properties (as Semantic for vocabulary the Web the of the method. ing evaluation whole use ad- the the we and Thus consider The ontology not sense. evaluations. the the do for further we both context that perform load a Note then can defi- provides and and tool thus evaluating ontology creation artifact ditional the An the describing and ontology. literature artifact an the additional accompanying in artifacts approaches of of nition number a are There Context 9 Chapter n fteeris prahstwr nooyeauto a h introduction the was evaluation ontology toward approaches earliest the of One optnyquestions competency ..qetosta h nooysol eal oanswer to able be should ontology the that questions i.e. , oncintewr implies. word the the connection is matters What word. Rama-Kandra: Neo: love? of speak program Rama-Kandra: Neo: ahwk n Wachowski, and (Wachowski tsa ua emotion. human . . a. It’s . . never. have just I MtxRevolutions (Matix o ti a is it No, a heard . . . 2003)) 151</p><p>9 Chapter 9 Context questions needs to be assessed: if the ontology-based tool cannot ask the question, why should the correct answer be important? The additional artifact, in this case, would be the set of formalized competency questions and the required correct answers. But in some cases a list of formalized competency questions and their correct answers is not feasible or possible to generate. Instead, we often can state certain constraints the answers need to fulfill in order to be possibly correct. Again we have to come to the conclusion that we cannot create an automatic system that allows us to check if the ontology is correct – but merely a system that sends out warnings when something seems to be wrong. The constraints in turn may often require expressivity beyond what OWL offers. We address problems in ontology engineering and maintenance that arose during the work with ontologies within the case studies in the European FP6 project SEKT.1 As they often reminded us of problems that occurred in software engineering, a solution that was successfully introduced to software engineering was examined – unit testing. Although the notion of unit testing needed to be adapted for ontologies, it inspired a slew of possible approaches. Section 9.1 will show how unit testing for ontologies can be applied. In Section 9.2 we discuss how further expressivity that goes well beyond the available standardized languages can be used in order to guarantee that the evaluated ontologies fulfill certain, formalized properties. Since these semantics cannot be expressed within OWL there have to be necessarily regarded as contextual in the sense of our definition, i.e. as extra-ontological artifacts that can be used for the evaluation of the ontology.</p><p>9.1 Unit tests</p><p>In the SEKT project one of the case studies aimed at providing an intelligent FAQ system to help newly appointed judges in Spain (Benjamins et al., 2005). The system depends on an ontology for finding the best answers and to find references to existing cases in order to provide the judge with further background information. The applied ontology is built and maintained by legal experts with almost no experience in formal knowledge representation (Casanovas et al., 2005). As the ontology evolved and got refined (and thus changed), the legal experts noticed that some of their changes had undesired side effects. To give a simplified example, consider the class hierarchy depicted in Figure 9.1. Let’s assume that this ontology has been used for a while already, before someone notices that not every academic needs necessarily be a member of an university. So Academic becomes a direct subclass of Person, instead of University_member. But due to this change, also <a href="/tags/Professor/" rel="tag">Professor</a> is no subclass of University_member anymore (a change that maybe was hidden from the</p><p>1http://cordis.europa.eu/ist/kct/sekt_synopsis.htm</p><p>152 9.1 Unit tests</p><p>Figure 9.1: Example class hierarchy.</p><p> ontology engineer, as the ontology development environment may not have displayed all subclasses of Academic). The resulting ontology remains perfectly satisfiable. But a tool, that, for example, creates a Web page for all members of the university may now skip the <a href="/tags/Professor/" rel="tag">professors</a>, since they are not classified as university members any more – an error that would only become apparent in the use of the tool much later and will be potentially hard to track down to that particular ontology change operation. Unit testing for ontologies can discover such problems, and a few other ones as well. In software engineering, the idea of unit testing (Beck, 1999) was introduced to counter the complexities of modern software engineering efforts. Unit tests are meant to facilitate the development of program modules or units, and to ensure the interplay of such units in the combined system. It results in code that is easier to refactor and simpler to integrate, and that has a formalized documentation (although not 9 necessarily complete). Unit tests can be added incrementally during the maintenance of a piece of software, in order to not accidentally stumble upon an old bug and hunt it down repeatedly. Unit tests in software engineering became popular with the object oriented language Smalltalk, and still to this day remain focused on languages with strong possibilities to create smaller units of code. They are based on several decomposition techniques, most important of all information hiding. Ontologies behave quite differently than program units. As there is no notion of information hiding in ontology engineering, and thus no black box components, at first the idea of unit testing for ontologies seems not applicable. Therefore we need to adapt the idea for ontologies. The main purpose of unit tests for ontologies is similar to their purpose in software engineering: whenever an error is encountered with an axiom which is falsely inferred or respectively incorrectly not inferred, the ontology maintainer may add this piece of knowledge to the appropriate test ontology. Whenever the ontology is changed, the changed ontology can be automatically checked against the test ontology, containing</p><p>153 Chapter 9 Context the formalized knowledge of previously encountered errors. We investigate the benefits of unit testing applied to ontologies, especially their possibilities to facilitate regression tests, to provide a test framework that can grow incrementally during the maintenance and evolution phase of the ontology, and that is reasonably simple to use. In order for unit tests for ontologies to be useful, they need to be reasonably easy to use and maintain. This will depend heavily on the given implementation. The following approach is informed by the idea of design by contract, i.e. we enable to formalize what statements should and should not derive from an ontology being developed or maintained, either as formalized competency questions (Section 9.1.1) or as explicit ontology statements (Sections 9.1.2 and 9.1.3).</p><p>9.1.1 Formalized competency questions Competency questions, as defined by some methodologies for ontology engineering (such as OTK (Sure and Studer, 2002) or Methontology (Fern´andez-L´opez et al., 1999)), describe what kind of knowledge the resulting ontology is supposed to answer. These questions are necessarily formalizable in a query language (since otherwise the ontology management module would actually not be able to give the answer to the sys- tem later). Formalizing the queries instead of writing them down in natural language, and formalizing the expected answers as well allows for a system to automatically check if the ontology meets the requirements stated with the competency questions.</p><p>Method 19 (Checking competency questions against results) Formalize your competency question as a SPARQL query. Write down the ex- pected answer as a SPARQL query result, either in XML (Beckett and Broekstra, 2008) or in JSON (Clark et al., 2007). Compare the actual and the expected results. Note that the order of results is often undefined.</p><p>This approach is especially interesting when the expressivity of the query language is outside of the expressivity of the knowledge representation language. This is, for example, the case with SPARQL and OWL. The following SPARQL query for ex- ample returns all instances of Good_mother, i.e. those mothers that are also friends with their child (we omit namespace declarations for readability). Since there are no property intersections in OWL, one cannot describe a class including all instances of Good_mother in OWL.</p><p>SELECT ?Good_mother WHERE { ?child :mother ?Good_mother .</p><p>154 eut culyb nisac ffte,tersligmre nooywl eincon- be will ontology merged resulting the father, of instance sistent. an be actually results Father) the like DisjointClasses(Good_mother results, the on constraints includes that following. ontology constraint a and answering) CONSTRUCT Good_mother } } :Good_mother { rdf:type WHERE ?Good_mother { CONSTRUCT omitted): are declarations namespace (again, to SPARQL needs some a indeed define use answer can can the we that case (like that possible In even time). or song). song query sensible a the at is be (like to answer often listening i.e. the changing ontologies, is if keep dynamic system checks that for the world true of the especially sometimes, user is of the that This properties note reflect question. known Also the that be to writing ontologies need of complete. necessarily not time is does the answer ontology at that the Note formalizable, the is ontology. that question the the mean although of competency all extent not answering the – does define questions to exemplary questions just order are in usually build, questions initial competency its for rather but } osbyepyotlg otiigfrhrconstraints ontology further inconsistencies. as containing for RDF ontology ontology in empty result possibly the a ontology formulates for that question query constraints) competency with your questions Formalize competency (Checking 20 Method h nooysae htftescno ego ohr.Nwsol n fthe of one should Now mothers. good be cannot fathers that states ontology The the for instantiations class of only consists that ontology an in result will This SPARQL a using of Instead constraints? such test we can How system, the of maintenance the for not useful especially method this consider We cid:red?odmte . ?Good_mother :friend . ?child ?Good_mother :mother ?child cid:red?odmte . ?Good_mother :friend ?child ur ihtebcgon nooy(sal h nooyue o query for used ontology the (usually ontology background the with query ls.W a o eg h nooyrsligfo h SPARQL the from resulting ontology the merge now can We class. CONSTRUCT ur ocet e nooywt h ie results given the with ontology new a create to query O saSPARQL a as R Merge . C hc h merged the Check . SELECT R . nttests Unit 9.1 with CONSTRUCT ur,we query, O and 155</p><p>9 Chapter 9 Context</p><p>9.1.2 Affirming derived knowledge Unit tests for ontologies test if certain axioms can or can not be derived from the ontology (Vrandeˇci´cand Gangemi, 2006). This is especially useful in the case of evolving or dynamic ontologies: we can automatically test certain assumptions with regards to a given ontology O. We create two test ontologies T + (called the positive test ontology) and T − (the negative test ontology), and define that an ontology O, in order to fulfill the constraints imposed by the test ontologies, needs to fulfill the following conditions: each axiom + + + A1 ...An ∈ T must be derivable from O, i.e.</p><p>+ + + O |= Ai ∀Ai ∈ T</p><p>− − − and each axiom A1 ,..., An ∈ T must not be derivable from O, i.e.</p><p>− − − O 6|= Ai ∀Ai ∈ T</p><p>Note that T + trivially fulfills the first condition if O is not satisfiable, whereas an empty ontology trivially fulfills the second condition. So it is not hard to come up with ontologies that fulfill the conditions, which shows that unit tests are not meant to be complete formalizations of the requirements of an ontology, but rather helpful indicators towards possible errors or omissions in the tested ontologies. To come back to our previous example in Section 9.1 a simple test ontology T + that consists of the single axiom SubClassOf(Professor University_member) would have been sufficient to discover the problem described. So after the discovered error, this statement is added to the test ontology, and now this same error will be detected next time automatically by running the unit tests. The test ontologies are meant to be created and grown during the maintenance of the ontology. Every time an error is encountered in the usage of the ontology, the error is formalized and added to the appropriate ontology (like in the example above). Experienced ontology engineers may add appropriate axioms in order to anticipate and counter possible errors in maintenance. In software engineering it is often the case, that the initial development of a pro- gram is done by a higher skilled, better trained, and more consistent team, whereas the maintenance is then performed by a less expensive group, with less experienced mem- bers, that change more frequently. So in software engineering, the more experienced developers often anticipate frequent errors that can happen during maintenance, and create unit tests accordingly in order to put appropriate constraints on the future evo- lution of the software. We expect a similar development in ontology engineering and maintenance, as soon as ontologies become more common components of information systems. The framework proposed here offers the same possibilities to an ontology engineer.</p><p>156 opromwa ehv ecie with described have we what perform to osblte fui etn elbyn omlzdcmeec questions. the competency expand is formalized ontologies SPARQL beyond test than well negative testing of constructs unit notion OWL the of already express also Finally, possibilities We and to 6.2). OWL, natural Section complementary. of with more largely expressivity (compare much are the are beyond but ontologies queries other, test enables SPARQL the that into shown one have from translated be each for .W eeaieadepn h hoeia onain fteframe- the of foundations theoretical the expand and work. generalize We 2005). (Horridge, 2 http://www.co-ode.org/downloads/owlunittest/ ro message. error ontology tested the by inferred message. error ontology an tested issue the by inferred ontologies) axiom test each with For testing (Unit 21 Method rte´ lgi mlmniga W ntTs framework Test Unit OWL Prot´eg´e an implementing A plug-in omlzdcmeec usin n oiieui etotlge a sometimes can ontologies test unit positive and questions competency Formalized from axioms the add just not engineer ontology an should Why h xosin axioms the 9.1.3, Section in discussed as language use 4. else or complexity, reasoning increase it potentially makes may thus axioms the and ontology, 3. the in redundancy increases axioms such adding 2. fial,det h pnwrdassumption, world open the to due finally, 5. axiom every not 1. o ahaxiom each For iuae ihtemaso W DL. OWL of means the with simulated as osrcsta r o en ob sdwti h ontology. the within used be to meant not are that constructs maintain. and edit to harder entities. new inventing without negated be cannot O A i − | = in ¬ T A − i − hr r eea reasons: several are There ? ∀ A A i + A i − i − ntepstv etontology test positive the in A ∈ ntengtv etontology test negative the in i − T a engtd o xml,asbupinstatement subsumption a example, For negated. be can − ota h eaiets nooycnatal o be not actually can ontology test negative the that so , O O o vr xo hti en nerd su an issue inferred, being is that axiom every For . o vr xo hti o en inferred, being not is that axiom every For . T + etn o ffimn eie knowledge derived affirming for testing O T − 6| = T a econtradictory. be may T + A − i − eti h xo sbeing is axiom the if test eti h xo sbeing is axiom the if test ∀ A i − ∈ 2 T T xss htallows that exists, − + snttesame the not is to . nttests Unit 9.1 O and , ¬ 157 A i −</p><p>9 Chapter 9 Context</p><p>9.1.3 Asserting agnosticism Whereas ontologies in general should be consistent in order to be considered useful, this is not true for the negative test ontology T −. Since T − is simply a collection of all axioms that should not be inferrable from the tested ontology O, it does not need to be satisfiable. Thus they may be two (or more) sets of axioms (subsets of T −) that contradict each other. What does this mean? The existence of such contradicting sets mean that O must not make a decision about the truth of either of these sets, thus formalizing the requirement that O must be agnostic towards certain statements. For example an ontology of complexity classes (where each complexity class is an individual) should have the following two axioms in its negative test ontology:</p><p>DifferentIndividuals(P NP) SameIndividual(P NP)</p><p>It is obvious that the test ontology is inconsistent, but what it states is that any tested ontology will neither allow to infer that P and NP are equal, nor that they are different. Every ontology that is successfully tested against this negative test ontology will be agnostic regarding this fact. Besides known unknowns (such as the P=NP? problem) we can also assert agnos- ticism in order to preserve privacy. If we want to ensure that it cannot be inferred if Alice knows Bob from an ontology (i.e. a set of statements, possibly gathered from the Web), we can state that in a test ontology:</p><p>PropertyAssertion(foaf:knows Alice Bob) NegativeProperyAssertion(foaf:knows Alice Bob)</p><p>Note that if such a test ontology is indeed used for privacy reasons, it should not be made public since knowing it may lead to important clues for the actual statements that were meant to remain secret.</p><p>9.2 Increasing expressivity for consistency checking</p><p>Certain language constructs, or their combination, may increase reasoning time con- siderably. In order to avoid this, ontologies are often kept simple. But instead of abstaining from the use of more complex constructs, an ontology could also be mod- ularized with regards to its complexity: an ontology may come in different versions, one just defining the used vocabulary, and maybe their explicitly stated taxonomic relations, and a second ontology adding much more knowledge, such as disjointness or domain and range axioms. If the simple ontology is properly built, it can lead to an ontology which often yields the same results to queries as the complete ontology (depending, naturally, on the actual queries). The additional axioms can be used in</p><p>158 rpryag(ahrMale) PropertyRange(father Person) PropertyDomain(parent parent) SubPropertyOf(mother parent) SubPropertyOf(father ontology: rpryseto(ohrEeSeth) Eve PropertyAssertion(mother Seth) Adam PropertyAssertion(father n hc o h aifiblt ftemre ontology merged the of satisfiability the in for check used and entities the of axioms esnr odrv nwr ihabte epnetm,cudb sdinstead. used be could enables time, potentially after response that but ontology, better usage, the a the of with of version answers beginning weight derive the light to at a reasoners checked, consistency been for ontology has the this check interface account. to user into used a information offer be this take to can want that that algorithms tools mapping for for indicator or Nevertheless ontology, an ontology. the as DL to useful OWL an be in can statements statement inferred the any to or lead ever mod- alignment, hardly and possible tion, mapping restrict ontology to as used sources. such different applications, are from many rather integration in DL but information OWL useful of information, is constructors This further complex more add els. the ontologies not weight of many Light do Also, ontology. often the best. to rea- task allowing regards this of fulfill with requirement usually queries the answer fulfill quickly to to need soners often systems information in 9.2.4. Ontologies Section checks in consistency favor our Expressive into error 9.2.1 this description with turn ontologies modeling to of ontology try (relative) introspection in and their error the languages, common to formalize based a regards to logics regard with way also We especially great ontologies, 9.2.3). a themselves of (Section are lend testing they operators the since Autoepistemic in completeness, used 9.2.2 . be Section the to in to also rules an it adding to evaluating by move OWL, then for of We expressivity ontology the 9.2.1. beyond Section well in going evaluation formalism ontology exemplary to regards suffice. with may to interplay ontology can regards simple with the base them knowledge querying the for but and version, ontology axiomatized the higher of the consistency the check to order o osdrontology consider Now o xml,cnie h olwn ontology following the consider example, For omly eitoueats ontology test a introduce we Formally, can classes, of disjointness as such ontology, the on constraints expressive Further assump- world open the to due will, constraint cardinality minimal a example, For they how and ontologies, lightweight to heavy- of relationship the investigate We C omlzn ute osrit ntetrsue nthe in used terms the on constraints further formalizing , O htcntantepsil oeso h ontology, the of models possible the constrain that . nraigepesvt o ossec checking consistency for expressivity Increasing 9.2 C o nontology an for O bu family: a about O ∪ C . O nldn additional including , 159</p><p>9 Chapter 9 Context</p><p>PropertyRange(mother Female) DisjointClasses(Male Female)</p><p>Now, O ∪ C will be recognized as inconsistent by a reasoner. This is because of the definitions of the properties father and mother showing that it should point to the father respectively the mother, whereas in O it is used to point from the father respectively the mother. This is recognized because C adds range axioms on both, thus letting us infer that Seth has to be an instance of both Female and Male, which is not possible due to the DisjointClasses axiom in C.</p><p>Method 22 (Increasing expressivity) An ontology O can be accompanied by a highly axiomatized version of the ontology, C. The merged ontology of O∪C has to be consistent, otherwise the inconsistencies point out to errors in O.</p><p>9.2.2 Consistency checking with rules The consistency checks with context ontologies do not need to be bound by the ex- pressivity of OWL, but can instead be using languages with a different expressivity, such as SWRL (Horrocks et al., 2003; Brockmans and Haase, 2006). For our example, we will use the transformation of the ontology to a logic programming language like datalog (Grosof et al., 2003) as implemented by the OWL tools (Motik et al., 2005) wrapping KAON2 (Motik, 2006). We can then add further integrity constraints to the resulting program. This may exceed the expressivity available in the original ontology language. Consider constraints on the parent relationship that state that a parent has to be born before the child. In datalog we can concatenate the translation of the ontology to datalog (i.e. LP(O)) with the constraint program C, and test the result- ing program for violations of the integrity constraints. Note that LP(O) cannot fully translate arbitrary complex ontologies O, but for our use case it is sufficient to only regard the resulting logic program. We can simply ignore the rest. Consider (a repaired version of) the family ontology O given in Section 9.2.1. We also add years of birth for Adam and Seth:</p><p>PropertyAssertion(father Seth Adam) PropertyAssertion(mother Seth Eve) PropertyAssertion(birthyear Adam "-3760"^^xsd:gYear) PropertyAssertion(birthyear Seth "-3890"^^xsd:gYear)</p><p>Translating it to datalog will yield the following result LP(O):</p><p>160 on10yasafter years 130 born h aet hnw aea nossec.W a o concatenate now can We inconsistency. an have and we then parent, the ← before): ← Female(Y) Male(Y) Person(X) Y) parent(X, Y) parent(X, -3890) birthyear(Seth, -3760) birthyear(Adam, Eve) mother(Seth, Adam) father(Seth, vr esni u nweg aenest aeakonnm n nades and address, an and name known a have to needs base information knowledge our of in kind person every what using have, describe to to A supposed is is expressivity ontology increased an using approach One operators autoepistemic of Use 9.2.3 osrit.Rntersligporm fi assayitgiycntans then constraints, integrity any errors. raises contains it ontology If evaluated program. the resulting the further Run Formalize a constraints. complete. to be ontologies constraints. to integrity constraint have or not possible rules does as and constraints translation evaluated This be program. to logic ontology rules) the with Translate checks (Inconsistency 23 Method .I hswyw a,freape en that define example, for can, we way this In ). 2005 Motik, and (Grimm -operators R following the as such constraints integrity further regard we Now eas rnlt h osritontology constraint the translate also We aetX Y) parent(X, Male(X) octnt h rnltdotlge n h ute osrit rintegrity or constraints further the and ontologies translated the Concatenate R ttsta fteya fbrho h aetscidi eoeteya fbrhof birth of year the before is child parent’s the of birth of year the if that states n hc o nossece n ned n ilb asdsince raised be will one indeed, and – inconsistencies for check and ← ← ← ∧ Y) father(X, ohrX Y) mother(X, Y) parent(X, ← ← Female(X) ∧ Y) mother(X, Y) father(X, Seth itya(,BX) birthyear(X, n hsteotlg utb inconsistent. be must ontology the thus and , . nraigepesvt o ossec checking consistency for expressivity Increasing 9.2 uopsei constructs autoepistemic ∧ C itya(,BY) birthyear(Y, as 9.2.1 Section from ∧ uha the as such B BX) < (BY LP LP R ( C ( ( ): O < Adam ), meaning K LP and - was ( 161 C )</p><p>9 Chapter 9 Context put this into a description. Note that this is different than just using the existential construct. An axiom like</p><p>SubClassOf(Human SomeValuesFrom(parent Human)) tells us that every human has a human parent, using the K operator we could require that every humans’ parent has to be explicitly given in the knowledge base or else an inconsistency would be raised. Autoepistemic operators allow for a more semantic way to test the completeness of knowledge bases than the syntactic XML based schema definitions described in Section 5.3. This way the ontology can be evaluated with regards to its data completeness. Data completeness is defined with regards to a tool that uses the data. The tool needs to explicate which properties it expects when being confronted with an individual belonging to a certain class. The operators allow to map completeness checks to satisfiability checks, and use a common language to express these checks. Otherwise the completeness checks have to be checked programmatically by the tool. In (Grimm and Motik, 2005) an extension of OWL DL with autoepistemic operators is described. The axiom above translated to DL would be</p><p>Human v ∃parent.Human</p><p>Using the K- and A−operators instead, we would define that</p><p>KHuman v ∃Aparent.AHuman i.e. for every human a parent must be known who also must be known to be human (i.e. either stated explicitly or inferrable) in the ontology, or else the ontology will not be satisfiable.3 Thus we are able to state what should be known, and a satisfiability check will check if, indeed this knowledge is present. On the Semantic Web, such a formalism will prove of great value, as it allows to simply discard data that does not adhere to a certain understanding of completeness. For example, a crawler may gather event data on the Semantic Web. But instead of simply collecting all instances of event, it may decide to only accept events that have a start and an end date, a location, a contact email, and a classification with regards to a certain term hierarchy. Although this will decrease the recall of the crawler, the data will be of a higher quality, i.e. of a bigger value, as it can be sorted, displayed, and actually used by calendars, maps, and email clients in order to support the user. The formalization and semantics of autoepistemic operators for the usage in Web ontologies is described in (Grimm and Motik, 2005).</p><p>3The example follows ex. 3.3 in (Donini et al., 2002)</p><p>162 h niiulntbigarayepiil setdt ea ntneo htclass. that of instance an fruitful. provide be indeed to could with asserted approach property 130 this explicitly that Watson that of using already axiom indicates individual the domain being This an and the to not where EA assertion case individual Watson class single the inferred the a find an both not added checked did property we We misinterpretationa and useful 11.3) not. a Section is or (see it this corpora ontology evaluating if for out given merely figure a but quickly ontology, will for the evaluator explicitly existing using The this for already that usage). reinterpretation (note, the before evaluation an ontology of as of meant semantics sake not the the Based is for misinterpret constraints. axioms inferences type consciously range as more to and i.e. add domain suggest types, to axiom we order new that the in by on not intended erroneously, meaning the axioms with range but and of domain domain add often the in be to known is of instance that everything that stating ∃ semantics following the to translates C) the PropertyDomainConstraint(R as syntax same the have they and name respective property, their a Besides of respectively. types, domain axiom a new on that two constraint assumes introduce This to need function. suitable a find not actually Female polymorphism, to due (or, child) C or Java like language between stated is axiom a as classified be will languages, They programming in 2008). constraints Seth Hendler, type that like and say used they (Allemang be if find i.e. to confusing often ranges range they and and domain programming, expect domain in background of a semantics have the engineers ontology constraints novice When as ranges and Domain 9.2.4 K nta fuiganwaimtp,w ol louetefc htotlg engineers ontology that fact the use also could we type, axiom new a using of Instead we programmers, of expectations the to similar mechanism a simulate to order In R . v > and aln h ucinwt nojc ntneof instance object an with function the calling , sntasbls of subclass a not is Eve A PropertyRange C C . h sa is who , father Man Woman ++ Male fw a ucindfie as defined function a had we if , yteraoe frtesk fteeapen disjointness no example the of sake the (for reasoner the by sarlto ihterange the with relation a is xos sn uopsei oi,teaxiom the logic, autoepistemic Using axioms. Male hyepc nicnitnyt erie.Instead, raised. be to inconsistency an expect they , and ). . nraigepesvt o ossec checking consistency for expressivity Increasing 9.2 Female PropertyDomainConstraint .We rgamn nasrng typed stronlgy a in programming When ). PropertyRangeConstraint Male Female ahrMl a,Person dad, father(Male n n ple tbetween it applies one and ilriea exception an raise will R a ob known a be to has PropertyDomain odsrb a describe to o h range the for Eve 163</p><p>9</p><p>Part III</p><p>Application</p><p>10 Collaborative ontology evaluation in Semantic MediaWiki 167</p><p>11 Related work 185</p><p>12 Conclusions 197</p><p> ihepii nomto.Uigti eatcdt,SWadesscr rbesof problems core addresses SMW data, contents semantic wikis: wiki’s this the today’s Using annotate to information. users explicit enabling with by MediaWiki introduce enhances SMW we software. chapter this In with (Kr¨otzsch by text structured. writing way, created by weakly plain is easy e.g. only mainly page brackets, and in another is quick page to This the link a share of a in wikitext. name to example, the For content as and enclosing elements. known online knowledge markup syntax collected manage occasional wikis, simple the some to of some organize tools majority editing vibrant to are a by many are Wikis For and goals Web, primary knowledge. information. the exchange not, on to or collaboration wikis public for employ tools communities popular online become have Wikis MediaWiki Semantic in evaluation ontology Collaborative 10 Chapter u nsieo hi tlt,tecneti ii sbrl ahn-cesbeand machine-accessible barely is wikis in content the utility, their of spite in But • • ossec fcontent of Consistency comparing way? distributed knowledge a in Accessing changed consistent, is be system can the of it parts as different in especially information that ensure one can How tal. et ,awdl sdwiki used widely a 2008), (Barret, MediaWiki to extension an 2007c), , nomto rmdffrn ae scalnigadtime-consuming. and challenging is pages different from information ag ii aetosnso ae.Fnigand Finding pages. of thousands have wikis Large : h aeifrainotnocr nmn pages. many on occurs often information same The : eatcMediaWiki Semantic l usaeshallow. are bugs all eyeballs, enough Given h ahda n h Bazaar the and Cathedral The [eatcWeb]] [[Semantic Ei amn,b 1957, b. Raymond, (Eric amn,2001)) (Raymond, (SMW) . 167</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>Webserver (Apache)</p><p>Page display and Special MediaWiki manipulation pages</p><p>Inline OWL Setup Parsing Rendering ... Queries Export Java- Setup Scripts + Lan- Storage Datatype API CSS guage Abstraction Data processing system Language Storage Type:String DB interface Implementation Type:Date Semantic Type:Number MediaWiki DB Semantic MediaWiki ... (MySQL) store</p><p>Figure 10.1: Architecture of SMW’s main components in relation to MediaWiki.</p><p>• Reusing knowledge: Many wikis are driven by the wish to make information accessible to many people. But the rigid, text-based content of classical wikis can only be used by reading pages in a browser or similar application.</p><p>SMW is a free and open source extension of MediaWiki, released under the GNU Public License. Figure 10.1 provides an overview of SMW’s core components and ar- chitecture. The integration between MediaWiki and SMW is based on MediaWiki’s extension mechanism: SMW registers for certain events or requests, and MediaWiki calls SMW functions when needed. SMW thus does not overwrite any part of Me- diaWiki, and can be added to existing wikis without much migration cost. Usage information about SMW, installation instructions, and the complete documentation are found at SMW’s homepage.1 Section 10.1 explains how structural information is collected in SMW, and how this data relates to the OWL ontology language. Section 10.2 surveys SMW’s main features for wiki users: semantic browsing, semantic queries, and data exchange on the Semantic Web. Queries are the most powerful way of retrieving data from SMW, and their syntax and semantics is presented in detail. In Section 10.3 we survey related systems. Based on our definition of ontology, it is clear that SMW is an ontology engineering tool that aims at a massively collaborative usage. Based on that, Section 10.4 discusses how ontology evaluation can be performed collaboratively within SMW using some of the approaches introduced in PartII of this thesis.</p><p>1http://semantic-mediawiki.org</p><p>168 Special: editing, and reading namespace. for called processing main technical the of to kind same belong the denoted simply to subject prefix are page’s namespace pages A known Most a site. without a titles of Page settings configuration as or cannot the such pages, prefix, of Namespaces documentation specific part a are by function. signified but their is users, namespace into to wiki classified according by further defined pages are be of pages kinds these different MediaWiki, distinguish In may pages. articles wiki example, year. that for after Wikipedia, updates or In revalidation form need the might informally. of pages sometimes articles to are links classify and contain navigation, to for vital used are are They Accordingly, hyperlinks even feature. wiki, for a important some within most reading. pages even the of for and arguably interrelation the formatting, pages is defining describing For HTML for wiki content. structuring facilities into a many transformed provides into already is information wikitext that entering language for markup method primary The MediaWiki in (Sec- structuring Content presented is 10.1.1 OWL of terms in 10.1.1), formal structure (Section a wiki’s Finally, data 10.1.3). the tion 10.1.2). structuring (Section of of properties interpretation means with semantic current annotations SMW’s MediaWiki’s introduce of structure and further some adding recall of ways we of introduces means by SMW suitably MediaWiki purpose, of to availability this the For is technologies data. semantic structured exploiting of prerequisite main The pages wiki of Annotation 10.1 rbbyi h n ucino eiWk hti lss nsii oteetnin of extensions the to system spirit category in the closest Overall, is that wiki. MediaWiki the and of be within pages, SMW. function can markup one classified hierarchy the special their the is via and browse probably users categories the to all Page in used by hierarchically. page be edited categories a can organize with turn to represented in also is pages category one Category elaborate each to more namespace. assigned and a be categories, by can many replaced page Vrandeˇci´c, been or Every and has (Schindler 2010). use system this however, category MediaWiki, In page the country. to that link a with pages all for h rmr tutrlmcaimo otwksi h raiaino otn in content of organization the is wikis most of mechanism structural primary The aywk nie eeal s ik o lsiyn ae.Frisac,searching instance, For pages. classifying for links use generally engines wiki Many pca pages special aedslyadmanipulation and display Page sanmsaeprefix. namespace a as ul-nqeyfrswtotue-dtdcnet–ta use that – content user-edited without forms query built-in – Talk: annotating o icsinpgso rilsi h annamespace. main the in articles on pages discussion for [so 2010]] of [[as [[France]] h eta otn ftewk.I hssection, this In wiki. the of content textual the h ao xeto r so- are exception major The 10.1. Figure in sago a ofidifrainabout information find to way good a is User: osaeta h ie information given the that state to 01Antto fwk pages wiki of Annotation 10.1 o srhomepages, user for wikitext namespaces simplified a , Category: Help: which , 169 for</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>Another structuring problem of large wikis are synonymous and homonymous ti- tles. In case of synonyms, several different pages for the same subject may emerge in a decentralized editing process. MediaWiki therefore has a redirect mechanism by which a page can be caused to forward all requests directly to another page. This is useful to resolve synonyms but also for some other tasks that suggest such forwarding (e.g. the mentioned articles [[as of 2005]] are redirects to the page about the year 2005). Homonyms in turn occur whenever a page title is ambiguous, and may refer to many different subjects depending on context. This problem is addressed by so-called disambiguation pages that briefly list the different possible meanings of a title. Actual pages about a single sense then either use an unique synonym or are augmented with parentheses to distinguish them, e.g. in the case of [[1984 (book)]]. A final formatting feature of significance to the structure of the wiki is MediaWiki’s template system. The wiki parser replaces templates with the text given on the tem- plate’s own page. The template text in turn may contain parameters. This can be used to achieve a higher consistency, since, for example, a table is then defined only once, and so all pages using this table will look similar. The idea of capturing semantic data in templates has been explored inside Wikipedia2 and in external projects such as DBpedia (Auer and Lehmann, 2007). In addition to the above, MediaWiki knows many ways of structuring the textual content of pages themselves, e.g. by sections or tables, presentation markup (e.g. text size or font weights), etc. SMW, however, aims at collecting information about the (abstract) concept represented by a page, not about the associated text. The layout and structure of article texts is not used for collecting semantic annotations, since they should follow didactic considerations.</p><p>10.1.2 Semantic annotations in SMW</p><p>Adhering to MediaWiki’s basic principles, semantic data in SMW is also structured by pages, such that all semantic content explicitly belongs to a page. Using the terms from Section 2.3, every page corresponds to an ontology entity (including classes and properties). This locality is crucial for maintenance: if knowledge is reused in many places, users must still be able to understand where the information originated. Dif- ferent namespaces are used to distinguish the different kinds of ontology entities: they can be individuals (the majority of the pages, describing elements of the domain of interest), classes (represented by categories in MediaWiki, used to classify individuals and also to create subcategories), properties (relationships between two individuals or an individual and a data value), and types (used to distinguish different kinds of prop- erties). Categories have been available in MediaWiki since 2004, whereas properties and types were introduced by SMW.</p><p>2See, e.g., http://de.wikipedia.org/wiki/Hilfe:Personendaten.</p><p>170 iipg fta ae h ie ik to links given The name. [[2005]] that of brackets page square wiki within quotes text triple and read: bold-faced, to easy are ments properties. link’s are of coordinates the given types geographic available other or that a take dates, other such properties calendar for of all quantities, properties, examples significant numeric not But by values: is property. as characterized it user-provided pages a be wiki whether of value to or the links becomes is, target allows wikis this SMW current relationship in of relations purpose. kind binary establishes of what kind link specifying obvious Each most hyperlinks. The are relationships explicit. binary wikis making by existing available becomes in data additional of amount surprising of means way. a structured are a properties in where information contents perspective page’s primary page-centric a a as augmenting adopts not triples) rather does SMW SMW (subject-predicate-object units. languages, also statements RDF-based unlike properties property But binary and view mechanism. property where expressive SMW’s area, formalisms central properties. a topic Web available wiki- are Semantic its of Each set standard on value. the follows control data depending mechanism users or relationships wiki individual lets different other SMW in therefore some interested and page) is wiki community a by SMW. represented of hidden, (as top more on become overall extensions to the of syntax of number annotation part a the small by expect exemplified a also as but We on is relevant. based it more framework, editors, conceptual rather underlying wiki the The to by system. visible) performed SMW is most markup (and this relevant of most processing The markup. for special components a via pages of text (bottom). SMW in and (top) MediaWiki in London about page a of Source 10.2: Figure rao 0 qaemls [[Category:City]] [[Category:City]] miles]]. square [[area::609 [[population::7,421,328]]. of estimated area was an London covers of London population the Greater [[2005]], an of covers As London Kingdom]]. Greater miles. 7,421,328. ''' square estimated 609 was of London area of population the [[2005]], ''' (o) h akpele- markup The (top). 10.2 Figure in shown wikitext the consider example, For a and pages, to values property assigning for mechanism general no offers MediaWiki individual one between relationships binary express to used are SMW in Properties source wiki the to annotations add users letting by data semantic collects SMW London London ''' ''' ontcryaymcieudrtnal eatc e.T tt htLon- that state To yet. semantics machine-understandable any carry not do stecptlct f[cptlo:Egad]ado h [aia of::United [[capital the of and of::England]] [[capital of city capital the is stecptlct f[Egad]ado h [ntdKndm] sof As Kingdom]]. [[United the of and [[England]] of city capital the is parsing and rendering ''' some hl h noainsna is syntax annotation the While 10.1. Figure in . . . [[England]] ''' eainhpbtentopgs without pages, two between relationship [[ . . . r sdfrtx htsol appear should that text for used are ]] stasomdit ik othe to links into transformed is 01Antto fwk pages wiki of Annotation 10.1 , [ntdKingdom]] [[United properties and types and , 171 is</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>Figure 10.3: A semantic view of London.</p><p> don is the capital of England, one just extends the link to [[England]] by writing [[capital of::England]]. This asserts that London has a property called capital of with the value England. This is even possible if the property capital of has not been introduced to the wiki before. Figure 10.2 (top) shows further interesting data values that are not corresponding to hyperlinks, e.g. the given population number. A syntax for annotating such values is not as straightforward as for hyperlinks, but we eventually decided for using the same markup in both cases. An annotation for the population number therefore could be added by writing [[population::7,421,328]]. In this case, 7,421,328 is not referring to another page and we do not want our statement to be rendered as a hyperlink. To accomplish this, users must first declare the property population and specify that it is of a numerical type. This mechanism is described below. If a property is not declared yet, then SMW assumes that its values denote wiki pages such that annotations will become hyperlinks. An annotated version of the wikitext for London is shown in Figure 10.2 (bottom), and the resulting page is displayed in Figure 10.3. Properties are introduced to the wiki by just using them on some page, but it is often desirable to specify additional information about properties. SMW supports this by introducing wiki pages for properties. For example, a wiki might contain a page [[Property:Population]] where Property: is the namespace prefix. A property page can contain a textual description of the property that helps users to employ it consistently throughout the wiki, but it also can specify semantic features of a property. One such feature is the aforementioned (data)type of the property. In the case of [[Property:Population]] one would add the annotation [[has type::Number]] to describe that the property expects numerical values. The property has type is a built-</p><p>172 M sitninlysalw ic h iii o nedda eea purpose general a as intended not in is representable wiki information the schematic since the shallow, Overall, intentionally is (Voss, hierarchies. wikis SMW property all property special for to desirable for a configured be introduces used not be SMW may can (this Moreover, hierarchy SMW class 2006)). and OWL SMW- MediaWiki an categories, provide as similarly. of this that treated interpret organisation are properties hierarchical conversion) Many the unit for supports property. annotation (e.g. an meta-information specific as interpreted is and in property pages above of The OWL. containment interpretation. in Finally, membership type. property. class datatype their as a interpreted on Schema is of XML depending categories values the MediaWiki’s upon literal those decide for may of but used any semantics, type OWL represent have not may do themselves properties Types SMW thus annotations properties. Most statements. and literals. OWL simple classes, typed to OWL or mapped to individuals directly abstract correspond properties are categories be individuals, can to properties, language. values OWL correspond ontology property mapping pages to OWL obvious normal correspond the the entities: SMW using to OWL, OWL later in of to mapping the terms pages for a in wiki mapping exported via from be their given easily as can is well annotations 10.2.3) as Most SMW, Section in (see annotations export of semantics formal The OWL to wiki. larger Mapping a in 10.1.3 avoided annotations be consolidating hardly for value can great which of units, is different conversion Unit use km that 10.3 ). specific between Figure conversion for in the seen support supports be conversion that type with main property custom types The the a use example, numerical types. For endow existing measurement. of to of versions new units is parameterized creating this of create by processing of can To computational datatypes they application whole page. customized the but define according value, new cannot data the create course a of to to pages link possible These a also pages. creates type is declaration it type extent, pages every dedicated some SMW and have also wiki, values. types the data properties, within like display Just to datatypes. and application-specific input, by user modular process a to supplies methods own type provides default type the and earth), on (locations on deleted. described are or be those modified also be can cannot It it interpretation. but special page given property the its with SMW of property in M ffr ubro ul-npoete htmyas aeaseilsemantic special a have also may that properties built-in of number a offers SMW annotation and properties, datatype properties, object distinguishes further OWL Among properties. with used be can that datatypes of number a provides SMW String caatrsequences), (character aayeAPI Datatype ta a lob extended be also can that 10.1 Figure in shown as a type has Date Page o ntne a oeuvln nOWL in equivalent no has instance, for , pit ntime), in (points htcetslnst te ae.Each pages. other to links creates that Area 01Antto fwk pages wiki of Annotation 10.1 (otm might (bottom) 10.2 Figure in upoet of subproperty 2 n qaemls(scan (as miles square and egahccoordinate Geographic htcnbe can that 173</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>Figure 10.4: Inverse search in SMW, here giving a list of everyone born in London.</p><p> ontology editor that requires users to have specific knowledge about semantic tech- nologies.</p><p>10.2 Exploiting semantics</p><p>However simple the process of semantic annotation may be, the majority of users will neglect it as long as it does not bear immediate benefits. In the following we introduce several features of SMW that show contributors the usefulness of semantic markup.</p><p>10.2.1 Browsing As shown in Figure 10.3 the rendered page may include a so called factbox which is placed at the bottom of the page to avoid disturbing normal reading. The factbox summarizes the given annotations, provides feedback on possible errors, e.g. if a given data value does not fit a property’s type, and offers links to related functions. Note that most SMW instances do not display the factbox but rather choose to customize the users experience by using inline queries to display the semantic data. These links can be used to browse the wiki based on its semantic content. The page title in the factbox heading leads to a semantic browsing interface that shows not only the annotations within the given page, but also all annotations where the given page</p><p>174 ancnrlsmosue osrcueqeisare: queries structure to specify used even symbols and values, control of main of ranges types specify principle. other also same with can expressions. the Queries one query values following nested annotation. fixed constructed this single are of with memberships Instead pages category all and for properties query instance, atomic For the annotations. is SMW’s of that to as similar encoded are are syntax conditions Fundamental whose OWL. conditions. of in conjunctions expressions and of class tion categories, specific normal to as corresponds semantics just do. queries categories in as than used recognized abstraction be be higher automatically a can then allow Concepts will annotated conferences. properly ISWC are that conferences All This description. as instead query such concept, the concepts a fulfilling define with by to explicitly implicitly allows tagged concept be a not instantiates can individual pages an Individual class. ( the maintain. namespace scribes to new easier a and occupy create, They edited to manually result easier to accurate, query Compared more a Switzerland. are shows of about queries 10.6 aware article inline an Figure even listings, within not system. appear are might underlying who it the as readers of to capabilities available semantic results the query up-to-date an making thus creating by page query special a a to via wiki answer the query the using query directly The add to either knowledge. to ways: wiki’s page, three the in to used access be allows can that language language query a includes SMW Querying 10.2.2 knowledge semantic interconnected the within are navigate features easily browsing list can those turn users in All that base. which so pages, property. links, property given appropriate to by a links for shows are factbox interfaces annotations an the user all to addition, those In leads of value Both 10.1. each Figure 10.4 ). behind (Figure icon annotations as similar magnifier realized with The pages value. all for a as used is 3 M’ ur agaehsnvrbe ffiilynmd u oerfrt tas it to refer some but named, officially been never has language query SMW’s (o) The (top). 10.5 Figure in defined is language query SMW’s of form simplified A in::England]] [[located of syntax The [[series::ISWC]] [[Category:Conference]] Concepts queries Inline concepts pca pages special r h nesoa oneprst eiWk’ xesoa categories. extensional MediaWiki’s to counterparts intensional the are . M’ ur language query SMW’s nbeeiost d yaial rae it rtbe oapage, a to tables or lists created dynamically add to editors enable rhtcual iia oteseilpage special the to similar architecturally , Concept: SCConference ISWC n lo hr odfieaqeyta de- that query a define to there allow and ) scoeyrltdt iitx,weesits whereas text, wiki to related closely is niequery inline as OR and 02Epotn semantics Exploiting 10.2 3 ahqeyi disjunc- a is query Each || ,o by or 10.1), (Figure stedisjunction the as AskQL W Export OWL nes search inverse ur atoms query l,2009) (Ell, 175 in</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>QUERY ::= CONJ (’OR’ CONJ)* CONJ ::= ATOM (ATOM)* ATOM ::= SUB | PROP | CAT | PAGE SUB ::= ’<q>’ QUERY ’</q>’ PROP ::= ’[[’ TITLE ’::’ VALUE (’||’ VALUE)* ’]]’ VALUE ::= ’+’ | SUB | ((’>’|’<’|’!’)? STR ) CAT ::= ’[[Category:’ TITLE (’||’ TITLE)* ’]]’ PAGE ::= ’[[:’ FULLTITLE (’||’ FULLTITLE)* ’]]’ QUERY ::= ’UnionOf(’ CONJ (CONJ)* ’)’ CONJ ::= ’IntersectionOf(’ ATOM (ATOM)* ’)’ ATOM ::= SUB | PROP | CAT | PAGE SUB ::= QUERY PROP ::= ’SomeValuesFrom(’ TITLE ’ UnionOf(’ VALUE (VALUE)* ’))’ VALUE ::= ’owl:Thing’ | SUB | (>=|<=|!=)? STR ’)’ CAT ::= ’UnionOf(’ TITLE (TITLE)* ’)’ PAGE ::= ’OneOf(’ FULLTITLE (FULLTITLE)* ’)’</p><p>Figure 10.5: Production rules for SMW queries (top) and according OWL descriptions (bottom). operators (depending on the context), <q> and </q> as (sub)query delimiters, + as the empty condition that matches everything, and <, >, and ! to express comparison operators ≤, ≥, and 6= (note that the given grammar assumes a way of expressing these comparison on literal values directly in OWL, whereas they actually need to be defined using appropriate concrete domain definitions). Some nonterminals in Figure 10.5 are not defined in the grammar: TITLE is for page titles, FULLTITLE is for page titles with namespace prefix, and STR is for Unicode strings. In those, we do not permit symbols that could be confused with other parts of the query, e.g. page titles must not start with <. SMW provides means to escape such characters. The following is an example query for all cities that are located in an EU-country or that have more than 500,000 inhabitants:</p><p>[[Category:City]] <q> [[located in::<q>[[Category:Country]] [[member of::EU]]</q>]] || [[population:: >500,000]] </q></p><p>The formal semantics of such queries is given by a mapping to class descriptions in OWL, i.e. a query retrieves all inferred members of the according OWL class. It is</p><p>176 auso h property the of adding values so-called instance, allows For SMW queries. results, of parts those about information fragments more allowed tractable retrieve are for variables even still when harder contrast, becomes (Kr¨otzsch which In in it 2 tractable, died and OWL wikis. answering who NP-hard, of large query least people in makes at all is restriction usage for querying This SMW’s ask to for in. essential possible born not is were all is they that it city ensures instance, This the For query. tree-like. the are of queries parts between cross-references disallows essentially here. XML details according the formal of the facets since suitable property, (Motik by data datatype realized given are Schema a equal” for or used “greater datatype like the restrictions on depends also description relevant type of properties in allowed not OWL’s (and using useful not thus a remove are be (>=500,000) obtain easily and must to can description, they class One hard interpretation, single OWL). formal not here a a is used only for the contain identifiers It Second, – 13 the entities URIs. yet. First, wiki proper by OWL of replaced expression. names correct this to fully from correspond not still description is class which OWL but correct UnionOf(>=500,000))))) thesis, this SomeValuesFrom(population in 10 to: translated is example given 10.5. Figure The of part description. lower production class the of in complex sequence grammar a the unique is in a steps result from same The emerges the follow query can SMW we every and steps, that see to hard not oeausrmmme_fOneOf(EU))))) SomeValuesFrom(member_of UnionOf(Country) IntersectionOf( UnionOf( 9 SomeValuesFrom(located_in 8 7 UnionOf( 6 UnionOf(City) 5 4 IntersectionOf( 3 2 UnionOf( 1 M ure sitoue bv eeydfiearsl e fpgs nodrto order In pages. of set result a define merely above introduced as queries SMW which variables, explicit support not does language query SMW’s OWL, like Just used syntax style functional the to close is that form syntactic a is result The utb rnfre noavldOLdt ag.Ti a edn by done be can This range. data OWL valid a into transformed be must DatatypeRestriction Page tal. et a capital has ), OneOf tal. et 2007a). , .W mtti aybttdosdsrpinof description tedious but easy this omit We 2009b). , UnionOf a eue nta.Mroe,teeatsaeo the of shape exact the Moreover, instead. used be can shows 10.6 Figure result. each for displayed be to hscapital ?has etr.I h aeo betpoete (SMW properties object of case the In feature. UnionOf nsc ae.Tid h suoexpression pseudo the Third, cases. such in ecitosi h ie ,3 ,8 and 8, 6, 3, 1, lines the in descriptions saqeyprmtrwl as all cause will parameter query a as 02Epotn semantics Exploiting 10.2 rn requests print 177 as</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>Figure 10.6: A semantic query for all cantons of Switzerland, together with their cap- ital, population, and languages. The data stems from an automatically annotated version of Wikipedia. a typical output for a query with multiple print requests. Using further parameters in query invocation, result formatting can be controlled to a large degree. In addition to tabular output, SMW also supports various types of lists and enumerations, interactive timelines using SIMILE’s timeline code,4 and many further custom formats.</p><p>10.2.3 Giving back to the Web The Semantic Web is all about exchanging and reusing knowledge, facilitated by stan- dard formats that enable the interchange of structural information between producers and consumers. Section 10.1.3 explained how SMW’s content is grounded in OWL, and this data can also be retrieved via SMW’s Web interface as an OWL export. As shown in Figure 10.1, this service is implemented as a special page. It can be queried for information about certain elements. The link RDF feed within each factbox also leads to this service (see Figure 10.3). Exported data is provided in OWL/RDF encoding, using appropriate URIs as iden- tifiers to prevent confusion with URLs of the wiki’s HTML documents. The semantic data is not meant to describe the HTML-document but rather its (intended) subject. The generated OWL/RDF is “browseable” in the sense that URIs can be used to locate further resources, thus fulfilling the linked data principles (as described in Sec- tion 4.1.1). All URIs point to a Web service of the wiki that uses content negotiation to redirect callers either to the OWL export service or to the according wiki page. Together with the compatibility to both OWL and RDF this enables a maximal reuse</p><p>4http://simile.mit.edu/timeline/</p><p>178 antp fcneti ii.OtWk rw rmcnet fsmni technolo- semantic of the concepts text, from on draws concentrates OntoWiki SMW wikis. whereas in records, content data of maintain type to main used are and ing, include systems such of OpenRecord Examples recently. appeared URIs e.g. uses annotation, KiWi simplifying SMW. for interfaces of properties. provides of focus suggesting but by main concept concepts, the the identify not various introduces to are explicitly and KiWi that SMW, editing annotations, ontology to wiki collaborative contrast inline In easy-to-use functions. ontologies of export kinds and supported search the to respect semantic as a known and previously first, 2009), wiki a is the SMW of that One been curve. have learning second. low SMW system and of usability paradigms their as design such have major wikis them of the strengths emphasized of the wikis of most semantic some early but the created, of Many been (Campanini 2006). have now wikis by discontinued semantic been other some SMW, Before systems Related in slash parameter 10.3 a a as Instead name local returned names. the the data. entity use as of basically of 4.1.2 ) export can that list required Section SMW the of -changing (see that creating Because namespace so and used, hash ever-growing is classes. a an namespace use or be to properties, would sense individuals, Users file make new not name. (Sauermann introduce article does URIs” the time it “cool from any for automatically at guidelines them generates and can but rules 2008), the Cyganiak, following and with user the burden towards tailored not are Longwell. that browser tools ex- faceted at for complete found useful the is are the as which generating such wiki, for operation the scripts within online data data provides semantic all furthermore additional of SMW retrieve port easily request. can user browsing on during resources (Berners-Lee Tabulator RDF as retrieve such Tools data. SMW’s of 8 7 6 5 http://www.omegawiki.org http://www.freebase.com http://www.openrecord.org http://simile.mit.edu/wiki/Longwell eie etcnee eatcwks aiu olbrtv aaaesseshave systems <a href="/tags/Database/" rel="tag">database</a> collaborative various wikis, semantic text-centered Besides is currently system related stable) (and notable most The not does It wiki. the within entities all for URIs valid generate to sure makes SMW n t oeextent) some (to and , 6 http://semanticweb.org/RDF/ freebase , 7 and IkeWiki OmegaWiki URIs .KW ssmlrt M with SMW to similar is KiWi 2006). (Schaffert, tal. et . notewk,wihepaie s-ae of use-cases emphasizes which wiki, the into 8 uhssestpclyuefr-ae edit- form-based use typically systems Such NxnadSimperl, and Nixon 2005; Souzis, 2004; , . tal. et OntoWiki 5 semantic apefie fsc export such of files Sample htincrementally that 2006a) , KiWi 03Rltdsystems Related 10.3 ieaddisregarded and side (Auer (Schaffert tal. et 2006), , tal. et 179 ,</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki gies and provides a built-in faceted (RDF) browser. The other systems have their background in relational <a href="/tags/Database/" rel="tag">databases</a>. There are two extensions to SMW that help with making SMW more similar to such a form-based editing system, Semantic Forms de- veloped by Yaron Koren9 and Halo developed by ontoprise.10 SMW has become the base for a number of further research works in the area of semantic wikis. (Rahhal et al., 2009) describes a Peer2Peer extension of SMW that allows the distributed editing of the semantic wiki. (Bao et al., 2009a) describes the usage of SMW as a light weight application model, implementing two applications on top of it. The MOCA extension (Kousetti et al., 2008) to SMW fosters the convergence of the emerging vocabulary within an SMW instance. A convergent vocabulary is not only a requirement for the proper usage of a semantic wiki, but also for the automatic content checks described in Section 10.4.</p><p>10.4 Collaborative ontology evaluation</p><p>Semantic wikis have shown to be feasible systems to enable communities to collabo- ratively create semantically rich content. They enable users to make the knowledge within the wiki explicit. This also allows the wiki to automatically check and evaluate the content. In this section we present a number of approaches in order to provide facilities to ensure the content quality of a wiki, including the application of constraint semantics and autoepistemic operators in ways that are easy accessible for the end user. Wikis such as Wikipedia do not work solely because of their underlying software, but due to their rather complex socio-technical dynamics that work due to often implicit community processes and rules (Ayers et al., 2008). We first introduce a number of technical implementation for some exemplary evaluations that can be performed auto- matically on top of knowledge formalized within an SMW (Vrandeˇci´c,2009c): concept cardinality in Section 10.4.1, class disjointness in Section 10.4.2, and property cardi- nality constraints in Section 10.4.3. We close this chapter with a discussion of the social aspects of collaborative ontology evaluation in Section 10.4.4.</p><p>10.4.1 Concept cardinality Concept cardinality states how many results a query within the wiki should have. Besides exact numbers also minimal and maximal cardinalities are allowed. For the description of the implementation we assume that the query is captured by a concept. Then Template:Cardinality can be added to the concept page.</p><p>{{#ifeq:{{#ask:[[Concept:{{PAGENAME}}]]|format=count}} |{{{1}}}|OK|Not OK}} 9http://www.mediawiki.org/wiki/Extension:Semantic_Forms 10http://smwforum.ontoprise.com</p><p>180 fte r qa,tetidprmtrwl ertre ( returned the 50). be to example will our equal (in parameter is call third result, template the query the equal, the of are parameter i.e. they first parameter, the If first i.e. the parameter, if second checks function parser {{Cardinality|50}} as such page concept a on instantiated lo te sr oesl d et opgswtothvn oudrtn o the how understand to having to without used pages merely to is works. tests template language The add query easily page. SMW wiki to any users on other directly allow query the add simply could them. check and pages their to individuals: Inconsistent [[Aphrodite]] - and Woman [[Hermes]], and Man Testing {{Disjoint|Man|Woman}} this: like page wiki any return on the will with query prepended inline individuals The respectively. call, [[Category:{{{2}}}]] - [[Category:{{{1}}}]] {{{2}}} cate- {{#ask: the specified and suggest two we {{{1}}} the case by Testing this categorized For be time. not same the should at page gories a states disjointness Class disjointness Class 10.4.2 not. function or parser true issue. MediaWiki is raised the the use can understands we user nality the that sure make to refined more ( be printed be will parameter 11 dfutO|nr=nossetidvdas }} individuals: |default=OK|Intro=Inconsistent http://www.mediawiki.org/wiki/Help:Extension:ParserFunctions#.23ifexpr h format The be can template the Thus cardinality. the parameter, one assumes template The oeta h epaei o eesr ete nti aenri h rvos We previous. the in nor case this in neither necessary not naviagte is quickly template the can that user Note the linked, already are individuals inconsistent the Since this: like look could that report little a in result will call template The template the to parameter second and first the with replaced are triple-brackets The cardi- minimal or maximal rather but cardinality, exact check to not want we case In 11 count eun ipytenme frsls The results. of number the simply returns o OK Not Intro .I rdcinstigtersligtx should text resulting the setting production a In ). ssonblw hstmlt a ecalled be can template This below. shown as , Sstates US 04Claoaieotlg evaluation ontology Collaborative 10.4 OK #ifexpr fn eut xss rtels of list the or exists, results no if , sfollows: as Template:Disjoint htcek fa expression an if checks that OK ,ohrietefourth the otherwise ), #ifeq : MediaWiki 181</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki</p><p>10.4.3 Property cardinality constraints Property cardinalities are statements about how often a property should be used on a specific page or point to a specific page. We can start with a similar approach as for concept cardinalities, introducing Template:Property cardinality:</p><p>{{#ifexpr:{{#ask:[[{{{Property}}}::{{{Value}}}]]|format=count}} <= {{{Maximal cardinality}}}|OK|Not OK}}</p><p>We could now call it for any page to check if that page’s individual fulfills the constraint or not. The following template call checks if the individual Seth has indeed a maximum of one father.</p><p>{{Property cardinality |Property=Father |Maximal cardinality=1 |Value=Seth }}</p><p>We would prefer not to call this manually for every individual, but rather for all indi- viduals of a category at once. To achieve this we can use the query format “template”. It applies the given template on every query result. We first define the formatting tem- plate Template:Property cardinality format:</p><p>{{{1}}} ({{Property cardinality |Property=Father |Maximal cardinality=1 |Value={{{1}}} }})</p><p>The parameter {{{1}}} means the first parameter, in this case each result of the query. Now we can use this template to format all results of a query, whereby it evaluates the given cardinality restrictions. The query could look like this:</p><p>{{#ask:[[Category:Person]] |link=none |format=template |template=Property cardinality format |sep=, }}</p><p>This query will return a comma-separated list of all persons, followed by the evalu- ation result for each person in brackets.</p><p>182 tr ihu h eurmn fpormigPPadhvn cest h server the to access having and PHP programming of requirement the without to utor, lead repair pages. to respective that But allow the edits and to prohibiting edit. problems linking seems of discovered by the this Instead report efficiently cancel feasible, to them paradigm. means computationally then wiki introduce be and we the would wiki inconsistencies inconsistent with check the it conflict time modified, heavily real turn is to a would page if edit a disregarding that even When if monitor. check inconsistencies. to introduce could ontology to the of contributors run part communityallow what collaboratively wiki the and be of how can member own each they way their their This Or on define decide page. they tests. central can where more own pages a their private on maintained own run and their and create criteria can evaluation maintainable. users own wiki particular, a in keep – to of page order piece and in specific simply requirement a wiki given necessary for the a look must The within is to represented wiki where This features. be the know information. can these always in they will apply constraints that contributors actually fact above i.e. unambiguously, the to the that on contributors Making approaches based allow is evaluation to hand. selection specific enough at introduce simple domain commu- will the be wiki wikis the to some by apply that necessary only growth deemed expect organic as We the evaluations allows content SMW nity. further in available of one addition the and as such framework evaluation An aspects usability and Social implemented 10.4.4 obviously be not can 9.2.4 Section of sense SMW. OWL the within wiki. from in the constraints deviation from range The data and exporting stated. when assume otherwise considered we not carefully wiki, if be the names to semantics Within has unique the semantics 9.2.3. and rather Section world follow not in closed but do discussed not a SMW cardinalities, as do in property operators cardinality checks OWL autoepistemic property of above and and nominals the Concept to that semantics. relate noted engines OWL inference normal be OWL the to external have by has evaluated then It and ontologies (Vrandeˇci´c9.2.1) background Kr¨otzsch, 2006 ). expressive Section more in and with suggested combination (as in used been have SMW from 12 aigalteeautosi h iibigipeetbea-o yaycontrib- any by ad-hoc implementable being wiki the in evaluations the all Having indeed to is framework evaluation ontology given the in embedded decision major A any on put be can They themselves. evaluations actual the for case the not is This domain example, For wiki. the within expressed be can evaluations possible all Not exports but OWL, than restricted more is itself SMW of expressivity the that Note ecretyasm htSWtgte ihpre ucin sTrn-opee fti sthe is this If Turing-complete. is functions parser with together SMW that assume currently We a unotta nata mlmnaini o adt efail.Ti swa ema with mean we what is This feasible. it be But to SMW. hard within too expressed is be implementation could actual implementable all an obviously at that automatically out done turn be can can that evaluation any case, 12 . 04Claoaieotlg evaluation ontology Collaborative 10.4 183</p><p>10 Chapter 10 Collaborative ontology evaluation in Semantic MediaWiki and the underlying code, we expect numerous evaluations to bloom. The prize of this freedom, though, is a high computational tax. Executing multiple intervowen layers of template calls and inline queries by the MediaWiki parser is computationally expen- sive. We expect that some template combinations will turn out to be both useful and expensive, in which case new extensions can be written natively in PHP to replace these template calls with calls to new parser functions. This has happened before in Wikipedia with several features (Schindler and Vrandeˇci´c,2010). We regard this as a prime example for the Web Science process (Berners-Lee et al., 2006b), where technological and social approaches interplay heavily with each other in order to come up with a Web-based solution to a problem.</p><p>184 h rmwr nti hssi h seseto h ulte foegvnontology. given one of qualities the of of goal assessment main the The is evaluation. thesis ontology this for frameworks in of framework number the a already are There aspects and Frameworks 11.1 thesis. by this chapter within this evaluation closes ontology then for 11.3 – Section used framework. methods corpora framework. our existing the our into in discussing regarding integrated further evaluations fit similarly methods, and they evaluation be can developed methods, how newly existing show approaches especially of – and and implementations work methods 11.2 future further that Section the expect discuss of in We We creation evaluation discuss the ontology 11.1. we in chapter aspects to Section relevant this in and In framework intention, similar evaluation. underlying a ontology with for frameworks framework other a presents thesis This work Related 11 Chapter 1.4. Section in given is work published previously own our to relation the that Note xlie vrtiges . . . else everything explained everything else, everything to pointed everything kinships, where of network whirling a in exploded world The everything. between and everywhere, always, – connections found we connections, Wanting UbroEo .1932, b. Eco, (Umberto ocutsPendulum Foucault’s c,1988)) (Eco, 185</p><p>11 Chapter 11 Related work</p><p>The other frameworks often have slightly different, although related goals. The most common such goals are ontology ranking and ontology selection. Ontology ranking has the goal of sorting a given set of ontologies based on some criteria. Often these criteria can be parameterized with a context, usually a search term. Ontology search engines such as FalconS (Cheng et al., 2008), Watson (d’Aquin et al., 2007b), SWeSE (Harth et al., 2009), or Swoogle (Ding et al., 2004) have to perform the task of ontology ranking when deciding which ontologies to display in what order to the user who searched for ontologies. Other ontology ranking frameworks are (Alani and Brewster, 2006) and (Tartir et al., 2005). Ontology selection can be regarded as a specialization of ontology ranking as it selects only a single result, often for being reused in an ontology engineering task. The Cupboard system (d’Aquin and Lewen, 2009) presents a similar, though more fine-grained approach that allows not the selection of whole ontologies but rather the selection of single axioms. It incorporates a topic-specific open rating system (Lewen et al., 2006) for assessing the ontologies and then provides further algorithms to present the individual axioms, but the final decision lies with the ontology engineer who selects the axioms to reuse. The framework presented here differs from ranking and selection frameworks as it does not regard and sort sets of given ontologies, but only assesses individual ontologies by themselves. Many of the methods presented here can also be used in a ranking or selection framework, but some have to be adopted: a number of the methods do not yield a numerical score but rather a list of problematic parts of an ontology (such as Method1 on page 67 or Method 11 on page 86), others may yield a number but this number is not simply proportional to ontology quality but may have a complex relation to it (if at all – compare the metrics introduced in Method 12 on page 101). Some ontology evaluation frameworks are based on defining several criteria or at- tributes; for each criterion, the ontology is evaluated and given a numerical score. Additionally a weight is also assigned (in advance) to each criterion, and an over- all score for the ontology is then computed as a weighted sum of its per-criterion scores. (Burton-Jones et al., 2005) proposes an approach of this type, with ten simple evaluation methods (called attributes) such as lawfulness, interpretability, comprehen- siveness, etc. The methods are grouped in so called metric suites, comparable to ontology aspects in our framework. The four suites are syntax, semantics, pragmat- ics, and social, but even though the names are partially equal, the metric suites do not correspond to the individual aspects in our framework, i.e. they are differently defined. (Burton-Jones et al., 2005) further assumes that every method is a metric M : O → [0, 1]. Each metric suite is a weighted average of its attributes, and the overall ontology quality is a weighted average over the results of the metric suites, thus achieving a simple, overall quality measure for ontologies between 0 and 1, with 1 denoting the perfect ontology. (Fox and Gruninger, 1998) proposes an alternative set of criteria for ontology evalua-</p><p>186 sr ofidmtosrlvn o hi ak ucl.Tels a eextended be evaluation. in may ontology originating list in criteria work The relevant further in further quickly. or include studies tasks help potentially of to their may fields and related thus for methods literature and relevant evaluation our changed methods ontology on or categorize based find research to to evaluation used ontology users are in criteria art The the of survey. state current the (ISO 8000, as accuracy ISO ). 2009 ), 2009 8000-140, 2009 ), as 8000-120, (ISO 8000-110, developed completeness ( ISO (ISO being and provenance specification 2009 ), is ), 2009 8000-130, a quality planning” 8000-102, to data and (ISO conformance for making vocabulary criteria standard decision the operations, A quality covering in high currently a uses 1999). having intended Godfrey, as their and defined for is fit Data are etc. extensibility, effectiveness, ity, compatibil- credibility, availability, auditability, theses accuracy, accountability, of accessibility, many research quality information modeling). data called and times schemas database of engineering field software database the quality, and number in models), a data software (especially are and and engineering architectures There information software of as literature. evaluation the such evaluation presented (especially ontology fields, framework on research evaluation ontology focused related have the of we for criteria thesis of this selection in the For 3.6. tion the ontology, ontology, the the of using available. of of contents tools aspects the etc.) the licensing, ontology, various software, and the cover (hardware, costs describe criteria the to used, The methodology used framework. language three-level formal a the be in rather would organized thus criteria, (and really). them criteria (Fox measure fully than in but to framework criteria metric methods, our evaluation the a in group that with methods to being goals used reachable difference The desiderata as main not completeness. defined the are and set, 1998) clarity, our accuracy, Gruninger, to of and compatible criteria fairly our various is with at set reasoning overlapping support is representation detail?” this the that and Does abstraction meaning? of in levels overlap functional Precision they conciseness. do catalog: in or our minimality to and asks: clarity, com- mapped granularity in within efficiency be perspicuity adaptability, efficiency, mostly in putational generality can completeness, within criteria is These completeness granularity, precision minimality. perspicuity, efficiency, and generality, completeness, functional tion: ti hsscvr oetefil of field the more covers thesis this 3.4 Section in noted As but list, unchanging fixed, a as not 3.6 Section in catalog criteria the regard We nssesegneig ult trbtsa o-ucinlrqieet r some- are requirements non-functional as attributes quality engineering, systems In Sec- in presented is criteria evaluation ontology the selecting for methodology Our 117 of set detailed more even an G´omez-P´erez, defines and 2004) (Lozano-Tello ilities I hr oesto nooia rmtvsta r partitionable, are that primitives ontological of set core a there “Is .I aaand data In 2004). (Voas, share words the of many suffix the to due h ute ecito fti rtro indicates criterion this of description further The ilities r eadda ult attributes: quality as regarded are 11Faeok n aspects and Frameworks 11.1 nooyverification ontology i they “if (Juran 187 (as</p><p>11 Chapter 11 Related work</p><p> opposed to ontology validation). A complementing work covering the area of ontology validation is provided in (Obrst et al., 2007). It provides a concise overview of many evaluation methods and techniques not discussed within this thesis. They are:1</p><p>• the evaluation of the use of an ontology in an application (see Section 11.2.2). • the comparison against a source of domain data (see Section 11.2.3). • assessment by humans against a set of criteria. Human experts are used to gauge an ontology against a set of principles “derived largely from common sense”.</p><p>• natural language evaluation techniques. This evaluates the ontology within a natural language processing application such as information extraction, question answering or abstracting.</p><p>• using reality as a benchmark. Here the notion of a “portion of reality” (POR) is introduced, to which the ontology elements are compared.</p><p>It is not claimed that the list of techniques is a complete list. We further point out that the list is not even a list of techniques: one of the points describes a di- mension of evaluation methods (whether the evaluation is performed automatically, semi-automatically, or manually, i.e. by humans), one is a specialization of the other (natural language evaluation techniques specialize the application-based evaluation), and the last one is describing a methodological condition for ontology evaluation tech- niques (using reality as a benchmark). As shown in Section 3.5 we do not commit to a strong theory of reality, i.e. an accessible objective ideal that can be used to be compared with an ontology. We furthermore do not think that ontologies should be restricted in specifying conceptualizations of domains within reality, but should be also allowed to specify conceptualizations of fictional domains, such as the family tree of the mythological Greek gods, the history of Tolkien’s Middle-earth, or scientific theories about the luminifereous aether. Whereas we disagree with the framework described in (Obrst et al., 2007), we agree that many of the listed methods and techniques are important for a comprehensive ontology evaluation, especially for ontology validation. Ontology validation is usually the only way to ensure the correctness of the knowledge encoded in the ontology. But most validation approaches require the close cooperation of domain and ontology engineering experts. Validation often can not be performed automatically. Since this thesis focuses on automatic evaluation approaches, we leave it to (Obrst et al., 2007) to provide an overview of validation approaches.</p><p>1 There is also a section on ontology accreditation, certification, and maturity model, but it is made clear that this is a discussion about the future of ontology evaluation and not describing a technique per se</p><p>188 a ubro datgs ecnie hsa o etitv.I u framework, our In this restrictive. Although too as example). ontology. an this measured for consider the we about advantages, ontology descriptions of an number in a metadata has as result its every Furthermore, profile ontology framework. our in in And case explicitly. the structurally not is is ontology to which an corresponds meta-ontology, framework the our in In of described set arcs). and a nodes as (i.e. defined well metrics as proposed graphs the are regarding ambiguities of O number a (Gangemi to in leads This ontology. the term the oprdto compared ) 37 page on 3.1 Section in given framework the our of of overview an gives 11.1 Figure (Gangemi in presented from notions main the of diagram UML 11.1: Figure conceptualization constituted by aninformationobject We consideran ontology to bea 2.1 O AND BACKGROUND: 2. THEORETICAL between parameters, and the more detail, the items that are crucial for evalua on ontology descriptions which ontology elements, processesanda We modelontology evaluation asa and selection 2.2 our current and future work. In Section 5, we draw some conc (cf. [14]). rational agent as anexplanation, an instruction, a command, etc pattern, whereas thatrepresentation isinterpretable by some equivalent pattern thatto any The basicintuition behind this pr analytic case for a trade-off. principles withconflictingpara 2 u rmwr ssrnl nune ythe by influenced strongly is framework our 3, Chapter in discussed As an</p><p> oQual Fig. 1AUML classdiagramdepictingthemain notionsfrom oQual 2 : asemioticmeta-ontology ontology : anontologyofon ontology oQual</p><p> establishedwithin a</p><p> may tal. et ontology involves: model nooydescription ontology san is 3.1 Section (see required not is this but ontologies, as expressed be trade-offs ic in Since 6.1.2. Section in depth in discussed exemplarily 2005), , hc ntr sa nooy ..a nooymaueexpresses measure ontology an i.e. ontology, an is turn in which axioms ting andselecting ontologies.In xlctybtrte sue htthe that assumes rather but explicitly make explicitthose knowledge lusions andsketch a picture of meters. Wealsoprovidean is usedtorepresent another semiotic object, i.e. anobject oposal isthat information is ttributes. This taskisbased nooygraph ontology norframework. our in tal. et needed whencomposing tology evaluation communication setting diagnostic task and anintended n h osbeaim n hi osiunsaefully are constituents their and axioms possible the and , Gangemi 2005; , over O O 2 2 . </p><p> nooy eew pcf h andifferences main the specify we Here ontology. tas osssof consists also it , in nooydescription ontology O O O 2 2 turn, formalizes the following specification: In agent) of conceptualizations, i.e. internal representations (by a rational are graph-like structures; c)they a) anontology isinformationof aspecialkind; b) itspatterns tasks tasks of ontology descriptiontype • new pattern called pattern called This intuition is formalized by applying anontology design ontology. ontology. good profiletypically enhances or annotations, suchasprovenanc possible quality criteria and values, as well as its lifecycle graph,properties ofanontology itsresulting attributes, its ontology, e.g. a method tomeasurethestructural orfunctional containing metadata conceptualization. Anagent can also provide a profile graph, while internally representing itsintended ‘kept together’ arationalby ag conceptualization. Thegraphandtheconceptualization are conceptualization andaformal semantic space admitted by the the requirement tobeable parts. For ontology, and have elementary qoods(called requirement for the ontology to be question. InFig.2,the : ontologygraphs,profiles,de 2 osntinclude not does Quality-Oriented OntologyDescriptions ( tal. et orsod to corresponds O of, respectively, the elements andprocesses 2</p><p> entity types</p><p>(Fig.1) anontology graphhas anintended example, ofqood is atype Gangemi 2006a; , O Information 2 11Faeok n aspects and Frameworks 11.1 O fo Gangemi (from O . 2 (because it is a “meta-ontology”). O (becauseitisa“meta-ontology”). 2 that express a“description” ofthe nooyelements ontology . Description ↔Description nooygraph ontology eatcspace Semantic retrieve (Fig.1), that provide the nooyevaluation ontology vlainmethods evaluation scriptions, measures, etc. </p><p> e and informal annotations. Aand informalannotations. e ent whoencodes/interprets the O answer acertain competency enforcesthe usability ofan able to retrieve the“family sepesdb an by expressed is 2 type is instantiated as a retrieve : O [3], and itoriginates a 2 tal. et qoods O express , which formalizes tal. et osntuse not does 2 principles ) framework , whicharea </p><p> intended from/on an from/on an 2006b). , roles roles 2005). , equals</p><p> in ) as 2 and , in that 189 O</p><p>2</p><p>11 Chapter 11 Related work</p><p>11.2 Methods and approaches</p><p>Work in ontology evaluation has grown considerably during the last few years, also due to two workshops on ontology evaluation (Vrandeˇci´c et al., 2006a; Vrandeˇci´c et al., 2007a). Even though we already included numerous methods within our framework in Chapters4–9, the list cannot be exhaustive. We hope that future work will regard this framework as an orientation and localize new methods within the framework. In the following, we regard a number of methods that have not been mentioned explicitly in this thesis and are here placed within our framework, defining the two dimensions of aspects and criteria for each of the methods. A major line of research that has been hardly discussed in this thesis is the logical consistency or satisfiability of an ontology and the debugging of inconsistencies (Parsia et al., 2005). Other papers extend the notion of satisfiability. For example, (G´omez-P´erez,2004) defines that class hierarchy cycles or partition problems also mean inconsistent ontologies. Such issues are often addressed by specific ontology repair tools such as SWOOP2 or ODEval.3 Inconsistency explanation (Ji et al., 2009) deals with the issue of creating user-understandable explanations of inconsisten- cies so that the user can deal with the given problem. Satisfiability is an issue of the representation aspect, and is covered by the criteria consistency and computational efficiency. Integrity constraints are extended on top of Section 6.3.4 in (Briggs, 2008; Sirin and Tao, 2009) and can also be regarded as extending the definition of satisfiable ontologies. They often belong to the context aspect since they add the integrity con- straints as an external artifact, and further the accuracy and consistency criteria. Some newer ontology engineering methodologies such as DILIGENT (Tempich et al., 2005) and HCOME (Kotis et al., 2005) take into account the fundamental role of discourse and agreement between different stakeholders. They identify the sharedness of the ontology as an explicit value and provide processes to guarantee that sharedness is achieved. Even though we do not regard sharedness as an ontology quality criterion (Section 3.6) of its own, it is an integral part of our conceptualization (Section 3.5). Sharedness regards the aspect of semantics and touches a number of criteria: accuracy, clarity, and organizational fitness. The larger a group that commits to an ontology (and the shared conceptualization it purports), the harder it is to reach a consensus – but also the larger the potential benefit. Thus the status of the ontology with regards to relevant standardization bodies in the given domain is a major criteria when evaluating an ontology. Ontologies may be standardized or certified by a number of bodies such as W3C, Oasis, IETF, and other organizations that may standardize ontologies in their area of expertise (Obrst</p><p>2http://www.mindswap.org/2004/SWOOP/ 3http://minsky.dia.fi.upm.es/odeval</p><p>190 otewrdotieo h optrti fe ean hleg.Eautn the Evaluating challenge. a remains often the this of computer evaluation the an is of the grounding outside (Oberle example, world computer for the files, the to XML by computers handle In “understood” since and computer: fully recognize understanding. a in to as shared grounded such able and completely term be are documentation can a meaning the the that through model- cases, sure encode certain the usually make to as grounded, way meaning to well no of need is understanding is users we weak there the very semantics) Since by the formal tool). (besides understood theoretic certain term be a a of ontology via meaning or the the directly, in (either terms ontology the the of will the how to i.e. refers the Mladeni´c, 2005), standardization measure The to thus of Web. and criterion the instantiated, the on and is (Ding ontology aspect ontology Swoogle an the like often Tools of how adoption check standardization. to than allow important 2004) more even be may cases, certain In (Beged-Dov examples). RSS of history the al. et h vlainmtosaecasfidit orapproaches: this four to into regards (Brank classified by with are provided evaluated methods methods evaluation evaluation then ontology the and of survey context the a In with context. paired are ontologies quently the improving clearly ontology, evolution the the consis- of extend ontology. expressive can aspects more we all way enable This over of and satisfiability. framework ontology, number logical a the simple as beyond of checks regarded aspects tency be user further could and formalize here generic to presented in allows ways ideas work implemented the theoretical consistency, checks, the to consistency As defined and regards framework. ontologies, evOWLution with DL called ontologies OWL so the for of work Based evolution this the introduced. extends Stojanovi´c,investigates are 2005) and strategies (Haase evolution this, ontology on and operations change Ontology eae usini the is question related A h otwdl vlae seti urn eerhi the is research current in aspect evaluated widely most The of practice and theory the (Stojanovi´c, 2004) In • • • • rjs ru fpesta elr notlg ob tnad(see standard a be to ontology an declare that peers of group a just or – 2007) , vlain efre yhmn gis e fpeendciei,standards criteria, predefined of set requirements. a or against humans by performed evaluations and evaluation, ontology and data-driven application an with ontology application the the using evaluating i.e. evaluation, ontology task-based standard, golden a to ontology the comparing raiainlfitness organizational adoption grounding tal. et vocabulary for 2005) Miller, and (Brickley FOAF or 2000) , yrlvn er epcal uiespartners) business (especially peers relevant by aui and (Jakulin ontology an in terms the of setadesn h rtro of criterion the addressing aspect tal. et . nooyevolution ontology .Btfrcassrelating classes for But 2006). , 12Mtosadapproaches and Methods 11.2 M file XML context adaptability foaf:Person set Fre- aspect. sdiscussed. is ls a be can class tal. et vocabulary clarity 2005) , fthe of tal. et 191 , .</p><p>11 Chapter 11 Related work</p><p>We think that the last point does not belong to this list as it describes a dimension of the ontology evaluation method independent of the other approaches. The other approaches, though, are all part of what the framework in this thesis defines as being an evaluation of the context aspect, with the context being a golden standard, a task or application, or a set of external data respectively. In the following, we will regard evaluation methods that belong to these three categories of context.</p><p>11.2.1 Golden standard – similarity-based approaches</p><p>The evaluation based on a golden standard builds on the idea of using similarity measures to compare an ontology with an existing ontology that serves as a reference. This approach is particularly useful to evaluate automatically learned ontologies with a golden standard. The similarity between ontologies can be calculated using similarity functions (Ehrig et al., 2005). A similarity function for ontologies is a real-valued function sim : O × O → [0, 1] measuring the degree of similarity between two ontologies. Typically, such measures are reflexive and symmetric. The similarity function to compare the ontologies can be used directly as the evaluation function, if we keep one of the arguments – the golden standard GS – fixed. (Ehrig et al., 2005) shows how similarity functions for ontologies based on individual similarity measures for specific aspects and elements of the ontologies can be defined. We present specific similarity measures for some aspects of an ontology in the following. On the vocabulary aspect, the similarity between two strings can be measured by the Levenshtein edit distance (Levenshtein, 1966), normalized to [0, 1]. A string matching measure between two sets of strings is then defined by taking each string of the first set, finding its similarity to the most similar string in the second set, and averaging this over all strings of the first set. In an evaluation setting, the second set is a “golden standard” set of strings that are considered a good representation of the classes of the problem domain under consideration. The golden standard could be another ontology, as in (Maedche and Staab, 2002), based on a document-corpus, or provided by experts. The vocabulary can also be evaluated using precision and recall, as known in infor- mation retrieval. In this context, precision is the fraction of the labels that also appear in the golden standard relative to the total number of labels. Recall is the percentage of the golden standard lexical entries that also appear as labels in the ontology, rel- ative to the total number of golden standard lexical entries. A disadvantage of these definitions is that they are strict with respect to spelling (e.g. different use of hyphens in multi-word phrases would not match, etc.). (Velardi et al., 2005) describes an approach for the evaluation of an ontology learning system which takes a body of natural-language text and tries to extract from it relevant</p><p>192 od hudb lse htaerltvl lsl eae oec te.Tu the Thus coher- the other. assess to each thereby to and related classes between closely distance individual relatively measure of are to interpretations used that the is that classes there means; ontology means really where be sentence which problem, the should coherent, recognition in words speech be word particular a should a is hypotheses what task a about The hypotheses related several is. closely be how classes may determine importance. two to less of primarily of used meaning is is the ontology used the where is described well- ontology is regarded an that nario that be advantage fact can the the system has the where situation since system integrated applied, a an be Such as can merely methods tool. as the evaluation ontology user used software Often the the known the regarding hand, ontology, application. of other the some component the with through evaluated another On management, always be data application. but to simply internal the needs directly the into application be ontology hard-coded interface, cannot an be user accesses ontology ontologies may the never Often the it of of it. that parts parts in so drive and used may application, ontology It an the of with exchanged. utility interwoven be the will tightly task, on given are depending a on worse performance or its or better application, ontology-based an of output The evaluations Task-based individuals. 11.2.2 or classes triples to i.e. corresponding lexons, terms (Spyns, with of ontology, layer. set an lexical a as the interpreted extracting on automatically like form just for the measures, of approach a recall an and With discusses precision 2005) on ontologies. based evaluation. two be ontology of also for used aspect be structural can the measures these comparing standard, for golden hierarchies, two of the of defines lot disambiguations a this requires of nevertheless precision The it function. that experts. evaluation domain is ontology the approach of this part glosses. on of natural-language work the disadvantage commonly evaluate to the are them ontologies course, for which easier Of be in experts, might languages it the domain thus formal selected and human with described, algorithm unfamiliar to disambiguation be shown sense might word be experts the then if of would see definitions this to correct as it evaluate such would gloss who x” A of form adjective. definition the an x, of multiple- for are the glosses glosses to natural-language The generate subsumptions. to by terms. is classes word approach the (using evaluation of them their some for connect of definitions Part and find entries) WordNet then and and searches phrases), Web and (terms classes domain-specific .Teeasce- a There 2004). Malaka, and (Porzel in presented is evaluation utility-based A can aspect semantic the on ontology an of evaluation standard, golden a Given cotopy semantic the as such measures, several proposes 2002) Staab, and (Maedche tr1poet term2) property (term1 x and where , y navnaeo hskn fapoc sta domain that is approach of kind this of advantage An . y stpclyanu and noun a typically is rmntrllnug et h eutcnbe can result The text. natural-language from xy=akn fy ento fy related y, of definition y, of kind a = y “x 12Mtosadapproaches and Methods 11.2 x samdfirsc as such modifier a is 193</p><p>11 Chapter 11 Related work ence of hypotheses (and choose the most coherent one). The correctness of the results directly maps to the quality of the ontology with regards to its use in this scenario. The evaluation function presented in (Haase and Sure, 2005) captures the intuition that the quality of an ontology built for searching is determined by how efficiently it allows the users to obtain relevant individuals. To measure the efficiency, a cost model is introduced to allow us to quantify the user effort necessary to arrive at the desired information. For the case of navigating a class graph, this cost is determined by the complexity of the hierarchy in terms of its breadth and depth. The breadth here means the number of choices (sibling nodes of the correct class) the user has to consider to decide for the right branch to follow: The broader the hierarchy, the longer it takes to make the correct choice. The depth means, how many links does the user need to follow to arrive at the correct class, under which the desired individual is classified: The deeper the hierarchy, the more “clicks” need to be performed. To minimize the cost, both depth and breadth need to be minimized, i.e. the right balance between them needs to be found. (Sabou et al., 2007) create custom-tailored ontologies on the fly from the formulation of a task (Alani, 2006) and evaluate them afterwards. Several problems are encoun- tered, ranging from broken links to incompatible axioms due to different contexts and points of views. Utility-based approaches often have drawbacks:</p><p>• They allow one to argue that the ontology is good or bad when used in a par- ticular way for a particular task, but it is difficult to generalize this observation (what if the ontology is used for a different task, or differently for the same task?)</p><p>• the evaluation may be sensitive in the sense that the ontology could be only a small component of the application and its effect on the outcome may be rela- tively small (or depend considerably on the behavior of the other components)</p><p>• if evaluating a large number of ontologies, they must be sufficiently compatible that the application can use them all (or the application must be sufficiently flexible)</p><p>11.2.3 Data-driven evaluation – fitting the data set An ontology may also be evaluated by comparing it to existing data about the domain to which the ontology refers. This can, e.g., be a collection of text documents. For example, (Jakulin and Mladeni´c,2005) proposed the ontology grounding process based on the data representing individuals within the ontology classes. A set of errors in ontology grounding is shown to the user to help in the ontology refinement process. Ontology grounding can be used in the construction of a new ontology or in the data driven ontology evaluation.</p><p>194 etit bu hte atclraimhlsfracrandomain. certain a for holds axiom particular a whether about (Velardi certainty heuristics linguistic applying and al. of corpus, variety et the matching in a ), 1998 order 1992) (Fellbaum, (Hearst, in WordNet applies patterns of V¨olker, instance, Hearst 2005) structure For and and hypernym the question. correctness (Cimiano exploiting in Text2Onto the algorithms domain subsumptions, to the learn respect for elements to with ontology evidences of of relevance the kinds different or provide ontology ontology. the algorithms, the in of information axioms the from is the derived measure with be evaluation consistent also the be are can and also world, that that external could facts the documents these about of of “facts” corpus percentage of the source 1995), a (Lenat, as Cyc used as such information factual topics of the structure documents. that hidden of indicate the corpus would with aligned This domain-specific well ontology. the reasonably lexical-layer the in is in with for ontology related associated the technique closely of classes a be structure that obtain should require we topic may same well, we Alternatively, clustering the reasonably ontology. the topic the by some of identified evaluation least topic at each the fits for and class measure, obtained ontology class the to models the well probabilistic used class how in The algorithm, Each be name (3) can its (2) WordNet. be clustering including from can topics. terms taken during name, of of document this set mixture each of a a hypernyms that by by such represented generated is “topics” determine been corpus ontology to hidden used a having of is Given as algorithm (1) model modeled clustering documents. mixture a of interest, probabilistic of corpus a domain a the and from used ontology documents be an of between can fit subject structural given of a on algorithm. documents learning classification the of text from to corpus algorithms a input learning a other machine the to classification; standard as and input text by properties, trained of the be area and as can the classes itself this of model use The names extract and model. as can strings) one (such topics: natural-language ontology of suitable the directory a from into data ontology the textual classify to and topic, particular ere em r hncmae oteeautdotlg omauetecoverage the measure pro- to language domain. ontology the natural evaluated use i.e. the (Velardi corpus to to corpus the compared is a of then corpus in are terms text relevant terms certain all learned completeness a detect the to to assess accessible to techniques respect is way cessing with A domain ontology ontology. complete the an the to if of compared automatically be measured can and only automatically be can This terest. aytcnqe o h uoae eeaino noois ..otlg learning ontology e.g. ontologies, of generation automated the for techniques Many of lot a incorporate that ontologies sophisticated and extensive more of case the In (Brewster (Patel in- of domain complete the covers ontology an when given is completeness Domain .Bsdo h vdne n a opt ofiecswihmdlthe model which confidences compute can one evidences the on Based 2005). , tal. et tal. et rpsda praht eemn fteotlg eest a to refers ontology the if determine to approach an proposed 2003) , ugse sn aadie praht vlaetedegree the evaluate to approach data-driven a using suggested 2004) , C t httpc 4 tti on,i erqieta each that require we if point, this At (4) topic. that fits 12Mtosadapproaches and Methods 11.2 tal. et .The 2005). , C fthe of 195</p><p>11 Chapter 11 Related work</p><p>11.3 Watson corpus</p><p>For a number of experiments throughout this thesis we have used corpora derived from the Watson collection. This section describes the corpora and how we created them. In order to test our approach on a realistic corpus of Web ontologies we have created and made available a corpus of ontologies based on the Watson corpus. Watson is a search engine developed by the Knowledge Media Institute (d’Aquin et al., 2007b). The complete corpus is simply called the Watson corpus and contains roughly 130,000 ontologies. It is available as part of the Billion Triple Challenge corpus. We used the 2008 edition.4 We have further sampled randomly two subcorpora. Each ontology has a name based on a hash-sum of the ontology’s URI. We defined two subcorpora based on these names: the Watson EA corpus (all ontologies where the hash started with EA) and the Watson 130 corpus (all ontologies where the hash started with 130). Watson 130 contains 35 ontologies, Watson EA 515 ontologies. Some of the evaluations are based on an earlier corpus. We received an early copy of the Watson corpus in spring 2007, containing 5873 files. We filtered these ontologies to receive only valid OWL DL ontologies, so that only 1331 ontologies remained (checking using KAON2 (Motik, 2006)). These ontologies are made available online,5 including metadata about the ontologies extracted during the experiments.6 The set of valid OWL ontologies is called the Watson OWL corpus. The ontologies were given short labels for easier reference (A00 to N30). All ontolo- gies can be retrieved in order to examine the results in this thesis within the context of the complete ontology. The metadata about the ontologies offers several key metrics about the ontology, e.g. the number of class names, the number of axioms, etc. Using the metadata file one can easily filter and select ontologies with specific properties. This corpus is by no means meant to be a full view of the Semantic Web, but just a partial, representative snapshot that should allow us to draw conclusions about cur- rent OWL DL ontology engineering practice on the Web. We assume that Watson is a random sample of the Web The experiments were partially run using the RDFlib7 Python library in version 2.4.0 instead of KAON2 that was used for DL checking. As of now, the KAON2 SPARQL engine allows only for conjunctive queries on the ABox of the ontology, but does not allow to query the TBox. Since most, though not all, ontology engineering patterns are indeed expressed in the TBox we had to resort to another SPARQL engine for some of the experiments.</p><p>4http://challenge.semanticweb.org/ 5http://www.aifb.uni-karlsruhe.de/WBS/dvr/research/corpus 6http://www.aifb.uni-karlsruhe.de/WBS/dvr/research/corpus/meta.rdf 7Available from http://rdflib.net/</p><p>196 famn o vlainmtosta elu fa nooyi od estldfrthe for settled we good, is ontology an if us tell that methods evaluation for methods aiming for of the aimed assess defined” we be to measure cannot is, a “Quality ontology such an of good may instead ontology how an tell goal: problems us the our help to redefine out that to tell point number not it had simple does We does this a number nor does ontology, have. this can the How having improve Just this: to ontology? how ontologies? like an us maintaining measure of and a engineering dimensions in many with help the shortcomings capture many is. really so it measure are better much there how but also defined, but other, the than better is ontology one as such ontologies two Given fully simple, a achieving of na¨ıve goal the had function: we quality real-valued thesis, computable, this automatically with started we When Conclusions 12 Chapter u loti us o ult i o edt aifigrsl,epcal since especially result, satisfying a to lead not did quality for quest this also But (Burton-Jones In na¨ıve goal. a was it said, As Q ( O 1 0 = ) . 3and 73 O 1 and Q .S neaanw hne u ol instead goal: our changed we again once So 1984). (Pirsig, ( O O 2 2 0 = ) ewne ob bet s h esr,gtresults get measure, the use to able be to wanted we Q . : 2 n hsntol en bet tt that state to able being only not thus and 62, → O [0 , 1] tal. et uhamauei indeed is measure a such 2005) , quality h...tig.U...the . . Uh. things? things. about . . all the. said the you of where part stuff the Um, repeat Okay. you can huh. Uh huh. Uh fa ontology. an of wrzedr 1996)) (Swartzwelder, esn7 psd 17 Episode 7, Season HmrSimpson (Homer h Simpsons, The 197</p><p>12 Chapter 12 Conclusions goal of finding ontology evaluation methods that tell us if an ontology is bad, and if so, in which way. This turned out to be the most useful approach in order to get closer to our goals: improving the quality of ontologies on the Web in general and thus gaining advantages from better ontologies, increasing the availability of ontologies by providing usable methods to test ontologies before release, and lower the maintenance costs for ontologies by providing methods that point out possible errors well in advance. But now it should be clear that none of the methods, neither alone nor in combination, can guarantee a good ontology. This final chapter summarizes the achievements of this thesis in Section 12.1 and lists the many open research questions and development challenges in Section 12.2.</p><p>12.1 Achievements</p><p>The result of this thesis is a comprehensive framework for the evaluation of ontologies. The framework organizes ontology evaluation methods in two dimensions: ontology quality criteria (accuracy, adaptability, clarity, completeness, computational efficiency, conciseness, consistency, and organizational fitness) and ontology aspects (vocabulary, syntax, structure, semantics, representation, and context). For all criteria and for all aspects we presented methods to evaluate the given criteria or aspect. We added a number of new techniques to the toolbox of an ontology engineer, such as stable metrics, XML based ontology validation, reasoning over a meta-ontology, and others. Unlike other evaluation frameworks and methods we separated an ontology into the given aspects, thus making it clear what is actually being evaluated. A common error in current research is to mix up semantics and structure. Chapters6–8 show how to keep these levels separate, and offers the tool of normalization in order to assess exactly what the metrics engineer claims to assess. This will clarify the conceptualiza- tion surrounding the evaluation of ontologies, and help with describing new ontology evaluation methods and what their benefits will be. The framework in this thesis is also novel as far as it puts some emphasis on the evaluation of the “lower aspects” of the ontology, i.e. vocabulary, syntax, and structure (Chapters4–6). Only recently, with the strong shift towards Linked Data, have these lower levels gained increased scrutiny. This is not yet reflected so much in research work but rather in informal groups such as the Pedantic Web.1 Other evaluation frameworks in published research almost exclusively focus on the aspects of semantics and context. But our extensive survey of existing ontological data shows that many ontologies have easily reparable issues on those low levels already. Without means to evaluate those aspects it is hard to fix them. We hope that our framework will show to be a more comprehensive and inclusive framework that takes into account both parts of ontology evaluation, and will ultimatively help with improving the overall quality</p><p>1http://pedantic-web.org</p><p>198 h etrso notlg n ult rtrarmisesl h otimportant most tools the scientific easily using evaluations remains Also criteria evaluation. quality ontology between and in connection challenge ontology research the an open of hinder of understanding will criteria features better and A the methods matching evaluations. evaluations ontology experimental meaningful of lack The user. for indicator (Gangemi (Yu in- in task some described browsing to relationship a tuitive point the in turn ontology between in the experiments correlations using Recent higher the a criteria. negative. investigating relations: the or counterintuitive experiments and positive deed methods any either the criteria, report of certain not results for For do indicators they as all. serve at But may if metrics investigated, the superficially of implemented, ontology and defined, and understood (Gangemi designed, methods badly example, evaluation properly only between is been relation criteria have The quality them verified. of experimentally few and only But field, this erature. in work rich understanding the issue. and integrating open creating errors an methods, for many remains evaluation framework eliminated least task-aware a they at Providing and will that ontology. domain- here confidence the importance discussed publish the relative level can the with quality and on engineer minimum decide ontology the methods and the But with hand, provide up criteria. at evaluation to come task order the to and In of needs domain far. always the so evaluator for go the only appropriate ontology, can an but level, evaluate quality properly minimum and common some this provides framework the throughout to questions pertain research that questions specific research whole. to general metric a more out normal as list a pointed will turns we We be that Here would method thesis. metric. it a XML define and stable normal- to evaluation, issues: a possible ontology languages, open is into besides schema it of areas if powerful list investigate other to more in specific interesting benefits own, with by offer raised their extended may challenges have be ization research methods and can the questions validation of open schema Many of number thesis. a list this we following the In questions Open 12.2 mutual the highlighting two other. the ontologies, each reconcile and expressive from to availability gain and path the can a data improve out they linked point only benefit also on not but Web, hopefully streams the will research on This data semantic Web. of usefulness the on ontologies of sw aese,nmru ult vlainmtoshv ensgetdi lit- in suggested been have methods evaluation quality numerous seen, have we As here, discussed as verification, task-independent and domain- a that obvious is It ontv ergonomics cognitive tal. et icse ubro ult rtraadhwsome how and criteria quality of number a discusses 2005) , hc nldsepotblt fteotlg ythe by ontology the of exploitability includes which , tangledness tal. et tal. et hr agens sa is tangledness where 2005) , .Ti otait h oein- more the contradicts This 2007). , culyicessecec when efficiency increases actually 22Oe questions Open 12.2 negative 199</p><p>12 Chapter 12 Conclusions from the field of psychology are expected to further show their usefulness in evaluating ontologies. (Yamauchi, 2007) provides an example, where the difference between two possibilities to represent formal models is evaluated, but we expect this to be merely the beginning towards a better understanding of human conceptualizations, which, in the end, form the foundation for every ontology. Most of the presented methods in this thesis are only prototypically implemented, be it as tools of their own (like the XML schema-based validation) or be it as part of a toolset (like the structural metrics implemented in the KAON2 OWL tools). What this thesis did not achieve is the implementation of a comprehensive application that applies the various described evaluation methods and provides a summarizing report, either as a part of an ontology development environment or as a stand-alone application. We expect such a validation tool to be of great use. Also many of the current implementations are not efficient. We have defined the results formally, but for a number of our prototypical implementations we do not expect them to scale to realistically sized ontologies. We expect that future research will realize efficient implementations of those methods that have proven useful. We have implemented a number of the evaluation methods within a collaborative semantic authoring system, Semantic MediaWiki. SMW was developed and imple- mented during the creation of this thesis. We expect the field of collaborative ontol- ogy evaluation to become an increasingly important topic for collaborative knowledge construction. But what we see today is just the beginning of this interesting, new research track. We expect the close future to show hitherto unknown levels of cooper- ation between groups of humans and federations of machine agents, working together to solve the wicked problems we face today.</p><p>200 Part IV</p><p>Appendix</p><p>List of Methods 203</p><p>List of Tables 205</p><p>List of Figures 207</p><p>Bibliography 209</p><p>Full Table of Contents 230</p><p>160 161 ...... 157 ...... 146 ...... 155 . . . . . 154 ...... 141 . . . 140 . . 145 ...... 101 ...... 114 . . 125 ...... rules ...... with ...... checks ...... Inconsistency ...... 79 . . . expressivity . . . . ontologies . . Increasing 86 81 . 23 . . . . test . . . . . with . . 82 . . . testing ...... constraints . . Unit 22 . . . with ...... questions ...... competency . . . . results . . Checking 21 . . . . 78 . . . against ...... questions ...... competency 75 ...... Checking . 20 ...... 74 . . 71 ratio ...... terminology . . . hierarchy . . . Explicit 19 ...... subsumption . . . . the ...... of ...... Explicitness . 18 . . 73 ...... completeness ...... language 69 . . . . hierarchy . Measuring . 17 ...... class ...... stable ...... a . . . 67 ...... Ensuring . . 16 ...... check ...... meta-property ...... OntoClean . . 15 ...... Anti-Patterns ...... for ...... Searching . . 14 ...... complexity . . . . . schema Ontology . . 13 ...... XML . . . . . an . . . . . against . . . . nodes . . . Validating . 12 . blank ...... superfluous . . . . . for . comments . . . Check . 11 . and ...... labels . . . . . Check ...... 10 . tags . . . . language . . . types Check . . . data . . . 9 . . and . . . . literals . . . Check ...... 8 . declarations . . . name . . Check . . reuse . . 7 ontology . . . of . . . Metrics . . . . . 6 conventions . naming . . . Check . . names 5 . . up . Look . 4 codes response . Check . 3 protocols used Check 2 1 Methods of List 203</p><p>14</p><p> ito Tables of List ...... 124 ...... of completeness . language . the calculate . to assertions . property . and Class Taggings 7.1 Automatic for Violations Constraint 6.2 table 71 the of part lower names. The of Taggings. Manual number for biggest Violations the Constraint with namespaces slash 6.1 and hash five The 4.2 resolved 18 the for imply . codes . response . different . what . of . overview . An . . 4.1 prop- . (datatypes properties more . object hold using . may expressions OWL * . of with Semantics . noted . types Axiom 2.2 . axioms. . OWL . of Semantics . thesis 2.1 this in declaration Namespace 1.1 fany. if ...... 142 ...... 68 ...... ontology. . example the ...... inter-annotater . the . from e.g. taggings sets, . on agreed . based 34 violations . constraint . shows ...... imply not . . do . others facts. the only, . further responses any . important most . the covers . table . . . reference . URI . HTTP 35 . parameters. more . hold may . * with . types . Expression analogous). . are erties . . parameters. given the than h agnsweeannotator where taggings the L steUIgvni h oainfil ftersos.The response. the of field location the in given URI the is A 2 / A U 3 . I hw h ubro iltosbsdo only on based violations of number the shows steifrainrsuc hti returned, is that resource information the is A 2 and A 3 ...... 123 ...... on. agreed 205</p><p>14</p><p> ito Figures of List ...... 172 ...... 153 ...... 168 . . 171 (bottom). SMW London. . in of . and . (top) view MediaWiki . semantic . in MediaWiki. A London . to about 10.3 page relation . a in of . components Source main . 10.2 SMW’s . of Architecture . 10.1 . hierarchy. class The Example normalization. (right) 9.1 after 103 and (left) before . taxonomy . simple . A . that . 8.1 Note ontology. . time . the of . levels . upper The . class . pattern: partition . 6.4 The . . 6.3 . . mea- similarity . Semantic semantics. . identical with hierarchies path. class hierarchy Two circular a the for 6.2 Note Example corpus. Watson the 6.1 in types data used often most fifteen The 45 the from URIs 4.2 HTTP the . on codes . response . HTTP the . of . Distribution . . 4.1 . . . ontology. . an . and . agents . Three . . 3.5 . . agents ontologies. lines Two for Dotted spectrum semantic 3.4 The reification. its and 3.3 left) the (on axiom the subsumption represents A arrow slashed The 3.2 evaluation. ontology for Framework 3.1 ...... 144 106 ...... 77 ...... 69 subsumption. denote . . arrows ...... Instant ...... one...... sure ...... 40 . . . . scale. . . right logarithmic the . . URIs, . slash . the . . shows URIs. side . hash hand side . left hand The . corpus. . 51 EA Watson ...... conceptualization . own . . its . ating . . . . reasons. . space . conceptualizations. . respective their . and . other, each . tree), . . . . individuals, . circles . annotations, classes. squares lines and slashed instantiation, represent expresses ssm 114 classes. sibling not are they though even disjoint declared are of ...... 38 ...... relation. C X 1 Z and and nenlzsontology internalizes Y C 3 n hi ocpulztoso domain of conceptualizations their and slwri h etotlg hni h right the in than ontology left the in lower is A spriindit h subclasses the into partitioned is C Y Z scnetaiaini mte for omitted is conceptualization ’s fdomain of O hscnetn tt rcre- or to it connecting thus , d 52 tree. the case, this in , ProperInterval B d 1 (the and B . . . n 108 207</p><p>14 List of Figures</p><p>10.4 Inverse search in SMW, here giving a list of everyone born in London. 174 10.5 Production rules for SMW queries (top) and according OWL descrip- tions (bottom)...... 176 10.6 A semantic query for all cantons of Switzerland, together with their cap- ital, population, and languages. The data stems from an automatically annotated version of Wikipedia...... 178</p><p>11.1 UML diagram of the main notions from O2 (from Gangemi et al., 2005). 189</p><p>208 enAlmn n ae Hendler. James and Allemang Dean Ingrid Jacobson, Max Silverstein, Murray Ishikawa, Sara Alexander, Christopher Les In ontologies. online from construction ontology paper: Position Alani. Harith Denny In ontologies. ranking for Metrics Brewster. Christopher and Alani Harith Bibliography rn adr ig avns,DbrhL cunes ail ad,adPtrF. Peter and Nardi, Daniele McGuinness, L. Deborah Calvanese, Diego Baader, Franz Yates. Ben and Matthews, Charles Ayers, Phoebe social, for tool A – OntoWiki Riechert. Thomas and Dietzold, Sebastian S¨oren Auer, Ex- common? in Leipzig and Innsbruck have What Lehmann. Jens and S¨oren Auer Aristotle. e ok Y 1977. NY, York, New Angel. Shlomo and Fiksdahl-King, Dahlin, Michael and Goble, A. Carole Iyengar, Arun Roure, editors, De David Carr, Conference Web Wide World International 15th the 2006) at (WWW (EON2006) Web the for ed- itors, Sure, York and Gangemi, Su´arez-Figueroa,Vrandeˇci´c, Aldo Carmen del Mari ieMdln nRF n OWL and RDFS in Modeling tive (WWW2006) ae-cnie,editors. Patel-Schneider, 2008. October CA, Francisco, San Press, 2006. Springer, Benjamins, 736–749. pages V. LNCS, Richard in Motta, 4273 number Enrico editors, Musen, Gil, Mark Yolanda and In collaboration. semantic and Kifer, Michael Franconi, Enrico In 2007. content. editors, May, wiki Wolfgang from semantics tracting n applications and Metaphysics rceig fte4hItrainlWrso nEauto fOntologies of Evaluation on Workshop International 4th the of Proceedings rceig fte1t nentoa ofrneo ol ieWeb Wide World on conference international 15th the of Proceedings ae 9–9,Eibrh ctad a 06 ACM. 2006. May Scotland, Edinburgh, 491–495, pages , ae 43,Eibrh ctad a 2006. May Scotland, Edinburgh, 24–30, pages , abig nvriyPes e ok Y S,2003. USA, NY, York, New Press, University Cambridge . xodUiest rs,30B.tasae yW .Ross. D. W. by translated BC. 330 Press, University Oxford . rc t uoenSmni e ofrne(ESWC) Conference Web Semantic European 4th Proc. h ecito oi adok hoy implementation, theory, handbook: logic description The rc t n.Smni e ofrne(ISWC’05) Conference Web Semantic Int. 5th Proc. eatcWbfrteWrigOtlgs:Effec- Ontologist: Working the for Web Semantic ognKumn a rnic,C,2008. CA, Francisco, San Kaufman, Morgan . atr Language Pattern A o iiei works Wikipedia How xodUiest Press, University Oxford . oStarch No . 209 , ,</p><p>14 BIBLIOGRAPHY</p><p>Jie Bao, Li Ding, Rui Huang, Paul Smart, Dave Braines, and Gareth Jones. A semantic wiki based light-weight web application model. In Proceedings of the 4th Asian Semantic Web Conference, pages 168–183, 2009.</p><p>Jie Bao, Sandro Hawke, Boris Motik, Peter F. Patel-Schneider, and Axel Polleres. rdf:PlainLiteral: A Datatype for RDF Plain Literals, 2009. W3C Recommenda- tion 27 October 2009, available at http://www.w3.org/TR/rdf-text/.</p><p>Daniel J. Barret. MediaWiki. O’Reilly, 2008.</p><p>Sean Bechhofer, Frank van Harmelen, Jim Hendler, <a href="/tags/Ian_Horrocks/" rel="tag">Ian Horrocks</a>, Deborah L. McGuin- ness, Peter F. Patel-Schneider, and Lynn Andrea Stein. OWL <a href="/tags/Web_Ontology_Language/" rel="tag">Web Ontology Language</a> Abstract Reference, 2004. W3C Rec. 10 February 2004.</p><p>Kent Beck. Extreme Programming. Addison-Wesley, Reading, MA, 1999.</p><p>Dave Beckett and Jeen Broekstra. SPARQL Query Results XML Format. W3C Recommendation 15 January 2008, 2008. available at http://www.w3.org/TR/ rdf-sparql-XMLres/.</p><p>Dave Beckett. RDF/XML syntax specification (revised). W3C Recommendation, February 2004.</p><p>Gabe Beged-Dov, Dan Brickley, Rael Dornfest, Ian Davis, Leigh Dodds, Jonathan Eisenzopf, David Galbraith, R.V. Guha, Ken MacLeod, Eric Miller, Aaron Swartz, and Eric van der Vlist. RDF site summary (RSS) 1.0, December 2000. Available at http://web.resource.org/rss/1.0/spec.</p><p>V. Richard Benjamins, Pompeu Casanovas, Jes´usContreras, Jos´eManuel L´opez Cobo, and Lisette Lemus. Iuriservice: An intelligent frequently asked questions system to assist newly appointed judges. In V.R. Benjamins, P. Casanovas, A. Gangemi, and B. Selic, editors, Law and the Semantic Web, LNCS, Berlin Heidelberg, 2005. Springer.</p><p>Tim Berners-Lee, Larry Masinter, and Mark McCahill. Universal Resource Locators (URL). Technical Report 1738, Internet Engineering Task Force, December 1994.</p><p>Tim Berners-Lee, Jim Hendler, and Ora Lassila. The seman- tic web. Scientific American, 2001(5), 2001. available at http://www.sciam.com/2001/0501issue/0501berners-lee.html.</p><p>Tim Berners-Lee, Roy Fielding, and Larry Masinter. Uniform Resource Identifier (URI): Generic Syntax. Technical Report 3986, Internet Engineering Task Force, June 2005. RFC 3986 (available at http://www.ietf.org/rfc/rfc3986.txt).</p><p>210 a rclyadLbyMle.TeFin fAFin FA)vcblr specifi- vocabulary (FOAF) Friend A Of Friend The Miller. Libby and Brickley Dan Wilks. Yorick and Dasmahapatra, Srinandan Alani, Francois Harith and Brewster, Maler, Christopher Eve Sperberg-McQueen, Michael C. Paoli, Jean Bray, Tim XML in Namespaces Tobin. Richard and Layman, Andrew Hollander, Dave Bray, Tim evaluation ontology of survey Mladeni´c. A Dunja and Grobelnik, Marko Brank, Janez vocabularies, RDF publishing for recipes practice Best Phipps. Jon and se- Berrueta Diego the on values property as classes Representing at Phipps. Jon available and Berrueta available Diego 2006. 1998. web, the on Style, data for W3C language readable a - 3 Notation change. Berners-Lee. Tim don’t URIs Cool Berners-Lee. Tim Shadbolt, Nigel O’Hara, Kieron Hendler, A. James Hall, Wendy Berners-Lee, Tim James Dhanaraj, Ruth Connolly, Dan Chilton, Lydia Chen, Yuhsin Berners-Lee, Tim oenLnug eore Association. Resources Language ropean ain uy2005. July cation, 2004) (LREC Conference Evaluation In evaluation. ontology Data-driven at available 2008. 2008, Recommen- November W3C edition). 26 (fifth dation 1.0 (XML) language markup Extensible Yergeau. at available 2006, August 16 Recommendation W3C http://www.w3.org/TR/REC-xml-names/ 2006. edition), (second 1.0 Society tion In techniques. at avail. 2008, August 28 TR/swbp-vocab-pub/ Note at Group avail. Working W3C 2005, 2008. April 5 Note Group Working W3C //www.w3.org/TR/swbp-vocab-pub/ 2005. web, mantic at http://www.w3.org/Provider/Style/URI.html. science. web for framework Science Web A in Weitzner. J. Daniel and Semantic International the Abraham ISWC2006 at Conference schraefel, SWUI2006 Web m.c. Workshop Interaction Rutledge, User Lloyd Web editors, mantic In Degler, Web. Duane analyzing and Semantic and Bernstein, the Exploring on Tabulator: data Sheets. linked David and Lerer, Adam Hollenbach, http://www.w3.org/DesignIssues/Notation3 ae 6–6,2005. 166–169, pages , rceig f8hItrainlMliCneec fteInforma- the of Multi-Conference International 8th of Proceedings ()110 2006. 1(1):1–130, , . 2006. , ae 6–6,Lso,Prua,20.Eu- 2004. Portugal, Lisbon, 164–168, pages , rceig fteLnug eore and Resources Language the of Proceedings . rceig fteTidItrainlSe- International Third the of Proceedings . http://www.w3.org/TR/REC-xml . onain n Trends and Foundations http://www.w3.org/ BIBLIOGRAPHY http: 211 .</p><p>14 BIBLIOGRAPHY</p><p>Thomas Henry Briggs. Constraint Generation and Reasoning in OWL. PhD thesis, University of Maryland, 2008.</p><p>Saartje Brockmans and Peter Haase. A Metamodel and UML Profile for Rule- extended OWL DL Ontologies –A Complete Reference. Technical report, Univer- sit¨atKarlsruhe, March 2006. http://www.aifb.uni-karlsruhe.de/WBS/sbr/ publications/owl-metamodeling.pdf.</p><p>Andrew Burton-Jones, Veda C. Storey, Vijayan Sugumaran, and Punit Ahluwalia. A semiotic metrics suite for assessing the quality of ontologies. Data and Knowledge Engineering, 55(1):84–102, October 2005.</p><p>Stefano Emilio Campanini, Paolo Castagna, and Roberto Tazzoli. Towards a seman- tic wiki wiki web. In Giovanni Tummarello, Christian Morbidoni, Paolo Puliti, Francesco Piazza, and Luigi Lella, editors, Proceedings of the 1st Italian Semantic Web Workshop (SWAP2004), Ancona, Italy, December 2004.</p><p>Pompeu Casanovas, N´uriaCasellas, Marta Poblet, Joan-Josep Vallb´e,York Sure, and Denny Vrandeˇci´c.Iuriservice II ontology development. In Pompeu Casanovas, ed- itor, Workshop on Artificial Intelligence and Law at the XXIII. World Conference of Philosophy of Law and Social Philosophy, May 2005.</p><p>Pompeu Casanovas, Nuria Casellas, Christoph Tempich, Denny Vrandeˇci´c, and Richard Benjamins. OPJK and DILIGENT: ontology modeling in a distributed environment. Artificial Intelligence and Law, 15(1), 2 2007.</p><p>Werner Ceusters and Barry Smith. A realism-based approach to the evolution of biomedical ontologies. In Proceedings of the AMIA 2006 Annual Symposium, November 2006.</p><p>Gong Cheng, Weiyi Ge, and Yuzhong Qu. FALCONS: Searching and browsing entities on the semantic web. In Proceedings of the the World Wide Web Conference, 2008.</p><p>Philipp Cimiano and Johanna V¨olker. A framework for ontology learning and data- driven change discovery. In Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB’2005), 2005.</p><p>Philipp Cimiano, G¨unter Ladwig, and Steffen Staab. Gimme the context: Context- driven automatic semantic annotation with C-PANKOW. In Allan Ellis and Tatsuya Hagino, editors, Proceedings of the 14th World Wide Web Conference, pages 332 – 341, Chiba, Japan, MAY 2005. ACM Press.</p><p>James Clark and Makoto Murata. RELAX NG Specification, December 2001. OASIS committee specification.</p><p>212 ahe ’qi n ogrLwn ubad–apaet xoeyu ontologies your expose to place a – Cupboard Lewen. Holger and d’Aquin Mathieu W3C the Vrandeˇci´c, Pushing Denny Bechhofer. Mochol, Sean edition). Malgorzata and Cregan, (second Anne set information XML Tobin. Richard and Cowan November John 16 Recommendation W3C (XSLT). Transformations XSL Clark. James Query SPARQL Serializing Torres. Elias and Feigenbaum, Lee Clark, Grant Kendall ahe ’qi,CadoBlasre ara rdnc at ao,Sfi An- Sofia Sabou, Marta Gridinoc, Laurian Baldassarre, Claudio d’Aquin, Mathieu Marta Angeletou, Sofia Gridinoc, Larian Baldassarre, Claudio d’Aquin, Mathieu oeDegadOiirCry editors. Corby, Olivier and Dieng Rose Internet DeMarco. 2426, Tom RFC Profile. Directory MIME vCard Howes. Tim and Dawson Frank oapiain n h omnt.In community. the and applications to editors, Directions Patel-Schneider, Peter and Grau, and Cuenca Parsia, Bernardo Bijan In Horrocks, example. Prot´eg´eIan simple and A Rules OWL, – of limits at available 2004. xml-infoset/ 2004, February 4 Recommendation at available 1999. at 9 available 1999, 2007. 2007, June 18 Note Group Working W3C http://www.w3.org/TR/rdf-sparql-json-res/ JSON. in Results ue Science Crete, Proceedingsputer Heraklion, 2009, 2009, 4, ESWC 31-June Conference, May Greece, Web Semantic European 6th cations, eeo,adErc ot.Wto:Spotn etgnrto eatcweb semantic generation next Supporting Watson: In applications. Motta. Enrico and geletou, Workshop EON International 5th tools, ISWC/ASWC’07 Ontology-based at and (EON2007) Ontologies editors, of Huang, with tion web Zhisheng semantic and the on Sure, Vrandeˇci´c, knowledge Denny Characterizing In Ra´ul Garc´ıa-Castro,Watson. Motta. Asunci´on G´omez-P´erez, Enrico and York Sabou, neo nweg niern n nweg aaeet ehd,Models, Methods, (LNAI) Management: 2000) (EKAW Knowledge Tools and and Engineering Knowledge on ence tion niern akFre 1998. 9 Force, Task Engineering oro rs,NwYr,1982. York, New Press, Yourdon . unlsPn,Fac,20.Springer. 2002. France, Juan-les-Pins, , otoln otaePoet:Mngmn,Maueet&Estima- & Measurement Management, Projects: Software Controlling ae 1–1.Srne,Mi2009. Mai Springer, 913–918. pages , . awy rln,1 2005. 11 Ireland, Galway, , W/nentconference WWW/Internet oue13 of 1937 volume , http://www.w3.org/TR/1999/REC-xslt-19991116 ae –0 ua,Kra oebr2007. November Korea, Busan, 1–10, pages , rceig fte1t nentoa Confer- International 12th the of Proceedings rceig fteWrso nEvalua- on Workshop the of Proceedings h eatcWb eerhadAppli- and Research Web: Semantic The oue55 of 5554 volume , etr oe nAtfiilIntelligence Artificial in Notes Lecture iara,San 2007. Spain, real, Vila , . http://www.w3.org/TR/ etr oe nCom- in Notes Lecture W:Experiences OWL: BIBLIOGRAPHY 213 .</p><p>14 BIBLIOGRAPHY</p><p>Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng, Pavan Red- divari, Vishal Doshi, and Joel Sachs. Swoogle: a search and metadata engine for the semantic web. In CIKM ’04: Proceedings of the thirteenth ACM inter- national conference on Information and knowledge management, pages 652–659, New York, NY, USA, 2004. ACM.</p><p>Francesco M. Donini, Daniele Nardi, and Riccardo Rosati. Description logics of min- imal knowledge and negation as failure. ACM Transactions on Computational Logic, 3(2):177–225, 2002.</p><p>Umberto Eco. Foucault’s Pendulum. Secker & Warburg, London, 1988.</p><p>Marc Ehrig, Peter Haase, Nenad Stojanovi´c,and Mark Hefke. Similarity for ontologies - a comprehensive framework. In 13th European Conf. on Information Systems, 2005.</p><p>Basil Ell. Integration of external data in semantic wikis. Master thesis, Hochschule Mannheim, December 2009.</p><p>Michael Erdmann and Rudi Studer. How to structure and access XML documents with ontologies. Data Knowl. Eng., 36(3):317–335, 2001.</p><p>Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. Web- scale information extraction in KnowItAll (preliminary results). In Proceedings of the 13th International Conference on the World Wide Web (WWW 2004), pages 100–109, 2004.</p><p>Ronald Fagin, Joseph Halpern, Yoram Moses, and Moshe Vardi. Reasoning about Knowledge. MIT Press, Cambridge, MA, USA, 2003.</p><p>David C. Fallside and Priscilla Walmsley. XML schema part 0: Primer second edition, 2004. W3C Rec. 28 October 2004.</p><p>Adam Farquhar, Richard Fikes, and James Rice. The Ontolingua Server: A tool for collaborative ontology construction. In Proceedings of the 10th Banff Knowledge Acquisition for KnowledgeBased System Workshop (KAW’95), Banff, Canada, November 1996.</p><p>Christine Fellbaum. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press, May 1998.</p><p>Mariano Fern´andez-L´opez and Asunci´onG´omez-P´erez.The integration of OntoClean in WebODE. In Proceedings of the EON2002 Workshop at 13th EKAW, 2002.</p><p>214 akS o n ihe rnne.Etrrs modeling. Enterprise Gruninger. Michael and Fox S. Paul Mark Masinter, Larry Frystyk, Henrik Mogul, Jeffrey Gettys, James Fielding, Roy Alejan- and Fern´andez-L´opez, Sierra, Mariano Asunci´on G´omez-P´erez, Pazoz Juan loGnei nooydsg atrsfrsmni e otn.I oad Gil, Yolanda In content. web semantic for patterns design Ontology Gangemi. Aldo Qood Lehmann. Jos and Ciaramita, Massimiliano Catenaccia, Carola Gangemi, Aldo Mod- Lehman. Jos and Ciaramita, Massimiliano Catenacci, Carola Gangemi, Aldo On- Lehmann. Jens and Ciaramita, Massimiliano Catenacci, Carola Gangemi, Aldo Luc and Oltramari, Alessandro Masolo, Claudio Guarino, Nicola Gangemi, Aldo Vlissides. John and Johnson, Ralph Helm, Richard Gamma, Erich fte4hItrainlSmni e ofrne(ISWC2005) Conference editors, Web Musen, Semantic A. LNCS International Mark 4th and the Benjamins, of Richard V. Motta, Enrico Web Wide World International 2006. 15th May Scotland, the 2006) at (EON2006) (WWW Conference Web the for Ontologies York In and editors, selection. Gangemi, Sure, and Su´arez-Figueroa, Aldo evaluation Carmen ontology Vrandeˇci´c, for del Denny framework Mari metaontology-based A grid: (ESWC) In Conference Web validation. Semantic and evaluation ontology elling Rome, di- CNR, quality – the Ontologies Applied for of 2005. model Italy, Laboratory formal report, Technical integrated task. an agnostic validation: and evaluation tology Springer. 2002. Spain, Siguenza, 166–181, Web pages Semantic the and 2002) Ontologies (EKAW Management: R. Knowledge G´omez-P´erez V. and A. Engineering and edge In DOLCE. editors, with Benjamins, ontologies Sweetening Schneider. 1995. Software. sachusetts, Object-Oriented Reusable of Elements 1999. al1998. Fall RFC HTTP/1.1. – Protocol Transfer Hypertext 1999. June 2616, Berners-Lee. Tim and Leach, On- the and Methontology using ontology Environment. chemical Design a tology Building Sierra. Pazoz dro pigrVra elnHiebr,Nvme 2005. November Berlin-Heidelberg, Verlag Springer . http://www.loa-cnr.it/Files/OntoEval4OntoDev_Final.pdf oue27 of 2473 volume , rceig fte4hItrainlWrso nEauto of Evaluation on Workshop International 4th the of Proceedings rceig fte1t nentoa ofrneo Knowl- on Conference International 13th the of Proceedings oue19of 179 volume , EEItlietSystems Intelligent IEEE etr oe nAtfiilItliec (LNAI) Intelligence Artificial in Notes Lecture uv,Mneer,2006. Montenegro, Budva, , CEUR-WS rceig fteTidEuropean Third the of Proceedings dio ely edn,Mas- Reading, Wesley, Addison 41,January/February 14(1), , ae –5 Edinburgh, 8–15, pages , IMagazine AI BIBLIOGRAPHY einPatterns. Design oue32 of 3729 volume , 19:109–121, , Proceedings . 215 ,</p><p>14 BIBLIOGRAPHY</p><p>Asunci´onG´omez-P´erez,Mariano Fern´andez-L´opez, and Oscar Corcho. Ontological Engineering. Advanced Information and Knowlege Processing. Springer, 2003. Asunci´onG´omez-P´erez.Ontology evaluation. In Steffen Staab and Rudi Studer, edi- tors, Handbook on Ontologies, First Edition, chapter 13, pages 251–274. Springer, 2004. Jan Grant and Dave Beckett. RDF test cases. W3C Recommendation, February 2004. Bernardo Cuenca Grau, Ian Horrocks, Boris Motik, Bijan Parsia, Peter Patel- Schneider, and Ulrike Sattler. OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on the World Wide Web, 6(4):309–322, 2008. Stephan Grimm and Boris Motik. Closed world reasoning in the semantic web through epistemic operators. In Bernardo Cuenca Grau, Ian Horrocks, Bijan Parsia, and Peter Patel-Schneider, editors, Second International Workshop on OWL: Experi- ences and Directions (OWLED 2006), Galway, Ireland, 2005. Benjamin Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description Logic Programs: Combining Logic Programs with Description Logic. In Proceedings of the Twelfth International World Wide Web Conference, WWW2003, Budapest, Hungary, 20-24 May 2003, pages 48–57. ACM, 2003. William E. Grosso, Henrik Eriksson, Ray W. Fergerson, Samson W. Tu, and Mark A. Musen. Knowledge modeling at the millennium: the design and evolution of PROTEGE-2000. In Proceedings of the 12th International Workshop on Knowl- edge Acquisition, Modeling and Mangement (KAW-99), Banff, Canada, October 1999. Thomas R. Gruber. Towards principles for the design of ontologies used for knowledge sharing. International Journal of Human-Computer Studies, 43(5/6):907–928, 1995. Michael Gr¨uningerand Mark S. Fox. Methodology for the design and evaluation of ontologies. In IJCAI95 Workshop on Basic Ontological Issues in Knowledge Sharing, Montreal, 1995. Nicola Guarino and Christopher Welty. A formal ontology of properties. In Dieng and Corby (2002), pages 97–112. Nicola Guarino and Christopher Welty. Evaluating ontological decisions with Onto- Clean. Communications of the ACM, 45(2):61–65, February 2002. Nicola Guarino and Chris A. Welty. An overview of OntoClean. In Steffen Staab and Rudi Studer, editors, Handbook on Ontologies in Information Systems, First Edition, pages 151–172. Springer, 2004.</p><p>216 ee as,FakvnHree,Zihn un,Hie tceshit n York and Stuckenschmidt, Heiner Huang, Zhisheng Harmelen, van Frank Haase, Peter Deliverable SEKT evolution. ontology for tracking Usage Sure. York and Haase Peter In ontologies. OWL of evolution Consistent Stojanovi´c. Ljiljana and Haase Peter ee as n ulnQ.A nlsso prahst eovn inconsistencies resolving to approaches of analysis An Qi. Guilin and Haase di Peter Istituto 2, Deliverable FOFIS objects. information of Ontology Guarino. Nicola aclHtlradDnyVadˇic eouinbsdapoiaeraoigfor reasoning approximate Vrandeˇci´c. Resolution-based Denny In and Hitzler corpora. Pascal text large from hyponyms of acquisition Automatic 2004. 2004, Hearst. February A. 10 Marti Recommendation W3C Semantics. RDF Hayes. Patrick Su´arez- Carmen del Mari and Palma, Raul Haase, Peter Sure, York Hartmann, Jens be- Object SWSE: Decker. Stefan J¨urgen and Umbrich, Hogan, Aidan Harth, Andreas 2005. W L nYlnaGl nioMta .RcadBnais n akA. Mark and Benjamins, Richard V. Motta, (ISWC’05) Enrico Gil, editors, Yolanda Musen, In DL. OWL Linguistics Computational on 1992. Conference 539–545, International pages 14th the of Proceedings at available Workshop Web Semantic editor, Welty, the Chris for In Vocabulary. Patterns Metadata ogy Ontology – OMV Figueroa. Challenge Web Semantic Conference, Web Semantic editors, International 2008 Paolucci, Thirunarayan, 7th Massimo the Krishnaprasad Dean, of and Mike ings Staab, Finin, Steffen Tim Sheth, Maynard, P. Diana Amit In documents! fore 2005. November Springer, 353–367. pages Springer. 2005. Greece, Crete, nentoa eatcWbCneec (ISWC2005) editors, Gil, Conference Musen, Y. Web A. In M. Semantic ontologies. and changing Benjamins, International in R. V. inconsistency Motta, handling E. for framework A Sure. 2005. June Karlsruhe, of University AIFB, Institute D3.2.1, Conference Web Semantic European Asunci´on G´omez-P´erez J´erˆome editors, and Euzenat, yais(IWOD’07) Dynamics nD-ae noois In ontologies. DL-based in Mar Ricerche, delle Nazionale Consiglio del 2006. Cognizione della Tecnologie e Scienze oue51 of 5318 volume , oue32 of 3729 volume , http://www.w3.org/TR/rdf-mt/ rceig fteFut nentoa eatcWbConference Web Semantic International Fourth the of Proceedings LNCS ae 719 ue2007. June 97–109, pages , alrh,Gray coe 09 Springer. 2009. October Germany, Karlsruhe, , LNCS rceig fItrainlWrso nOntology on Workshop International of Proceedings pigrVra elnHiebr,November Berlin-Heidelberg, Verlag Springer . oue33,pgs1217 Heraklion, 182–197, pages 3532, volume , awy rln,2005. Ireland, Galway, , . rceig fteSecond the of Proceedings oue32 of 3729 volume , rceig fteFourth the of Proceedings BIBLIOGRAPHY Proceed- LNCS Ontol- 217 , ,</p><p>14 BIBLIOGRAPHY</p><p>Pascal Hitzler, Markus Krotzsch, and Sebastian Rudolph. Semantic Web Foundations. Springer, 2009.</p><p>Jerry R. Hobbs and Feng Pan. An ontology of time for the semantic web. ACM Transactions on Asian Language Information Processing (TALIP), 3(1):66–85, 2004.</p><p>Masahiro Hori, J´erˆomeEuzenat, and Peter F. Patel-Schneider. OWL Web Ontology Language XML presentation syntax, 2003. W3C Note 11 June 2003.</p><p>Matthew Horridge, Nick Drummond, John Goodwin, Alan Rector, Robert Stevens, and Hai Wang. The manchester owl syntax. In OWLED2006 Second Workshop on OWL Experiences and Directions, Athens, GA, USA, 2006.</p><p>Matthew Horridge. The Prot´eg´eOWL unit test framework, 2005. Website at http: //www.co-ode.org/downloads/owlunittest/.</p><p>Ian Horrocks and Peter F. Patel-Schneider. Reducing OWL Entailment to Description Logic Satisfiability. Journal of Web Semantics, 1(4):7–26, 2004.</p><p>Ian Horrocks, Frank van Harmelen, Peter Patel-Schneider, Tim Berners-Lee, Dan Brickley, Dan Connolly, Mike Dean, Stefan Decker, Dieter Fensel, Richard Fikes, Pat Hayes, Jeff Heflin, James A. Hendler, Ora Lassila, Deborah L. McGuinness, and Lynn Andrea Stein. DAML+OIL (March 2001), 2001. Joint Committee, http://www.daml.org/2001/03/daml+oil-index.</p><p>Ian Horrocks, Peter Patel-Schneider, Harold Boley, Said Tabet, Benjamin Grosof, and Mike Dean. SWRL: a semantic web rule language combining OWL and RuleML, 2003.</p><p>ISO 15924. Codes for the representation of names of scripts. Technical report, Inter- national Standard ISO, 2004.</p><p>ISO 2108. Information and documentation – International standard book number (ISBN). Technical report, International Standard ISO, 2005.</p><p>ISO 24824. ISO/IEC 24824-1 (Fast Infoset). Technical report, International Standard ISO, 2007.</p><p>ISO 3166. Codes for the representation of names of countries and their subdivisions. Technical report, International Standard ISO, 1999.</p><p>ISO 639-1. Codes for the representation of names of languages – Part 1: Alpha-2 code. Technical report, International Standard ISO, 2002.</p><p>218 oehM ua n .BatnGodfrey. Blanton A. and Juran M. Joseph RaDON Stadtm¨uller. Steffen and Hitzler, Pascal Qi, Guilin Haase, Peter Ji, Qiu In grounding. Mladeni´c. Ontology Dunja and 2004. Jakulin 1, Aleks Vol. Web Wide World the of Architecture Walsh. Norman and Jacobs data: characteristic Ian of Exchange data: Master 140: Part – quality Data data: 8000-140. characteristic ISO of Exchange data: Master 130: Part – quality Data data: 8000-130. characteristic ISO of Exchange data: Master 120: Part – quality Data 8000-120. ISO data: characteristic of Exchange data: Master 110: Part – quality Data data: 8000-110. characteristic ISO of Exchange data: Master 102: Part – quality code. Data Alpha-3 8000-102. 2: ISO Part – languages of names of representation the for Codes 639-2. ISO ihe ie,GogLue,adJmsW.Lgclfudtoso object-oriented of foundations Logical Wu. James and Lausen, Georg Kifer, Michael Plauger. P.J. and Kernighan James W. Brian and Grau, Cuenca Bernardo Sirin, Evren Parsia, Bijan Kalyanpur, Aditya n rm-ae languages. frame-based and 1978. edition, 2nd Hill, Web Wide World the browser. on editing Agents ontology and web Services A Science, Swoop: Hendler. IS-2005 Society Information Multi-Conference national t dto,1999. edition, 5th 2009) (ESWC Conference 2009. Web June Semantic Greece, European Heraklion, editors, 6th Stuckenschmidt, Heiner the and of Stevens, Meilicke, Nejdl, Hyv¨onen, Robert Christian Wolfgang Eero Viljanen, Motta, Kim In Enrico Davis, Brian networks. Haase, ontology Peter Handschuh, in Siegfried diagnosis and repair – at avail. 2004, December webarch/ 15 Recommendation W3C 2009. ISO, Standard International report, Technical Completeness. 2009. ISO, Standard International report, Technical Accuracy. 2009. ISO, Standard International report, Technical Provenance. Technical specification. data 2009. to ISO, conformance Standard International and report, encoding, semantic Syntax, 2009. ISO, Standard International report, Technical Vocabulary. 1998. ISO, Standard International report, Technical . ora fteACM the of Journal h lmnso rgamn Style Programming of Elements The ua’ ult Handbook Quality Juran’s 27183 1995. 42:741–843, , ora fWbSemantics: Web of Journal ()1413 ue2006. June 4(2):144–153, , ae 7–7,2005. 170–173, pages , http://www.w3.org/TR/ rceig f8hInter- 8th of Proceedings BIBLIOGRAPHY ae 863–867, pages , McGraw-Hill, . Proceedings McGraw- . 219</p><p>14 BIBLIOGRAPHY</p><p>Graham Klyne and Jeremy Carroll. Resource Description Framework (RDF): Con- cepts and abstract syntax. W3C Recommendation 10 February 2004, 2004.</p><p>Andrew Koenig. Patterns and antipatterns. Journal of Object-Oriented Programming, 8(1):46–48, March 1995.</p><p>Konstantinos Kotis, George A. Vouros, and Jer´onimoP. Alonso. HCOME: A tool- supported methodology for engineering living ontologies. Semantic Web and Databases, pages 155–166, 2005.</p><p>Chrysovalanto Kousetti, David Millard, and Yvonne Howard. A study of ontology convergence in a semantic wiki. In Ademar Aguiar and Mark Bernstein, editors, WikiSym 2008, September 2008.</p><p>Markus Kr¨otzsch, Denny Vrandeˇci´c,and Max V¨olkel. Wikipedia and the semantic web – the missing links. In Proceedings of Wikimania 2005 – The First International Wikimedia Conference, Frankfurt, Germany, July 2005. Wikimedia Foundation.</p><p>Markus Kr¨otzsch, Pascal Hitzler, Denny Vrandeˇci´c,and Michael Sintek. How to reason with OWL in a logic programming system. In Thomas Eiter, Enrico Franconi, Ralph Hodgson, and Susie Stephens, editors, Proceedings of the 2nd Interna- tional Conferenc on Rules and Rule Markup Languages for the Semantic Web (RuleML2006), pages 17–26, Athens, GA, USA, 11 2006. IEEE Computer Soci- ety.</p><p>Markus Kr¨otzsch, Denny Vrandeˇci´c,and Max V¨olkel. Semantic MediaWiki. In Is- abel Cruz, Stefan Decker, Dean Allemang, Chris Preist, Daniel Schwabe, Peter Mika, Mike Uschold, and Lora Aroyo, editors, Proceedings of the 5th International Semantic Web Conference (ISWC2006), volume 4273 of LNCS, pages 935–942, Athens, GA, USA, November 2006. Springer.</p><p>Markus Kr¨otzsch, Sebastian Rudolph, and Pascal Hitzler. Conjunctive queries for a tractable fragment of OWL 1.1. In Karl Aberer, Key-Sun Choi, and Natasha Noy, editors, Proc. 6th Int. Semantic Web Conf. (ISWC’07). Springer, 2007.</p><p>Markus Kr¨otzsch, Sebastian Schaffert, and Denny Vrandeˇci´c. Reasoning in semantic wikis. In Grigoris Antoniou, Uwe Assmann, Cristina Baroglio, Stefan Decker, Nicola Henze, Paula-Lavinia Patranjan, and Robert Tolksdorf, editors, Proceed- ings of the 3rd Reasoning Web Summer School, volume 4636 of LNCS, pages 310–329, Dresden, Germany, September 2007. Springer.</p><p>Markus Kr¨otzsch, Denny Vrandeˇci´c,Max V¨olkel, Heiko Haller, and Rudi Studer. Se- mantic wikipedia. Journal of Web Semantics, 5:251–261, September 2007.</p><p>220 t ¨sh eata uop,DnyVadˇic n uiSue.Tmu ui – fugit August Tempus 31 Studer. Vrandeˇci´c, Rudi Finding Denny and Rudolph, TAG L¨osch, Sebastian Draft Uta 2007. URIs, HTTP Dereferencing Lewis. Rhys Topic-specific Musen. A. Mark and Noy, F. Natalya and Supekar, Kaustubh insertions, Lewen, Holger deletions, correcting of capable codes Binary Levenshtein. Vladimir infrastructure. knowledge in investment large-scale A CYC: Lenat. B. Douglas In time. after Time Hyman. Rob and Lauper Cindy Lam. Joey eoa .MGies nooiscm fae nDee esl i Hendler, Jim Fensel, Dieter In age. of come Ontologies McGuiness. L. Deborah 1929. images, of treachery The Ren´e Magritte. In ontologies. between similarity Measuring Staab. Steffen and Maedche Alexander Lozano-Tello. Adolfo the choose Asunci´on to and G´omez-P´erez. method Lozano-Tello A Adolfo OntoMetric: rnigteWrdWd e oIsFl Potential Full Its to Web Wide World editors, the Wahlster, Bringing Wolfgang and Lieberman, Henry Management 2002 and 1-4, 2002. Acquisition October Springer, Knowledge Spain, Madrid, on EKAW-2002. Conference European - the Of Proc. 2002. Extremadura, Methods Analysis Systems Business 2004. of 15(2), Engineering and Evaluation, analysis, ical ontology. appropriate ACM of munications 2007. Aberdeen, of fte6hErpa eatcWbCneec EW 2009) (LNCS) (ESWC 2009. Science Conference 6 editors, Computer Web al., in Semantic et Notes European Aroyo Lecture Lora 6th In the language. of update ontology an towards HttpRange-14.html at avail. 2007, Web the 2006) for (WWW 2006. Conference Ontologies Mai Web of Wide UK, Evaluation World Edinburgh, International on In 15th Workshop the evaluation. at International ontology (EON2006) 4th for approach the An of systems: ings rating open and trust reversals. 1983. ehd o eovn nossece nontologies in inconsistencies resolving for Methods oitPyisDoklady Physics Soviet http://www.w3.org/2001/tag/doc/httpRange-14/2007-08-31/ ´ tiad dnia eontolog´ıas de idoneidad M´etrica de . 81)3–8 oebr1995. November 38(11):32–38, , ora fDtbs aaeetSeilIseo Ontolog- on Issue Special Management Database of Journal 08:0–1,1966. 10(8):707–710, , pigrVra elnHeidelberg, Berlin Springer-Verlag . h’ oUnusual So She’s oue27 of 2473 volume , I rs,2003. Press, MIT . h hss nvria de Universidad thesis, PhD . pnigteSmni Web: Semantic the Spinning h hss University thesis, PhD . BIBLIOGRAPHY oue55 of 5554 volume , pcRecords, Epic . LNCS/LNAI Proceedings Proceed- Com- 221 . , ,</p><p>14 BIBLIOGRAPHY</p><p>Alistair Miles and Sean Bechhofer. SKOS Simple Knowledge Organization System Reference, 2009. W3C Recommendation 18 August 2009, available at http: //www.w3.org/TR/skos-reference/.</p><p>Malgorzata Mochol, Anne Cregan, Denny Vrandeˇci´c,and Sean Bechhofer. Exploring owl and rules: a simple teaching case. International Journal of Teaching and Case Studies (IJTCS), 1(4):299–318, 11 2008.</p><p>Boris Motik and Ian Horrocks. Problems with OWL syntax. In OWLED2006 Second Workshop on OWL Experiences and Directions, Athens, GA, USA, 2006.</p><p>Boris Motik, Denny Vrandeˇci´c,Pascal Hitzler, York Sure, and Rudi Studer. dlpconvert - Converting OWL DLP statements to logic programs. In Heiner Stuckenschmidt, editor, Poster and Demonstration Proceedings of the ESWC 2005, 5 2005.</p><p>Boris Motik, Peter F. Patel-Schneider, and Bernardo Cuenca Grau. OWL2 Web Ontology Language: Direct semantics, 2009. W3C Recommendation 27 October 2009, available at http://www.w3.org/TR/owl2-semantics/.</p><p>Boris Motik, Peter F. Patel-Schneider, and Bijan Parsia. OWL2 Web Ontology Lan- guage: Structural specification and functional-style syntax, 2009. W3C Recom- mendation 27 October 2009, available at http://www.w3.org/TR/owl2-syntax/.</p><p>Boris Motik. Reasoning in Description Logics using Resolution and Deductive Databases. PhD thesis, Universit¨atFridericiana zu Karlsruhe (TH), Germany, 2006.</p><p>Boris Motik. On the properties of metamodeling in OWL1. Journal of Logic and Computation, 17(4):617–637, 2007.</p><p>Lyndon J. B. Nixon and Elena Paslaru Bontas Simperl. Makna and MultiMakna: towards semantic and multimedia capability in wikis for the emerging web. In Sebastian Schaffert and York Sure, editors, Proc. Semantics 2006. Osterreichische¨ Computer Gesellschaft, 2006.</p><p>Natalya F. Noy, R. Fergerson, and Mark Musen. The knowledge model of Prot´eg´e- 2000: Combining interoperability and flexibility. In Dieng and Corby (2002), pages 17–32.</p><p>Daniel Oberle, Steffen Lamparter, Stephan Grimm, Denny Vrandeˇci´c,Steffen Staab, and Aldo Gangemi. Towards ontologies for formalizing modularization and com- munication in large software systems. Applied Ontology, 1(2):163–202, 2006.</p><p>Leo Obrst, Werner Ceusters, Inderjeet Mani, Steve Ray, and Barry Smith. The eval- uation of ontologies. In Christopher J.O. Baker and Kei-Hoi Cheung, editors,</p><p>222 ee .PtlShedradBrsMtk W2WbOtlg agae Mapping Language: Ontology Web OWL2 Motik. Boris and Patel-Schneider F. Peter seman- a OntoKhoj: Park. K. E. and Lee, Yugyung Supekar, Kaustubh Patel, Chintan Pascal. In Blaise ontologies. OWL Debugging Kalyanpur. Aditya and Sirin, Evren Parsia, Bijan elPsmn agaeeuaini nweg context. knowledge a in education Language Postman. Neil In evaluation. ontology for approach task-based A Malaka. Rainer and Porzel Robert Pollock. T. Jeffery Plato. Pirsig. M. Robert Report Technical Languages. Identifying for Tags Davis. Mark and Phillips Addison RFC Report Technical Tags. Language On- of Matching Davis. Web Mark and OWL Phillips Addison Horrocks. Ian and Hayes, Patrick Patel-Schneider, F. Peter eouinzn nweg icvr nteLf Sciences 2007. Life Springer, the 158. in Discovery Knowledge Revolutionizing oRFgah,20.WCRcmedto 7Otbr20,aalbeat available 2009, October 27 Recommendation W3C 2009. http://www.w3.org/TR/owl-mapping-to-rdf/ graphs, RDF to management 2003. data USA, and NY, information York, New Web 58–61, on In pages Workshop classification. International and ranking ACM searching, Fifth ontology for portal web tic (WWW2005) Conference 2005. Web May Wide World 14th the of Proceedings Semantics fEA 04Wrso nOtlg erigadPopulation and Learning Ontology 2004. on August Workshop 2004 editors, ECAI Magnini, of Bernardo and Handschuh, Siegrfried Buitelaar, Paul 1984. Bantam, (available 4646 RFC 2006. at September Force, Task Engineering Internet 4646, RFC at (available 4647 RFC 2006. Recom- September Force, http://www.ietf.org/rfc/rfc4647.txt Task W3C Engineering Internet 4647, at 2004. available Syntax, 2004, Abstract February REC-owl-semantics-20040210/ and 10 Semantics mendation Language tology Phaedrus http://www.ietf.org/rfc/rfc4646.txt 71:53,1980. 37(1):25–37, , Pens´ees xodUiest rs,30B.tasae yBnai Jowett. Benjamin by translated BC. 370 Press, University Oxford . e n h r fMtryl aneac:A nur noValues into Inquiry An Maintenance: Motorcycle of Art the and Zen eatcWbfrDummies for Web Semantic 1670. . . ). ie,2009. Wiley, . ). . http://www.w3.org/TR/2004/ T:ARve fGeneral of Review A ETC: hpe ,pgs139– pages 7, chapter , BIBLIOGRAPHY aeca Spain, Valencia, , hb,Japan, Chiba, , rceig of Proceedings Proceedings 223 , .</p><p>14 BIBLIOGRAPHY</p><p>Eric Prud’hommeaux and Andy Seaborne. SPARQL query language for RDF. W3C Recommendation 15 January 2008, January 2008. available at http://www.w3. org/TR/rdf-sparql-query/.</p><p>Charbel Rahhal, Hala Skaf-Molli, Pascal Molli, and St´ephane Weiss. Multi- synchronous collaborative semantic wikis. In Gottfried Vossen, Darrell D. E. Long, and Jeffrey Xu Yu, editors, Proceedings of the International Conference on Web Information Systems Engineering (Wise 2009), volume 5802 of LNCS, Poznan, Poland, October 2009.</p><p>Eric Raymond. The Cathedral and the Bazaar. O’Reilly, Sebastapol, CA, re- vised edition, January 2001. Available at http://catb.org/~esr/writings/ cathedral-bazaar/.</p><p>Alan Rector. Representing specified values in OWL: ”value partitions” and ”value sets”. W3C Working Group Note, May 2005. available at http://www.w3.org/ TR/swbp-specified-values/.</p><p>Sebastian Rudolph, Johanna V¨olker, and Pascal Hitzler. Supporting lexical ontology learning by relational exploration. In Uta Priss, Simon Polovina, and Richard Hill, editors, Proceedings of the Conference on Conceptual Structures: Knowledge Architectures for Smart Applications (ICCS 2007), volume 4604 of LNAI, pages 488–491, Sheffield, UK, July 2007. Springer.</p><p>Marta Sabou, Jorge Gracia, Sofia Angeletou, Mathieu d’Aquin, and Enrico Motta. Evaluating the semantic web: A task-based approach. In Karl Aberer, Key- Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudr´e-Mauroux,editors, Proceedings of the 6th Inter- national Semantic Web Conference and 2nd Asian Semantic Web Conference (ISWC2007+ASWC2007), volume 4825 of Lecture Notes in Computer Science, pages 423–437, Busan, South Korea, November 2007. Springer.</p><p>Leo Sauermann and Richard Cyganiak. Cool URIs for the semantic web, March 2008. W3C Interest Group Note, available at http://www.w3.org/TR/cooluris/.</p><p>Andrea Schaerf. Reasoning with individuals in concept languages. Data and Knowledge Engineering, 13(2):141–176, September 1994.</p><p>Sebastian Schaffert, Julia Eder, Szaby Gr¨unwald, Thomas Kurz, Mihai Radulescu, Rolf Sint, and Stephanie Stroka. KiWi - a platform for semantic social software. In Christoph Lange, Sebastian Schaffert, Hala Skaf-Molli, and Max V¨olkel, editors, 4th Workshop on Semantic Wikis (SemWiki2009) at the European Semantic Web</p><p>224 ve ii,BjnPri,Brad unaGa,Aiy aynu,adYarden and Kalyanpur, Aditya Grau, Cuenca Bernardo Parsia, Bijan Sirin, Evren Patel- Peter In OWL. in constraints integrity Towards Tao. Jiao and Sirin Evren Shakespeare. William Evans. Colin and Taylor, Jamie Segaran, Toby Case Wikipedia: to Vrandeˇci´c. Denny features and new Schindler Introducing Mathias manage- knowledge collaborative for wiki semantic A IkeWiki: Schaffert. Sebastian tffnSab ihe rmn,adAeadrMece niern ontologies Engineering Maedche. Alexander and Erdmann, Michael Staab, Steffen Star 09, Report Technical texts. from mined triples Assessing EvaLexon: Spyns. Peter wiki. semantic a Building Souzis. Adam Lan- Ontology Web OWL McGuinness. Deborah and Welty, Chris Smith, K. Michael new a Ontology—towards introduction: FOIS Welty. Christopher and Smith Barry fteWrso nEBsns h nelgn e tteItrainlJoint International the at Web 2001) Intelligent (IJCAI the Intelligence Artificial & on E-Business editors, Conference O’Leary, on Daniel Workshop and the Preece of Alun In patterns. semantic using 2005. May Belgium, Brussels, Lab, available 2004, February 10 Recommendation W3C at 2004. February Guide, guage Press. ACM 2001) 2001. (FOIS 2001 Systems Information in In synthesis. Semantic International 8th 2009) the (ISWC at Conference 2009) Web (OWLED 2009 Directions and periences 2009. July O’Reilly, etme coe 2005. October / September 2007. reasoner. June OWL-DL 53, practical A Pellet: Katz. editors, Hoekstra, Rinke and Schneider Sience. Web for studies In- Technologies: Enabling on 2006) 2006. Workshops (WETICE Enterprises International Collaborative IEEE for frastructures 15th (STICA the Applications Collaborative at in editors, Technologies 2006) Schild, Semantic Klaus on and Workshop Simperl, tional Elena Tolksdorf, Robert In ment. 2009. ofrne(SC2009 (ESWC Conference http://www.w3.org/TR/owl-guide/ rceig fte2ditrainlcneec nFra Ontology Formal on conference international 2nd the of Proceedings oe n Juliet and Romeo EEItlietSystems Intelligent IEEE oue66of 646 volume , coe 2009. October , onDne,Lno,1597. London, Danter, John . . EEItlietSystems Intelligent IEEE rceig fteWrso W Ex- OWL Workshop the of Proceedings ae I–X gnut E October ME, Ogunquit, III–IX, pages , CEUR-WS ora fWbSemantics Web of Journal rgamn h eatcWeb Semantic the Programming 00 oappear. to 2010. , etl,W,Ags 2001. August WA, Seattle, , ealin ree June Greece, Herakleion, , ae 8–9,June 388–396, pages , BIBLIOGRAPHY 20(5):87–91, , s Interna- 1st Proceedings 5(2):51– , 225 .</p><p>14 BIBLIOGRAPHY</p><p>Ljiljana Stojanovi´c. Methods and Tools for Ontology Evolution. PhD thesis, Universit¨at Karlsruhe (TH), Karlsruhe, Germany, August 2004.</p><p>Peter F. Strawson. Entity and identity. In Hywel David Lewis, editor, Contemporary British Philosophy Fourth Series. Allen and Unwin, London, England, 1976.</p><p>York Sure and Rudi Studer. On-To-Knowledge methodology. In John Davies, Dieter Fensel, and Frank van Harmelen, editors, On-To-Knowledge: Semantic Web en- abled Knowledge Management, chapter 3, pages 33–46. J. Wiley and Sons, Novem- ber 2002.</p><p>York Sure, J¨urgen Angele, and Steffen Staab. OntoEdit: Multifaceted inferencing for ontology engineering. In Stefano Spaccapietra, Salvatore March, and Karl Aberer, editors, Journal on Data Semantics I, volume 2800 of LNCS, pages 128– 152. Springer, October 2003.</p><p>York Sure, Christoph Tempich, and Denny Vrandeˇci´c. SEKT methodology: Final description including guidelines, best practices, and lessons learned. SEKT De- liverable 7.2.2, Institute AIFB, University of Karlsruhe, January 2007.</p><p>Vojtech Svatek. Design Patterns for Semantic Web Ontologies: Motivation and Discus- sion. In Witold Abramowicz, editor, Proceedings of the 7th International Confer- ence on Business Information Systems (BIS 2004), Poznan, Poland, April 2004.</p><p>John Swartzwelder. Homer the Smithers. The Simpsons, 7(17), February 1996.</p><p>Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, and Boanerges Aleman-Meza. OntoQA: Metric-based ontology quality analysis. In Doina Caragea, Vasant Honavar, Ion Muslea, and Raghu Ramakrishnan, editors, Pro- ceedings of IEEE Workshop on Knowledge Acquisition from Distributed, Au- tonomous, Semantically Heterogeneous Data and Knowledge Sources at Fifth IEEE International Conference on Data Mining (ICDM 2005), pages 45–53, November 2005.</p><p>Christoph Tempich, Helena Sofia Pinto, York Sure, and Steffen Staab. An argu- mentation ontology for DIstributed, Loosely-controlled and evolvInG Engineer- ing processes of oNTologies (DILIGENT). In Asunci´onG´omez-P´erezand J´erˆome Euzenat, editors, Proceedings of the Second European Semantic Web Conference (ESWC 2005), volume 3532 of LNCS, pages 241–256, Heraklion, Greece, May / June 2005.</p><p>Ivan Terziev, Atanas Kiryakov, and Dimitar Manov. Base upper-level ontology (BULO) guidance. SEKT deliverable 1.8.1, Ontotext Lab, Sirma AI EAD (Ltd.), July 2005.</p><p>226 al ead,RbroNvgi lsadoCchael,adFacsaNr.Evalua- Neri. Francesca and Cucchiarelli, Alessandro Navigli, Roberto Velardi, con- Paola seamless for semantics and Ontologies Gruninger. Michael and Uschold Michael Consortium. Technical Unicode 4. The Revision Use, Statistical for Codes Area or Country Standard M.49. UN de- System reasoner: logic description FaCT++ Horrocks. Ian and Tsarkov universe. Dmitry bibliographic the for model conceptual A – FRBR? is What Tillett. Barbara oan ¨le,DnyVadˇic okSr,adAdesHto erigdisjoint- Learning Hotho. Andreas Vrandeˇci´c, V¨olker, and Denny Johanna Sure, York ontologies of evaluation Automatic Vrandeˇci´c, V¨olker, Denny Johanna Sure. York and Se- Studer. Rudi Vrandeˇci´c, and Kr¨otzsch, V¨olkel, Denny Haller, Markus Max Heiko ”-ilities”. The sauce: secret Software’s Voas. Jeffrey otn A oebr2006. November MA, Boston, ir nAtfiilItliec n Applications and Intelligence editors, Artificial Magnini, Evaluation in and Bernardo Applications tiers and Methods, Cimiano, Text: ontologies. from Philipp domain Learning of Buitelaar, population Paul automatic for In methodology a OntoLearn, of tion nectivity. 1998. UNSD, Division Statistics Nations United 98.XVII.9, Report 2006) (IJCAR Reasoning Automated of on 4130 editors, Conference volume Shankar, Joint Natarajan International and Furbach Third Ulrich In scription. Journal Library Australian The fte4hErpa eatcWbCneec EW 2007) (ESWC Conference Web Semantic editors, May, European LNCS Wolfgang 4th and the Kifer, Michael of Franconi, Enrico In ness. A. Mark and Springer. 2005. Benjamins, Richard 2005) V. (ISWC Motta, Enrico editors, Gil, Musen, Yolanda In (AEON). ACM. Goble, (WWW2006) A. Web Carole Wide Iyengar, World Arun editors, Roure, Dahlin, De David Michael Carr, and Les In Wikipedia. mantic 2004. December / November 2005. ae 7–8,Inbuk uti,Jn 07 Springer. 2007. June Austria, Innsbruck, 175–189, pages , IMDRecord SIGMOD oue32 of 3729 volume , rceig fteFut nentoa eatcWbConference Web Semantic International Fourth the of Proceedings LNAI ae 9–9,Sate A uut20.Springer. 2006. August WA, Seattle, 292–297, pages , h ncd tnad eso 5.0.0 Version Standard, Unicode The 34:86,Dcme 2004. December 33(4):58–64, , rceig fte1t nentoa ofrneon conference international 15th the of Proceedings LNCS 41,Fbur 2005. February 54(1), , ae 9–9,Eibrh ctad a 2006. May Scotland, Edinburgh, 491–495, pages , ae 1–3,Gla,Iead November Ireland, Galway, 716–731, pages , ae 216 O rs,July Press, IOS 92–106. pages , EESoftware IEEE oue13of 123 volume , Addison-Wesley, . BIBLIOGRAPHY rceig fthe of Proceedings oue41 of 4519 volume , 21(6):14–15, , Proceedings Ontology Fron- 227 ,</p><p>14 BIBLIOGRAPHY</p><p>Johanna V¨olker, Denny Vrandeˇci´c, Andreas Hotho, and York Sure. AEON - an ap- proach to the automatic evaluation of ontologies. Applied Ontology, 3(1-2):41–62, January 2008.</p><p>Jakob Voss. Collaborative thesaurus tagging the Wikipedia way. CoRR, ab- s/cs/0604036, April 2006. Available at http://arxiv.org/abs/cs/0604036.</p><p>Denny Vrandeˇci´c,H. Sofia Pinto, York Sure, and Christoph Tempich. The DILIGENT knowledge processes. Journal of Knowledge Management, 9(5):85–96, October 2005.</p><p>Denny Vrandeˇci´cand Aldo Gangemi. Unit tests for ontologies. In Mustafa Jarrar, Claude Ostyn, Werner Ceusters, and Andreas Persidis, editors, Proceedings of the 1st International Workshop on Ontology content and evaluation in Enterprise (OntoContent 2006) at On the Move Federated Conferences (OTM2006), volume 4278 of LNCS, pages 1012–1020, Montpellier, France, October 2006. Springer.</p><p>Denny Vrandeˇci´cand Markus Kr¨otzsch. Reusing ontological background knowledge in semantic wikis. In Max V¨olkel, Sebastian Schaffert, and Stefan Decker, editors, Proceedings of the 1st Workshop on Semantic Wikis – From Wikis to Semantics (SemWiki2006) at the 3rd European Semantic Web Conference (ESWC 2006), volume 206 of CEUR-WS, June 2006.</p><p>Denny Vrandeˇci´cand York Sure. How to design better ontology metrics. In Enrico Franconi, Wolfgang May, and Michael Kifer, editors, Proceedings of the 4th Eu- ropean Semantic Web Conference (ESWC 2007), volume 4519 of LNCS, pages 311–325, Innsbruck, Austria, June 2007. Springer.</p><p>Denny Vrandeˇci´c,Mari del Carmen Su´arez-Figueroa,Aldo Gangemi, and York Sure, editors. Proceedings of the 4th International Workshop on Evaluation of Ontolo- gies for the Web (EON2006) at the 15th International World Wide Web Confer- ence (WWW 2006), volume 179 of CEUR-WS, Edinburgh, Scotland, May 2006.</p><p>Denny Vrandeˇci´c,York Sure, and Christoph Tempich. SEKT methodology: Initial lessons learned and tool design. SEKT Deliverable 7.2.1, Institute AIFB, Univer- sity of Karlsruhe, January 2006.</p><p>Denny Vrandeˇci´c,Johanna V¨olker, Peter Haase, Duc Thanh Tran, and Philipp Cimi- ano. A metamodel for annotations of ontology elements in OWL DL. In York Sure, Saartje Brockmans, and J¨urgen Jung, editors, Proceedings of the Second Workshop on Ontologies and Meta-Modeling, Karlsruhe, Germany, October 2006. GI Gesellschaft f¨urInformatik.</p><p>228 awiDvdWn n ia asa nooypromnepoln n model and profiling performance Ontology Parsia. Bijan and Wang David Taowei Wachowski. Paul and Wachowski Laurence In wikis. semantic in checks quality content Vrandeˇci´c. automatic Towards editors, Denny Staab, Steffen and Studer Rudi In ACTIVE Vrandeˇci´c. evaluation. Denny Ontology prototypes. early – repair and leveraging Vrandeˇci´c. Knowledge Denny and Sure, G´omez-P´erez, Asunci´on Vrandeˇci´c, York Garc´ıa-Castro, Ra´ul Denny en rnec´.Epii nweg niern atrswt ars nChris In macros. with patterns engineering knowledge Vrandeˇci´c. Explicit Denny RDF Erdmann. Michael and Rudholph, Vrandeˇci´c, Sebastian Denny Dengler, Frank reposi- Ontology Santana. Francisco and Palma, Raul Vrandeˇci´c,Denny Sure, York oebr20.Springer. 2007. November 2003. November nead2dAinSmni e ofrne(ISWC2007+ASWC2007) Conference of Web 4825 Semantic Asian Fridman 2nd Cudr´e- and Natasha Philippe ence and Choi, Peter Schreiber, editors, Golbeck, Guus Key-Sun Jennifer Mauroux, Mizoguchi, Nixon, Aberer, Riichiro B. Maynard, J. Karl Diana Lyndon Mika, Lee, In Kyung-Il Allemang, Dean steps. Noy, First examination: Press. 3.0 AAAI Web 2009. editors, March Meets Bojars, Uldis 2.0 and Web Bao, Where Jie Ding, Li Greaves, Mark edition 2nd Ontologies, on Handbook 2009. March Karlsruhe, of University AIFB, Institute D1.4.1, Deliverable Semantic International 6th the at (EON2007) 2007) Web (ISWC Conference the Web for Ontologies of uation editors. Huang, Zhisheng eatcWbWrso OS20)a h t nentoa eatcWeb Semantic International 8th the 2005) at (ISWC (OPSW2005) Conference Workshop Web Semantic 2007. November et n loGnei editors, Gangemi, Aldo and Welty (ISWC Conference Web Semantic International 8th and the 2009) at Lassila, 2009) Ora (SemRUs Kagal, 2009 Lalana editors, In Finin, validation. Tim XML using normalization syntax Institute 2007. D1.2.10v2, June Deliverable Karlsruhe, Web of Knowledge University AIFB, evaluation. content and tories hnil,V,Nvme 2009. November VA, Chantilly, , etr oe nCmue Science Computer in Notes Lecture rceig fte6hItrainlSmni e Confer- Web Semantic International 6th the of Proceedings rceig fteWrso eatc o h eto Us of Rest the for Semantics Workshop the of Proceedings awy rln,Nvme 2005. November Ireland, Galway, , rceig fte5hItrainlWrso nEval- on Workshop International 5th the of Proceedings oue39of 329 volume , AISrn ypsu 09 tnod CA, Stanford, 2009, Symposium Spring AAAI , rceig fteOtlg atrsfrthe for Patterns Ontology the of Proceedings hpe 3 pigr uut2009. August Springer, 13. chapter , arxRevolutions Matrix ae 9–0,Bsn ot Korea, South Busan, 595–608, pages , CEUR-WS ua,SuhKorea, South Busan, , anrBo. USA, Bros., Warner . oilSmni Web: Semantic Social BIBLIOGRAPHY volume , 229</p><p>14 BIBLIOGRAPHY</p><p>Taowei David Wang, Bijan Parsia, and James A. Hendler. A survey of the web ontology landscape. In Isabel Cruz, Stefan Decker, Dean Allemang Chris Preist, Daniel Schwabe, Peter Mika, Michael Uschold, and Lora Aroyo, editors, Proceedings of the Fifth International Semantic Web Conference (ISWC’06), volume 4273 of Lecture Notes in Computer Science, pages 682–694, Athens, Georgia, November 2006. Springer.</p><p>Roger Waters and David Gilmour. Wish You Were Here. In Pink Floyd – Wish You Were Here. Harvest Records, September 1975.</p><p>Chris Welty. OntOWLClean: Cleaning OWL ontologies with OWL. In Brandon Bennet and Christiane Fellbaum, editors, Proceedings of the Fourth Interna- tional Conference on Formal Ontologies in Information Systems (FOIS 2006), volume 150 of Frontiers in Artificial Intelligence and Applications, Baltymore, MD, November 2006. IOS Press.</p><p>Takashi Yamauchi. The semantic web and human inference: A lesson from cognitive science. In Karl Aberer, Key-Sun Choi, Natasha Fridman Noy, Dean Allemang, Kyung-Il Lee, Lyndon J. B. Nixon, Jennifer Golbeck, Peter Mika, Diana May- nard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudr´e-Mauroux, editors, Proceedings of the 6th International Semantic Web Conference and 2nd Asian Se- mantic Web Conference (ISWC2007+ASWC2007), volume 4825 of LNCS, pages 609–622, Busan, South Korea, November 2007. Springer.</p><p>Jonathan Yu, James A. Thom, and Audrey Tam. Ontology evaluation using Wikipedia categories for browsing. In Alberto H. F. Laender, Andr´eO. Falc¯ao,Øystein Haug Olsen, M´arioJ. Silva, Ricardo Baeza-Yates, Deborah L. McGuinness, and Bjorn Olstad, editors, Proceedings of the ACM Sixteenth Conference on Information and Knowledge Management (CIKM), pages 223–232, Lisboa, Portugal, November 2007. ACM.</p><p>Evgeny Zolin. Complexity of reasoning in description logics. http://www.cs.man. ac.uk/~ezolin/dl/, 2010. Last accessed: 14 January 2010.</p><p>230 23 13 11 7 Preliminaries and Terminology 5 2 Introduction 1 Foundations I Abstract Acknowledgements Contents of Table Full ...... 31 ...... 29 ...... 25 ...... 23 ...... Semantics ...... 2.4 ...... 20 . . 18 ...... Entities ...... 2.3 ...... 16 ...... Axioms ...... 14 2.2 . Ontologies ...... 2.1 ...... publications . . previous . . to . . Relation ...... 1.4 ...... guide . Readers’ . . . . 1.3 ...... Contribution . . . 1.2 ...... Motivation 1.1 ...... 31 30 30 ...... 30 ...... 29 ...... 25 . . 28 . . . 26 ...... Ontologies ...... 20 . . 19 . . 2.3.4 . Properties ...... 2.3.3 . . Classes ...... 2.3.2 18 Individuals ...... 2.3.1 ...... Annotations ...... 17 axioms . . 2.2.4 . Property ...... axioms . . 2.2.3 . Class ...... 2.2.2 . Facts ...... 2.2.1 ...... 17 ...... 16 . . . . 16 ...... 15 . Application ...... 15 . 1.3.3 . . Aspects ...... 1.3.2 . Foundations ...... 1.3.1 ...... Implementation . . evaluation . . . ontology . . . for . . 1.2.3 evaluation Methods . ontology . . for . . . framework . 1.2.2 A ...... 1.2.1 . . costs . . maintenance Lower . . availability . ontology . 1.1.3 Increasing ontologies better of 1.1.2 Advantages 1.1.1 231</p><p>14 Full Table of Contents</p><p>3 Framework 37 3.1 Overview ...... 37 3.2 Meta-ontology...... 39 3.2.1 Reifying ontologies...... 39 3.2.2 Reifying URIs...... 41 3.2.3 Advantages of a meta-ontology...... 41 3.3 Types of ontologies...... 42 3.3.1 Terminological ontology...... 42 3.3.2 Knowledge base...... 43 3.3.3 Semantic spectrum...... 44 3.3.4 Classification example...... 47 3.4 Limits...... 48 3.5 Conceptualizations...... 49 3.6 Criteria ...... 53 3.6.1 Accuracy ...... 56 3.6.2 Adaptability ...... 56 3.6.3 Clarity...... 57 3.6.4 Completeness...... 57 3.6.5 Computational efficiency...... 58 3.6.6 Conciseness...... 58 3.6.7 Consistency...... 59 3.6.8 Organizational fitness ...... 59 3.7 Methods...... 60 3.8 Aspects ...... 61</p><p>II Aspects 63</p><p>4 Vocabulary 65 4.1 URI references ...... 65 4.1.1 Linked data...... 66 4.1.2 Hash vs slash...... 70 4.1.3 Opaqueness of URIs...... 72 4.1.4 URI reuse...... 73 4.1.5 URI declarations and punning ...... 75 4.2 Literals ...... 75 4.2.1 Typed literals and datatypes ...... 76 4.2.2 Language tags ...... 78 4.2.3 Labels and comments ...... 80 4.3 Blank nodes...... 81</p><p>232 127 99 Semantics 7 83 Structure 6 Syntax 5 ...... 136 . . . . 141 ...... 128 ...... completeness . . Language . . . . 7.3 . Stability . . 7.2 . . . . 115 ...... Normalization . . 7.1 . . . . . 107 ...... approach . AEON . The 100 . . 6.3 ...... patterns . finding . for SPARQL . . 6.2 . 85 . . 85 ...... 84 . . practice . . in . metrics . Structural ...... 6.1 ...... validation . XML . . . . names 5.3 Qualified . comments 5.2 Syntactic 5.1 ...... 134 134 . . . . 133 . . . 132 ...... 131 ...... 130 ...... 122 ...... 120 ...... normalization . . . . of . . . . Examples ...... 119 . . . . . normalization . . . 7.1.6 . . Fifth ...... 117 . normalization . 7.1.5 Fourth ...... 118 . . . . . normalization . . . 7.1.4 . . Third ...... normalization . . 7.1.3 . Second ...... normalization 111 . . . 7.1.2 . . First 113 ...... 7.1.1 ...... 110 ...... 110 ...... Examples ...... and . . . . Analysis ...... checking . . . 6.3.5 . . Constraint ...... constraints . . 6.3.4 . OntoClean ...... 104 . . meta-properties . . . 106 107 6.3.3 104 OntoClean ...... theory . . . . . in . . . 6.3.2 . OntoClean ...... 6.3.1 ...... anti-patterns . . . . for . . . . Querying ...... 102 . searches . . 6.2.5 . . . Incomplete ...... normalization . . 6.2.4 Ontology ...... 96 SPARQL . . . with RDF . 95 6.2.3 . . Querying . to . OWL . . . from . . . . patterns . 6.2.2 Translating ...... 89 . . . . 6.2.1 ...... 87 . measure . . . . similarity . . Semantic ...... 93 ...... richness . . . 6.1.4 . . Relationship . 92 ratio . . . . relation . . . . . / . . 6.1.3 . taxonomy Class . 98 . . the . . . . . of . . . . depth . . 6.1.2 . Maximum ...... 6.1.1 ...... validation . . . . . RDF . . . schema-based . . . . XML . . . . . for . . . questions . . . . Open ...... approaches . . . 5.3.7 . . Related ...... 5.3.6 . . Implementation . schemas . . compliant . . of . 5.3.5 . Creation . . serialization . the . 5.3.4 . Normalizing . . . 5.3.3 . Motivation . . 5.3.2 Example 5.3.1 ulTbeo Contents of Table Full 233</p><p>14 Full Table of Contents</p><p>8 Representation 143 8.1 Ontological metrics...... 143 8.2 Maximum depth of the taxonomy...... 145 8.3 Class / relation ratio...... 146 8.4 Relationship richness...... 147 8.5 Semantic similarity measure...... 149</p><p>9 Context 151 9.1 Unit tests...... 152 9.1.1 Formalized competency questions ...... 154 9.1.2 Affirming derived knowledge ...... 156 9.1.3 Asserting agnosticism ...... 158 9.2 Increasing expressivity for consistency checking ...... 158 9.2.1 Expressive consistency checks...... 159 9.2.2 Consistency checking with rules...... 160 9.2.3 Use of autoepistemic operators...... 161 9.2.4 Domain and ranges as constraints ...... 163</p><p>III Application 165</p><p>10 Collaborative ontology evaluation in Semantic MediaWiki 167 10.1 Annotation of wiki pages ...... 169 10.1.1 Content structuring in MediaWiki ...... 169 10.1.2 Semantic annotations in SMW ...... 170 10.1.3 Mapping to OWL ...... 173 10.2 Exploiting semantics...... 174 10.2.1 Browsing ...... 174 10.2.2 Querying ...... 175 10.2.3 Giving back to the Web...... 178 10.3 Related systems...... 179 10.4 Collaborative ontology evaluation...... 180 10.4.1 Concept cardinality ...... 180 10.4.2 Class disjointness...... 181 10.4.3 Property cardinality constraints...... 182 10.4.4 Social and usability aspects...... 183</p><p>11 Related work 185 11.1 Frameworks and aspects...... 185 11.2 Methods and approaches...... 190 11.2.1 Golden standard – similarity-based approaches ...... 192</p><p>234 209 230 207 205 203 201 197 Contents of Table Full Bibliography Figures of List Tables of List Methods of List Appendix IV Conclusions 12 ...... 199 . 198 ...... questions . Open . 12.2 . Achievements 12.1 ...... 196 ...... corpus Watson 11.3 ...... 193 . . . . . 194 ...... set . data . the fitting . – evaluation . Data-driven . 11.2.3 . evaluations Task-based 11.2.2 ulTbeo Contents of Table Full 235</p><p>14</p> </div> </article> </div> </div> </div> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.6.1/jquery.min.js" crossorigin="anonymous" referrerpolicy="no-referrer"></script> <script> var docId = 'd93af3f0a381c0ed11807b2c9ebf12d2'; var endPage = 1; var totalPage = 235; var pfLoading = false; window.addEventListener('scroll', function () { if (pfLoading) return; var $now = $('.article-imgview .pf').eq(endPage - 1); if (document.documentElement.scrollTop + $(window).height() > $now.offset().top) { pfLoading = true; endPage++; if (endPage > totalPage) return; var imgEle = new Image(); var imgsrc = "//data.docslib.org/img/d93af3f0a381c0ed11807b2c9ebf12d2-" + endPage + (endPage > 3 ? ".jpg" : ".webp"); imgEle.src = imgsrc; var $imgLoad = $('<div class="pf" id="pf' + endPage + '"><img src="/loading.gif"></div>'); $('.article-imgview').append($imgLoad); imgEle.addEventListener('load', function () { $imgLoad.find('img').attr('src', imgsrc); pfLoading = false }); if (endPage < 5) { adcall('pf' + endPage); } } }, { passive: true }); if (totalPage > 0) adcall('pf1'); </script> <script> var sc_project = 11552861; var sc_invisible = 1; var sc_security = "b956b151"; </script> <script src="https://www.statcounter.com/counter/counter.js" async></script> </html>