Evaluating Ontology Completeness Via SPARQL and Relations-Between-Classes Based Constraints Philippe Martin
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating Ontology Completeness via SPARQL and Relations-between-classes based Constraints Philippe Martin To cite this version: Philippe Martin. Evaluating Ontology Completeness via SPARQL and Relations-between-classes based Constraints. 11th International Conference on the Quality of Information and Communica- tions Technology, Sep 2018, Coimbra, Portugal. pp.255-263. hal-01924566 HAL Id: hal-01924566 https://hal.univ-reunion.fr/hal-01924566 Submitted on 16 Nov 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Evaluating Ontology Completeness via SPARQL and Relations-between-classes based Constraints Philippe A. Martin EA2525 LIM, Uni. of La Réunion (and adjunct researcher of the School of ICT, Griffith University, Australia) F-97490 Sainte Clotilde, France Abstract—This article first distinguishes constraints from Evaluating this degree is common in various tasks or fields but rules, and descriptive constraints from prescriptive ones. Both is performed differently by different tools and sometimes in kinds can be used to calculate a constraint-based completenesses implicit or ad hoc ways. Examples of such tasks or fields are: (as opposed to a real-world-based completeness), i.e. evaluating i) the automatic/manual extraction of knowledge or the creation how much of a knowledge base is complete with respect to some of a KB, ii) the exploitation of ontology design patterns, KB constraints, e.g. for evaluating how well this base follows given design libraries (e.g., the KADS library) or top-level ontologies ontology design patterns or best practices. Such evaluations may (e.g., DOLCE), and iii) the evaluation of ontologies or, more also guide knowledge elicitation and modelisation. This article generally, datasets. In this last field, as noted in [2], completeness explores the ways constraints can be represented via relations commonly refers to a degree to which the “information required between classes, hence via any knowledge representation language to satisfy some given criteria or a given query” are present in the (KRL) that has an expressiveness at least equal to RDF or RDFS. considered dataset. To complement this very general definition, Compared to the popular practice of both representing and this article distinguishes two kinds of completeness: checking constraints via queries, this approach is as simple, offers more possibilities for exploiting both knowledge and constraints, Constraint-based completeness measures the percentage and permits the selection and use of inference engines adapted to of elements in a dataset that satisfy explicit the expressiveness of the exploited knowledge instead of the use representations of what must or must not be represented of restricted or ad hoc constraint-validation tools. This approach in the dataset. These representations are constraints such is also modular in the sense it separates content from usage: the as integrity constraints or, more generally, those represented “content focused constraints” can then be exploited expressed by ontology design patterns and schemas of via few “content independent” queries, one for each usage and databases or of structured documents. E.g.: the constraint kind of constraint. This approach provides more possibilities. that, in a particular dataset, at least one movie must be associated to each movie actor. Keywords—constraints, ontology completeness, OWL, SPARQL Real-world-based completeness measures the degree to which certain real-world information are represented in I. INTRODUCTION the dataset. E.g., regarding movies associated to an actor, Knowledge representations (KRs) are formal descriptions calculating the completeness may consist in dividing enabling automatic logical inferencing, and thus automatic KR “the number of movies associated to this actor in the comparison, search, merge, etc. KRs are logic formulas, e.g. the dataset” by “the number of movies he actually played in”. binary predicates of 1st-order logic; these predicates are called Either the missing information are found in a gold triples or property instances in RDF and binary relations in standard dataset or the degree is estimated via Conceptual Graphs (CGs) [1]. For the purpose of clarity, this completeness oracles [3], i.e. rules or queries estimating article uses the intuitive terminology of CGs: (information) what is missing in the dataset to answer a given query objects are either types or individuals, and types are either correctly. The four kinds of completeness collected by [2] relation types or concept types (classes and datatypes in RDF). – schema/property/population/interlinking completeness A formal knowledge base (KB) is a collection of such objects – assume a closed-world-assumption and a gold standard written using a KR language (KRL). An ontology is a KB that is dataset. Thus, they are real-world based completenesses. essentially about types, rather than about individuals. One way to define or calculate the constraint-based Creating a KB or evaluating its quality – for knowledge completeness of a KB is to divide “the number of statements sharing or exploitation purposes, or for designing or generating satisfying the constraints in that KB” by “the total number of software, or evaluating their qualities – are difficult. Models and statements in the KB”. As a variant, instead of statements only, constraints (e.g. design patterns) help these tasks and can be one may want to consider objects, i.e. measure the percentage of stored into an ontology. E.g., the author of this article is building objects for which all relations from/to them satisfy the an ontology representing and organizing ontology design constraints. Other variants may be defined by considering only patterns as well as software design patterns. Reference [2], a certain kinds of objects or statements. Defining constraints via survey on quality assessment for Linked Data, provides many KRs, instead of via queries, permits the definition of “content- dimensions and metrics for evaluating the quality of KBs and independent (alias, domain-independent) queries” to exploit hence helping the selection or design of KBs. One of the quality these constraints. Otherwise, a different (content-dependent) dimensions is the (degree of) completeness of a KB with respect query has to be created for each variant of constraint based to some criteria or constraints: concisely, “its completeness”. checking or completeness. Because of this lack of modularity, XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE when stored in an ontology, content-dependent queries are also constraints can only be used for checking statements, i.e. that less easily organized than content-independent ones. they are not rules allowing the derivation of non-modal statements. More formally, this means that such positive and This article does not address real-world based completeness negative constraints can respectively be translated into the but the techniques this article proposes may also be used for forms “A ∧ ¬B =>> false” and “A ∧ B =>> false” where A and B representing certain domain-specific parts of the rules used for do not contain a “must” modality and A may be empty. As an calculating real-world based completeness. From now on, example, consider the positive constraint “if x is a Person, x “completeness” refers to constraint-based completeness. must have a parent”. From this constraint and the fact “Tom is a Section II explores the first research question of this article: Person”, an inference engine must not derive “Tom has a what does the expression “must and must not be represented in parent”. It may derive “Tom must have a parent” but, in practice, the dataset” entail or, more precisely, given the “descriptive vs. such derivation is not made. As a somewhat opposite example, prescriptive” distinction, what kinds of constraints need to be RDFS-aware engines do not exploit relations of type rdfs:domain considered for evaluating constraint-based completeness via or rdfs:range as relation signature constraints but as inference content-independent queries? supporting statements: when a relation r has a type partially defined by an rdfs:domain (vs. rdfs:range) relation, RDFS-aware Section III proposes an approach to answer a second engines may infer a type for the source (vs. destination) of r. research question: how to represent constraints in a KRL independent way – or, more precisely, in any KRL that has an In this article, constraints that are directly represented in a expressiveness at least equal to RDF or RDFS – even though form ending by “=>> false” – or, equivalently, “=>> ⊥” – are actually defining the semantics of some of these constraints called constraints in inconsistency-implying form. Not all KRLs would require much more expressive logics? The proposed allow to represent rules (instead of – or in addition to – solution relies i) on the representation of constraints via implications); in those that do, representing negative constraints restricted constructs based on relations between classes (or to using the inconsistency-implying