Completing Description Logic Knowledge Bases Using Formal Concept Analysis∗

Completing Description Logic Knowledge Bases using Formal Concept Analysis∗ Franz Baader,1 Bernhard Ganter,1 Barıs¸ Sertkaya,1 and Ulrike Sattler2 1TU Dresden, Germany and 2The University of Manchester, UK Abstract concerned with a different quality dimension: completeness. We provide a basis for formally well-founded techniques and We propose an approach for extending both the ter- tools that support the ontology engineer in checking whether minological and the assertional part of a Descrip- an ontology contains all the relevant information about the tion Logic knowledge base by using information application domain, and to extend the ontology appropriately provided by the knowledge base and by a domain if this is not the case. expert. The use of techniques from Formal Con- A DL knowledge base (nowadays often called ontology) cept Analysis ensures that, on the one hand, the usually consists of two parts, the terminological part (TBox), interaction with the expert is kept to a minimum, which defines concepts and also states additional constraints and, on the other hand, we can show that the ex- (so-called general concept inclusions, GCIs) on the interpre- tended knowledge base is complete in a certain, tation of these concepts, and the assertional part (ABox), well-defined sense. which describes individuals and their relationship to each other and to concepts. Given an application domain and a DL 1 Introduction knowledge base (KB) describing it, we can ask whether the KB contains all the relevant information1 about the domain: Description Logics (DLs) [Baader et al., 2003] are a suc- • cessful family of logic-based knowledge representation for- Are all the relevant constraints that hold between con- malisms, which can be used to represent the conceptual cepts in the domain captured by the TBox? knowledge of an application domain in a structured and for- • Are all the relevant individuals existing in the domain mally well-understood way. They are employed in vari- represented in the ABox? ous application domains, such as natural language process- ing, configuration, databases, and bio-medical ontologies, but As an example, consider the OWL ontology for human pro- their most notable success so far is due to the fact that DLs tein phosphatases that has been described and used in [Wol- provide the logical underpinning of OWL, the standard ontol- stencroft et al., 2005]. This ontology was developed based ogy language for the semantic web [Horrocks et al., 2003]. on information from peer-reviewed publications. The human As a consequence of this standardization, several ontology protein phosphatase family has been well characterised exper- editors support OWL [Knublauch et al., 2004; Oberle et al., imentally, and detailed knowledge about different classes of 2004; Kalyanpur et al., 2006a], and ontologies written in such proteins is available. This knowledge is represented in OWL are employed in more and more applications. As the the terminological part of the ontology. Moreover, a large set size of these ontologies grows, tools that support improving of human phosphatases has been identified and documented their quality become more important. The tools available un- by expert biologists. These are described as individuals in the til now use DL reasoning to detect inconsistencies and to infer assertional part of the ontology. One can now ask whether the consequences, i.e., implicit knowledge that can be deduced information about protein phosphatases contained in this on- from the explicitly represented knowledge. There are also tology is complete. Are all the relationships that hold among promising approaches that allow to pinpoint the reasons for the introduced classes of phosphatases captured by the con- inconsistencies and for certain consequences, and that help straints in the TBox, or are there relationships that hold in the the ontology engineer to resolve inconsistencies and to re- domain, but do not follow from the TBox? Are all possible move unwanted consequences [Schlobach and Cornet, 2003; kinds of human protein phosphatases represented by individ- Kalyanpur et al., 2006b]. These approaches address the qual- uals in the ABox, or are there phosphatases that have not yet ity dimension of soundness of an ontology, both within it- been included in the ontology or even not yet been identified? self (consistency) and w.r.t. the intended application domain Such questions cannot be answered by an automated tool (no unwanted consequences). In the present paper, we are alone. Clearly, to check whether a given relationship between ∗Supported by DFG (GRK 334/3) and the EU (IST-2005-7603 1The notion of “relevant information” must, of course, be formal- FET project TONES and NoE 507505 Semantic Mining). ized appropriately for this problem to be addressed algorithmically. IJCAI-07 230 concepts—which does not follow from the TBox—holds in ing the exploration process. the domain, one needs to ask a domain expert, and the same In the next section, we introduce our variant of FCA that is true for questions regarding the existence of individuals not can deal with partial contexts, and describe an attribute ex- described in the ABox. The rôle of the automated tool is to ploration procedure that works with partial contexts. In Sec- ensure that the expert is asked as few questions as possible; tion 3, we show how a DL knowledge base gives rise to a in particular, she should not be asked trivial questions, i.e., partial context, define the notion of a completion of a knowl- questions that could actually be answered based on the rep- edge base w.r.t. a fixed model, and show that the attribute resented knowledge. In the above example, answering a non- exploration algorithm developed in the previous section can trivial question regarding human protein phosphatases may be used to complete a knowledge base. In Section 4 we de- require the biologist to study the relevant literature, query scribe ongoing and future work. This paper is accompanied existing protein databases, or even to carry out new exper- by a technical report [Baader et al., 2006] containing more iments. Thus, the expert may be prompted to acquire new details and in particular full proofs of our results. biological knowledge. Attribute exploration is an approach developed in Formal 2 Exploring Partial Contexts [ ] Concept Analysis (FCA) Ganter and Wille, 1999 that can In this section, we extend the classical approach to Formal be used to acquire knowledge about an application domain Concept Analysis (FCA), as described in detail in [Ganter by querying an expert. One of the earliest applications of this and Wille, 1999], to the case of individuals (called objects [ ] approach is described in Wille, 1982 , where the domain is in FCA) that have only a partial description in the sense that, lattice theory, and the goal of the exploration process is to for some properties (called attributes in FCA), it is not known find, on the one hand, all valid relationships between prop- whether they are satisfied by the individual or not. The con- erties of lattices (like being distributive), and, on the other nection between this approach and the classical approach is hand, to find counterexamples to all the relationships that do explained in [Baader et al., 2006]. In the following, we as- not hold. To answer a query whether a certain relationship sume that we have a finite set M of attributes and a (possibly holds, the lattice theory expert must either confirm the rela- infinite) set of objects. tionship (by using results from the literature or carrying out a new proof for this fact), or give a counterexample (again, by Definition 2.1 A partial object description (pod) is a tuple either finding one in the literature or constructing a new one). (A, S) where A, S ⊆ M are such that A ∩ S = ∅.Wecall A ∪ S = M Although this sounds very similar to what is needed in our such a pod a full object description (fod) if .A context, we cannot directly use this approach. The main rea- set of pods is called a partial context and a set of fods a full son is the open-world semantics of description logic knowl- context. edge bases. Consider an individual i from an ABox A and Intuitively, the pod (A, S) says that the object it describes a concept C occurring in a TBox T . If we cannot deduce satisfies all attributes from A and does not satisfy any attribute from the TBox T and A that i is an instance of C,thenwe from S. For the attributes not contained in A ∪ S, nothing is do not assume that i does not belong to C. Instead, we only known w.r.t. this object. A partial context can be extended by accept this as a consequence if T and A imply that i is an either adding new pods or by extending existing pods. ¬C instance of . Thus, our knowledge about the relationships (A,S) between individuals and concepts is incomplete: if T and A Definition 2.2 We say that the pod extends the pod (A, S), and write this as (A, S) ≤ (A,S),ifA ⊆ A and imply neither C(i) nor ¬C(i), then we do not know the re- i C S ⊆ S . Similarly, we say that the partial context K extends lationship between and . In contrast, classical FCA and K K≤K attribute exploration assume that the knowledge about indi- the partial context , and write this as , if every pod in K is extended by some pod in K.IfK is a full context and viduals is complete: the basic datastructure is that of a formal K≤K K K (A, S) context, i.e., a crosstable between individuals and properties. ,then is called a realizer of .If is a fod (A, S) ≤ (A, S) (A, S) A cross says that the property holds, and the absence of a and , then we also say that realizes (A, S) cross is interpreted as saying that the property does not hold.

Load more