A Formal Approach for RDF/S Ontology Evolution

George Konstantinidis and Giorgos Flouris and Grigoris Antoniou and Vassilis Christophides1

Abstract. In this paper, we consider the problem of ontology evo- the cases and options considered are exhaustive. lution in the face of a change operation. We devise a general-purpose To overcome these limitations, we propose an ontology evolution algorithm for determining the effects and side-effects of a requested framework and elaborate on its formal foundations. Our methodol- elementary or complex change operation. Our work is inspired by ogy is driven by ideas and principles of the belief revision literature belief revision principles (i.e., validity, success and minimal change) [3]. In particular, we adopt the Principle of Success (every change and allows us to handle any change operation in a provably ratio- operation is actually implemented) and the Principle of Validity (re- nal and consistent manner. To the best of our knowledge, this is the sulting ontology is valid, i.e., it satisfies all the validity constraints of first approach overcoming the limitations of existing solutions, which the underlying language). Satisfying both these requirements is not deal with each change operation on a per-case basis. Additionally, trivial, as the straightforward application of a change operation upon we rely on our general change handling algorithm to implement spe- an ontology can often lead to invalidity, in which case certain ad- cialized versions of it, one per desired change operation, in order to ditional actions (side-effects) should be executed to restore validity. compute the equivalent of effects and side-effects.2 Sometimes, there may be more than one ways to do so, in which case a selection mechanism should be in place to determine the “best” op- tion. In this paper, we employ a technique inspired by the Principle of 1 INTRODUCTION Minimal Change [3] (stating that the appropriate result of changing Stored knowledge, in any knowledge-based application, may need an ontology should be as “close” as possible to the original). to change due to various reasons, including changes in the modeled The general idea of our approach is to first determine all the inva- world, new information on the domain, newly-gained access to infor- lidities that any given change (elementary or composite) could cause mation previously unknown, and other eventualities [11]. Here, we upon the updated ontology, using a formal, well-specified validity consider the case of ontologies expressed in RDF/S (as most of the model, and then to determine the best way to overcome potential Semantic Web Schemas (85,45%) are expressed in RDF/S [14]) and invalidity problems in an automatic way, by exploring the various al- introduce a formal framework to handle the evolution of an ontology ternatives and comparing them using a selection mechanism based given a change operation. on an ordering relation on potential side-effects. In particular, our We pay particular attention to the semantics of change operations formal approach is parameterizable to this relation, thus providing which can, in principle, be either elementary (involving a change in a customizable way to guarantee the determination of the “best” re- a single ontology construct) or composite (involving changes in mul- sult. Although our framework is general, in this paper we focus on a tiple constructs) [5]. Even though RDF/S does not support negation, fragment of the RDF/S model which exhibits interesting properties the problem is far from trivial as inconsistencies may rise due to the for deciding query containment and minimization [10]. To the best of validity rules associated with RDF/S ontologies. In fact, naive set- our knowledge, our implementation is the first one that allows pro- theoretical addition or removal of ontological constructs (i.e., direct cessing any type of change operation, and in a fully automatic way. application of a change) has been acknowledged as insufficient for ontology evolution [4, 6, 12]. 2 PROBLEM FORMULATION Most of the implemented ontology management systems (e.g., [1, 2, 8, 13]), are designed using an ad-hoc approach, that solves the 2.1 Modeling RDF/S, ontologies and updates problems related to each change operation on a per-case basis. More In order to abstract from the syntactic peculiarities of the underly- specifically, they explicitly define a finite, and thus incomplete, set of ing language and develop a uniform framework, we will map RDF change operations that they support, and have determined, a priori, to First-Order Logic (FOL). Table 1 (restricted for presentation pur- the semantics of each such operation. Hence, given the lack of a for- poses) shows the FOL representation of certain RDF statements. mal methodology, the designers of these systems have to determine, The language’s semantics is not carried over during the mapping, in advance, all the possible invalidities that could occur in reaction so we need to combine the FOL representation with a set of validity to a change, the various alternatives for handling any such possible rules that capture such semantics. For technical reasons, we assume invalidity, and to pre-select the preferable solutions for implementa- that all constraints can be encoded in the form of (or can be broken tion per case [6]; this selection is hard-coded into the systems’ im- down into a conjunction of) DEDs (disjunctive embedded dependen- plementations. This approach requires a highly tedious, case-based cies), which have the following general form: reasoning which is error-prone and gives no formal guarantee that

∀~uP (~u) → ∨i=1,...,n∃~viQi(~u,~vi) (DED) 1 Institute of Computer Science, FO.R.T.H., Heraklion, Greece, email:gconstan,fgeo,antoniou,[email protected] where: 2 This work was partially supported by the EU projects CASPAR (FP6-2005- IST-033572) and KP-LAB (FP6-2004-IST-27490). • ~u,~vi are tuples of variables Table 1. Representation of RDF facts using FOL predicates Table 2. Validity Rules

RDF triple Intuitive meaning Predicate Rule ID/Name Integrity Constraint Intuitive Meaning C rdf:type rdfs: C is a class CS(C) R2 Domain ∀x, y ∈ Σ: Domain applies to prop- P rdf:type rdf:Property P is a property PS(P ) Applicability Domain(x, y) → erties; the domain of a x rdf:type rdfs:Resource x is a class instance CI(x) PS(x) ∧ CS(y) property is a class P rdfs:domain C domain of property Domain(P,C) R4 C IsA ∀x, y ∈ Σ: Class IsA applies be- P rdfs:range C range of property Range(P,C) Applicability C IsA(x, y) → tween classes C1 rdfs:subClassOf C2 IsA between classes C IsA(C1,C2) CS(x) ∧ CS(y) P1 rdfs:subPropertyOf P2 IsA between properties P IsA(P1,P2) R6 C Inst ∀x, y ∈ Σ: Class Instanceof applies x rdf:type C class instantiation C Inst(x, C) Applicability x P y property instantiation PI(x, y, P ) C Inst(x, y) → between a class instance CI(x) ∧ CS(y) and a class R8 Domain is ∀x, y, z ∈ Σ: The domain of a prop- unique Domain(x, y) → erty is unique • P , Qi are conjunctions of relational atoms of the form ¬Domain(x, z) ∨ (y = 0 z) R(w1, ..., wn) and equality atoms of the form (w = w ), where R10 Domain and ∀x ∈ Σ, ∃y, z ∈ Σ: Each property has a do- 0 w1, ..., wn, w, w are variables or constants Range exists PS(x) → main and a range • P may be the empty conjunction Domain(x, z) ∧ Range(x, y) R11 C IsA ∀x, y, z ∈ Σ: Class IsA is Transitive We employ DEDs, as they are expressive enough for capturing the Transitivity C IsA(x, y) ∧ semantics of different RDF fragments and other simple data mod- C IsA(y, z) → els which are appropriate for our purposes in this paper. Moreover, C IsA(x, z) R12 C IsA ∀x, y, ∈ Σ: Class IsA is Irreflexive DEDs will prove suitable for constructing a convenient mechanism Irreflexivity C IsA(x, y) → for detecting and repairing invalidities. ¬C IsA(y, x) R15 Determining ∀x, y, z ∈ Σ: Class instance propaga- Table 2 shows some rules that are used to capture the semantics of C Inst C Inst(x, y) ∧ tion the various RDF constructs (e.g., R11 captures IsA transitivity), as C IsA(y, z) → C Inst(x, z) well as the restrictions imposed by our RDF model (e.g., R8 captures R17 Property ∀x, y, z, w ∈ Σ: Instanceof between that the domain of a property should be unique). It should be stressed Instance of and PI(x, y, z) ∧ properties reflect in their Domain sources/domains that the semantics of the language captured by tables 1 and 2 essen- Domain(z, w) → 3 C Inst(x, w) tially corresponds to a fragment of the standard RDF/S data model R23 P IsA ∀x, y ∈ Σ: Property IsA is Irreflex- in which there is a clear role distinction between ontology prim- Irreflexivity P IsA(x, y) → ive ¬P IsA(y, x) itives, no cycles in the subsumption relationships, while property subsumption respects corresponding domain/range subsumption re- lationships. Such a fragment has been first studied in [10] in an effort positive ground facts that are already in the ontology. The above two to provide a group of sound and complete algorithms for query con- properties together with the CWA semantics, imply that: tainment and minimization while it is compatible with W3C guide- lines4 for devising restricted fragments of the RDF/S data model. • P (x) ∈ K ⇔ K ` P (x) ⇔ K 0 ¬P (x) Similarly, the general-purpose change handling algorithm presented • P (x) ∈/ K ⇔ K ` ¬P (x) ⇔ K 0 P (x) in this paper can be also applied to other fragments of RDF/S (see An application of these properties is that updating K with ¬P (x) also [7, 9]) or the standard RDF/S semantics. corresponds to contracting P (x) from K, because “incorporating” In Table 2, Σ denotes the set of constants in our language. We ¬P (x) in an ontology could be achieved only by removing P (x) equip our FOL with closed semantics, i.e., CWA (closed world as- from K. Therefore, updating an ontology with negative ground sumption). This means that, for two formulas p, q, if p 0 q then facts corresponds to contraction/erasure in the standard terminology, p ` ¬q. Abusing notation, for two sets of ground facts U, V , we will whereas updating an ontology with positive ground facts corresponds say that U implies V (U ` V ) to denote that U ` p for all p ∈ V . to revision/update in the standard terminology. Any expression of the form P (x1, ..., xk) is called a positive ground fact where P is a predicate of arity k and x1, ..., xk are constant sym- bols. Any expression of the form ¬P (x1, ..., xk) is called a negative 2.2 Updating under constraints ground fact iff P (x , ..., x ) is a positive ground fact. L denotes the 1 k We say that an ontology K satisfies a validity rule c, iff K ` c. set of all well-formed formulae that can be formed in our FOL. We Obviously for a set C of validity rules, K satisfies C (K ` C) iff denote by L+ the set of positive ground facts, L− the set of negative K ` c for all c ∈ C. It is easy to see that for a simple constraint ground facts and set L0 = L+ ∪ L−, called the set of ground facts of the form c = ∀uP (u) → Q(u), where P,Q are simple positive of the language. We define: predicates and u is a variable, it holds that: • An ontology is a set K ⊆ L+ K ` c iff for all constants x : K ` {¬P (x)} or K ` {Q(x)}. • An update is a set U ⊆ L0 This can be easily extended to the general case. Suppose that c = In simple words, an ontology is any set of positive ground facts ∀~uP (~u) → ∨ ∃~v Q (~u,~v ), where P (~u) = P (~u) ∧ ... ∧ whereas an update is any set of positive or negative ground facts. i=1,...,n i i i 1 P (~u) for some k > 0 and Q (~u,~v ) = Q (~u,~v ) ∧ ... ∧ Applying an update to an ontology should result in the incorporation k i i i1 i Q (~u,~v ) for some m > 0 depending on i. Then K ` c iff for of the update in the ontology. im i all tuples of constants x at least one of the following is true (note that By definition, ontologies have two properties: (a) they are always in case of obvious reference to tuples of constants or variables we consistent (in the purely logical sense) and (b) they imply only the will be omitting the 0~0 symbol): 3 http://www.w3.org/TR/rdf-concepts/ 4 http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#technote • There is some j : 0 < j 6 k such that K ` {¬Pj (x)}. • There is some i : 1 6 i 6 n and some tuple of constants z such negative ground facts, so they are updates in our terminology. This is that for all j = 1, 2, ..., m K ` {Qij (x, z)}. a very useful remark, as we will subsequently take advantage of the elements of Comp(c, x), applying them as updates. We can conclude that K ` c iff for all tuples of constants x at least In our example, the validity of rule R2.2, for x = P, y = D can one of the following sets is implied by K: be restored iff either {¬Domain(P,D)} or {CS(D)} are added as additional updates (side-effects) to the ontology. Note that side- • {¬Pj (x)}, 0 < j 6 k effects could trigger side-effects of their own if violating any rules. • {Qi1(x, z) ∧ Qi2(x, z) ∧ ... ∧ Qim(x, z)}, 1 6 i 6 n, z:constant

Based on the above observation, we define the component set of c Table 3. Some validity rules in component set form with respect to some tuple of constants x as follows: Rule ID/Name Components of the rule R2 Domain R2.1 : ∀x, y ∈ Σ: Comp(R2.1, (x, y))= Comp(c, x) = {{¬Pj (x)}|0 < j 6 k} ∪ {{Qi1(x, z) ∧ Qi2(x, z) Applicability {{¬Domain(x, y)}, {PS(x)}} ∧... ∧ Qim(x, z)} |1 6 i 6 n, z : constant} R2.2 : ∀x, y ∈ Σ: Comp(R2.2, (x, y))= {{¬Domain(x, y)}, {CS(y))}} R8 Domain is ∀x, y, z ∈ Σ: Comp(R8, (x, y, z))= Prop. 1 will subsequently help us define a valid ontology. unique {{¬Domain(x, y)}, {¬Domain(x, z)}, {(y = z)}} Prop. 1 K ` c iff for all constants x there is some V ∈ Comp(c, x) R10 Domain and R10.1 : ∀x ∈ Σ, ∃z ∈ Σ: such that K ` V . Range exists Comp(R10.1, (x, z))= {{¬PS(x)}, {Domain(x, z)}} R10.2 : ∀x ∈ Σ, ∃y ∈ Σ: Def. 1 Consider a FOL language L and a set of validity rules C. Comp(R10.1, (x, y))= An ontology K will be called valid with respect to L and C iff K is {{¬PS(x)}, {, Range(x, y)}} R17 Property ∀x, y, z, w ∈ Σ: consistent and it satisfies the validity rules C. Instance of and Comp(R17, (x, y, z, w)) = Domain {{¬PI(x, y, z)} Note that a valid ontology, by our rules of Table 2, contains all ,{¬Domain(z, w)}, {C Inst(x, w)}} its implicit knowledge as well (i.e., it is closed with respect to in- ference). Due to the special characteristics of our framework (e.g., P D P A B A B CWA, the form of rules, etc), one does not need to employ full FOL Make D reasoning to determine whether an ontology K is valid (i.e., using domain of P Def. 1 and Prop. 1); instead, we can use the specialized procedure P P described below (Prop. 2). a b a b (a) (b) Prop. 2 A ground fact P (x), added to an ontology K, would violate rule c, iff there is some set V and tuple of constants ~u for which ¬P (x) ∈ V and V ∈ Comp(c, ~u) and for all V 0 ∈ Comp(c, ~u), Figure 1. Adding a new domain to a property. V 6= V 0 it holds that K 0 V 0.

As an example, consider the ontology of Fig. 1(a). The orig- 2.3 Selection of side-effects: ordering inal ontology in our case, per Table 1, is: K = { CS(A), CS(B), CI(a), CI(b), PS(P ), Domain(P,A), Range(P,B), If there were no validity rules or we were not interested in the result PI(a, b, P ), C Inst(a, A), C Inst(b, B)} and the update is: U = being a valid ontology the most rational way to perform an update {Domain(P,D)}. To detect rule violations in an automated way, would be to simply apply the changes in U upon K. according to Prop. 2, we must find all the rules that contain Def. 2 The raw application of an update U to an ontology K is de- ¬Domain(x, y), set x = P , y = D, and determine whether some noted by K + U and is the following ontology: K + U = {P (x) ∈ other component for the particular instantiation is implied by the L+|P (x) ∈ K ∪ Uand¬P (x) ∈/ U} ontology. If the answer is no, then the addition of Domain(P,D) would violate the particular instantiation of this rule. In our case, When a set of changes (i.e., an update U) is raw applied to a valid this is true for rule R2.2 (domain applicability), for x = P, y = D ontology K, some of the changes that appear in U may be void, and rule R8 (unique domain) for x = P, y = D, z = A as well i.e., they don’t need to be performed because they are already im- as for x = P, y = A, z = D (see also Table 3 for some rules plemented (implied) by the original ontology. We define an operator in their component set format). Moreover, it violates rule R17 for which, given a resulting ontology K0 that an update would produce x = a, y = b, z = P, w = D. on a valid ontology K, calculates the actual effects of the update: One nice property of our detection mechanism is that it provides an immediate way to restore invalidities as well, i.e., generate po- Def. 3 For K a valid ontology and K0 an ontology: tential side-effects that would restore the violation. In particular, the Delta(K,K0) = {P (x) ∈ L0|K0 ` {P (x)} and K 0 {P (x)}} violation that Prop. 2 detects can be restored by making any of the 5 elements of Comp(c, ~u) true in the ontology. At this point note that Delta function is some kind of “edit distance” between K and K0; if K0 = K + U, then Delta represents the actual changes that when a Qij (x, z) in some set V ∈ Comp(c, x) is an equality of 0 the form w = w0, then the truth value of this equality is revealed U enforces upon K. Thus, K + U = K + Delta(K,K + U) = K , as soon as we instantiate this rule’s variables to constants. There- so Delta(K,K + U) produces the same result as U when applied fore, by evaluating an equality as a tautology (>) or contradiction upon an ontology; however they may be different as U could contain (⊥) and replace it accordingly in the rule’s instances, we are able void changes. to eliminate all the equality atoms from the components sets. With- 5 Note that the term “edit-distance” is usually used for sequences and not sets out equalities, the elements of Comp(c, x) contain only positive and (i.e., edit scripts) As already mentioned, the raw application of an update would not • Limit Cases: If K is not a valid ontology or U is an infeasible update, work for our case, because it may not respect the validity constraints then: K • U = K. of the language. Thus, applying an update involves the application • General Case: If K is a valid ontology and U is a feasible update, then: of some side-effects. In some cases, it may not be possible to find – Principle of Success: K • U ` U adequate side-effects for the update at hand; such updates are called – Principle of Validity: K • U is a valid ontology infeasible and cannot be executed. For example, any inconsistent up- – Principle of Minimal Change: For all valid ontologies K0 such that 0 0 date (such as, U = {CS(A), ¬CS(A)}) is infeasible. K ` U, it holds that Delta(K,K • U) 6K Delta(K,K ) In most cases though, an update has several possible alternative Def. 5 dictates that applying a rational change operator between an sets of side-effects, which implies that a selection should be made. update and an ontology should result (in the general case) in a valid Consider an update U with alternative side-effects U and U . Then, 1 2 ontology (Principle of Validity), which implies the update (Principle the set of changes that should be raw applied on the initial ontology, of Success). Moreover, for any other valid ontology K0 that could be in order to reach a valid result, is either U ∪ U or U ∪ U . Accord- 1 2 an alternative result, the set of (non-void) side-effects leading to K0 ing to the Principle of Minimal Change we should choose the one (captured by the Delta function) is more “expensive” than the set of which causes the “mildest” effects upon the ontology; to determine (non-void) side-effects leading to the result of the rational change op- the “relative mildness” (or “relative cost”) of such effects, we define eration. In effect, the rational change operator applies the minimum, an ordering 6 between updates. Note that this ordering should de- with respect to 6 , set of effects and side-effects. pend on K itself: for example, the “cost” of removing an IsA relation K between A and B should depend on the importance of the concepts A, B in the RDF graph itself. The following conditions have proven 3 ALGORITHMS necessary for an ordering to produce “rational” results. 3.1 General-purpose algorithm Def. 4 An ordering 6K is called update- We now present our general-purpose algorithm shown in Table 5. generating iff the following conditions hold: 0 0 0 For a given language L (predicates and rules) and update-generating Delta Antisymmetry: For any U, U : U 6K U and U 6K U implies Delta(K,K + U)=Delta(K,K + ordering 6, the function takes as inputs the ontology K, an update U 0). U, and the set ESE (initially empty) of already considered effects 0 00 0 0 Transitivity: For any U, U , U : U 6K U and U 6K and side-effects. The algorithm identifies all invalidities caused by 00 00 U implies U 6K U . 0 0 0 the predicates of the update, appends the update with each possible Totality: For any U, U : U 6K U or U 6K U. 0 0 alternative side-effect separately and calls itself. Upon returning, it Conflict Sensitivity: For any U, U : U 6K U iff Delta(K,K + 0 U) 6K Delta(K,K + U ). compares the different alternatives with the “min” function (which 0 0 0 Monotonicity: For any U, U : U ⊆ U implies U 6K U . implements 6). Upon termination it returns the effects (U) and the Similarly, an ordering scheme {6 |K : a valid ontology} is called K minimal (per 6K ) set of side-effects of U upon K. The output of the update-generating iff 6K is update-generating for all valid algorithm, if not infeasible, is ready to be raw applied to K, as stated ontologies K. in Theorem 1. For our RDF case we introduced a particular update-generating or- dering, which is based on the ordering shown in Table 4 (among the Table 5. Function Update: (U, K, ESE) → Delta(K,K • U) positive and negative predicates presented in Table 1 for simplicity). The details of the expansion of this ordering to refer to ground facts STEP 1: If U ∪ ESE is inconsistent, then return INFEASIBLE. and sets of ground facts (i.e., updates) is omitted due to space limita- STEP 2: If (K ∪ ESE) ` U, then return ∅ tions. In short, the general idea is that an update U is “cheaper” (or STEP 3: Take an arbitrary ground fact P (x) ∈ U \ ESE such that K 0 1 {P (x)} preferable) than U2 (denoted by U1 6K U2) iff the “most expensive” STEP 4: Find a rule r, such that there is some set V and tuple of con- 0 predicate used in update U1, is “cheaper” than the “most expensive” stants ~y for which ¬P (x) ∈ V and V ∈ Comp(r, ~y) and for all V ∈ 0 0 predicate used in update U2 where the predicates’ relative preference Comp(r, y), V 6= V it holds that V * U ∪ ESE. is determined by the order shown in Table 4. Ties are resolved us- STEP 5: If there is no such rule, then return {P (x)} ∪ Update(U, K, ESE ∪ {P (x)}). ing cardinality considerations and/or the relative importance of the STEP 6: Otherwise, select (arbitrarily) one such rule, say r, and return predicate’s arguments in the original ontology (details omitted). Our min{Update(U ∪ V 0,K,ESE)}, where the min is taken over all V 0 ∈ ordering was based on the results of experimentation on various al- Comp(r, ~y), V 0 6= V . ternative orderings and results to an efficient and intuitive implemen- tation. Nonetheless, our algorithm works with any update-generating Theorem 1 The raw application of the output of Update(K, U, ∅) ordering; each different ordering would model and impose a differ- to K implements a rational change operation. ent global evolution policy on our algorithm. Fig. 1(b) depicts the outcome of the requested update with respect to our ordering. The complexity of our algorithm depends on the parameters (lan- guage, rules, ordering). When considering an infinite number of con- Table 4. Ordering of predicates stants in our language, we could have an infinite number of rule in-

P I