Proceedings, Fifteenth International Conference on Principles of Knowledge Representation and Reasoning (KR 2016)

A MIS Partition Based Framework for Measuring Inconsistency

Said Jabbour1, Yue Ma2,3, Badran Raddaoui4, Lakhdar Sa¨ıs1 and Yakoub Salhi1 1CRIL CNRS UMR 8188, , 2LRI, Univ. -Sud, CNRS, Universite´ Paris-Saclay, France 3 Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, China 4LIAS - ENSMA, , France {jabbour,sais,salhi}@cril.fr, [email protected], [email protected]

Abstract this case, it is not desired to immediately conclude that both agents have a same conflict degree with A. Instead, some In this paper, we propose a general framework, both pa- proper inconsistency measures are necessary. rameterized and parameter-free, for defining a family of fine-grained inconsistency measures for propositional Among many possible ways to define an inconsistency knowledge bases. The parameterized approach allows measure (Hunter 2006; Grant and Hunter 2008; Hunter and to encompass several existing inconsistency measures Konieczny 2010; Ma, Qi, and Hitzler 2011; Jabbour et al. as specific cases, by properly setting its parameter. And 2014), minimal inconsistent subsets (MISes) are often used the parameter-free approach is defined to avoid the dif- because a MIS forms a direct representation of an incon- ficulty in choosing a suitable parameter in practice but sistency core in a KB. For example, a classical measure still keeps a desired ranking for knowledge bases by their IMI (Hunter and Konieczny 2010) is defined as the num- inconsistency degrees. The fine granularity of our frame- ber of MISes of a base K, i.e., IMI(K)=|MISes(K)|, work is based on the notion of MIS partition that con- I (A ∪ B )=I (A ∪ B )=2 siders the inner structure of all the minimal inconsistent by which we have MI 1 MI 2 subsets of a knowledge base. Moreover, MinCostSAT- for the previously mentioned example. Although useful based encodings are provided, which enable the use of for many scenarios (Hunter and Konieczny 2010; 2008; efficient SAT solvers for the computation of the pro- Mu, Liu, and Jin 2011), IMI measure fails to recommend posed measures. We implement these algorithms and B1 or B2 for A. We claim that B1 has more conflicts with test them on some real-world datasets. The preliminary A than B2 because there are two independent contradictory experimental results for a variety of inputs show that the topics between A and B1 (i.e. {t1, ¬t1}, {t2, ¬t2}), each of proposed framework gives a wide range of possibilities which should be modified to make an agreement between A for evaluating large knowledge bases. and B1. However, the disagreement between A and B2 (i.e. {t1, ¬t1}, {t1,t2, ¬t1 ∨¬t2}) can be handled by revising Introduction only one topic, for instance, if A deletes t1. The problem of no distinction between B1 and B2 for A is due to the fact that Reasoning about inconsistent knowledge bases (KBs) has IMI treats all MISes equally in terms of their contributions to been a long-standing challenge in the AI community. In re- the inconsistency degree. Recently, another measure, called cent years, measuring inconsistency has proved useful in di- ICC, has been developed as a lower bound of all standard in- verse scenarios, including software specifications (Barragans- consistency measures (Jabbour, Ma, and Raddaoui 2014), by Martinez, Arias, and Vilas 2004), belief merging (Qi, Liu, considering the most representative MISes. The ICC metric and Bell 2005), news reports (Hunter 2006), integrity con- can distinguish B1 and B2 for A because ICC(A ∪ B1)=2 straints (Grant and Hunter 2006; 2013), and multi-agents and ICC(A ∪ B2)=1. However, if another agent is avail- systems (Hunter, Parsons, and Wooldridge 2014; Jabbour, able, say B3 = {¬t1,t1 → t2},wegetICC(A ∪ B3)=1. Ma, and Raddaoui 2014). That is, the agent A can not distinguish between B2 and Inconsistencies are often unavoidable in real-world appli- B3, which is again insufficient in the following aspect: A cations. To achieve a certain goal, an agent may need to co- and B2 have more groups of topics in conflict (i.e. {t1, ¬t1}, operate with another agent, even in the presence of conflicts {t1,t2, ¬t1 ∨¬t2}) than A and B3 (i.e. {t1, ¬t1}). From the between them. In this case, the agent would prefer one that above discussion about IMI and ICC, a fine-grained analysis has the least disagreement with herself. For instance, suppose of the inner structures of MISes is clearly necessary. that an agent A with the support on two topics t1 and t2, writ- Studies on inner structures of MISes have been performed ten A = {t1,t2}, has to collaborate with one of the following in many different settings. For instance, a dependence relation two agents: B1 = {¬t1, ¬t2}, and B2 = {¬t1, ¬t1 ∨¬t2}. among MISes has been identified by the fact that resolving Clearly, A is in conflict with both, since B1 and B2 agree some MISes allows automatic resolution of others (Benferhat, upon the topic ¬t1 that contradicts with the topic t1 of A.In Dubois, and Prade 1995). In the context of ontology debug- Copyright c 2016, Association for the Advancement of Artificial ging, root or derived axioms for unsatisfiability have been Intelligence (www.aaai.org). All rights reserved. distinguished (Kalyanpur et al. 2005). A graphical represen-

84 tation of the relationships among justifications and axioms • Independence: I(K∪{α})=I(K) if α ∈ free(K∪{α}). is provided (Bail et al. 2011). In this paper, we propose the • MinInc: I(M)=1if M ∈ MISes(K). notion of MIS partition that results in a general framework • I(K ∪ ...∪K )=Σn I(K ) of inconsistency measures. Ind-decomposability: 1 n i=1 i if MISes(K ∪...∪K )=MISes(K )  ...MISes(K ) Our contribution can be summarized as follows: 1 n 1 n , where  is the multi-set union over sets, and unfree(Ki) ∩ • Based on the MIS partition, we define a new family of unfree(Kj)=∅ for 1 ≤ i = j ≤ n. weighted inconsistency measures and show that these mea- The first four properties seem natural for an inconsistency sures satisfy some rational properties, and capture several measure. The idea behind the Ind-decomposability (Jabbour, existing inconsistency measures. Ma, and Raddaoui 2014) is that the inconsistency degrees • To overcome the difficulty in choosing a suitable parameter of several KBs should be additive if these bases are dis- in practice, we further provide a parameter-free inconsis- joint and have disjoint MISes. Notice that this revised prop- tency measure that can preserve the desired ranking (see erty is introduced to overcome the limitations of the De- below for details) generated from the MIS partition. composability property (Hunter and Konieczny 2010). An- I(K∪{α}) ≥ • We present Minimum-Cost Satisfiability (MinCostSAT) other property named Dominance says that I(K∪{β}) α  β based encodings so that we can benefit from cutting-edge if , which is however criticized due to SAT solvers for computing the proposed fine-grained in- its unsuitability for characterizing inconsistency measures consistency measures. based on minimal inconsistent subsets (Mu et al. 2011; Besnard 2014). Therefore, in the rest of this paper, we only • An experimental study is conducted to show the relevance consider the five postulates given above which are widely of the proposed measures for finely quantifying the conflict accepted by the AI community. status of real-world KBs. Definition 2 An inconsistency measure is called a standard Preliminaries measure if it satisfies the Consistency, Monotonicity, Indepen- dence, MinInc, and Ind-decomposability properties. L A propositional language is built over a finite set of I propositional symbols PS using classical logical connec- It is easy to check that MI is a standard measure. {¬, ∧, ∨, →, ↔} ⊥ tives . The symbol denotes contradiction. I a, b, c, . . . represent atoms in PS. A literal is an atom a MIS-based Measure CC Revisited or its negation ¬a. A clause C is a disjunction of literals: In (Jabbour, Ma, and Raddaoui 2014), the authors proposed C = a1 ∨ ...∨ an. A formula α in conjunctive normal form a new inconsistency measure, denoted ICC, based on a subtle (CNF) is a conjunction of clauses. Let Var(α) denotes the analysis of the dependencies among the formulas of the KB. set of variables in α. An interpretation is a total function For example, in the ICC measure, overlap among MISes are from PS to {true, false}. An interpretation B is a model taken into account. Indeed, to characterize the inner struc- of α iff B(α)=true.AKBK is a finite set of propositional ture of MISes, each KB K is associated with an hypergraph formulas. For a set S, we denote by |S| its cardinality, and by GK whose vertices are associated with the formulas of K 2S its power set. K is inconsistent if K⊥, where  is the and edges with the MISes of K. This hypergraph based rep- classical consequence relation. Minimal inconsistent subsets, resentation gives a better insight of the correlations among defined below, are often used to analyze inconsistency in a MISes so that it becomes useful for analyzing inconsistencies. KB. For example, the notion of strong-partition of a KB can be considered in the light of the connected components of GK, Definition 1 Let K be a KB and M ⊆K. M is a Minimal based on which the ICC measure can be defined. Inconsistent Subset (MIS)ofK iff M ⊥and ∀M M, M ⊥. The set of all minimal inconsistent subsets of K is Definition 3 (Jabbour, Ma, and Raddaoui 2014) Let K be denoted MISes(K). a KB and R ⊆K.Astrong-partition of K is a pair {K1,...,Kn},R such that: A formula α ∈Kis called a free formula iff M ∈ (1) Ki ⊆Kand Ki ⊥, ∀i (1 ≤ i ≤ n); MISes(K) s.t. α ∈ M. The class of free formulas of K free(K)=K\ MISes(K) (2) {K1,...,Kn,R} is a partition of K; is written , and its complement n is named unfree formulas set: unfree(K)=K\free(K). (3) MISes(K1 ∪ ...∪Kn)= i=1 MISes(Ki). I An inconsistency measure is a function that maps a KB The ICC measure is defined as ICC(K)=m if there is a to a non-negative real number such that higher value indicates strong-partition D, R where |D| = m, and there is no larger conflict. strong-partition D,R such that |D| >m. Several desired properties have been defined to charac- That is, when removing the set R from K, ICC corresponds terize inconsistency measures (Hunter and Konieczny 2010; exactly to the number of connected components of the hyper- Jabbour, Ma, and Raddaoui 2014; Besnard 2014). In this pa- graph representing K\R. per, given an arbitrary inconsistency measure I we focus on One thing to note is that the measure ICC is defined to the following properties: serve as a lower bound of all standard measures. • I(K)=0 K Consistency: iff is consistent. Proposition 1 ((Jabbour, Ma, and Raddaoui 2014)) For • Monotonicity: if K⊆K, then I(K) ≤ I(K ). any KB K and any standard measure I, ICC(K) ≤ I(K).

85 Example 1 Consider the KB K = {a, ¬a, a ∧ b, (a ∨ direction holds because a closed set packing T for U = K c) ∧ d, ¬d, ¬c ∧ e, ¬e, ¬e ∧ f} with its formulas named and S = MISes(K) is indeed a set packing for U = K\R u1,u2,...,u8, respectively. K has six MISes as depicted and S = MISes(K\R) with R = K\ Q⊆T Q.  I (K)=2 in the hypergraph below. It follows that CC , which For Example 1, if we take R = {u1,u4,u7}, we will get (K) means that MISes are highly correlated, that is, the hy- the maximal set packing of K\R = {s2,s6} which is indeed pergraph can not be decomposed into more connected com- a maximal closed set packing of K. This proposition says that ponents even if some formulas are removed. the optimal closed set packing corresponds to the maximal set packing with some sub-bases removed. Using Proposition 2 and Definition 3, the ICC measure can be nicely characterized by the closed set packing problem as stated by the following proposition. Corollary 3 For any KB K, we have:

ICC(K)=σMCSP(K, MISes(K)).

Proof: (Sketch) By Definition 3, ICC value corresponds to In the following, we provide some additional properties of the size of the largest subset of MISes(K) having pairwise ICC using the closed set packing problem defined in (Jabbour disjoints MISes and closed by union. This corresponds to et al. 2015). the solutions of the defined MCSP(U, S).  We first review the classical set packing problem defined The last result shows that ICC value can be computed by as follows. leveraging solutions to the closed set packing problem. The Definition 4 Let U be a universe and S a family of subsets detailed algorithm is given in the section dedicated to the of U.Aset packing is a subset P ⊆ S such that, ∀Si,Sj ∈ P computation issues. with Si = Sj, Si ∩ Sj = ∅. Towards Weight-Based Standard Measures The closed set packing problem is defined to circumscribe a special type of set packing. As mentioned earlier, the ICC measure defines a lower bound for standard measures. Unfortunately, the lower bound con- Definition 5 Let U be a universe and S a family of subsets S S siders only a subset of MISes forming a closed set packing of U. We define the function fS :2 → 2 as fS(P )= of MISes (cf. Corollary 3). That is, ICC does not take into {S ∈ S | S ⊆∪ S } P ⊆ S i i S ∈P . Then, a set packing is account the contribution of each MIS to the whole inconsis- called a closed set packing (CSP) if P is a fixed point of the f f (P )=P tency. A key step for designing a more accurate inconsistency function S, i.e., S . metric is to analyse the contribution of each MIS to the in- That is, a CSP P of S satisfies the following condition: the consistency of a given KB. union of the selected subsets does not contain any unselected One of such analyses can be done via a weighted assess- subsets of S, i.e., ∀Si ∈ S \ P , Si ⊆ S∈P S . ment of the relevance of each MIS by taking into account the overall structure of the whole set of MISes. To illustrate Example 2 (Example 1 contd.) By taking the universe this point, let us consider the KB K = {a, ¬a, a ∨ b, ¬b} U = {u1,u2,u3,u4,u5,u6,u7,u8} and S = MISes(K), with two MISes M1 = {a, ¬a}, and M2 = {¬a, a ∨ b, ¬b}. various CSPs can be constructed. For instance, {S2,S4} is Since M1 ∩ M2 = ∅, a “relevant” standard inconsistency one, but not {S2,S4,S6} since S3 ⊆ S2 ∪ S4 ∪ S6. Indeed, measure I should satisfy I(K)

Proof: (Sketch) The ≤ direction can be seen from the 1Note that the Ind-decomposability cannot be applied to K be- fact that for any R, a set packing w.r.t. U = K\R and cause its MISes are inner connected. Hence, for a standard measure S = MISes(K\R) is closed under the function fS. The ≥ I, it is not required that I(K)=2.

86 indicates that the conflicts spread over the whole KB with inconsistency value to both K and K , since ICC(K)=|p1| various independent sources of inconsistency. To resolve such and ICC(K )=|p1| by Corollary 3. In contrast, the ordering sort of inconsistency, all these conflicts need to be handled defined above allows us to compare two KBs according to separately. To this end, our proposed approach exploits the the remaining elements of the conflict vectors. closed set packing problem to characterize the correlation To quantify the conflict degree of a KB, we will use the set among different MISes. of c-partitions with an associated weighting scheme. This will Before defining our inconsistency measures, we need to allows us to better quantify the contribution of the different introduce a MIS partition that partitions the set of MISes of a MISes to the inconsistency of the KB. KB into clusters of closed set packing, called c-partition. Definition 11 Let K beaKBandP = {p1,...,pn}∈ K U = K S = Definition 6 Let be a KB, and PMISes (K). We define the weighting function W of P as: MISes(K) P = {p ,...,p } K . 1 n is called a c-partition of MISes(K)= pi pi if 1≤i≤n , where each is a closed set W(P)= |pi|×wi, packing for U and S. pi∈P In the sequel, let PMISes (K) denote the set of c-partitions {w }+∞ w = of K. Obviously, PMISes (K) = ∅ if K⊥. Indeed, we can where n n=1 is a decreasing positive sequence with 1 1 build a c-partition where each pi contains exactly one MIS. . Now, we will associate with each c-partition P = p {p ,...,p } V (P)= That is, the MISes belonging to the same i are equally 1 n the following numeric vector weighted and the weight associated to each c-partition is the |π(p1)|,...,|π(pn)| where π is a permutation of P such |π(p )|≥... ≥|π(p )| sum of the individual weight associated to each MIS. that 1 n . Clearly, each c-partition pos- Now, we are ready to define our new class of inconsistency sesses a unique ordered numeric vector. metrics induced by W as follows: In order to compare two c-partitions of a given KB, it will be convenient to assume a lexicographic ordering relation  Definition 12 Let K be a KB. The weighted inconsistency over their associated numeric vectors: measure of K, w.r.t. a weighting function W, is defined as Definition 7 Let u, v ∈ Nn × Nm be two vectors. Suppose follows: that u = u1,...,un and v = v1,...,vm. Then, u is IW (K)=max{W(P) |P∈PMISes (K)}. lexicographically less than v, denoted by u  v,iffu = v, k ≤ min(n, m) u

87 +∞ Proposition 4 Let K beaKBandμ = |MISes(K)|− Proposition 6 Let K be a KB and {wn}n=1 a weighting ICC(K). Then, we have: sequence such that w1 =1and wn = λ for all n>1, where λ is a non negative constant satisfying 0 ≤ λ ≤ 1. Then, μ ( wi)+ICC(K) ≤ IW (K) ≤ IMI(K). IW (K)=(1− λ) × ICC(K)+λ × IMI(K). i=2

Proof: Note that w1 =1and wn = λ for all n>1, I Proof: By definition of CC, there exists a strong-partition it is clear that a partition P = {p1,...,pn} maximizing D, R D = {K ,...,K } R = K\ where 1 ICC(K) and W(P) p1 the must put in the maximal set of MISes that ∪ K P = {p ,...,p } p = K ∈D . Let us consider 1 μ where 1 forms a closed set packing, which corresponds to ICC, and the {M ,...M } M ∈ MISes(K ) 1 ≤ i ≤ 1 ICC(K) s.t. i i for remaining MISes can be distributed into the other elements ICC(K), and {p2,...,pμ} = MISes(K) \ p1. Clearly, pi (1

Proof: ⎧ ⎪ • Consistency and Independence are verified be- ⎨ICC(K) if λ =0 ICC(K)+IMI(K) 1 cause MISes(K) = ∅ iff K is inconsistent, and IW (K)= if λ = MISes(K)=MISes(K∪{α}) α ∈ free(K∪{α}) ⎩⎪ 2 2 for . IMI(K) if λ =1 • Monotonicity: Let K and K be two KBs, and P = {p1,...,pn} a c-partition of MISes(K). Proof: Direct consequence of Proposition 6.  We have MISes(K) ⊆ MISes(K∪K). Let +∞ {M1,...,Mm} = MISes(K∪K) \ MISes(K). Then, Corollary 8 Let K beaKBand{wn}n=1 a weighting se- P = {p1,...,pn, {M1},...,{Mm}} is a c-partition of quence. Then, we have: K∪K. So, W(P ) ≥W(P). Consequently, IW (K∪ I (K)=I (K) ∀n ∈[1, +∞[,w =1. W MI if and only if n K ) ≥ IW (K). I • MinInc: For a single MIS M, there exists a unique c- Finally, let us stress that the definition of W is a gen- eral method to define an inconsistency measure. Follow- partition P = {M}. Since w1 =1, IW (M)=1. ing the same way, another measure can be obtained by • Ind-decomposability: The proof follows from the defining the partitions by set packing instead of closed set existence of a bijection between the partition of K∪K packing (cf. Definition 6), which leads to the following K K and the union of the partitions of and . new inconsistency value: ISP(K)=max{W(P) |P ∈  PMISes (K) such that ∀p ∈P, p is a set packing}. Now we can compare different KBs by conflict degrees. Example 5 Consider the following KBs K1 = {a, ¬a, a ∨ Definition 13 Let K and K be two KBs. We say that K is b, ¬b, b} and K2 = {a, ¬a ∧ b, ¬b ∧ c, ¬c}. Suppose that we w = 1 ,i∈ N∗ less inconsistent than K iff IW (K) ≤ IW (K ). use a sequence i i . We have: 1 5 Example 4 Let us consider the following two KBs K1 = • IW (K1)=ISP(K1)=2+2 = 2 , {a, ¬a, a∨b, ¬b, b}, and K2 = {c, ¬c∧d, ¬d∧e∧f,¬e, ¬f}. • I (K )=1+1 + 1 = 11 I (K )=2+1 = 5 1 ∗ W 2 2 3 6 and SP 2 2 2 . Given a sequence wi = ,i∈ N , i K K M 1 = Note that both 1 and 2 have three MISes: K1 • We have, MISes(K1)={{a, ¬a}, {b, ¬b}, {¬a, a ∨ 2 3 {a, ¬a},MK = {¬a, a ∨ b, ¬b},MK = {b, ¬b}, and b, ¬b}}. Then, the maximal c-partition of K1 is P = 1 1 M 1 = {a, ¬a ∧ b},M2 = {¬a ∧ b, ¬b ∧ c},M3 = {p1,p2} such that p1 = {{a, ¬a}, {b, ¬b}} and p2 = K2 K2 K2 {¬a, a ∨ b, ¬b}. It follows that IW (K1)=|p1|×w1 + {¬b ∧ c, ¬c}. Similarly, they both own two sized set pack- |p |×w =2× 1+1× 1 = 5 {M 1 ,M3 } {M 1 ,M3 } 2 2 2 2 . ings, e.g. K1 K1 and K2 K2 , respectively. 1 1 1 25 However, their closed set packings are different, so are • In a similar way, we obtain IW (K2)=1+ + + = . 2 3 4 12 the c-partitions corresponding to IW (K1) and IW (K2): I (K )

88 Proposition 9 Let K be a KB. The following inequality v1, 1,v2, 1,...,vn−1, 1,vn: holds: 1 I (K)=v + . ISP(K) ≥ IW (K). cf 1 1 1+ 1 v + Towards a Parameter-Free Standard Measure 2 1 1+ . 1 An important property of the previously defined weight-based .. + v measure IW lies in its ability to encompass some existing n inconsistency measures, such as IMI and ICC, as specific Note that the extra “1”s in the above definition are added cases. However, such a measure is defined using an additional into the vector to guarantee that Theorem 11 (below) holds. parameter w that must be tuned in practice. In the following, we will study the impact of the parameter w on the induced Example 7 (Example 6 contd.) From Definition 14, we I (K )=2+ 1 = 8 I (K )= 34 I have cf 1 1 3 and cf 2 13 whose W value. Considering that an inconsistency measure is often 1+ 2 e used to rank different KBs, the goal is, therefore, to have a corresponding extended sequence are V (K1)=2, 1, 2 e parameter-free measure that can return a meaningful ranking and V (K2)=2, 1, 1, 1, 1, 1, 1. Then, we deduce that K1 of different KBs based on their degrees of conflict. is more inconsistent than K2, i.e., K2 is less conflicting than First, we examine the ranking of different KBs under the K1, under the measure Icf . I W measure, illustrated by the following example. Similar to IW , we can see that Icf (K) satisfies all the desired properties of a standard measure. Example 6 Let us consider two KBs Proposition 10 Icf measure is a standard measure. K1 = {u1,u2,u3,u4,u5,u6,u7} and K2 = {v1,v2,v3,v4,v5,v6,v7} whose MISes are depicted For any KBs K and K , suppose that their conflict vectors below by ellipses: are V (K)=v1, ··· ,vn and V (K )=v1, ··· ,vm, re- spectively. Recall that we have a ranking between K and K by their conflict vectors, that is, K is less conflicting than K if V (K)  V (K).

Example 8 (Example 6 and 7 contd.) Since V (K1)= 2, 2 and V (K2)=2, 1, 1, 1, we have V (K2)  V (K1), that is, K2 is of less conflict than K1 according to their con- flict vectors. We can see that some maximal c-partitions can be obtained I (the left hand side for K1, the right hand side for K2): The key property of the cf measure is characterized by the following proposition. Theorem 11 For any KBs K and K , Icf (K) ≤ Icf (K ) if KK.

So we have V (K1)=2, 2 and V (K2)=2, 1, 1, 1. Now, Proof: It is obvious that if K = K , Icf (K)=Icf (K ). wi = 1 wi = 1 ,i∈ we take two different sequences 1 i and 2 2i−1 Now we suppose K≺Kand V (K)=v1, ··· ,vn, ∗ i 1 V (K)=v , ··· ,v  1 ≤ k ≤ N . Then, for w1 we have IW (K1)=2× 1+2× 2 =3and 1 m . Then, there exists 1 1 1 37 min(n, m) such that vi = v for 0 ≤ iIW (K2), k 1+ 1 k by noting that k and k are positive vk+1+... which contradicts the previous ordering of K1 and K2 under i integers according to the definition of conflict vector. Conse- w1. quently, Icf (K) ≤ Icf (K ).  This example shows that there exist cases where different settings of the parameter w can lead to different rankings Theorem 11 tells us that the ranking of KBs under the Icf of KBs. To overcome this problem, we introduce below a measure coincides with that defined by their conflict vectors. new inconsistency measure, called Icf , which is defined as a special continued fraction constructed from MIS partitions On the Computation of Standard Measures of a KB. According to Corollary 3, computing the lower bound of stan- dard measures is equivalent to computing a solution to a max- Definition 14 Let K be a KB and V(K)=v1, ..., vn imum closed set packing problem. In this section, we extend its conflict vector. The inconsistency measure Icf is 0 if this result to the fine-grained measure IW and Icf . Indeed, V(K) is a zero vector; otherwise Icf is the continued frac- we provide encodings for all the three measures into the Min- tion corresponding to the extended sequence Ve(K)= imum Cost Satisfiability problem (MinCostSAT) (Miyazaki,

89 Iwama, and Kambayashi 1996), which allows us to take a In order to compute ICC, we need to maximize the sum YS full advantage of the continuous progresses in practical SAT Si∈S i . To encode this maximization problem as a Min- solving. CostSAT problem, we only need to rename each variable Y ¬Y Y α = Si with Si ( Si is a fresh propositional variable) in MinCostSAT-based Encoding of ICC (1) ∧ (3) ∧ (4), for all Si ∈ S. We note such renamed for- R(α) MinCostSAT I (U, S) I Let us recall that ICC is the cardinality of the maximum closed mula . The - CC , encoding CC, set packing of S = MISes(K) over the universe U = K. No- is defined as follows: tice that a direct encoding of this problem can be obtained Problem: MinCostSAT-ICC(U, S) by modeling the conflicting sets of MISes, i.e., sets of MISes which are not closed set packings. Unfortunately, this ap- min F(B)= f(x) ×B(x) proach is clearly inefficient because of the huge number of x∈Var(α) possible conflicting sets of MISes. To overcome this problem, subject to we propose an original encoding that considers both S and U B∈M(R(α)) with f defined as follows: in MinCostSAT. ∀Si ∈ S, f(YS )=1; Definition 15 (MinCostSAT) Let α be a CNF formula and i ∀X ∈ Var(α) \{Y | S ∈ S},f(X)=0. f a cost function that associates a non-negative cost to each Si i variable in Var(α). The MinCostSAT is the problem of find- ing a model B for α that minimizes the objective function: MinCostSAT-based Encoding of IW Similarly to the previous encoding of ICC, we provide a F(B)= f(x) ×B(x) MinCostSAT-based encoding to compute IW value. Indeed, x∈Var(α) we look for a partition of MISes such that each element of U S U the partition is a closed set packing (see Definition 6). Let be a universe and a set of subsets of .We P = {P ,...,P } S X Y e ∈ U Let 1 n be a partition of . Here, we asso- associate a boolean variable e (resp. Si ) to each j ciate a binary variable Xe to each element e in U to express (resp. Si ∈ S). The first formula allows us to only consider j that e ∈ Pj and a set of binary variables Y ∈{0, 1} to each the pairwise disjoint subsets in S: Si   subset Si in S to express that Si ∈ Pj. YS ≤ 1 (1) The first formula allows us to only consider the pairwise i S P e∈U Si∈S|e∈Si disjoint subsets in w.r.t. j: The inequalities in (1) correspond to the AtMostOne con- n Y j ≤ 1 straint which is a special case of the well-known cardinality Si (5) constraint. Several efficient encodings of the cardinality con- e∈U j=1 Si∈S|e∈Si straint to CNF have been proposed, most of which try to im- j The following formulas allow us to express that Y =1 prove the efficiency of constraint propagation (e.g. (Bailleux Si ∀e ∈ S Xj =1 and Boufkhad 2003; Sinz 2005)). In this case, the inequality if and only if i, e : YS ≤ 1 is encoded as follows using a sequen- Si∈S|e∈Si i n tial counter (Sinz 2005; Marques-Silva and Lynce 2007) (we j j n ¬Y ∨ X (6) Y Y Si e suppose that S ∈S|e∈S Si can be rewritten as i=1 Si ): i i j=1 Si∈S e∈Si n   (¬YS ∨ q1) ∧ (¬YS ∨¬qn−1) j j 1 n (Y ∨ ¬Xe ) Si (7) ((¬Y ∨ q ) ∧ (¬q ∨ q ) ∧ (¬Y ∨¬q )) j=1 Si∈S e∈Si Si i i−1 i Si i−1 1

¬YSi ∨ Xe (3) A MinCostSAT encoding can be obtained by renaming the j j j Si∈S e∈Si Y ¬Z Z variables of the form Si as Si ( Si is a fresh variable),   in the same way as in our encoding of ICC.WeuseR (α) to (YS ∨ ¬Xe) (4) i denote the formula obtained from α by using this renaming Si∈S e∈Si where α =(5)∧ (6) ∧ (7) ∧ (8). It is worth noticing that any solution to the inequalities (1), (3) and (4) represents a closed set packing of S. Problem: MinCostSAT-IW (U, S)

90 Datasets 3 contain randomly generated instances, named mpfs m n, having m MISes and each MIS is of the size n. min F(B)= f(x) ×B(x) The artificial datasets are designed because both eMUS and x∈Var(α) JUST can only terminate over instances whose MISes are subject to highly correlated, which is reflected by the small ICC values B∈M(R(α)) with f defined as follows: for the datasets 1 and 2 as shown in Table 1. The procedure is done by generating a random family of sets {S1,...,Sm} ∀S ∈ S ∀j ∈{1,...,n},f(Zj )=w ; i and Si j to represent the set of MISes, with m varying between 50 ∀X ∈ Var(α) \Z,f(X)=0 and 200. Each Si (corresponding to a MIS) is generated as a j set of integers selected from the interval [1, 100] to stand for where Z = {Z | Si ∈ S, j ∈{1,...,n}}. Si formulas in Si by their indexes. Table 1 reports, for each instance, the inconsistency values I Computation of cf under the four standard measures, i.e., IMI, ICC, IW (where w = 1 I Let us recall that the computation of Icf is based on the n n ), and cf , from which we can draw the following conflict vector corresponding to a maximal c-partition w.r.t. conclusions: a lexicographical ordering. The encoding of c-partitions is The first interesting observation is that both IW and Icf described by the constraints Cp =(5)∧ (6)∧ (7)∧ (8) used for metrics assign all instances with distinct inconsistency de- I I computing IW . To calculate a maximal c-partition P, we enu- grees. In contrast, many instances have equivalent CC or MI merate the models of Cp in a lexicographical ordering. Each values. The ICC is more problematic due to its small values of time a model is found, a lex-constraint is added dynamically being either 1 or 2 in most cases for the first two real-world in order to search for a new model (c-partition) that is greater datasets. This means that the ICC is not adequate in practice I than the previous one w.r.t. pr. In a second step, we derive to be used as an improvement of MI to take into account a conflict vector associated to the maximal c-partition of K, inner structure of MISes while estimating inconsistencies. I I then we compute Icf (see Definition 14). Meanwhile, it shows that our W and cf measures can be used to better distinguish different KBs according to their Experiments different inconsistency degrees, which benefits from the pro- posed MIS partition for a fine-grained analysis of the inner The previous encodings provide a way to benefit from the structure of MISes. efficient SAT solvers for computing the family of standard Secondly, compared to IW , the parameter-free measure inconsistency measures. In this section, we conduct a com- Icf has an important theoretical property that it guarantees parative evaluation of IMI, ICC, IW , and Icf measures. Our the ordering defined over conflict vectors. However, from goal is to discover their strengths and complementarities. We the experimental result, an advantage of the measure IW in implemented the algorithms based on an optimization-based practice exists in a clear difference between the inconsistency SAT solver MiniSAT 2.2 (Een´ and Sorensson¨ 2003). The ex- values. For instance, IW (Snomed typeI out2) = 6.53 and periments were performed on a Xeon 3.2GHz (2 GB RAM) IW (Snomed typeI out48) = 4.16 but their Icf values are cluster with a timeout of one hour of CPU time based on the too close with a difference of less than 10−7. following three datasets: Finally, we note that there is a big difference between 2 Datasets 1 were taken from the MaxSAT competition IMI and the other measures (i.e. ICC, IW ,Icf ). For example, used in the context of MISes Enumeration (Previti and the instances C220 FV RZ 13 and apex gr 2pin w4.shuffled∗ Marques-Silva 2013). These datasets encode real-world prob- from the datasets 1 have large IMI values (6772 and 1500, lems coming from rocket domain or automotive product con- respectively), but much smaller ICC, IW , and Icf degrees figuration of car lines. For these instances, the state-of-the-art (≤ 10). Small values of ICC, IW , and Icf indicate that the MISes enumerator eMUS (Previti and Marques-Silva 2013) MISes of such instances are strongly interconnected, which was used to generate MISes of each KB. Note that eMUS is a can not be reflected by merely using the IMI measure. partial MISes enumerator, so for certain instances, we only consider subsets of their MISes if eMUS cannot get all MISes before timeout. Related Work Datasets 2 are instances from the error-tolerant reasoning In this section, we provide a brief overview of some works (Ludwig and Penaloza˜ 2014) over large biomedical ontolo- related to inconsistency measures. gies ( SNOMED version 13.11d and the NCI thesaurus3) To measure inconsistencies in classical logics, a range of with their EL++ versions (Baader, Brandt, and Lutz 2005). logic-based approaches have been proposed. For instance, The data instances correspond to different brave and cautious one can cite those based on probabilistic models (Knight consequences of the given ontologies and errors (Ludwig and 2002; Doder et al. 2010), multi-valued semantics (Grant Penaloza˜ 2014). The MISes for these data were computed 1978; Hunter 2002; 2003; Oller 2004; Hunter 2006; Grant and by JUST tool (Ludwig 2014). We used only the instances Hunter 2008; Ma et al. 2010; Xiao et al. 2010; Ma, Qi, and treatable by JUST under its predefined timeout. Hitzler 2011), maximal consistent subsets (Ammoura et al. 2015), minimal inconsistent subsets (Hunter and Konieczny 2http://www.satcompetition.org/2011/ 2008; Mu, Liu, and Jin 2011; 2012; Xiao and Ma 2012; 3http://evs.nci.nih.gov/ftp1/NCI Thesaurus Jabbour, Ma, and Raddaoui 2014) and hitting sets (Mu 2015;

91 Instance #vars #clauses IMI ICC IW Icf C168 FW UT 851 1909 6758 102 1 5.2 1, 1, ··· , 1102 C220 FV RZ 13 1728 4014 6772 1 9.39 1, 1, ··· , 16772 c880 gr rcs w5.shuffled 3280 44291 70 1 4.83 1, 1, ··· , 170 rocket ext.b 283 1844 75 1 4.9 1, 1, ··· , 175 ∗ c7552-bug-gate-0 2640 6989 1000 1 7.48 1, 1, ··· , 11000 ∗ apex gr 2pin w4.shuffled 1322 10940 1500 2 8.89 1, 1, ··· , 11500 ∗ wb conmax1.dimacs.filtered 277950 1221020 20 2 4.54 2.73205080739 ∗ wb 4m8s4.dimacs.filtered 463080 1759150 20 9 13.36 9.61803278689 ∗ wb1.dimacs.filtered 49525 140091 71 50 59.16 50.61803398875 Snomed typeI out2 SNOMED metrics 112 2 6.53 2.73205080757 Snomed typeI out30 #axioms: 369194 8 1 3.78 1, 1, ··· , 18 Snomed typeII out48 #concepts: 310013 19 2 4.16 2.73205080512 Snomed typeII out152 #roles: 58 21 2 4.54 2.73205080739 Snomed typeII out189 37 2 5.59 2.61803398875 NCI typeI out36 NCI metrics 41 1 4.96 1, 1, ··· , 141 NCI typeII out157 #axioms: 159 805 79 2 5.2 2.61803398875 NCI typeI out233 #concepts: 104 087 82 1 5.96 1, 1, ··· , 182 NCI typeII out425 #roles: 92 46 5 10.78 5.85410196625 mfsp 50 20 mfsp n m 50 5 11.12 5.82842712475 mfsp 100 50 n: # MISes 100 22 47.55 22.9582607431 mfsp 120 80 m: MIS size 120 20 46.03 20.9544511501 mfsp 150 60 150 11 29.50 11.9160797831 mfsp 200 50 200 11 34.79 11.9226162893

1 Table 1: Comparative Evaluation of IMI, ICC, Icf , and IW with wn = n .ForIcf , the results are either represented as the value or in the form of conflict vector 1, 1, ··· , 1n where n denotes the length of a conflict vector. Note that once ICC(K)=1, the K 1, 1, ··· , 1 conflict vector of will be |IMI(K)|, which can be directly used for ranking KBs.

Thimm 2016). Conclusion and Future Work In this paper, based on the MIS partition, we have first pre- sented an original framework for defining a family of in- Inconsistency measures based on a Shapley value are an- consistency measures of KBs which encompasses several other alternative that exploits existing inconsistency measures well-known measures as specific cases. Then, we simplified to define a coalition-based game and then use the Shapley the framework to get a parameter-free measure which keeps value to analyze the amount of inconsistency that can be im- a desired property in ranking inconsistent KBs. Moreover, puted to each formula in a given KB (Hunter and Konieczny the computational aspects of these new measures and of an 2010). Recently, based on logical argumentation theory, an- existing lower bound of all standard measures have been ex- other family of inconsistency measures for propositional plored using a MinCostSAT based encoding which enables logic has been proposed (Raddaoui 2015). Conspicuously, the use of efficient SAT solvers. We have implemented the it is hardly possible to have a complete comparison of the algorithms and tested them over real-world datasets. The existing measures. One way to categorize the existing metrics preliminary but encouraging experimental results highlight is with respect to their dependence on syntax or semantics. that the new inconsistency measures can better distinguish Semantic-based measures aim to compute the proportion of different KBs by their conflict degrees in comparison to two the language that is affected by the inconsistency. The incon- well-known inconsistency metrics. sistency measures belonging to this class are often based on As future work, we plan to conduct further experimental some paraconsistent semantics and, thus, syntax independent, validations of our proposed framework on application do- because we can still find paraconsistent models for incon- mains where inconsistency measure would be very helpful. sistent KBs. Whilst, syntax-based approaches are concerned Such domains include belief merging, argumentation and with the minimal number of formulas that cause inconsis- heterogeneous source integration and management. tencies. An overview of inconsistency measures for classical logics can be found in (Grant and Hunter 2011). Acknowledgments The second author benefits from the support of the ANR under the number ANR-15-CE23-0022. There is also related work on inconsistency measure- ment for quantitative logics. In particular, several works References have extended existing inconsistency measures for classi- Ammoura, M.; Raddaoui, B.; Salhi, Y.; and Oukacha, B. cal frameworks to the probabilistic setting and investigates 2015. On measuring inconsistency using maximal consistent their properties. One can quote for example the family of sets. In ECSQARU, 267–276. inconsistency metrics, proposed by (Picado-Muino˜ 2011; EL Thimm 2013), based on the quantification of the minimal Baader, F.; Brandt, S.; and Lutz, C. 2005. Pushing the adjustments in the degrees of certainty (i.e., probabilities) envelope. In IJCAI, 364–369. of the statements necessary to make the KB consistent. In Bail, S.; Horridge, M.; Parsia, B.; and Sattler, U. 2011. The (Rodder and Xu 2001), another inconsistency measure for justificatory structure of the NCBO bioportal ontologies. In probabilistic conditional logic is proposed. It is based on ISWC, 67–82. generalized divergence which is a specific distance for proba- Bailleux, O., and Boufkhad, Y. 2003. Efficient CNF encoding bility functions. of boolean cardinality constraints. In CP, 108–122.

92 Barragans-Martinez, A. B.; Arias, J. P.; and Vilas, A. F. 2004. Ludwig, M. 2014. Just: a tool for computing justications On measuring levels of inconsistency in multi-perspective w.r.t. EL ontologies. In ORE. requirements specifications. In PRISE, 21–30. Ma, Y.; Qi, G.; Xiao, G.; Hitzler, P.; and Lin, Z. 2010. Compu- Benferhat, S.; Dubois, D.; and Prade, H. 1995. A local ap- tational complexity and anytime algorithm for inconsistency proach to reasoning under inconsistency in stratified knowl- measurement. Int. J. Software and Informatics 4(1):3–21. edge bases. In ECSQARU, 36–43. Ma, Y.; Qi, G.; and Hitzler, P. 2011. Computing inconsistency Besnard, P. 2014. Revisiting postulates for inconsistency measure based on paraconsistent semantics. J. Log. Comput. measures. In JELIA, 383–396. 21(6):1257–1281. Doder, D.; Raskovic, M.; Markovic, Z.; and Ognjanovic, Z. Marques-Silva, J. P., and Lynce, I. 2007. Towards robust cnf 2010. Measures of inconsistency and defaults. Int. J. Approx. encodings of cardinality constraints. In CP, 483–497. Reasoning 51(7):832–845. Miyazaki, S.; Iwama, K.; and Kambayashi, Y. 1996. Database Een,´ N., and Sorensson,¨ N. 2003. An extensible SAT-solver. queries as combinatorial optimization problems. In CODAS, In SAT, 502–518. 477–483. Grant, J., and Hunter, A. 2006. Measuring inconsistency in Mu, K.; Liu, W.; Jin, Z.; and Bell, D. A. 2011. A syntax- knowledgebases. J. Intell. Inf. Syst. 27(2):159–184. based approach to measuring the degree of inconsistency for belief bases. Int. J. Approx. Reasoning 52(7):978–999. Grant, J., and Hunter, A. 2008. Analysing inconsistent first- order knowledgebases. Artif. Intell. 172(8-9):1064–1093. Mu, K.; Liu, W.; and Jin, Z. 2011. A general framework for measuring inconsistency through minimal inconsistent sets. Grant, J., and Hunter, A. 2011. Measuring the good and the Knowl. Inf. Syst. 27(1):85–114. bad in inconsistent information. In IJCAI, 2632–2637. Mu, K.; Liu, W.; and Jin, Z. 2012. Measuring the blame of Grant, J., and Hunter, A. 2013. Distance-based measures of each formula for inconsistent prioritized knowledge bases. J. inconsistency. In ECSQARU, 230–241. Log. Comput. 22(3):481–516. Grant, J. 1978. Classifications for inconsistent theories. Notre Mu, K. 2015. Responsibility for inconsistency. International Dame Journal of Formal Logic 19(3):435–444. Journal of Approximate Reasoning 61:43–60. Hunter, A., and Konieczny, S. 2008. Measuring inconsistency Oller, C. A. 2004. Measuring coherence using lp-models. J. through minimal inconsistent sets. In KR, 358–366. Applied Logic 2(4):451–455. Hunter, A., and Konieczny, S. 2010. On the measure Picado-Muino,˜ D. 2011. Measuring and repairing incon- of conflicts: Shapley inconsistency values. Artif. Intell. sistency in probabilistic knowledge bases. Int. J. Approx. 174(14):1007–1026. Reasoning 52(6):828–840. Hunter, A.; Parsons, S.; and Wooldridge, M. 2014. Measuring Previti, A., and Marques-Silva, J. 2013. Partial MUS enu- inconsistency in multi-agent systems. Unstliche Intelligenz meration. In AAAI. 28(3):169–178. Qi, G.; Liu, W.; and Bell, D. A. 2005. Measuring conflict and Hunter, A. 2002. Measuring inconsistency in knowledge via agreement between two prioritized belief bases. In IJCAI, quasi-classical models. In AAAI, 68–73. 552–557. Hunter, A. 2003. Evaluating significance of inconsistencies. Raddaoui, B. 2015. Computing inconsistency using deduc- In IJCAI, 468–478. tive argumentation. In ICAART, 164–172. Hunter, A. 2006. How to act on inconsistent news: Ignore, Rodder, W., and Xu, L. 2001. Elimination of inconsistent resolve, or reject. Data Knowl. Eng. 57(3):221–239. knowledge in the probabilistic expert system-shell SPIRIT Jabbour, S.; Ma, Y.; Raddaoui, B.; and Sa¨ıs, L. 2014. Prime (in German). In Operations Research - Springer-Verlag, 260– implicates based inconsistency characterization. In ECAI, 265. 1037–1038. Sinz, C. 2005. Towards an optimal cnf encoding of boolean Jabbour, S.; Ma, Y.; Raddaoui, B.; Sais, L.; and Salhi, Y. cardinality constraints. In CP, 827–831. 2015. On structure-based inconsistency measures and their Thimm, M. 2013. Inconsistency measures for probabilistic computations via closed set packing. In AAMAS, 1749–1750. logics. Artif. Intell. 197:1–24. Jabbour, S.; Ma, Y.; and Raddaoui, B. 2014. Inconsistency Thimm, M. 2016. Stream-based inconsistency measurement. measurement thanks to MUS-decomposition. In AAMAS, Int. J. Approx. Reasoning 68:68–87. 877–884. Xiao, G., and Ma, Y. 2012. Inconsistency measurement Kalyanpur, A.; Parsia, B.; Sirin, E.; and Hendler, J. A. 2005. based on variables in minimal unsatisfiable subsets. In ECAI, Debugging unsatisfiable classes in OWL ontologies. J. Web 864–869. Sem. 3(4):268–293. Xiao, G.; Lin, Z.; Ma, Y.; and Qi, G. 2010. Computing Knight, K. 2002. Measuring inconsistency. J. Philosophical inconsistency measurements under multi-valued semantics Logic 31(1):77–98. by partial Max-SAT solvers. In KR. Ludwig, M., and Penaloza,˜ R. 2014. Brave and cautious reasoning in EL. In DL, 274–286.

93