14 Feature Article: Jingli Lu, Xiaowei Yan, Dingrong Yuan and Zhangyan Xu

A New Similarity Measure for Vague Sets

Jingli Lu, Xiaowei Yan, Dingrong Yuan, Zhangyan Xu

Abstract--Similarity measure is one of important, effective and their similarity should not be 1. In [5] Dug Hun Hong widely-used methods in data processing and analysis. As vague put forward another similarity measure MH, according to the theory has become a promising representation of fuzzy concepts, in formulae of MH, we can obtain the similarity of vague values [0.3, this paper we present a similarity measure approach for better 0.7] and [0.4, 0.6] is M ([0.3,0.7],[0.4,0.6]) = 0.9 , in the same understanding the relationship between two vague sets in applica- H tions. Compared to existing similarity measures, our approach is far model, we can get M H ([0.3,0.6],[0.4,0.7]) = 0.9 . In a voting more reasonable, practical yet useful in measuring the similarity model, the vague value [0.3, 0.7] can be interpreted as: “the vote between vague sets. for a resolution is 3 in favor, 3 against and 4 abstentions”; [0.4,

Index Terms-- Fuzzy sets, Vague sets, Similarity measure 0.6] can be interpreted as: “the vote for a resolution is 4 in favor, 4 against and 2 abstention”. [0.3, 0.6] and [0.4, 0.7] can have I. INTRODUCTION similar interpretation. Intuitively, [0.3, 0.7] and [0.4, 0.6] may be more similar than [0.3, 0.6] and [0.4, 0.7]. Therefore sometimes In the classical introduced by Cantor, a German the results of M model are not accordant with our intuition. mathematician, values of elements in a set are only one of 0 and H After analyzing most existing vague sets and vague values simi- 1. That is, for any , there are only two possibilities: in or larity measures, we find out that almost each measure has its not in the set. Therefore, the theory cannot handle the data with defect. In section 2.2, we will illustrate them with more examples. ambiguity and uncertainty. Then we have to make a choice according to the applications. Zadeh proposed fuzzy theory in 1965 [1]. The most important A more reasonable approach is proposed to measure similarity in feature of a is that fuzzy set A is a of objects that this paper, after analyzing existing methods. satisfy a certain (or several) property. Each object x has a mem- The remaining of this paper is organized as follows. In Section bership degree of A, denoted as µ (x). This membership A 2, several methods of similarity measure for vague set are dis- has the following characteristics: The single degree contains the cussed. An improved similarity measure method and its proper- evidences for both supporting and opposing x. It cannot only ties are given in Section 3. Section 4 concludes this paper. represent one of the two evidences, but it cannot represent both at the same time too. II. PRELIMINARIES In order to deal with this problem, Gau and Buehrer proposed the concept of vague set in 1993 [2], by replacing the value of an A. Vague set element in a set with a sub- of [0, 1]. Namely, a true- In this section, we review some basic of vague values membership function tv(x) and a false-membership function fv(x) and vague sets from [2], [3], [4]. are used to describe the boundaries of membership degree. These 1 Vague Sets [2]: Let X be a space of points (ob- two boundaries form a sub-interval [tv(x), 1 – fv(x)] of [0, 1]. The jects), with a generic element of X denoted by x. A vague set V in vague set theory improves of the objective real world, X is characterized by a -membership function t and a false- becoming a promising tool to deal with inexact, uncertain or v vague knowledge. Many researchers have applies this theory to membership function fv . tv is a lower bound on the grade of many situations, such as fuzzy control, decision-making, knowl- membership of x derived from the evidence for x, and fv is a edge discovery and fault diagnosis. And the tool has presented lower bound on the negation of x derived from the evidence more challenging than that with fuzzy sets theory in applications. against x, t and f both associate a real number in the interval In intelligent activities, it is often needed to compare and cou- v v ple between two fuzzy concepts. That is, we need to check [0,1] with each point in X, where tv + f v ≤ 1 . That is whether two knowledge patterns are identical or approximately same, to find out functional dependence relations between con- tv : X → [0,1] ; fv : X → [0,1] cepts in a data mining system. Many measure methods have been This approach bounds the grade of membership of x to a sub- proposed to measure the similarity between two vague sets (val- interval [t (x),1− f (x)] of [0,1] ues). Each of them is given from different side, having its own v v counterexamples. Such as Shyi-Ming Chen proposed a similarity When X is continuous, a vague set V can be written as measure MC in [3], whereas from the MC model we can gain the V = ∫ [tV (x),1− fV (x)] / x , x ∈ X . similarity of vague values [0.5, 0.5] and [0, 1] is 1, obviously X When X is discrete, a vague set V can be written as Department of Computer Science, Guangxi Normal University, Gulin, 541004, P.C.China. n [email protected]; [email protected]. V = ∑[tV (xi ),1− fV (xi )]/ xi , xi ∈ X . i=1

November 2005 Vol.6 No.2 IEEE Intelligent Informatics Bulletin Feature Article: A New Similarity Measure for Vague Sets 15

Definition 2: Let x and y be two vague values, where 0.8]). According to our intuition, pair of ([0.4, 0.8], [0.5, 0.8]) is more similar than pair of ([0.4, 0.8], [0.5, 0.7]), but in the model x = [t x ,1 − f x ] and y = [t y ,1 − f y ] . If t x = t y and of ML, the two pairs of vague values have the same similarity. f x = f y , then the vague values x and y are called equal (i.e., The MO model also reflects the equal concerns between the difference of true-membership degrees and the difference of [t x ,1 − f x ] = [t y ,1 − f y ] ). false-membership degrees. But similar to the MH model, the MO Definition 3: Let A and B be vague sets of the universe of dis- model does not consider whether the differences are positive or course U, U ={u1 ,u2 ,Lun }, where negative. The above methods of similarity measure can be used to solve A = [t (u ),1− f (u )] / u +[t (u ),1− f (u )] / u A 1 A 1 1 A 2 A 2 2 the problem of how to determine the similarity between two +L+[t A (un ),1− f A (un )] / un vague values in a certain extent. But each of them focuses on B [t (u ),1 f (u )]/u [t (u ),1 f (u )]/u = B 1 − B 1 1 + B 2 − B 2 2 different aspects. There are three factors which affect the similar- +L+[tB (un ),1− fB (un )]/un ity of vague values: true-membership function tx, false- membership function f , and 1 – t – f . The there are If ∀i , [t A (ui ),1− f A (ui )] = [tB (ui ),1− f B (ui )] , then the vague x x x many counterexamples under the measures of the M , M and M sets A and B are called equal, where 1 ≤ i ≤ n . H L O models is that the weights of |tx – ty|, | fx – fy| and |(ty + fy) – (tx + B. Research into similarity measure fx)| in the above methods are constants. Its explicit characteristic Currently, there have been many similarity measurements for is that it is not considered whether the difference is positive or vague set (value). Suppose that X = [tx, 1 – fx] and Y = [ty, 1 – fy] negative. Based on this idea, a new weighted and variable simi- are two vague values over the discourse universe U. Let S(x) = tx larity measure is proposed. It can considerably reduce the possi- – fx,, S(y) = ty – fy, the MC, MH, ML and MO models are defined bility of similarity coincidence. respectively in [3], [5], [6] and [7] as follows, | S(x) − S(y) | III. MEASURING THE SIMILARITY BETWEEN VAGUE SETS M (x, y) = 1− C 2 This section constructs a new approach for measuring the (1) | (t x − t y ) − ( f x − f y ) | similarity between vague sets, analyzes the properties and illus- = 1− trates the use by examples. 2 | t − t | + | f − f | A. A New Similarity Measure M (x, y) = 1− x y x y (2) H 2 We first give an example. Assume that there are four candi-

| S(x) − S(y) | | t x − t y | + | f x − f y | dates A, B, C, D, and ten voters. One voter supports A, one op- M L (x, y) = 1− − poses A; two support B, one opposes B; seven support C, one 4 4 (3) opposes C ; eight support D, and one opposes D. The voting re- | (t − t ) − ( f − f ) | + | t − t | + | f − f | = 1− x y x y x y x y sults of A, B, C, and D can be viewed as four vague values, A[0.1, 4 0.9], B[0.2, 0.9], C[0.7, 0.9], and D[0.8, 0.9]. Now we compare 2 2 the similarities between A and B, and between C and D. By for- (t x − t y ) + ( f x − f y ) M (x, y) = 1− (4) mulae (1), (2), (3) and (4), we obtain the similarities as shown in O 2 Table 1. Comparisons among the M , M , M , M models can also be C H L O found in [7]. TABLE 1. From the definition of the MC model, we know that AN EXAMPLE OF SIMILARITY CALCULATION x y Mc MH ML MO M’ t – f = t – f ⇒ M ≡ 1, x x y y C A,B [0.1,0.9] [0.2, 0.9] 0.95 0.95 0.95 0.929 0.968 C,D [0.7, 0.9] [0.8, 0.9] 0.95 0.95 0.95 0.929 0.953 i.e. the MC model is too rough when tx – fx = ty – fy.

The MH model pays equal attention both to the difference of two true-membership degrees and to the difference of two false- From Table 1, we can see that the similarities between A and B membership degrees, between two vague values. Pairs of vague (and between C and D) are all the same by using the Mc, MH, ML values, which have both the same difference of true-membership and MO models. If only a candidate can be selected and renuncia- degrees and the same difference of false-membership degrees, tion is considered, D is most possible to be selected. The possi- have the same similarity. But it does not distinguish the positive bility of selecting C is smaller than that of selecting D. Selecting difference and negative difference between true- and false- A or B has rather low possibility. Intuitively, it should be easier membership degrees. to say that A and B are similar than C and D, because A and B are all impossible options, D might be selected, and C might not be The ML model inherits the advantages of the MC and MH mod- els, paying equal attentions to the support of vague value, true- selected. Among A, B, C, and D, we would be most concerned membership degree, and false-membership degree, respectively. with the similarity between C and D. We need to enlarge the But it uses absolute values, and hence increases the possibility of difference of similarities where we are concerned. similarity coincidence. For example, it cannot distinguish be- For another example, assume that there are other four candi- tween pair of ([0.4, 0.8], [0.5, 0.7]) and pair of ([0.4, 0.8], [0.5, dates E, F, G, H, and ten voters. One voter supports E, nine op-

IEEE Intelligent Informatics Bulletin November 2005 Vol.6 No.2 16 Feature Article: Jingli Lu, Xiaowei Yan, Dingrong Yuan and Zhangyan Xu pose E; one supports F, eight oppose F; one supports G, two when the support is large and the opposition is small. oppose G; one supports H, and one opposes H. The voting results From the above definition, we obtain the following properties. of E, F, G, and H can also be viewed as four vague values, E[0.1, Property 1: M'(x, y) ∈ [0, 1].

0.1], F[0.1, 0.2], G[0.1, 0.8], and H[0.1, 0.9]. Now we compare Proof: Since tx ∈ [0, 1], ty ∈ [0, 1], fx ∈ [0, 1], fy ∈ [0, 1], we the similarities between E and F, and between G and H. By for- have mulae (1), (2), (3) and (4), we obtain the similarities as shown in |tx – ty| ∈ [0, 1], |fx – fy| ∈ [0, 1], |(tx – ty) – (fx – fy) | ∈ [0, 2] Table 2. (t + t ) ⋅ 0 + (2 − f − f ) ⋅ 0 + 0 M ′ ≤ 1 − x y x y = 1 TABLE 2. (t x + t y ) + (2 − f x − f y ) + 2 AN EXAMPLE OF SIMILARITY CALCULATION x y Mc MH ML MO M’ (t + t ) + (2 − f − f ) + 2 E,F [0.1,0.1] [0.1, 0.2] 0.95 0.95 0.95 0.929 0.948 M ′ 1 x y x y 0 ■ G,H [0.1, 0.8] [0.1, 0.9] 0.95 0.95 0.95 0.929 0.931 ≥ − = (t x + t y ) + (2 − f x − f y ) + 2

We definitely know only the minority of voters support the Property 2: M'(x, y) = M'(y, x). candidates E, F, G or H, and the majority of voters oppose E or F, It is obtained directly from the definition of the M' model. whereas we have little information about G or H because there Property 3: M'(x, y) = 0 ⇔ x = [0, 0] and y = [1, 1]; or x = [1, are so many abstainers. Intuitively compared with E and F, G 1] and y = [0, 0]. and H should have less similarity. But from the Table 2, we can Proof: For x = [0, 0] and y = [1, 1] (or for x = [1, 1] and y = [0, see the similarities between E and F (and between G and H) are 0]), by the definition, we obviously have M'(x, y) = 0; and all the same by using the Mc, MH, ML and MO models. If M'(x, y) = 0, we have tx – ty = 1 and fx – fy = – 1; Then we need a new similarity measure which can magnify or tx – ty = – 1, fx – fy = 1 what we are concerned. That is, if tx – ty is equal to fx – fy, the Hence, x = [0, 0] and y = [1, 1]; similarities can still be different. For example, the larger the or x = [1, 1], y = [0, 0] ■ support (tx + ty) is, the smaller the similarity should be. Analogi- Property 4: M'(x, y) = 0 ⇔ x = y. cally, the smaller the opposition (fx + fy) is, the smaller the simi- Proof: If x = y, from the definition, it is clear that larity should be. M'(x, y) = 1. Based on the above discussion, we propose a weight-varied If M'(x, y) = 1 ⇒ tx – ty = 0, fx – fy = 0, that is, x = y. ■ similarity measure M', i.e. Example 1: In table 3, seven groups of vague values (x, y) are given. Intuitively, the similarity of the first pair vague values (t x + t y ) | t x − t y | M ′ = 1− should be larger than the second pair, namely M(x1, y1) > M(x2, (t x + t y ) + (2 − f x − f y ) + 2 y2). And experientially M(x4, y4) < M(x5, y5); M(x6, y6) <

(2 − f x − f y ) | f x − f y | M(x7, y7). Consider the 7 groups of data pairs (x, y) in the sec- − (5) (t + t ) + (2 − f − f ) + 2 ond and third rows of Table 2. We compare our measure method x y x y with others. The results are shown in 4th — 8th rows of Table 3. | (t x − t y ) − ( f x − f y ) | TABLE 3. − COMPARISONS OF VARIOUS SIMILARITY MEASURES (t x + t y ) + (2 − f x − f y ) + 2 x y MC MH ML MO M’

Coefficient (tx + ty) of |tx – ty| implies that when |tx – ty| is the 1 [0.3, [0.4, 1 0.9 0.9 0.9 0.95 same, similarity should be smaller if (tx + ty) is larger. Coefficient 0.7] 0.6] 5 2 [0.3, [0.4, 0.9 0.9 0.9 0.9 0.9 (2 – fx – fy) of |fx – fy| means that when |fx – fy| is the same, similar- 0.6] 0.7] ity should be smaller if (fx + fy) is smaller. 3 [0.3, [0.4, 1 0.9 0.9 0.9 0.94 Let tx + ty = p, tx – ty = q, fx + fy = m, fx – fy = n. Then, formula (5) can be reduced as 0.8] 0.7] 5 8 4 [1, 1] [0, 1] 0.5 0.5 0.5 0.3 0.6 | pq | + | (2 − m)n | + | q − n | 5 [0.5, [0, 1] 1 0.5 0.7 0.5 0.75 M ′ = 1− (6) p + (2 − m) + 2 0.5] 5 6 [0.4, [0.5, 1 0.9 0.9 0.9 0.94 Where the M' model pays attention to both the difference of 0.8] 0.7] 5 5 true-membership degrees and the difference of false-membership 7 [0.4, [0.5, 0.9 0.9 0.9 0.92 0.95 degrees between vague values. It also implies the attention to the 0.8] 0.8] 5 5 5 9 8 support of vague value. Because of the introduction of |(tx – ty) – (fx – fy)|, The M' model can distinguish positive difference and From the Table 3, we can see sometimes the similarities negative difference. The strategy of varied-weight leads to the gained by formulae of MC, MH, ML and MO are counterintuitive. reduced possibility of similarity coincidence, and the weights For example, M C (x1, y1 ) = M C ([0.3,0.7],[0.4,0.6]) = 1 , ap- meet the requirement that we are concerned with those similari- parently we know the similarity of [0.3,0.7] and [0.4,0.6] is ties where supports are high and oppositions are low. For the absolutely not 1. sake of comparison, we enlarge the difference of similarities

November 2005 Vol.6 No.2 IEEE Intelligent Informatics Bulletin Feature Article: A New Similarity Measure for Vague Sets 17

Another example, compare the similarity of the first group fx(ui) – fy(ui) = n(ui) data pair ([0.3, 0.7], [0.4, 0.6]) with the second group data pair From the above definition, we have the following properties. ([0.3, 0.6], [0.4, 0.7]) using the several different similarity meas- Property 5: T '(A, B) ∈ [0, 1]. ures. We get Property 6: T '(A, B) = T '(B, A). M (x , y ) = M (x , y ) , M (x , y ) = M (x , y ) , H 1 1 H 2 2 O 1 1 O 2 2 Property 7: whereas intuitively the first data pair should be more similar than n n T ′(A, B) = 0 ⇔ A = [0,0] / u , B = [1,1] / u the second data pair, namely M (x1, y1 ) > M (x2 , y2 ) , but only ∑ i ∑ i i=1 i=1 M L and M ' can be accordant with our intuition — n n or A = [1,1] / ui , B = [0,0] / ui M (x1, y1 ) > M (x2 , y2 ) . ∑ ∑ i=1 i=1 Then compare the similarity of the sixth group data pair ([0.4, Property 8: T '(A, B) = 1 ⇔ A = B. 0.8], [0.5, 0.7]) with the seventh group data pair ([0.4, 0.8], [0.5, Example 2: Let A and B be two vague sets over the discourse 0.8]). Intuitively, the similarity of the sixth group and the seventh universe U = {u1, u2, u3, u4}, where group should satisfy M (x , y ) < M (x , y ) , but from the 6 6 7 7 A = [0.3, 0.7] / u1 + [0.5, 0.5] / u2 + [0.4, 0.8] / u3 + [1.0, 1.0] / u4

above result we can see only M H , M O , and M ' can satisfy the B = [0.4, 0.6] / u1 + [0.0, 1.0] / u2 + [0.5, 0.7] / u3 + [0.0, 1.0] / u4 limitation. From formula (7), we have the following similarity between A To sum up, none but M ' can distinguish those groups vague and B. 4 values, to some extend, according with our intuition. 1 T ′(A, B) = ∑ M '(VA(ui ),VB(ui )) In table 3, we give more comparison of different similarity 4 i=1 measures. = [(1− 0.05) + (1− 0.25) + (1− 0.055) + (1− 0.4)] / 4 In the new similarity measure M ' , the three factors are con- = 0.811 sidered equally which affect the similarity of vague values: true-

membership function tx, false-membership function fx, and 1 – tx C. Weighted Similarity Measure between Vague Sets – fx. The weights of |tx – ty|, | fx – fy| and |(ty + fy) – (tx + fx)| in new Suppose that A and B are two vague sets over the discourse

similarity measure M ' are variable, and the variable weights universe U = {u1, u2, ..., un}, wi is the weight of ui, wi ∈ [0, 1], 1 reduce the possibility of similarity coincidence. Simultaneously, ≤ i ≤ n. Then, the weighted similarity between A and B can be the new similarity measure M ' enlarges the difference of simi- obtained by calculating the following W(A, B). larities where we are concerned then it is easier for us to do some n n decision. W ′(A, B) = ∑ wi M ′(V A (ui ),VB (ui )) ∑ wi i=1 i=1 B. Similarity measure between Vague Sets n ⎛ | p(u )q(u ) | + | (2 − m(u ))n(u ) | + | q(u ) − n(u ) | ⎞ n = w ⎜1− i i i i i i ⎟ w Assume that A and B are two vague sets over the discourse ∑ i ⎜ ⎟ ∑ i i=1 ⎝ p(ui ) + (2 − m(ui )) + 2 ⎠ i=1 universe U = {u , u , ..., u }. V (u ) = [t (u ), 1 – f (u )] is the 1 2 n A i A i A i membership value of u in vague set A, and V (u ) = [t (u ), 1 – i B i B i (8) f (u )] is the membership value of u in vague set B. Let B i i Example 3: Let A and B be the same as that in Example 2, the

n weights of elements u1, u2, u3, and u4 in discourse universe U are A = ∑[t A (ui ),1− f A (ui )] / ui 0.4, 0.2, 0.8, and 0.6, respectively. From (8), we have the i=1 weighted similarity between A and B 0.4(1− 0.05) + 0.2(1− 0.25) + 0.8(1− 0.055) + 0.6(1− 0.4) n W ′(A, B) = B = ∑[t B (ui ),1− f B (ui )] / ui 0.4 + 0.2 + 0.8 + 0.6 i=1 = (0.38 + 0.15 + 0.756 + 0.36) / 2.0 Then, the similarity between vague sets A and B can be ob- = 0.823 tained by the following function T '. 1 n T ′(A, B) = ∑ M '(VA(ui),VB(ui)) n IV. CONCLUSIONS i=1 n 1 | p(ui )q(ui ) | + | (2 − m(ui ))n(ui ) | + | q(ui ) − n(ui ) | After analyzing the limitations in current similarity measures = ∑ (1− ) for vague sets, we have proposed a new method for measuring n i=1 p(ui ) + (2 − m(ui )) + 2 the similarity between vague sets in this paper. The basic idea is (7) to deeply understand the support, the difference of true- where, membership and the difference of false-membership, to signifi-

tx(ui) + ty(ui) = p(ui) cantly distinguish the directions of difference (positive and nega- tive), and properly use varied-weights in the differences of true- tx(ui) – ty(ui) = q(ui) and false-membership, for two vague sets. The examples have illustrated that our approach is effective and practical, and pre- fx(ui) + fy(ui) = m(ui) sents much better discernibility than existing ones at measuring

IEEE Intelligent Informatics Bulletin November 2005 Vol.6 No.2 18 Feature Article: Jingli Lu, Xiaowei Yan, Dingrong Yuan and Zhangyan Xu the similarity between vague sets.

REFERENCES [1] Zadeh, L A. Fuzzy sets. Information and Control, 1965, 8: 338-356. [2] Gau, W. L. and Buehrer, D. J. Vague sets. IEEE Transactions on Systems, Man and Cybernetics (Part B), 1993, 23(2): 610-614. [3] Chen, S. M. Measures of similarity between vague sets. Fuzzy Sets and Systems, 1995, 74(2): 217-223. [4] Chen, S. M. Similarity measures between vague sets and between elements. IEEE Transactions on Systems, Man and Cybernetics (Part B), 1997, 27(1): 153-158. [5] Hong, D. H. and Kim, C. A note on similarity measures between vague sets and elements. Information Sciences, 1999, 115: 83-96. [6] Li, F. and Xu, Z. Y. Similarity measures between vague sets. Chinese Jour- nal of Software, 2001, 12(6): 922-927. [7] Li, Y. H., Chi, Z. X., and Yan, D. Q. Vague similarity and vague entropy. Computer Science (Chinese journal), 2002, 29(12): 129-132. [8] Li, F. and Xu, Z. Y. Vague sets. Computer Science (Chinese journal), 2000, 27(9): 12-14.

November 2005 Vol.6 No.2 IEEE Intelligent Informatics Bulletin