http://www.paper.edu.cn

A Vague Relational Model and Algebra 1

Faxin Zhao, Z.M. Ma*, and Li Yan School of Information Science and Engineering, Northeastern University Shenyang, P.R. China 110004 [email protected]

Abstract

Imprecision and uncertainty in data values are pervasive in real-world environments and have received much attention in the literature. Several methods have been proposed for incorporating uncertain data into relational databases. However, the current approaches have many shortcomings and have not established an acceptable extension of the relational model. In this paper, we propose a consistent extension of the relational model to represent and deal with fuzzy information by means of vague sets. We present a revised relational structure and extend the relational algebra. The extended algebra is shown to be closed and reducible to the fuzzy relational algebra and further to the conventional relational algebra.

Keywords: Vague , , Relational data model, Algebra

1 Introduction

Over the last 30 years, relational databases have gained widespread popularity and acceptance in information systems. Unfortunately, the relational model does not have a comprehensive way to handle imprecise and uncertain data. Such data, however, exist everywhere in the real world. Consequently, the relational model cannot represent the inherently imprecise and uncertain nature of the data. The need for an extension of the relational model so that imprecise and uncertain data can be supported has been identified in the literature. Fuzzy has been introduced by Zadeh [1] to handle vaguely specified data values by generalizing the notion of membership in a set. And fuzzy information has been extensively investigated in the context of the relational model. Based on various fuzzy relational database models, many studies have been done for data integrity constraints [2, 3]. Also there have been research studies on fuzzy query languages [4] and fuzzy relational algebra [5]. In [4], an existing query language, namely SQL, for fuzzy queries was extended and some fuzzy aggregation operators were developed. In fuzzy set theory, for a given fuzzy set F, each object u∈U is assigned a single value, called the grade of membership, between zero and one (here U is a universe of discourse). In [6], the concept of vague sets was introduced, which is a generalized version of fuzzy sets. Gau and Buehrer pointed out that the drawback of using the single membership value in fuzzy set theory is that the evidence for u∈U and the evidence against u∈U are in fact mined together. They also pointed out that the single number reveals nothing about its accuracy. In order to tackle this problem, they proposed the notion of vague set. A vague set, as well as an Intuitionistic Fuzzy Set (IFS) [7], is a further generalization of a fuzzy set. But vague set is not isomorphic to the IFS, there are some interesting features for handling vague data that are unique to vague sets, such as vague sets allow for a more intuitive graphical representation of vague data, which facilitates significantly better analysis in data relationships, incompleteness, and similarity

1 Support by the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.20050145024.

1 http://www.paper.edu.cn measures [8]. In [8], the notion of vague sets was initially incorporated into relations. In this paper, we propose an extension of the relational model−the vague relational database (VRDB) model. Based on the proposed model and the semantic measure of vague sets, we define vague relational operations in the relational algebra, which can be used to query and update vague relational databases. The remainder of this paper is organized as follows. Section 2 presents some basic knowledge about the vague set theory. Based on the theory, we propose a vague relation model in Section 3. Vague data redundancies and removal are investigated in Section 4. Section 5 defines vague relational operations. Section 6 concludes the paper.

2 Basic Knowledge

Let U be a universe of discourse, where an of U is denoted by u.

Definition 2.1. A vague set V in U is characterized by a -membership tV and a false-membership function fV. Here tV (u) is a lower bound on the grade of membership of u derived from the evidence for u, and fV (u) is a lower bound on the negation of u derived from the evidence against u. tV (u) and fV (u) are both associated with a real number in the [0, 1] with each element in U, where tV (u) + fV (u) ≤ 1. That is tV: U → [0, 1] and fV: U → [0, 1]

Suppose U = {u1, u2, …, un}. A vague set V of the universe of discourse U can be represented by

n V = ∑[t v (ui ),1− fv (ui )]/ui ,∀ui ∈U , where tV (u) ≤ µV (u) ≤ 1 - fV (u) and 1≤ i ≤ n. i=1

This approach bounds the grade of membership of u to a subinterval [tV (u), 1 - fV (u)] of [0, 1]. In other words, the exact grade of membership µV (u) of u may be unknown, but is bounded by tV (u) ≤ µV (u) ≤ 1 - fV (u), where tV (u) + fV (u) ≤ 1.

For a vague value [tV (u), 1 - fV (u)]/u, the vague value to the object u is the interval [tV (u), 1 - fV (u)]. For example, if [tV (u), 1 - fV (u)] = [0.5, 0.8], then we can say that tV (u) = 0.5, 1 − fV (u) = 0.8 and fV (u) = 0.2. It can be interpreted as “the degree that object u belongs to the vague set V is 0.5; the degree that the object u does not belong to the vague set V is 0.2”. In a voting process, the vague value [0.5,0.8] can be interpreted as “the vote for resolution is 5 in favor,2 against, and 3 neutral (abstentious).”

The precision of the knowledge about u is characterized by the difference (1 - tV (u) - fV (u)). If this is small, the knowledge about u is relatively precise. But if it is large, we know correspondingly little. If tV (u) is equal to (1 - fV (u)), the knowledge about u is exact, and the vague set theory reverts back to fuzzy set theory. If tV (u) and (1 - fV (u)) are both equal to 1 or 0, depending on whether u does or does not belong to V, the knowledge about u is very exact and the theory reverts back to ordinary sets. In the following, we present several special vague sets and operations on vague sets.

Definition 2.2. A vague set V is an empty vague set if and only if its truth-membership function tV = 0 and false-membership function fV = 1 for all u on U. We use ∅ to denote it.

Definition 2.3. The of a vague set V, denoted V’, is defined by tV’ (u) = fV (u) and 1 - fV’ (u) = 1 - tV (u).

Definition 2.4. A vague set A is contained in another vague set B, written as A ⊆ B, if and only if tA ≤ tB and 1 - fA ≤ 1 - fB. Definition 2.5. Two vague sets A and B are equal, written as A = B, if and only if A ⊆ B and B ⊆ A, namely, tA = tB, and 1 - fA = 1 - fB. Definition 2.6. () The union of two vague sets A and B is a vague set C, written as A ∪ B, whose truth-membership and false-membership functions are related to those of A and B by tC = max (tA, tB) and 1 - fC = max (1 - fA, 1 - fB) = 1 – min (fA, fB).

2 http://www.paper.edu.cn

Definition 2.7. (Intersection) The intersection of two vague sets A and B is a vague set C, written as C = A ∩ B, whose truth-membership and false-membership functions are related to those of A and B by tC = min (tA, tB) and 1 - fC = min (1 - fA, 1 - fB) = 1 – max (fA, fB).

3 Vague Relational Model

In order to incorporate fuzzy information into relational databases, various attempts towards enhancing the relational database model by fuzzy extensions can be found in the literature [5, 9]. In this section, we describe an approach of enhancing the relational model by means of vague set theory, which results in the vague relational database (VRDB) model. To this purpose, we should extend some attribute domains as a set of vague sets and a vague relational instance is hereby defined as a subset of the of such attribute domains.

Definition 3.1. Let Ai for i from 1 to n be attributes defined on the universes of discourse sets Ui, respectively. Then a vague relation r is defined on the relation schema R (A1, A2, …, An) as a subset of the Cartesian product of a collection of vague subsets: … r ⊆ V (U1) × V (U2) × × V (Un), where V (Ui) denotes the collection of all vague subsets on a universe of discourse Ui.

Each t of r consists of a Cartesian product of vague subsets on the respective Ui’s, i.e., t [Ai] = π (Ai) where π (Ai) is a vague subset of the attribute Ai defined on Ui for all i. It should be noticed that the vague relations can be considered as an extension of classical relations (all vague values are [1, 1]) and possibilistic relations (all vague values are possibility distributions, i.e. such that each degree is [a, a], 0 ≤ a ≤ 1), which can capture more information about vagueness.

4 Vague Data Redundancies and Removal

4.1 Similarity Measure of Vague Data

There have been some studies which discuss the topic concerning how to measure the degree of similarity between vague sets [10-12]. In [12], it was pointed out that the similarity measures proposed in [10, 11] did not fit well in some cases. They proposed a set of modified measures which turned out to be more reasonable in more general cases than the Chen’s.

Definition 4.1. Let x and y be two vague values such that x = [tx, 1 - fx], y = [ty, 1 - fy], where, 0 ≤ tx ≤ 1 - fx ≤ 1, and 0 ≤ ty ≤ 1 - fy ≤ 1. Let SE (x, y) denote the similarity measure between x and y. Then:

(t x − t y ) + ( f x − f y ) SE ( x, y ) = 1 − 2

Definition 4.2. Let U = {u1, u2, …, un} be the universe of discourse. Let A and B be two vague sets on U, where:

n A = ∑[t A(ui ),1− f A (ui )]/ui ,∀ui ∈U , where tA (u) ≤ µA(u) ≤ 1 - fA(u) and 1≤ i ≤ n. i=1

n B = ∑[t B (ui ),1− f B (ui )]/ui ,∀ui ∈U , where tB (u) ≤ µB(u) ≤ 1 - fB(u) and 1≤ i ≤ n. i=1 Let SE (A, B) denote the similarity measure between A and B. Then:

1 n SE(A, B) = SE([t (u ),1− f (u )],[t (u ),1− f (u )]) n ∑ A i A i B i B i i=1 1 n (t (u ) − t (u )) + ( f (u ) − f (u )) = ∑(1− A i B i A i B i n i=1 2 The notion of similarity measure between the attribute values represented by vague sets can be extended to the similarity measure between two . Let ti = and tj =

3 http://www.paper.edu.cn be two vague tuples in vague relational instance r over schema R (A1, A2, …, An). Let SE (ti, tj) denote the similarity measure between ti and tj. Then

SE (ti, tj) = min {SE (ti [A1], tj [A1]), SE (ti [A2], tj [A2]), …, SE (ti [An], tj [An])}

4.2 Data Redundancies

Following the similarity measure between vague sets, equivalence redundancy can be evaluated. Being different from the classical set theory, the condition of SE (A, B) = 1 is essentially the special case of vague sets due to the vagueness of the data. In general, a threshold which is should be considered when evaluating the similarity measure between two vague sets. Definition 4.3. Let A and B be two vague sets and α be a threshold. If SE (A, B) ≥ α, it is said that A and B are equivalently α-redundant. If A and B are equivalently redundant, the removal of redundancy can be achieved by merging A and B and producing a new vague set C. Assume that the vague sets A and B are equivalently α-redundant. The elimination of duplicate could be achieved by merging A and B and producing a new vague data C, where A, B, and C are three vague sets on U = {u1, u2, …, un}. Then the following three merging operations are defined: (a) C = A ∪ B = {[t (w),1− f (w)]/ w | (∃ ([t (u ),1− f (u )]/u ))(∃ ([t (u ),1− f (u )]/u ))(t (w) = max(t (u ),t (u )) ∧ V C C A i A i i B i B i i C A i B i

1 − f C (w) = max( 1 − f A (u i ),1 − f B (u i ))) ∧ (w = u i ) ∧ u i ∈ U ∧ 1 ≤ i ≤ n}

(b) C = A ∩V B = {[tC (w),1− fC (w)]/ w | (∃([tA(ui ),1− f A(ui )]/ui ))(∃([tB (ui ),1− fB (ui )]/ui ))(tC (w) = min(tA(ui ),tB (ui )) ∧

1− fC (w) = min(1− f A (ui ),1− fB (ui ))) ∧ (w = ui ) ∧ ui ∈U ∧1 ≤ i ≤ n}

(c) C = A −V B = {[tC (w),1− fC (w)]/ w | (∃([tA(ui ),1− f A(ui )]/ui ))(∃([tB (ui ),1− fB (ui )]/ui )) tC (w) = max(0,tA (ui ) − (1− fB (ui ))) ∨1− fC (w) = max(0,1− f A (ui ) − tB (ui )) ∧ ui ∈U ∧1 ≤ i ≤ n}

Example 4.1. Let vague sets A = [0.2, 0.3]/u1 + [0.3, 0.7]/u2 + [0.1, 0.3]/u3 + [0.8, 0.8]/u5 and B = [0.3, 0.5]/u1 + [0.2, 0.6]/u2 + [0.4, 0.5]/u4 + [0.8, 0.8]/u5 SE (A, B) = (0.85 + 0.9 + 0.8 + 0.55 + 1)/5 = 0.82 If a threshold α = 0.8 is given, A and B are considered equivalently α-redundant. Utilizing the merging operations defined above, one has the following results:

A ∪V B = [0.3, 0.5]/u1 + [0.3, 0.7]/u2 + [0.1, 0.3]/u3 + [0.4, 0.5]/u4 + [0.8, 0.8]/u5

A ∩V B = [0.2, 0.3]/u1 + [0.2, 0.6]/u2 + [0.8, 0.8]/u5

A −V B = [0, 0.5]/u2 + [0.1, 0.3]/u3 The processing of vague set redundancy can be extended to that of vague tuple redundancy.

Definition 4.4. Let r be a vague relation on the relational schema R (A1, A2, …, An). Let t = <π (A1), π (A2), …, π (An)> and t’ = <π’ (A1), π’ (A2), …, π’ (An)> be two vague tuples in r. let α ∈ [0, 1] be a threshold. The vague tuples t and t’ are equivalently α-redundant if and only if min (SE (t [Ai], t’ [Ai]) ≥ α holds (1 ≤ i ≤ n).

5 Vague Relational Operations

5.1 Vague Relational Algebra

Union. Let r and s be two union-compatible vague relations on the schema R (A1, A2, …, An). Let α ∈ [0, 1] be a given threshold. Then the union of r and s is defined as follows. It is clear that the vague union is essentially α-union.

4 http://www.paper.edu.cn r ∪ s = {t | (∀y)(y ∈ s ∧ t ∈ r ⇒ SE(t, y) < α) ∨ (∀x)(x ∈ r ∧ t ∈ s ⇒ SE(t, x) < α) ∨ (∃x)(∃y)(x ∈ r ∧ y ∈ s

⇒ SE(x, y) ≥ α ∧ t = x ∪v y)} Table 1: Vague relation r Sex Weight

t1 Jack male [0.3,0.5]/130+[0.8,0.8]/140+[0.6,0.7]/150

t2 Mary female [0.4,0.6]/90+[0.9,0.9]/100+[0.6,0.8]/110

t3 Hans male [0.2,0.3]/130+[0.4,0.5]/150+[0.7,0.9]/170 Table 2: Vague relation s Name Sex Weight

u1 Jack male [0.8,0.8]/140+[0.6,0.7]/150

u2 Rose female [0.2,0.5]/95+[0.8,0.9]/110+[0.5,0.7]/120

u3 Hans male [0.3,0.4]/130+[0.4,0.6]/150+[0.8,0.8]/170 Let r and s be two vague relations shown in Table 1 and 2, respectively. Let the given threshold be α = 0.8 (the same as in the following examples). It is clear that tuple t2 in r and tuple u2 in s are not redundant with any other tuple. Now let us look at the similarity measure between t1 and u1 as well as t3 and u3.

SE (t1 (Weight), u1 (Weight)) = 0.87 ≥ α and

SE (t3 (Weight), u3 (Weight)) = 0.92 ≥ α Therefore,

SE (t1, u1) = min (1, 1, 0.87) = 0.87 ≥ α and

SE (t3, u3) = min (1, 1, 0.92) = 0.92 ≥ α

One can conclude that tuple t1 in r and tuple u1 in s are redundant and tuple t3 in r and tuple u3 in s are redundant. According to the of the vague union, the result relation of the union of r and s is shown in Table 3. Table 3: Vague relation r ∪ s Name Sex Weight Jack male [0.3,0.5]/130+[0.8,0.8]/140+[0.6, 0.7]/150 Mary female [0.4,0.6]/90+[0.9,0.9]/100+[0.6, 0.8]/110 Rose female [0.2,0.5]/95+[0.8,0.9]/110+[0.5,0.7]/120 Hans male [0.3,0.4]/130+[0.4,0.6]/150+[0.8,0.9]/170 Difference. Let r and s be the same as the above. Their difference is defined as follows: r − s ={t | (∀y)(y∈s ∧ t ∈r ∧ SE(t, y) <α) ∨ (∃x)(∃y)(x∈r ∧ y∈s ∧ SE(x, y) ≥α ∧ t = x −v y)} Cartesian Product. The Cartesian product of vague relations is the same as one under classical relational database. Let r and s be two vague relations on schema R and S, respectively. Then: r × s = {t[R ∪ S] | (∀x)(∀y)(x ∈ r ∧ y ∈ s ∧ t[R] = x[R] ∧ t[S] = y[S])} Selection. Let r (R) be a vague relation and P be a predicate-denoted selection condition. Under a vague relational database environment, the predicate P may be vague, denoted Pv, to implement a vague query for vague databases. In Pv, the constants and attributes may be vague, so the expressions may also be vague. So the “θ” can be vague comparison operations f α , p α ,fα , pα , ≈α , ≈/ α , in Pv, where α is a threshold. Let A and B be two vague set over U = {u1, u2, …, un }. Then

5 http://www.paper.edu.cn

(a) A ≈α B if SE(A, B) ≥ α

(b) A ≈/ α B if SE(A, B) < α

(c) A f α B if A ≈/ α B and max(supp(A))>max(supp(B))

(d) A fα B if A ≈α B or A f α B

(e) A p α B if A ≈/ α B and max(supp(A))

(f) A pα B if A ≈α B or A p α B

Here supp(A)={u| u∈ U, tA(u) > 0} and supp(B)={u| u∈ U, tB(u) > 0}.

Then the selection on r for Pv is defined as follows: σ (r) = {t | t ∈ r ∧ P (t)} Pv v Projection. Let r (R) be a vague relation and attribute subset S ⊂ R. The projection of r on S is defined as follows:

Π S (r) = {t | (∀x)(x ∈ r ∧ t = ∪v (x)} The five operations above are called primitive operations in relational databases. There are three additional operations: intersection, join, and division-which can be defined by the primitive operations. Intersection. Let r and s be two union-compatible vague relations. Then vague intersection of these two relations can be defined in terms of vague difference operation as r ∩ s = r – (r – s).

Join. Let r (R) and s (S) be any two vague relations. Pv is a conditional predicate in the form of A θ B, where θ ∈{f α ,pα ,fα ,pα ,≈α ,≈/ α } , A ∈ R and B ∈ S. Then vague join of these two relations can be defined in terms of vague selection operation as

r s = σ (r × s) . >< Pv Pv

When attribute A and B are identical and θ takes ≈α, vague join becomes vague natural join, denoted r >< s . Being the special case of vague join, vague natural join can be evaluated with the definition above. In the following, the definition of vague natural join is given directly. Let Q = R ∩ S. Then: r s ={t[(R − Q) ∪ S]| (∃x)(∃y)(x∈r ∧ y ∈ s ∧ SE(x[Q], y[Q]) ≥ α ∧ t[R − Q] = x[R − Q] ∧ >< t[S − Q] = y[S − Q] ∧ t[Q] = x[Q] ∩v y[Q])} Division. Division, sometimes referred to as quotient operation, is used to find out the subrelation r ÷ s of a relation r, containing subtuples of r which have for complements in r all the tuples of a relation s. In classical relational databases, the division operation is defined by: r ÷ s = {t | (∀u)(u ∈ s ∧ (t,u) ∈ r)} where u is a tuple of s and t a subtuple of r such that (t, u) is a tuple of r. Let r (R) and s (S) be two vague relations, where S ⊂ R. Let Q = R – S. Then the vague division of r and s can be defined as:

r ÷ s = ∏Q (r) − ∏Q (Q(r) × s − r)

5.2 Properties of Vague Relational Algebra

In this section, several properties of the vague relational algebra are discussed. Being similar to the conventional relational databases, the proposed vague relational algebra is sound. In other words, it is closed. It means that the results of all operations are valid relations. In detail, the result relations produced by the vague relational operations satisfy the following three criteria: (a) the attribute values must come from an appropriate attribute domain (b) there are no duplicate tuples in a relation

6 http://www.paper.edu.cn

(c) the relation must be a of tuples. Projection, division, and selection take out a part from the source relation in either column direction or row direction. Because the attribute value in source relation must belong to the appropriate attribute domain, the attribute values in these three result relations must come from the appropriate attribute domain. Union, difference, and intersection operations are conducted under union-compatible condition, which satisfies the first criterion. In join and Cartesian product, the attribute values in result relations come from two source relations, respectively, and they must be within the appropriate attribute domains. For selection, if there are no redundant tuples in ordinary relation, there are no redundant tuples in the result relations. There exist no redundant tuples in the result relations of union, difference, intersection, join, Cartesian product, division and projection. This can be ensured by the definitions of those operations because the removal of redundancies has been considered. Therefore, the second criterion is satisfied. Now let us look at the satisfactory situation of the third criterion. Let r and s be two vague relations, and let |r| and |s| denote the numbers of tuples in r and s, respectively. It is easy to see that 0 ≤ | σ (r) | ≤ |r| Pv for vague selection. This implies that when no any tuple in r satisfies the selection condition, the tuple number in the result relation is 0, and that when all tuples in r satisfies the selection condition, one obtains | σ (r) | = |r|. When part of the tuples in r satisfy the selection condition, | σ (r) | must be greater Pv Pv than 0 and less than |r|. For projection πs (r), if all tuples in r are redundant after projecting, then |πs (r)| = |r|. In the other situations, |πs (r)| must be greater than 1 and less than |r|, i.e. 1 ≤ |πs (r)| ≤ |r|. Additionally, |r ∪ s| must not be greater than |r| + |s|, |r – s|, |r ∩ s|, and |r ÷ s| must not be greater than |r|, and | r s | and |r × s| must not be greater than |r| × |s|. Since the number of tuples in the result >< Pv relation is closely related with the source relations and the source relations are finite, the result relations must be finite. In addition, vague set operations in the relational algebra have the same properties as those of classical set operations. Let r, s, and u be three union-compatible vague relations. Then:

6 Conclusion

Although relational databases enjoy a very widespread popularity in modern information systems, they lack the power to model imprecise and uncertainty in data items. In this paper, we present an extension of the relational model and an algebra that uses vague sets, a generalized version of fuzzy sets, to express imprecision and uncertainty about object properties. We discuss the basic structure of vague relations, and formally define the necessary operations. The associated relational algebra is shown to be closed. It is also a consistent extension of the conventional relational algebra and reduces to the fuzzy relational algebra and further to the conventional relational algebra.

References [1] Zadeh , L. A. (1965) Fuzzy sets. Information and Control 8(3):338-353. [2] Bosc, P. & Pivert, O. (2003) On the Impact of Regular Functional Dependencies When Moving to a Possibilistic Database Framework. Fuzzy Sets and Systems 140(1):207-227. [3] Sözat, M.I. & Yazici, A. (2001) A Complete Axiomatization for Fuzzy Functional and Multivalued Dependencies in Fuzzy Database Relations. Fuzzy Sets and Systems 117(2):161-181. [4] Bosc, P. & Pivert, O. (1995) SQLf: A Relational Database Language for Fuzzy Querying. IEEE Transactions on Fuzzy Systems 3(1):1-17. [5] Ma, Z. M. & Mili, F. (2002) Handling Fuzzy Information in Extended Possibility-Based Fuzzy Relational Databases. International Journal of Intelligent Systems 17(10):925-942. [6] Gau, W.L. & Buehrer, D.J. (1993) Vague Sets. IEEE trans. Syst., Man, Cybern. 23(2):610-614. [7] Atanassov, K. (1986) Intuitionistic fuzzy sets. Fuzzy Sets and Systems 20(1):87-96. [8] Lu, A. & Ng, W. (2005) Vague Sets or Intuitionist Fuzzy Sets for Handling Vague Data: Which One Is Better. Lecture Notes in Computer Science 3716:401- 416.

7 http://www.paper.edu.cn

[9] Buckles, B.P. & Petry, F.E. (1982) A Fuzzy Representation of Data for Relational Databases. Fuzzy Sets and Systems 7:213 - 226. [10] Chen, S.M. (1995) Measures of Similarity between Vague Sets. Fuzzy Sets and Systems, 74(2):217-223. [11] Chen, S.M. (1997) Similarity Measure between Vague Sets and between Elements. IEEE Trans. Syst. Man, Cybern. 27(1):153-158. [12] Hong, D.H. & Kim, C. (1999) A Note on Similarity Measures between Vague Sets and between Elements. Information Sciences, 115:83-96.

8