A Decision-Theoretic Rough Set Model
Total Page:16
File Type:pdf, Size:1020Kb
A Decision-Theoretic Rough Set Model Yiyu Yao and Jingtao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada S4S 0A2 fyyao,[email protected] Special Thanks to Professor G¨unther Ruhe and Dr. Jingzhou Li for providing this opportunity. 1 Introduction to Rough Sets ² Rough sets is a new mathematical theory for dealing with vagueness and uncertainty. ² The theory is different from, and complementary to, fuzzy sets. It is a generalization of standard set theory. ² The theory is a concrete model of granular computing (GrC). ² The theory has been successfully applied in many fields. For example, machine learning, data mining, data analy- sis, medicine, cognitive science, expert systems, and many more. 2 Introduction to Rough Sets ² Basic assumption: Objects are defined, represented, or characterized based on a finite number of attributes or properties. ² Implications: We cannot distinguish some objects. We can only observe, measure, or define a certain set of objects as a whole rather than many individuals. Only some subsets in the power set can be measured or defined. ² Type of uncertainty: The uncertainty comes from our in- ability to distinguish certain objects. 3 Introduction to Rough Sets ² Basic problem: How to represent undefinable subsets based on definable subsets? ² Solution: An undefinable subsets are approximately represented by two definable subsets, called lower and upper approxima- tions. 4 ² A motivating example: information table Object Height Hair Eyes Classification o1 short blond blue + o2 short blond brown - o3 tall red blue + o4 tall dark blue - o5 tall dark blue - o6 tall blond blue + o7 tall dark brown - o8 short blond brown - 5 Introduction to Rough Sets ² Objects are described by three attributes: Height, Hair, and Eyes. ² If only the attribute Height is used, we obtain a partition: ffo1; o2; o8g; fo3; o4; o5; o6; o7gg: Based on the Height, we cannot distinguish objects o1, o2 and o8. They represent the set of short people. ² If two attributes Height and Hair are used, we have the partition: ffo1; o2; o8g; fo3g; fo4; o5; o7g; fo6gg: 6 Introduction to Rough Sets ² For Height and Hair, the set of all definable subsets are: the empty set ;. the entire set U = fo1; : : : ; o8g. the union of some blocks in the partition ffo1; o2; o8g; fo3g; fo4; o5; o7g; fo6gg. For example, fo1; o2; o3; o8g is a definable set. ² The set + = fo1; o3; o6g is a undefinable subset. ² + can be approximated by two definable subsets from be- low and above: fo3g ⊆ fo1; o3; o6g ⊆ fo1; o2; o3; o6; o8g: 7 Significance of Rough Set Theory ² It provides a formal theory for dealing with a particular type of uncertainty induced by indistinguishability. ² It precisely defines the notion of what is definable and what is undefinable. ² It presents a philosophy of representing what is undefin- able (unknown) based on what is definable (known). 8 Rough Set Theory: Formal Development ² Let U be a finite set called universe. ² Let E be an equivalence relation on U, that is, E is re- flexive, symmetric and transitive. ² Let U=E be the partition induced by the equivalence re- lation. ² Let [x]E denote the equivalence class contain x. ² The pair apr = (U; E) is called an approximation space. ² Let Def(U) be the family of all definable subsets. 9 Rough Set Theory: Granule based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: [ apr(X) = f[x]E j [x]E ⊆ Xg; [ apr(X) = f[x]E j [x]E \ X 6= ;g: ² The lower approximation apr(X) is the union of equiva- lence classes that are subsets of X. ² The upper approximation apr(X) is the union of equiva- lence classes that have non-empty overlap with X. ² Granule based definition provides a model for granular computing. 10 Rough Set Theory: Element based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: apr(X) = fx j 8y 2 U[xEy =) y 2 X]g; apr(X) = fx j 9y 2 U[xEy ^ y 2 X]g: ² An element x belongs to the lower approximation apr(X) if all its equivalent elements belong to X. ² An element x belongs to the upper approximation apr(X) if at least one of its equivalent elements belongs to X. ² Element based definition relates rough set theory to modal logic. 11 Rough Set Theory: Sub-system based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: apr(X) = [fY j Y 2 Def(U) ^ Y ⊆ Xg; apr(X) = \fY j Y 2 Def(U) ^ X ⊆ Y g: ² The lower approximation apr(X) is the largest definable set contained in X. ² The upper approximation apr(X) is the smallest defin- able set containing X. ² It is related to topological space, closure systems, and other mathematical systems, as well as belief functions. 12 Rough Set Theory: Algebraic systems ² The pair of approximations can be viewed as a pair of dual unary set-theoretic operators called approximation operators. ² The system (2U ; :; apr; apr; \; [) is an extension of stan- dard set algebra (2U ; :; \; [) with two added unary oper- ators. It is called a rough set algebra. ² The rough set algebra is an example of Boolean algebras with added operators. 13 Rough Set Theory: Rough Classification ² Based on the lower and upper approximations, the uni- verse U can be divided into three disjoint regions, the posi- tive region POS(X), the negative region NEG(X), and the boundary region BND(X): POS(X) = apr(X); NEG(X) = U ¡ apr(X); BND(X) = apr(X) ¡ apr(X): ² The boundary region consists of objects that cannot be classified without uncertainty, due to our inability to dif- ferentiate some different objects. 14 Rough Classification: Another formulation ² For X ⊆ U, its rough membership function is defined by: j[x]E \ Xj ¹X(x) = = P (X j [x]E): j[x]Ej ² The three regions are then defined by: POS(X) = fx 2 U j ¹X(x) = 1g; NEG(X) = fx 2 U j ¹X(x) = 0g; BND(X) = fx 2 U j 0 < ¹X(x) < 1g: ² Obviously, they use extreme values of ¹X, i.e., 0 and 1. 15 Rough Classification: Observations ² All elements with non-zero and non-full membership val- ues will be classified into boundary region. ² The quantitative information given by the conditional prob- ability P (X j [x]E) is not taken into consideration. ² In practice, a not so rigid classification may be more useful. An object may be classified into the possible region if the conditional probability is sufficiently large. Likewise, an object may be classified into the negative region if the conditional probability is sufficiently small. 16 Decision-Theoretic Rough Sets: ² Basic question: How do we determine the threshold values for deciding the three regions? ² Answer: Use the Bayesian decision procedure. 17 Bayesian decision procedure ² Let Ω = fw1; : : : ; wsg be a finite set of s states. ² Let A = fa1; : : : ; amg be a finite set of m possible actions. ² Let P (wjjx) be the conditional probability of an object x being in state wj given that the object is described by x. ² Let ¸(aijwj) denote the loss, or cost, for taking action ai when the state is wj. 18 Bayesian decision procedure ² For an object with description x, suppose action ai is taken. ² Since P (wjjx) is the probability that the true state is wj given x, the expected loss associated with taking action ai is given by: Xs R(aijx) = ¸(aijwj)P (wjjx): j=1 The quantity R(aijx) is also called the conditional risk. ² Given description x, a decision rule is a function ¿(x) that specifies which action to take. 19 ² The overall risk is defined by: R = X R(¿(x)jx)P (x); x where the summation is over the set of all possible descrip- tions of objects. ² Bayesian decision procedure: For every x, compute the conditional risk R(aijx) for i = 1; : : : ; m and then select the action for which the conditional risk is minimum. If more than one action minimizes R(aijx), any tie-breaking rule can be used. 20 A Simple Example ² Set of states: s0 – the meeting will be over in less than 2 hours, s1 – the meeting will be more than 2 hours. ² P (s0) = 0:8;P (s1) = 0:2. ² Set of actions: a0 – put the car on meter (pay $2.00), a1 – put the car in the parking lot (pay $7.00). ² Loss function: ¸(a0 j s0) = u($2:00) = 2; ¸(a1 j s0) = u($7:00) = 10; ¸(a0 j s1) = u($2:00 + $15:00) = 60; ¸(a1 j s1) = u($7:00) = 10: 21 ² The expected cost of actions a0 and a1: R(a0) = ¸(a0 j s0)P (s0) + ¸(a0 j s1)P (s1) = 2 ¤ 0:8 + 60 ¤ 0:2 = 13:6; R(a1) = ¸(a1 j s0)P (s0) + ¸(a1 j s1)P (s1) = 10 ¤ 0:8 + 10 ¤ 0:2 = 10:0: ² Choose action a1. 22 Decision-Theoretic Rough Sets ² The set of states: Ω = fX; :Xg. ² The set of actions: A = fa1; a2; a3g, deciding POS(A), deciding NEG(A), and deciding BND(A), respectively. ² Description of x:[x]E. ² Conditional probability: P (X j [x]E) and P (:X j [x]E) = 1 ¡ P (X j [x]E). ² Loss function: ¸i1 = ¸(aijX), ¸i2 = ¸(aij:X), and i = 1; 2; 3. 23 ² expected loss R(aij[x]E) associated with taking the indi- vidual actions can be expressed as: R(a1j[x]E) = ¸11P (Xj[x]E) + ¸12P (:Xj[x]E); R(a2j[x]E) = ¸21P (Xj[x]E) + ¸22P (:Xj[x]E); R(a3j[x]E) = ¸31P (Xj[x]E) + ¸32P (:Xj[x]E): 24 ² The Bayesian decision procedure leads to the following minimum-risk decision rules: (P) If R(a1j[x]E) · R(a2j[x]E) and R(a1j[x]E) · R(a3j[x]E); decide POS(X); (N) If R(a2j[x]E) · R(a1j[x]E) and R(a2j[x]E) · R(a3j[x]E); decide NEG(X); (B) If R(a3j[x]E) · R(a1j[x]E) and R(a3j[x]E) · R(a2j[x]E); decide BND(X): ² Based on P (Aj[x]E) + P (:Aj[x]E) = 1, the decision rules can be simplified by using only the probabilities P (Xj[x]E).