A Decision-Theoretic Rough Set Model

Yiyu Yao and Jingtao Yao Department of University of Regina Regina, Saskatchewan, Canada S4S 0A2 {yyao,jtyao}@cs.uregina.ca

Special Thanks to Professor G¨unther Ruhe and Dr. Jingzhou Li for providing this opportunity.

1 Introduction to Rough Sets

• Rough sets is a new mathematical theory for dealing with vagueness and . • The theory is different from, and complementary to, fuzzy sets. It is a generalization of standard . • The theory is a concrete model of (GrC). • The theory has been successfully applied in many fields. For example, , , data analy- sis, medicine, cognitive science, expert systems, and many more.

2 Introduction to Rough Sets

• Basic assumption: Objects are defined, represented, or characterized based on a finite number of attributes or properties. • Implications: We cannot distinguish some objects. We can only observe, measure, or define a certain set of objects as a whole rather than many individuals. Only some subsets in the power set can be measured or defined. • Type of uncertainty: The uncertainty comes from our in- ability to distinguish certain objects.

3 Introduction to Rough Sets

• Basic problem: How to represent undefinable subsets based on definable subsets? • Solution: An undefinable subsets are approximately represented by two definable subsets, called lower and upper approxima- tions.

4 • A motivating example: information table

Object Height Hair Eyes Classification o1 short blond blue + o2 short blond brown - o3 tall red blue + o4 tall dark blue - o5 tall dark blue - o6 tall blond blue + o7 tall dark brown - o8 short blond brown -

5 Introduction to Rough Sets

• Objects are described by three attributes: Height, Hair, and Eyes. • If only the attribute Height is used, we obtain a partition:

{{o1, o2, o8}, {o3, o4, o5, o6, o7}}.

Based on the Height, we cannot distinguish objects o1, o2 and o8. They represent the set of short people. • If two attributes Height and Hair are used, we have the partition:

{{o1, o2, o8}, {o3}, {o4, o5, o7}, {o6}}.

6 Introduction to Rough Sets

• For Height and Hair, the set of all definable subsets are: the empty set ∅. the entire set U = {o1, . . . , o8}. the union of some blocks in the partition {{o1, o2, o8}, {o3}, {o4, o5, o7}, {o6}}. For example, {o1, o2, o3, o8} is a definable set.

• The set + = {o1, o3, o6} is a undefinable subset. • + can be approximated by two definable subsets from be- low and above:

{o3} ⊆ {o1, o3, o6} ⊆ {o1, o2, o3, o6, o8}.

7 Significance of Rough Set Theory

• It provides a formal theory for dealing with a particular type of uncertainty induced by indistinguishability. • It precisely defines the notion of what is definable and what is undefinable. • It presents a philosophy of representing what is undefin- able (unknown) based on what is definable (known).

8 Rough Set Theory: Formal Development

• Let U be a finite set called universe. • Let E be an equivalence relation on U, that is, E is re- flexive, symmetric and transitive. • Let U/E be the partition induced by the equivalence re- lation.

• Let [x]E denote the equivalence class contain x. • The pair apr = (U, E) is called an approximation space. • Let Def(U) be the family of all definable subsets.

9 Rough Set Theory: Granule based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: [ apr(X) = {[x]E | [x]E ⊆ X}, [ apr(X) = {[x]E | [x]E ∩ X 6= ∅}. • The lower approximation apr(X) is the union of equiva- lence classes that are subsets of X. • The upper approximation apr(X) is the union of equiva- lence classes that have non-empty overlap with X. • Granule based definition provides a model for granular computing.

10 Rough Set Theory: Element based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: apr(X) = {x | ∀y ∈ U[xEy =⇒ y ∈ X]}, apr(X) = {x | ∃y ∈ U[xEy ∧ y ∈ X]}. • An element x belongs to the lower approximation apr(X) if all its equivalent elements belong to X. • An element x belongs to the upper approximation apr(X) if at least one of its equivalent elements belongs to X. • Element based definition relates rough set theory to modal logic.

11 Rough Set Theory: Sub-system based definition For any subset X ⊆ U, a pair of lower and upper approxi- mations is defined by: apr(X) = [{Y | Y ∈ Def(U) ∧ Y ⊆ X}, apr(X) = \{Y | Y ∈ Def(U) ∧ X ⊆ Y }. • The lower approximation apr(X) is the largest definable set contained in X. • The upper approximation apr(X) is the smallest defin- able set containing X. • It is related to topological space, closure systems, and other mathematical systems, as well as belief functions.

12 Rough Set Theory: Algebraic systems

• The pair of approximations can be viewed as a pair of dual unary set-theoretic operators called approximation operators. • The system (2U , ¬, apr, apr, ∩, ∪) is an extension of stan- dard set algebra (2U , ¬, ∩, ∪) with two added unary oper- ators. It is called a rough set algebra. • The rough set algebra is an example of Boolean algebras with added operators.

13 Rough Set Theory: Rough Classification

• Based on the lower and upper approximations, the uni- verse U can be divided into three disjoint regions, the posi- tive region POS(X), the negative region NEG(X), and the boundary region BND(X): POS(X) = apr(X), NEG(X) = U − apr(X), BND(X) = apr(X) − apr(X). • The boundary region consists of objects that cannot be classified without uncertainty, due to our inability to dif- ferentiate some different objects.

14 Rough Classification: Another formulation

• For X ⊆ U, its rough membership function is defined by:

|[x]E ∩ X| µX(x) = = P (X | [x]E). |[x]E| • The three regions are then defined by:

POS(X) = {x ∈ U | µX(x) = 1}, NEG(X) = {x ∈ U | µX(x) = 0}, BND(X) = {x ∈ U | 0 < µX(x) < 1}.

• Obviously, they use extreme values of µX, i.e., 0 and 1.

15 Rough Classification: Observations

• All elements with non-zero and non-full membership val- ues will be classified into boundary region. • The quantitative information given by the conditional prob- ability P (X | [x]E) is not taken into consideration. • In practice, a not so rigid classification may be more useful. An object may be classified into the possible region if the conditional is sufficiently large. Likewise, an object may be classified into the negative region if the conditional probability is sufficiently small.

16 Decision-Theoretic Rough Sets:

• Basic question: How do we determine the threshold values for deciding the three regions? • Answer: Use the Bayesian decision procedure.

17 Bayesian decision procedure

• Let Ω = {w1, . . . , ws} be a finite set of s states.

• Let A = {a1, . . . , am} be a finite set of m possible actions.

• Let P (wj|x) be the conditional probability of an object x being in state wj given that the object is described by x.

• Let λ(ai|wj) denote the loss, or cost, for taking action ai when the state is wj.

18 Bayesian decision procedure

• For an object with description x, suppose action ai is taken.

• Since P (wj|x) is the probability that the true state is wj given x, the expected loss associated with taking action ai is given by:

Xs R(ai|x) = λ(ai|wj)P (wj|x). j=1

The quantity R(ai|x) is also called the conditional risk. • Given description x, a decision rule is a function τ(x) that specifies which action to take.

19 • The overall risk is defined by: R = X R(τ(x)|x)P (x), x where the summation is over the set of all possible descrip- tions of objects. • Bayesian decision procedure: For every x, compute the conditional risk R(ai|x) for i = 1, . . . , m and then select the action for which the conditional risk is minimum. If more than one action minimizes R(ai|x), any tie-breaking rule can be used.

20 A Simple Example

• Set of states: s0 – the meeting will be over in less than 2 hours, s1 – the meeting will be more than 2 hours.

• P (s0) = 0.8,P (s1) = 0.2. • Set of actions: a0 – put the car on meter (pay $2.00), a1 – put the car in the parking lot (pay $7.00). • Loss function:

λ(a0 | s0) = u($2.00) = 2, λ(a1 | s0) = u($7.00) = 10,

λ(a0 | s1) = u($2.00 + $15.00) = 60, λ(a1 | s1) = u($7.00) = 10.

21 • The expected cost of actions a0 and a1:

R(a0) = λ(a0 | s0)P (s0) + λ(a0 | s1)P (s1) = 2 ∗ 0.8 + 60 ∗ 0.2 = 13.6,

R(a1) = λ(a1 | s0)P (s0) + λ(a1 | s1)P (s1) = 10 ∗ 0.8 + 10 ∗ 0.2 = 10.0.

• Choose action a1.

22 Decision-Theoretic Rough Sets

• The set of states: Ω = {X, ¬X}.

• The set of actions: A = {a1, a2, a3}, deciding POS(A), deciding NEG(A), and deciding BND(A), respectively.

• Description of x:[x]E. • Conditional probability: P (X | [x]E) and P (¬X | [x]E) = 1 − P (X | [x]E).

• Loss function: λi1 = λ(ai|X), λi2 = λ(ai|¬X), and i = 1, 2, 3.

23 • expected loss R(ai|[x]E) associated with taking the indi- vidual actions can be expressed as:

R(a1|[x]E) = λ11P (X|[x]E) + λ12P (¬X|[x]E), R(a2|[x]E) = λ21P (X|[x]E) + λ22P (¬X|[x]E), R(a3|[x]E) = λ31P (X|[x]E) + λ32P (¬X|[x]E).

24 • The Bayesian decision procedure leads to the following minimum-risk decision rules:

(P) If R(a1|[x]E) ≤ R(a2|[x]E) and R(a1|[x]E) ≤ R(a3|[x]E), decide POS(X);

(N) If R(a2|[x]E) ≤ R(a1|[x]E) and R(a2|[x]E) ≤ R(a3|[x]E), decide NEG(X);

(B) If R(a3|[x]E) ≤ R(a1|[x]E) and R(a3|[x]E) ≤ R(a2|[x]E), decide BND(X).

• Based on P (A|[x]E) + P (¬A|[x]E) = 1, the decision rules can be simplified by using only the P (X|[x]E).

25 • Consider a special kind of loss functions with λ11 ≤ λ31 < λ21 and λ22 ≤ λ32 < λ12. • We have:

(P) If P (X|[x]E) ≥ γ and P (X|[x]E) ≥ α, decide POS(X);

(N) If P (X|[x]E) ≤ β and P (X|[x]E) ≤ γ, decide NEG(X);

(B) If β ≤ P (X|[x]E) ≤ α, decide BND(X); λ − λ α = 12 32 , (λ31 − λ32) − (λ11 − λ12) λ − λ γ = 12 22 , (λ21 − λ22) − (λ11 − λ12) λ − λ β = 32 22 . (1) (λ21 − λ22) − (λ31 − λ32)

26 • Suppose further:

(λ12 − λ32)(λ21 − λ31) ≥ (λ31 − λ11)(λ32 − λ22), we have: α ≥ γ ≥ β. • This leads to the following decision rules:

(P1) If P (X|[x]E) ≥ α, decide POS(X); (N1) If P (X|[x]E) ≤ β, decide NEG(X); (B1) If β < P (X|[x]E) < α, decide BND(X).

27 • When α = β, we have α = γ = β. In this case, we use the decision rules:

(P2) If P (X|[x]E) > α, decide POS(X); (N2) If P (X|[x]E) < α, decide NEG(X); (B2) If P (X|[x]E) = α, decide BND(X).

28 • Example 1:

• λ12 = λ21 = 1, λ11 = λ22 = λ31 = λ32 = 0. • α = 1 > β = 0, α = 1 − β, and γ = 0.5. • classical rough sets:

POS(X) = {x ∈ U | µX(x) = 1}, NEG(X) = {x ∈ U | µX(x) = 0}, BND(X) = {x ∈ U | 0 < µX(x) < 1}.

29 • Example 2:

• λ12 = λ21 = 1, λ31 = λ32 = 0.5, λ11 = λ22 = 0. • α = β = γ = 0.5. • 0.5-classification:

POS(X) = {x ∈ U | µX(x) > 0.5}, NEG(X) = {x ∈ U | µX(x) < 0.5}, BND(X) = {x ∈ U | µX(x) = 0.5}.

30 • Example 3:

• λ12 = λ21 = 4, λ31 = λ32 = 1, λ11 = λ22 = 0. • α = 0.75, β = 0.25 and γ = 0.5. • 1/4-classification:

POS(X) = {x ∈ U | µX(x) ≥ 0.75}, NEG(X) = {x ∈ U | µX(x) ≤ 0.25}, BND(X) = {x ∈ U | 0.25 < µX(x) < 0.75}.

31 Conclusions • The rough set theory provides a sound and useful frame- work to study many issues. • The language of rough sets can be used to describe con- cisely many problems. • The rough set theory has a solid foundation and is related to many other theories. • The decision-theoretic rough set theory is a probabilistic generalization of standard rough set theory. • The decision-theoretic rough set theory extends the appli- cation domain of rough sets.

32 Thank you!

33