Advanced Review Fuzzy set theory H.-J. Zimmermann∗
Since its inception in 1965, the theory of fuzzy sets has advanced in a variety of ways and in many disciplines. Applications of this theory can be found, for example, in artificial intelligence, computer science, medicine, control engineering, decision theory, expert systems, logic, management science, operations research, pattern recognition, and robotics. Mathematical developments have advanced to a very high standard and are still forthcoming to day. In this review, the basic mathematical framework of fuzzy set theory will be described, as well as the most important applications of this theory to other theories and techniques. Since 1992 fuzzy set theory, the theory of neural nets and the area of evolutionary programming have become known under the name of ‘computational intelligence’ or ‘soft computing’. The relationship between these areas has naturally become particularly close. In this review, however, we will focus primarily on fuzzy set theory. Applications of fuzzy set theory to real problems are abound. Some references will be given. To describe even a part of them would certainly exceed the scope of this review. 2010 John Wiley & Sons, Inc. WIREs Comp Stat 2010 2 317–332
ost of our traditional tools for formal modeling, assumes that precise symbols are being employed. It Mreasoning, and computing are crisp, determin- is, therefore, not applicable to this terrestrial life but istic, and precise in character. Crisp means dichoto- only to an imagined celestial existence’.1 L. Zadeh mous, that is, yes-or-no type rather than more-or-less referred to the second point, when he wrote: ‘As the type. In traditional dual logic, for instance, a state- complexity of a system increases, our ability to make ment can be true or false—and nothing in between. precise and yet significant statements about its behav- In set theory, an element can either belong to a set ior diminishes until a threshold is reached beyond or not; in optimization a solution can be feasible or not. Precision assumes that the parameters of a which precision and significance (or relevance) become 2 model represent exactly the real system that has been almost mutually exclusive characteristics’. For a long modeled. This, generally, also implies that the model time, probability theory and statistics have been the is unequivocal, that is, that it contains no ambigui- predominant theories and tools to model uncertainties ties. Certainty eventually indicates that we assume the of reality.3,4 They are based—as all formal theo- structures and parameters of the model to be definitely ries—on certain axiomatic assumptions, which are known and that there are no doubts about their val- hardly ever tested, when these theories are applied to ues or their occurrence. Unluckily these assumptions reality. In the meantime more than 20 other ‘uncer- and beliefs are not justified if it is important, that the tainty theories’ have been developed,5 which partly model describes well reality (which is neither crisp contradict each other and partly complement each nor certain). In addition, the complete description of a real system would often require far more detailed other. Fuzzy set theory—formally speaking—is one data than a human being could ever recognize simul- of these theories, which was initially intended to be taneously, process, and understand. This situation has an extension of dual logic and/or classical set theory. already been recognized by thinkers in the past. In During the last decades, it has been developed in the 1923, the philosopher B. Russell referred to the first direction of a powerful ‘fuzzy’ mathematics. When point when he wrote: ‘All traditional logic habitually it is used, however, as a tool to model reality better than traditional theories, then an empirical validation is very desirable.6 In the following sections, the for- ∗Correspondence to: [email protected] mal theory is described and some of the attempts RWTH Aachen, Korneliusstr. 5, 52076 Aachen, Germany to verify the theory empirically in reality will be DOI: 10.1002/wics.82 summarized.
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 317 Advanced Review www.wiley.com/wires/compstats
HISTORY 2. As an application-oriented ‘fuzzy technology’, that is, as a tool for modeling, problem solving, A short historical review may be useful to better and data mining that has been proven superior to understand the character and motivation of this existing methods in many cases and as attractive theory. The first publications in fuzzy set theory ‘add-on’ to classical approaches in other by Zadeh7 and Goguen8 show the intention of the cases. authors to generalize the classical notion of a set and a proposition to accommodate fuzziness in the In 1992, in three simultaneous conferences sense that it is contained in human language, that is, in Europe, Japan, and the United States, the in human judgment, evaluation, and decisions. Zadeh three areas of fuzzy set theory, neural nets, and writes: ‘The notion of a fuzzy set provides a convenient evolutionary computing (genetic algorithms) joined point of departure for the construction of a conceptual forces and are henceforth known under ‘compu- framework which parallels in many respects the tational intelligence’.8,12,13 In a similar way, the framework used in the case of ordinary sets, but term ‘soft computing’ is used for a number of is more general than the latter and, potentially, may approaches that deal essentially with uncertainty and prove to have a much wider scope of applicability, imprecision. particularly in the fields of pattern classification and information processing. Essentially, such a framework provides a natural way of dealing with problems in which the source of imprecision is the absence of MATHEMATICAL THEORY AND sharply defined criteria of class membership rather EMPIRICAL EVIDENCE than the presence of random variables’.7 ‘Imprecision’ here is meant in the sense of vagueness rather than Basic Definitions and Operations the lack of knowledge about the value of a parameter The axiomatic bases of fuzzy set theory are manifold. (as in tolerance analysis). Fuzzy set theory provides a Gottwald offers a good review.14–16 We shall strict mathematical framework (there is nothing fuzzy concentrate on the elements of the theory itself: about fuzzy set theory!) in which vague conceptual phenomena can be precisely and rigorously studied. Definition 1 If X is a collection of objects denoted It can also be considered as a modeling language, generically by x, then a fuzzy set AinXisasetof˜ well suited for situations in which fuzzy relations, ordered pairs: criteria, and phenomena exist. The acceptance of this theory grew slowly in the 1960s and 1970s of ˜ A = (x, µ ˜ (x)|x ∈ X) (1) the last century. In the second half of the 1970s, A however, the first successful practical applications in the control of technological processes via fuzzy rule- µA˜ (x) is called the membership function (generalized based systems, called fuzzy control (heating systems, characteristic function) which maps X to the cement factories, etc.), boosted the interest in this area membership space M. Its range is the subset of considerably. Successful applications, particularly in nonnegative real numbers whose supremum is finite. = Japan, in washing machines, video cameras, cranes, For sup µA˜ (x) 1: normalized fuzzy set. subway trains, and so on triggered further interest In Definition 1, the membership function of the and research in the 1980s so that in 1984 already fuzzy set is a crisp (real-valued) function. Zadeh also approximately 4000 publications existed and in 2000 defined fuzzy sets in which the membership functions more than 30,000. Roughly speaking, fuzzy set theory themselves are fuzzy sets. Those sets can be defined as during the last decades has developed along two follows: lines: Definition 2 A type m fuzzy set is a fuzzy set whose membership values are type m − 1,m> 1, fuzzy sets 1. As a formal theory that, when maturing, became on [0, 1]. more sophisticated and specified and was enlarged by original ideas and concepts as well Because the termination of the fuzzification on as by ‘embracing’ classical mathematical areas, stage r ≤ m seems arbitrary or difficult to justify, such as algebra,9,10 graph theory,11 topology, Hirota17 defined a fuzzy set the membership function and so on by generalizing or ‘fuzzifying’ them. of which is pointwise a probability distribution: the This development is still ongoing. probabilistic set.
318 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
Definition 3 A probabilistic set17 A on X is defined the α-level set: by defining function µ , A = ∈ | ≥ Aα x X µA˜ α (5) µ : X × (x, ω) → µ (x, ω) ∈ (2) A A C = ∈ | Aα x X µA˜ >α is called strong α-level set or strong α-cut. and ( C, BC) = [0, 1] are Borel sets.
˜ Definition 4 A linguistic variable2 is characterized Definition 8 A fuzzy set A is convex if by a quintuple (x, T(x), U, G, M˜ ), in which x is the µ ˜ (λx + (1 − λ)x ) ≥ min µ ˜ (x ), µ ˜ (x ) , name of the variable, T(x) (or simply T) denotes the A 1 2 A 1 A 2 ∈ ∈ term set of x, that is, the set of names of linguistic x1, x2 X, λ [0, 1]. (6) values of x. Each of these values is a fuzzy variable, denoted generically by X and ranging over a universe Alternatively, a fuzzy set is convex if all α-level sets of discourse U, which is associated with the base are convex. variable u; G is a syntactic rule (which usually has the form of a grammar) for generating the name, X, of Definition 9 For a finite fuzzy set A,˜ the cardinality values of x. M is a semantic rule for associating with | ˜ | ˜ A is defined as each X its meaning. M(X) is a fuzzy subset of U. A particular X, that is, a name generated by G, is called | ˜ |= A µA˜ (x). (7) a term (e.g., Figure 1). x∈X Of course, the fuzzy sets in Figure 1 representing || ˜ || = |A˜ | ˜ the values of the linguistic variable ‘AGE’ may more A |X| is called the relative cardinality of A. appropriately be continuous fuzzy sets. A variety of definitions exist for fuzzy measures ˜ and measures of fuzziness. The interested reader is Definition 5 A fuzzy number M is a convex normalized referred to Refs 18–21. fuzzy set M˜ of the real line R such that Operations on Fuzzy Sets 1. it exists exactly one x0 ∈ R µ ˜ (x0) = 1 (x0 is 7 ˜ M In his first publication, Zadeh defined the following called the mean value of M). operations for fuzzy sets as generalization of crisp sets 2. µM˜ (x) is piecewise continuous. and of crisp statements (the reader should realize that the set theoretic operations intersection, union and complement correspond to the logical operators and, Dubois and Prade9 defined LR-fuzzy numbers as inclusive or and negation): follows: Definition 10 Intersection (logical and): the member- Definition 6 A fuzzy number M˜ is of LR-type if there ship function of the intersection of two fuzzy sets A˜ exist reference functions L (for left) and R (for right), and B˜ is defined as: and scalars α>0, β>0,with = ∀ ∈ µA˜ ∩B˜ (X) Min(µA˜ (X), µB˜ (X)) x X (8) − = m x ≤ µM˜ (x) L for x m (3) α x − m Definition 11 Union (exclusive or): the membership = R for x ≤ m (4) β function of the union is defined as:
= ∀ ∈ (see Figure 2) µA˜ ∪B˜ (X) Max(µA˜ (X), µB˜ (X)) x X (9) Some other useful definitions are the following:
Definition 7 The support of a fuzzy set A,˜ S(A˜ ) is the Definition 12 Complement (negation): the member- ∈ ship function of the complement is defined as: crisp set of all x X such that µA˜ (x) > 0. The (crisp) set of elements that belong to ˜ = − ∀ ∈ the fuzzy set A at least to the degree α is called µA˜ (X) 1 µA˜ (X) x X (10)
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 319 Advanced Review www.wiley.com/wires/compstats
Age Linguistic variable
Values of age
Very young Young Old Very old
20 25 30 35 40 45 50 55 60 65 70 75 80 Age | 2 Base variable FIGURE 1 Linguistic variable ‘Age’.
(2a) bounded difference: t1(µ ˜ (x), µ ˜ (x)) = max 0, µ ˜ (x) + µ ˜ (x) − 1 1 A B A B (13)
(2b) bounded sum: = + s1(µA˜ (x), µB˜ (x)) min 1, µA˜ (x) µB˜ (x) (14) 510 (3a) Einstein product: FIGURE 2| LR-representation of a fuzzy number. t1.5(µA˜ (x), µB˜ (x)) These definitions were later extended. The µ ˜ (x) · µ ˜ (x) ‘logical and’ (intersection) can also be modeled as = A B − µ ˜ + µ ˜ − µ ˜ · µ ˜ a t-norm18,21–28 and the ‘inclusive or’ (union) as a 2 A(x) B(x) A(x) B(x) t-conorm. Both types are monotonic, commutative, (15) and associative. Typical dual pairs of nonparameterized t-norms (3b) Einstein sum: and t-conorms are compiled below: µ ˜ (x) + µ ˜ (x) = A B s1.5(µA˜ (x), µB˜ (x)) (16) (1a) drastic product: 1 + µ ˜ (x) · µ ˜ (x) A B tw µA˜ (x), µB˜ (x) (4a) Hamacher product: min µ ˜ (x), µ ˜ (x) if max µ ˜ (x), µ ˜ (x) = 1 = A B A B 0otherwise t2.5(µA˜ (x), µB˜ (x)) (11) µ ˜ (x) · µ ˜ (x) = A B + − · (17) µA˜ (x) µB˜ (x) µA˜ (x) µB˜ (x) (1b) drastic sum: (4b) Hamacher sum: sw µ ˜ (x), µ ˜ (x) A B max µ ˜ (x), µ ˜ (x) if min µ ˜ (x), µ ˜ (x) = 0 = A B A B s2.5(µ ˜ (x), µ ˜ (x)) A B 1otherwise µ ˜ (x) + µ ˜ (x) − 2µ ˜ (x) · µ ˜ (x) = A B A B (18) (12) − · 1 µA˜ (x) µB˜ (x)
320 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
(5a) minimum: Definition 13 The compensatory and operator is defined as follows: t (µ ˜ (x), µ ˜ (x)) = min µ ˜ (x), µ ˜ (x) (19) − 3 A B A B m (1 γ ) m γ µ ˜ (x) = µ (x) 1 − (1 − µ (x)) (5b) maximum: Ai,comp i i i=1 i=1 ∈ ≤ ≤ = x X, γ 1. (28) s3(µA˜ (x), µB˜ (x)) max µA˜ (x), µB˜ (x) . (20) In effect, each convex combination of a These operators are ordered as follows: t-norm with the respective t-conorm could be used as averaging operator. tw ≤ t1 ≤ t1.5 ≤ t2 ≤ t2.5 ≤ t3 (21) Definition 14 An OWA-Operator30 is defined as s3 ≤ s2.5 ≤ s2 ≤ s1.5 ≤ s1 ≤ sw. (22) follows: Also a number of parameterized operators were = defined, such as: µOWA(x) wjµj(x) (29) j
={ } 1. Hamacher union: where w w1,..., wn is a vector of weights wi with wi ∈ [0, 1] and wi = 1. (γ − 1)µ ˜ (x) + µ ˜ (x) + µ ˜ (x) i = B A B µ ˜ ∪ ˜ (x) A B + µj(x)isthejth largest membership value for 1 γ µA˜ (x)µB˜ (x) (23) an element x for which the (aggregated) degree of membership shall be determined. The rationale behind 2. Yager intersection: this operator is again the observation that for an ‘and’ aggregation (modeled i.e., by the min-operator) the = − − p smallest degree of membership is crucial, whereas for µA˜ ∩B˜ (x) 1 min 1, (1 µA˜ (x)) an ‘or’ aggregation (modeled by ‘max’) the largest 1/p p degree of membership of an element in all fuzzy sets +(1 − µ ˜ ) , p ≥ 1 (24) B(x) is to be aggregated. Therefore, a basic aspect of this operator is the re-ordering step. In particular, the 3. Yager union: degree of membership of an element in a fuzzy set is not associated with a particular weight. Rather a = p + p 1/p ≥ µA˜ ∩B˜ (x) min 1, µA˜ (x) µB˜ (x) , p 1 weight is associated with a particular ordered position (25) of a degree of membership in the ordered set of relevant degrees of membership (Table 1). 4. Dubois and Prade intersection: The Extension Principle µ ˜ (x) · µ ˜ (x) = A B ∈ µA˜ ∩B˜ (x) , α [0, 1] One of the most basic concepts of fuzzy set theory max µA˜ (x), µB˜ (x), α that can be used to generalize crisp mathematical (26) concepts to fuzzy sets is the extension principle. In its elementary form, it was already implied in Zadeh’s 5. Union: first contribution. In the meantime, modifications have been suggested. Following Zadeh and Dubois and ˜ + ˜ − ˜ · ˜ − µA(x) µB(x) µA(x) µB(x ) 2,10 − Prade, we define the extension principle as follows: min µA˜ (x), µB˜ (x), (1 α) µ ˜ ˜ (x) = . A∪B − − max (1 µA˜ (x)), (1 µB˜ (x)), α Definition 15 Let X be a Cartesian product of ˜ ˜ (27) universes X = X1 ×···×Xr,andA1, ..., Ar be r fuzzy sets in X1, ..., Xr, respectively. f is a mapping 29 from X to a universe Y, y = f(x1, ..., xr). Then the Finally, a class of ‘averaging operators’ were ˜ defined, which do not have the mathematical prop- extension principle allows us to define a fuzzy set Bin erties of the t-norms and t-conorms. A justification Yby for those operators will be given in Section Empirical ˜ = | = ∈ Evidence. Two of the best known are the following B (y, µB˜ (y)) y f(x1, ..., xr), (x1, ..., xr) X two: (30)
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 321 Advanced Review www.wiley.com/wires/compstats
TABLE 1 Classification of Aggregation Operators Intersection operators Union operators References t-norms Averaging operators t-conorms Nonparameterized Zadeh 1965 Minimum Maximum Algebraic product Algebraic sum Giles 1976 Bounded sum Bounded difference Hamacher 1978 Hamacher product Hamacher sum Mizumoto 1982 Einstein product Einstein sum Dubois and Prade Drastic product Drastic sum 1980,1982 Dubois and Prade Arithmetic mean 1984 Geometric mean Silvert 1979 Symmetric summation Symmetric difference Parameterized families Hamacher 1978 Hamacher intersection Hamacher union operators operators Yager 1980 Yager intersection operators Yager union operators Dubois and Prade Dubois intersection operators Dubois union operators 1980, 1982, 1984 Werners 1984 ‘fuzzy and’, ‘fuzzy or’ Zimmermann and ‘compensatory and’, convex comb. Zysno 1984 of maximum and minimum , or algebraic product and algebraic sum Yager 1988 OWA operators where Fuzzy Relations and Graphs Fuzzy relations are fuzzy sets in product space.31,32 sup ∈ −1 min{µ ˜ (x1), ..., µ ˜ (xr)} (x1,...,xr) f (y) A1 Ar −1 = ⊆ R µ ˜ = if f (y) Definition 16 Let X, Y be universal sets, then B(y) 0otherwise (31) ˜ = | ⊆ × R (x, y), µR˜ (x, y) (x, y) X Y (34) where f −1 is the inverse of f. For r = 1, the extension principle, of course, is called a fuzzy relation on X × Y. reduces to = = R ˜ = ˜ ˜ Example 1 Let X Y and R : ‘considerably = = µ ˜ | = ∈ B f(A) (y, B(y)) y f(x), x X (32) larger than’. The membership function of the fuzzy relation, which where is, of course, a fuzzy set on X × Ycanthenbe − − { } 1 = supx∈f 1(y) min µA˜ (x) if f (y) µ ˜ (y) = ≤ B 0otherwise 0forx y (x−y) µ ˜ = < ≤ (33) R(x, y) 10y for y x 11y (35) 1forx > 11y
322 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
A different membership function for this relation Fuzzy Analysis could be A fuzzy function is a generalization of the concept of a classical function. A classical function f is a mapping 0forx ≤ y µ ˜ (x, y) = − − (36) (correspondence) from the domain D of definition of R (1 + (y − x) 2) 1 for x > y. the function into a space S; f(D) ⊆ S is called the range of f. Different features of the classical concept of a For discrete supports, fuzzy relations can also function can be considered fuzzy rather than crisp. be defined by matrices.21 Quite a number of different Therefore, different ‘degrees’ of fuzzification of the kinds of fuzzy relations have been defined. As an classical notion of a function are conceivable: example, we will here only define one kind: similarity relations33: 1. There can be a crisp mapping from a fuzzy set that carries along the fuzziness of the domain Definition 17 A fuzzy relation that is reflexive, and, therefore, generates a fuzzy set. The image symmetric, and transitive is called a similarity relation. of a crisp argument would again be crisp. A fuzzy relation R˜ (x x) is called 0 2. The mapping itself can be fuzzy, thus blurring the image of a crisp argument. This is normally reflexive if µ ˜ (x, x) = 1 R (37) µ ˜ = µ ˜ called a fuzzy function. Dubois and Prade call symmetric if R(x, y) R(y, x) 10 this ‘fuzzifying function’. max − min 3. Ordinary functions can have properties or be if µ ˜ ◦ ˜ ≤ µ ˜ (x, y). (38) transitive R R R constrained by fuzzy constraints.
Particularly for Case 2, definitions for classical ˜ Definition 18 Max-min composition: let R1(x, y), notions of analysis, such as, extrema of fuzzy ˜ (x, y) ∈ X × YandR2(y, z), (y, z) ∈ Y × Zbetwo functions, integration of fuzzy functions, integration fuzzy set relations. The max-min composition of fuzzy functions aver a crisp interval, integration ˜ ˜ R1 max − min R2 is then the fuzzy set of a crisp function over a fuzzy interval, fuzzy 9,10,34,35 differentiation, and so on have been suggested. ˜ ◦ ˜ It would exceed the scope of this survey to describe R1 R2 = (x, z), max min µ ˜ (x, y), µ ˜ (y, z) R1 R2 y even a larger part of them in more detail. |x ∈ X, y ∈ Y, z ∈ Z . Empirical Evidence (39) So far fuzzy sets and their extensions and operations were considered as formal concepts that need no proof µ ˜ ◦ ˜ is again the membership function of a fuzzy R1 R2 by reality. If, however, fuzzy concepts are used, for relation on fuzzy sets. example, to model human language, then one has The ‘counterpart’ of similarity relations are to make sure that the concepts really model, what a order relations or preference relations21 which human says or thinks. One has, with other words, to are further differentiated in fuzzy preorders, fuzzy ‘extract’ thoughts from the human brain and compare total orders, or fuzzy strict total orders, which them with the modeling tools and concepts. This are particularly often used in multicriteria analysis is a problem of psycho-linguistics.6,36 Bellman and and preference theory. Fuzzy relations can also be Zadeh37 suggested in their paper that the ‘and’ by considered as representing fuzzy graphs.11 Let the which in a decision objective functions and constraints elements of fuzzy relations, as defined in Definition are combined can be modeled by the intersection of 16 be the nodes of a fuzzy graph that is represented the respective fuzzy sets and mathematically modeled by this fuzzy relation. The degrees of membership by ‘min’ or by the product. of the elements of the related fuzzy sets define the The interpretation of a decision as the intersec- ‘strength’ of or the flow in the respective nodes of tion of fuzzy sets implies no positive compensation the graph, whereas the degrees of membership of the (trade-off) between the degrees of membership of the corresponding pairs in the relation are the ‘flows’ or fuzzy sets in question, if either the minimum or the the ‘capacities’ of the edges. Fuzzy graph theory has product is used as an operator. Each of them yields in the meantime become an extended area of (fuzzy) degrees of membership of the resulting fuzzy set (deci- mathematics. sion) which are on or below the lowest degree of
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 323 Advanced Review www.wiley.com/wires/compstats membership of all intersecting fuzzy sets. This is also the truth tables of dual logic. Limited empirical tests true if any other t-norm is used to model the inter- have also been executed for shapes of membership section or operators, which are no t-norms but map functions, linguistic approximation, and hedges.40–45 below the min-operator. It should be noted that, in fact, there are decision situations in which such a APPLICATIONS ‘negative compensation’ (i.e., mapping below the min- imum) is appropriate. The interpretation of a decision It shall be stressed that ‘applications’ in this review as the union of fuzzy sets, using the max-operator, mean applications of fuzzy set theory to other formal leads to the maximum degree of membership achieved theories or techniques and not to real problems. The by any of the fuzzy sets representing objectives or latter would certainly exceed the scope of this review. constraints. This amounts to a full compensation of The interested reader is referred to Refs 46–54. lower degrees of membership by the maximum degree of membership. No membership will result, however, Fuzzy Logic, Approximate Reasoning, and which is larger than the largest degree of member- Plausible Reasoning ship of any of the fuzzy sets involved. Observing Logics as bases for reasoning can be distinguished managerial decisions, one finds that there are hardly essentially by three topic-neutral items: truth values, any decisions with no compensation between either vocabulary (operators), and reasoning procedures different degrees of goal achievement or the degrees (tautologies, syllogisms). In dual logic, truth values to which restrictions are limiting the scope of deci- can be‘true’ (1) or ‘false’ (0) and operators are defined sions. It may be argued that compensatory tendencies via truth tables (e.g., Table 2). in human aggregation are responsible for the failure A and B represent two sentences or statements of some classical operators (min, product, max) in which can be true (1) or false (0). These statements empirical investigations.38 can be combined by operators. The truth values of The following conclusions can probably be the combined statements are shown in the columns drawn: neither t-norms nor t-conorms can alone under the respective operators ‘and’, ‘inclusive or’, cover the scope of human aggregating behavior. It ‘exclusive or’, implication, and so on. Hence, the truth is very unlikely that a single nonparametric operator values in the columns define the respective operators. can model appropriately the meaning of ‘and’ or Considering the modus ponens as one tautology: ‘or’ context independently, that is, for all persons, ∧ ⇒ ⇒ at any time and in each context. There seem to be (A (A B)) B (40) three ways to remedy this weakness of t-norms and t-conorms: one can either define parameter-dependent or: t-norms or t-conorms that cover with their parameters Premise: A is true the scope of some of the nonparametric norms and Implication: If A then B can, therefore, be adapted to the context. A second Conclusion: B is true. way is to combine t-norms and their respective t-conorms and such cover also the range between Here four assumptions are being made: t-norms and t-conorms (which may be called the range of partial positive compensation). The disadvantage 1. A and B are crisp. is that generally some of the useful properties of 2. A in premise is identical to A in implication. t- and s-norms get lost. The third way is, eventually, to design operators, which are neither t-norms nor 3. True = absolutely true t-conorms, but which model specific contexts well. False = absolutely false. Empirical validations of suggested operators are very 4. There exist only two quantifiers: ‘All’ and ‘There scarce. The ‘min’, ‘product’, ‘geometric mean’, and exists at least one case’. the γ -operator have been tested empirically and it γ has turned out, the -operator models the ‘linguistic TABLE 2 Truth Table of Dual Logic and’, which lies between36,39 the ‘logical and’ and the ‘logical exclusive or’, best and context dependently. A B ∧ ∨ x ∨ ⇒ ⇔ ? It is the convex combination of the product (as 1111 0 1 11 a t-norm) and the generalized algebraic sum (as 1001 1 0 01 a t-conorm). In addition, it could also be shown 0101 1 1 00 that this operator is pointwise injective, continuous, monotonous, commutative and in accordance with 0000 0 1 10
324 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
67 Knowledge semantically contained the context of the rules. Expert User engineer Naturally the inference engine had to be substituted by a system that was able to infer from fuzzy statements. It shall be called here ‘computational unit’. The first, very successful, applications of these systems were Dialogue modules in control engineering. They were, therefore, called ‘fuzzy controllers’.20,68–71 The input to these systems Knowledge was numerical (measures of the process output) and Inference Explanatory modules had to be transformed into fuzzy statements (fuzzy engine module (acquisition) sets), which was called ‘fuzzification’. This input, together with the fuzzy statements of the rule base, Knowledge base was then processed in the computational unit, which (rules, facts, domain knowledge) delivered again fuzzy sets as output. Because the out- Expert system put was to control processes, it had to be transformed again into real numbers. This process was called ‘defuzzification’72 (Figure 4). This is one difference to FIGURE 3| Expert system. fuzzy expert systems, which are supposed to replace or support human experts. Hence, the output should be These assumptions are relaxed in fuzzy linguistic and, rather than having ‘defuzzification’ at reasoning.55–61 the end, these systems use ‘linguistic approximation’73 In fuzzy logic, the truth values are no longer to provide user-friendly output. restricted to the two values ‘true’ and ‘false’ but Several methods have been developed for are expressed by the linguistic variables ‘true’ and fuzzification, inference, and defuzzification, and at ‘false’. In approximate reasoning62 additionally the the beginning of the 1980s the first commercial statements A and/or B can be fuzzy sets. In systems were put on the market (control of video plausible reasoning,21 the A in the premise does not cameras, cement kilns, cameras, washing machines, have to be identical (but similar) to the A in the etc.). This development started primarily in Japan implication. In all forms of fuzzy reasoning, the and then spread to Europe and the United States. It implications can be modeled in many different ways.60 boosted the interest in fuzzy set theory tremendously, Which one is the most appropriate can be evaluated such that in Europe one was talking of a ‘fuzzy either empirically36 or axiomatically.60 Of course, boom’ around 1990. Research and development in the models for implications can also be chosen with the area of fuzzy control is, however, still ongoing respect to their computational efficiency (which does to day. not guaranty that the proper model has been chosen for a certain context). Fuzzy Data Mining With the development of electronic data processing, Fuzzy Rule-Based Systems (Fuzzy Expert more and more data were available electronically. Systems and Fuzzy Control) This led to the situation in which the masses of Knowledge-based systems are computer-based sys- data, for instance in data warehouses, exceeded the tems, normally to support decisions in which mathe- human capabilities to recognize important structures matical algorithms63 are replaced by a knowledge base in these data. Classical methods to ‘data mine’, such and an inference engine. The knowledge base64 con- as cluster techniques, and so on, were available, but tains expert knowledge. There are different ways to often they did not match the needs. Cluster techniques, acquire and store expert knowledge.65,66 The most fre- for instance, assumed that data could be subdivided quently used way to store this knowledge are if–then crisply into clusters, which did not fit the structures rules. These are then considered as ‘logical’ state- that existed in reality. Fuzzy set theory seemed to ments, which are processed in the inference engine offer good opportunities to improve existing concepts. to derive a conclusion or decision. Generally, these Bezdek31,74 was one of the first, who developed fuzzy systems are called ‘expert systems’ (Figure 3). Classi- cluster methods with the goals, to search for structure cal expert systems processed the truth values of the in data to reduce complexity and to provide input for statements. Hence, they were actually not processing control and decision making (Figure 5). knowledge but symbols. Different approaches have been followed: hierar- In the 1970s, crisp rules in the knowledge chical approaches, semi-formal heuristic approaches, base were substituted by fuzzy statements, which and objective function clustering. Because the area
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 325 Advanced Review www.wiley.com/wires/compstats
Fuzzy controller
w z Rule base e u x Fuzzification Defuzzification Process r Computational unit
| Primary FIGURE 4 Fuzzy detector controller.
Processdescription and pre-processing
Determ. of memberships Determ. of scale-levels
Feature analysis Classifier design Factor analysis Clustering Discriminant analysis structured modelling Regression analysis
Classification Pattern recognition Diagnosis discrete Ling. approximation
Defuzzification continuous
FIGURE 5| Fuzzy data mining. of (intelligent) data mining becomes more and more problems with several objective functions. In all these important, the development of further fuzzy methods approaches, objective functions were considered to be in this area can be expected.46,75–82 real valued and the actions as crisply defined. For the case that the objective function or the constraints are Fuzzy Decisions not crisply defined Bellman and Zadeh suggested in In the logic of decision making, a decision is defined 1970 the following symmetric model:37 as the choice of a (feasible) action by which the utility function is maximized. Hence, it is the search for Definition 19 Let µ ˜ , i = 1, ..., m be membership Ci an optimal and feasible action or strategy. Feasibility functions of constraints on X, defining the decision is either defined by enumerating the feasible actions space and µ ˜ , j = 1, ..., n the membership functions Gj or by constraints that define the feasibility space. of objective (utility) functions or goals on X. The feasibility space is unordered with respect to A decision is then defined by its membership utility and the utility or objective function defines function an order in this space. Hence, the problem is not µ ˜ = (µ ˜ ∗···∗µ ˜ ) × (µ ˜ ∗···∗µ ˜ ) symmetric. The problem becomes more complicated D C1 Cm G1 Gn if several objective functions exist (several orders on =∗ ×∗ iµC˜ jµG˜ (41) the same space). This leads to the area of ‘multicriteria i j analysis’. This area has grown very much since the where ×, ∗ denote appropriate, possibly context 1970s. Many approaches have been suggested to solve dependent, ‘aggregators’ (connectives).
326 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
The most frequent interpretation of ×, ∗ is the If we assume that the LP-decision has to be min-operator. made in fuzzy environments, quite a number of possible modifications exist. First of all, the decision It should be clear that the intersection of maker might really not want to actually maximize the objective functions and the constraints (called or minimize the objective function. Rather he might ‘decision’) is again a fuzzy set. If a crisp decision is want to reach some aspiration levels which might needed, one could, for instance, determine the solution not even be definable crisply. Thus, he might want that has the highest degree of membership in this fuzzy to ‘improve the present cost situation considerably’, set. Let us call this solution the ‘maximizing solution’. and so on. Second, the constraints might be vague This definition complicated and facilitated the in one of the following ways: the ‘≤’ sign might classical model at the same time: it facilitated not be meant in the strictly mathematical sense but the model by suggesting a symmetric rather than smaller violations might well be acceptable. This can an asymmetric model. It complicated the problem happen if the constraints represent aspiration levels because now to determine the optimal action, not real numbers (of the real-valued utility function) as mentioned above or if, for instance, the constraints but functions (the membership functions of fuzzy represent sensory requirements (taste, color, smell, numbers) had to be compared and ranked. The later etc.) which cannot adequately be approximated by problem led to many publications that focused on a crisp constraint. Of course, the coefficients of the the ranking of fuzzy numbers.83,84 The symmetry of vectors b or c or of the matrix A itself can have a fuzzy the model led to very efficient approaches in the character either because they are fuzzy in nature or area of ‘multiobjective decision making’, that is, in because perception of them is fuzzy. Finally, the role multiobjective mathematical programming, in which of the constraints can be different from that in classical several continuous objective functions have to be linear programming where all constraints are of equal optimized subject to constraints. In fact, mathematical weight. For the decision maker, constraints might be of programming can be considered as a special case of different importance or possible violations of different decisions in the logic of decision making. This will constraints may be acceptable to him to different be considered in more detail in the next section. For degrees. Fuzzy linear programming offers a number fuzzy multistage decisions, see Refs 85,86. of ways to allow for all those types of vagueness and we shall discuss some of them below. If we assume Fuzzy Optimization that the decision maker can establish an aspiration Fuzzy optimization87 was already discussed briefly in level, z, of the objective function, which he wants to Section Fuzzy Analysis. Here, we shall concentrate on achieve as far as possible and if the constraints of ‘constrained optimization’, which is generally called this model can be slightly violated—without causing ‘mathematical programming’.88–95 It will show the unfeasibility of the solution—then the model can be potential of the application of fuzzy set theory to written as follows: classical techniques very clearly. We shall look at the easiest, furthest developed, and most frequently used Find x T ≥˜ version, the linear programming. It can be defined as s.t. c x z ≤˜ (42) follows: Ax b x≥˜ 0. Definition 20 Linear programming model: ≤˜ ≤ max f(x) = z = cTx Here, denotes the fuzzified version of and s.t. Ax ≤ b has the linguistic interpretation ‘essentially smaller ≥˜ x ≥ 0 than or equal’. denotes the fuzzified version of ≥ and has the linguistic interpretation ‘essentially ∈ Rn ∈ Rm ∈ Rm×n with c, x , b , A . greater than or equal’. The objective function might In this model it is normally assumed that all have to be written as a minimizing goal to consider z coefficients of A, b,andc are real (crisp) numbers, as an upper bound. Obviously the asymmetric linear that ‘≤’ is meant in a crisp sense, and that ‘maximize’ is programming model has now been transformed into a strict imperative. This also implies that the violation a symmetric model. To make this even more visible, of any single constraint renders the solution unfeasible we shall rewrite the model as and that all constraints are of equal importance (weight). Strictly speaking, these are rather unrealistic Find x ≤˜ assumptions, which are partly relaxed in fuzzy linear s.t. Bx d (43) ≥ programming.96–100 x 0.
Volume 2, May/June 2010 2010 John Wiley & Sons, Inc. 327 Advanced Review www.wiley.com/wires/compstats
The fuzzy set ‘decision’ D˜ is then This is a normal crisp linear programming model for which very efficient solution methods exist. µ ˜ (x) = min {µ (x)}. (44) The main advantage, compared with the unfuzzy D = + i i 1,...,m 1 problem formulation, is the fact that the decision maker is not forced into a precise formulation because µ (x) can be interpreted as the degree to which x i of mathematical reasons although he might only be fulfills (satisfies) the fuzzy inequality B x ≤˜ d (where i i able or willing to describe his problem in fuzzy terms. B denotes the ith row of B). Assuming that the i Linear membership functions are obviously only a very decision maker is interested not in a fuzzy set but in rough approximation. Membership functions which a crisp ‘maximizing’ solution, we could suggest the monotonically increase or decrease, respectively, in the ‘maximizing solution’, which is the solution to the interval of [d , d + p ] can also be handled quite easily. possibly nonlinear programming problem i i i It should also be observed that the classical assumption { }= of equal importance of constraints has been relaxed: max min µi(x) max µD˜ (x). (45) x≥0 i=1,...,m+1 x≥0 the slope of the membership functions determines the ‘weight’ or importance of the constraint. The slopes, As membership functions seem suitable. however, are determined by the pis. The smaller the pi the higher the importance of the constraint. For p = 0, ≤ i 1ifBix di − the constraint becomes crisp, that is, no violation is − Bix di ≤ + = 1 p if di < Bix di pi, allowed. Because of the symmetry of the model, it µi(x) i (46) i = 1, ..., m + 1 is very easy to add additional objective functions, 0ifBix > di + pi. which solves the problem of multiobjective decision making.101–107 Because this is a crisp model, existing The pi are subjectively chosen constants of crisp constraints can also be added easily. So far, admissible violations of the constraints and the objec- two major assumptions have been made to arrive at tive function. Substituting these membership functions ‘equivalent models’ which can be solved efficiently by into the model yields, after some rearrangements and standard LP-methods: with some additional assumptions 1. Linear membership functions were assumed for Bix − di max min 1 − . (47) all fuzzy sets involved. x≥0 i=1,...,m+1 pi 2. The use of the minimum-operator for the Introducing one new variable, λ, which corre- aggregation of fuzzy sets was considered to be sponds essentially to the degree of membership of x in adequate. the fuzzy set decision, we arrive at re1 The linear membership functions used so far max λ could all be defined by fixing two points, the s.t. λp + B (x) ≤ d + p , i = 1, ..., m + 1 (48) i i i i upper and lower aspiration levels or the two x ≥ 0. bounds of the tolerance interval. The obvious A slightly modified version of this model results way to handle nonlinear membership functions if the membership functions are defined as follows: is probably to approximate them piecewise by linear functions.108,109 a variable ti, i = 1, ..., m + 1, 0 ≤ ti ≤ pi is defined which measures the degree of violation of the ith re2 As already mentioned earlier, the min-operator constraint. The membership function of the ith row is models not always the real meaning of the then ‘linguistic and’. Hence, the modeler might want t to use other operators. The computational µ (x) = 1 − i . (49) i p efficiency, by which the problem can then be i solved, depends only on the type of crisp The crisp equivalent model is then equivalent model, for example, whether it is linear or nonlinear, and that depends on the max λ combination of the type of membership function s.t. λp + t ≤ p i i i and the operator used in the fuzzy model. Table 3 B x − t ≤ d , i = 1, ..., m + 1 (50) i i i shows these possible combinations and the type t ≤ p i i of resulting crisp equivalent model. x, t ≥ 0.
328 2010 John Wiley & Sons, Inc. Volume 2, May/June 2010 WIREs Computational Statistics Fuzzy set theory
TABLE 3 Resulting Equivalent Models Membership function Operator Model Linear min LP Logistic min LP Hyperbolic min LP Linear γ min +(1 − γ )max MILP Linear/nonlinear Product nonconvex NLP Linear/nonlinear γ -operator nonconvex NLP ˜ Linear and LP Linear or˜ MILP ˜ Logistic and NLP with lin. const.