Fuzzy Sets and Systems 112 (2000) 117}125

On the optimization of fuzzy decision trees

Xizhao Wang! *, Bin Chen", Guoliang Qian", Feng Ye" ! Department of Mathematics, Hebei University, Baoding 071002, Hebei, People+s Republic of China " Group, Department of Computer Science, Harbin Institute of Technology, Harbin 150001, People+s Republic of China Received June 1997; received in revised form November 1997

Abstract

The induction of fuzzy decision trees is an important way of acquiring imprecise knowledge automatically. Fuzzy ID3 and its variants are popular and e$cient methods of making fuzzy decision trees from a group of training examples. This paper points out the inherent defect of the likes of Fuzzy ID3, presents two optimization principles of fuzzy decision trees, proves that the complexity of constructing a kind of minimum fuzzy decision is NP-hard, and gives a new algorithm which is applied to three practical problems. The experimental results show that, with regard to the size of trees and the classi"cation accuracy for unknown cases, the new algorithm is superior to the likes of Fuzzy ID3. ( 2000 Elsevier Science B.V. All rights reserved.

Keywords: Machine learning; Learning from examples; Knowledge acquisition and learning; Fuzzy decision trees; Complexity of fuzzy ; NP-hardness

(I) (I) Notation pG , qG relative frequencies concerning some classes X a given "nite set [X"A] a branch of a G" X" " F(X) the family of all fuzzy subsets fH P( AG CH) the conditional frequency (I) de"ned on X EntrG fuzzy entropy (I) A, B, C, P, N fuzzy subsets de"ned on X, i.e. AmbigG classi"cation ambiguity (I) values of attributes (or NG, AG , etc. fuzzy classi"cation) i.e. nodes of fuzzy decision trees M( ) ) the cardinality of a fuzzy subset 1. Introduction " ) " the cardinality of a crisp subset X C G, groups of fuzzy subsets, i.e., at- Machine learning is the essential way to acquire tributes intelligence for any computer system. Learning from examples, i.e. concepts acquisition, is one of the most important branches of machine learning. * Corresponding author. It has been generally regarded as the bottle-neck of E-mail address: [email protected] (X. Wang) expert system development.

0165-0114/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 0 1 1 4 ( 9 7 ) 0 0 3 8 6 - 2 118 X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125

The induction of decision trees is an e$cient way root whereas the nodes having not any sonnodes of learning from examples [8]. Many methods have are called leaves. been developed for constructing decision trees [9] A decision tree is a kind of directed tree. The and these methods are very useful in building induction of decision trees is an important way of knowledge-based expert systems. Cognitive uncer- inductive learning. It has two key aspects, training tainties, such as vagueness and ambiguity, have and matching. The former is a process of construct- been incorporated into the knowledge induction ing trees from a training set which is a collection of process by using fuzzy decision trees [14]. Fuzzy objects whose classes are known, while the latter is ID3 algorithm and its variants [2, 4, 10, 12}14] are a process of judging classi"cation for unknown popular and e$cient methods of making fuzzy de- cases. ID3 is a typical algorithm for generating cision trees. The fuzzy ID3 can generate fuzzy deci- decision trees [8]. sion trees without much computation. It has the A fuzzy decision tree is a generalization of the great matching speed and is especially suitable for crisp case. Before giving the de"nition, we explain large-scale learning problems. some symbols. Throughout this paper, X is a given Most of existing algorithms for constructing "nite set, F(X) is the family of all fuzzy subsets (fuzzy) decision trees focus on the selection of ex- de"ned on X, and the cardinality measure of "+ panded attributes (e.g. [1, 4, 7, 8, 10, 13, 14]). They a fuzzy subset, A,isde"ned by M(A) VZ6 A(x) attempt to obtain a small-scale tree via the ex- (the membership of a fuzzy subset is denoted by panded attribute selection and to improve the clas- itself). si"cation accuracy for unknown cases. Generally, X L )) the smaller the scale of decision trees, the stronger De5nition 1. Let G F(X)(1 i m)bemgiven "X "' their generalizing capability. It is possible that the groups of fuzzy subsets, with the property G 1 reduction of decision tree scale results in the im- (" ) " denotes the cardinality of a crisp set). ¹ is provement of the classi"cation accuracy for un- a directed tree satisfying known cases. An important problem is whether or (a) each node of the tree belongs to F(X), not there exists an exact algorithm for constructing (b) for each not}leaf, N, whose all sonnodes con- the smallest-scale fuzzy decision tree. stitute a subset of F(X) denoted by C, there exists ) ) C"X 5 Centering on this optimal problem, this paper i (1 i m) such that G N, and discusses the representation of fuzzy decision (c) each leaf corresponds to one or several values trees and gives a comparison between fuzzy deci- of classi"cation decision. sion trees and crisp ones; proves that the algorithm Then, ¹ is called a fuzzy decision tree. Each group X complexity for constructing a kind of smallest- of fuzzy subsets, G, corresponds to an attribute and scale fuzzy decision tree is NP-hard; points each fuzzy subset corresponds to a value of the out the inherent defect of the fuzzy ID3 and pres- attribute. ents a new algorithm for constructing fuzzy deci- sion trees; and applies the new algorithm to three Example 1. Consider Table 1 [14]. Each column " practical problems and shows the improvement corresponds to a fuzzy subset de"ned on X + , " # of accuracy (in comparison with the fuzzy ID3 1, 2, 3,2,16 , for instance, sunny 0.9/1 # #2# algorithm). 0.8/2 0.0/3 1.0/16. Four attributes are as follows. Outlook"+Sunny, Cloudy, Rain,LF(X), 2. Fuzzy decision trees Temperature"+Hot, Mild, Cool,LF(X), Consider a directed tree of which each edge links Humidity"+Humid, Normal,LF(X), two nodes, the initial node and the terminal node. Wind"+Windy, Not Windy,LF(X). The former is called the fathernode of the latter } while the latter is said to be the sonnode of the In the last column, three symbols, <, S and =, former. The node having not its fathernode is the denote three sports to play on weekends, Volleyball, X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125 119

Swimming and Weight}lifting, respectively. A fuzzy is not unique, matching proceeds along several decision tree can be constructed by using the algo- branches. (c) The decision with maximum degree of rithm in [14] to train Table 1. It is shown as Fig. 1. truth is assigned to the matching result.

The general matching strategy of fuzzy decision Example 2. Consider two examples remaining to trees is described as follows. be classi"ed e1"(0.0, 0.6, 0.4; 0.3, 0.7, 0.0; 0.5, 0.5; (a) Matching starts from the root and ends at 0.5, 0.5) and e2"(0.9, 0.1, 0.0; 0.8, 0.2, 0.0; 0.5, 0.5; a leaf along the branch of the maximum member- 0.8, 0.2). The process of matching in the fuzzy deci- ship. (b) If the maximum membership at the node sion tree (Fig. 1) is shown in Fig. 2.

Table 1 A small training set

No. Outlook Temperature Humidity Humid Normal Class

Sunny Cloudy Rain Hot Mild Cool Humid Normal Windy Not}windy V S W

1 0.9 0.1 0.0 1.0 0.0 0.0 0.8 0.2 0.4 0.6 0.0 0.8 0.2 2 0.8 0.2 0.0 0.6 0.4 0.0 0.0 1.0 0.0 1.0 1.0 0.7 0.0 3 0.0 0.7 0.3 0.8 0.2 0.0 0.1 0.9 0.2 0.8 0.3 0.6 0.1 4 0.2 0.7 0.1 0.3 0.7 0.0 0.2 0.8 0.3 0.7 0.9 0.1 0.0 5 0.0 0.1 0.9 0.7 0.3 0.0 0.5 0.5 0.5 0.5 0.0 0.0 1.0 6 0.0 0.7 0.3 0.0 0.3 0.7 0.7 0.3 0.4 0.6 0.2 0.0 0.8 7 0.0 0.3 0.7 0.0 0.0 1.0 0.0 1.0 0.1 0.9 0.0 0.0 1.0 8 0.0 1.0 0.0 0.0 0.2 0.8 0.2 0.8 0.0 1.0 0.7 0.0 0.3 9 1.0 0.0 0.0 1.0 0.0 0.0 0.6 0.4 0.7 0.3 0.2 0.8 0.0 10 0.9 0.1 0.0 0.0 0.3 0.7 0.0 1.0 0.9 0.1 0.0 0.3 0.7 11 0.7 0.3 0.0 1.0 0.0 0.0 1.0 0.0 0.2 0.8 0.4 0.7 0.0 12 0.2 0.6 0.2 0.0 1.0 0.0 0.3 0.7 0.3 0.7 0.7 0.2 0.1 13 0.9 0.1 0.0 0.2 0.8 0.0 0.1 0.9 1.0 0.0 0.0 0.0 1.0 14 0.0 0.9 0.1 0.0 0.9 0.1 0.1 0.9 0.7 0.3 0.0 0.0 1.0 15 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 0.8 0.2 0.0 0.0 1.0 16 1.0 0.0 0.0 0.5 0.5 0.0 0.0 1.0 0.0 1.0 0.8 0.6 0.0

Fig. 1. A fuzzy decision tree generated by training Table 1. (The percentage attached to each decision is the degree of truth on the decision.) 120 X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125

Fig. 2. The matching process of two examples. e1: the matching result is =. e2: the matching result is S.

Table 2 A comparison between the fuzzy decision tree and the crisp case

Crisp decision tree Fuzzy decision tree

Nodes are crisp subsets of X Nodes are fuzzy subsets of X + , + , If N is not a leaf and NG is the family of all sonnodes of N,IfNis not a leaf and NG is the family of all sonnodes of N, 8 N "N 8 N LN then G G then G G A path from the root to a leaf corresponds to a production rule A path from the root to a leaf corresponds to a fuzzy rule with some degree of truth An example remaining to be classi"ed matches only one path in An example remaining to be classi"ed can match several paths the tree in the tree The intersection of subnodes located on the same layer is empty The intersection of subnodes located on the same layer can be nonempty

The fuzzy decision tree, regarded as a generaliz- subsets) trees are expanded at the nodes considered ation of the crisp case, is more robust in tolerating i.e. sonnodes of the nodes are generated. The other is imprecise information. Compared with the crisp the judgment on leaves. Nodes are usually regarded case, the fuzzy decision tree has the characteristics as leaves if the relative frequency of one class is listed in Table 2. greater than or equal to a given threshold value. Learning algorithms typically use heuristics to There are many methods for representing fuzzy guide their search. A general learning algorithm for decision trees. In essential, the representation in this generating fuzzy decision trees can be described as paper is the same as in the article [14] where the follows. fuzzy decision tree is regarded as fuzzy partitioning. Consider the whole training set which is re- garded as the "rst candidate node. 3. Optimization of fuzzy decision trees WHILE there exist candidate nodes DO select one using the search strategy; if the There are two key points in the process of con- selected one is not a leaf, then generate its son- structing fuzzy decision trees. One is the selection of nodes by selecting the expanded attribute using expanded attributes. They are such attributes that a heuristic. These sonnodes are regarded as new according to values of attributes (which are fuzzy candidate nodes. X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125 121

From existing references, two powerful heuristics The heuristics (1) and (2) can be easily extended guiding the selection of expanded attributes in the to the case of fuzzy classi"cation with more than fuzzy decision tree generation can be found. One, two fuzzy subsets. The heuristic (2) has an option of called Fuzzy-ID3, is based on minimum fuzzy en- signi"cant level which will a!ect the generation of tropy [2, 4, 10, 13] whereas the other is based on the fuzzy decision trees. A fuzzy decision tree, as shown reduction of classi"cation ambiguity [14]. The lat- in Fig. 2, is generated by using the heuristic (2) ter is a variant of the former. These two heuristics where the signi"cant level is taken to be 0.5. The are brie#y described as follows. same fuzzy decision tree can be generated by using Let there be N training examples and n attributes the heuristic (1). It is just a coincidence. Generally, A(),2, A(L). For each k (1)k)n), the attribute di!erent heuristics will result in di!erent trees. Intu- A(I) takes m values of fuzzy subsets, A(I),2, A(I). itively, both the weighted average of fuzzy entropies I  mI For simplicity, the fuzzy classi"cation is considered and the weighted average of classi"cation ambi- to be two fuzzy subsets, denoted by P and N res- guities will decrease when pectively. For each attribute value (fuzzy subset), (I) ) ) )) (I) (I) P (I) (I) P AG (1 k n,1 i mI), its relative frequencies Max(pG , qG ) 1 or Min(pG , qG ) 0. concerning P and N are In the process of generating fuzzy decision trees, p(I)"M(A(I) 5 P)/M(A(I)) G G G both heuristic (1) and heuristic (2) attempt to reduce and the average depth of trees, i.e. on an average, to generate leaves as soon as possible. (I)" (I) 5 (I) qG M(AG N)/M(AG ), One of main objectives of fuzzy decision tree respectively. The relative frequency is regarded as induction is to generate a tree with high accuracy of the degree of truth of a fuzzy rule in [14]. classi"cation for unknown cases. Which factors in- #uence the accuracy? Experimental results [1] Heuristic (1). Fuzzy ID3 based on the minimum show, at least, that the selection of expanded at- fuzzy entropy. tributes is an important factor. Most existing re- searches on fuzzy (crisp) decision trees are focused Select such an integer k (the kth attribute) that on this selection (e.g. [1, 4, 7, 8, 10, 13, 14]). In the E "Min ) ) E , where k 1 k n I following, we analyze this problem from the opti- mI mI " + (I) + (I) (I) mization angle. EI AM(AG ) M(AH )B EntrG , G H Essentially, the accuracy of the classi"cation for unknown cases depends mainly on two aspects, the k"1, 2,2, n; size of decision trees and the representativeness of (I)"! (I) (I)! (I) (I) training data. This paper considers only the former. EntrG pG log pG qG log qG For two decision trees generated by training the denotes the fuzzy entropy of classi"cation. same data, one prefers to select the small-scale tree for classifying unknown cases. There are several Heuristic (2). A variant of Fuzzy ID3 based on norms for measuring the scale of decision trees, that the minimum classi"cation ambiguity. is, the complexity can be de"ned in various ways, Select such an integer k (the kth attribute) that but common measures include the total number of G "Min ) ) G , where I 1 k n I leaves and the average depth of decision trees. One mI mI naturally hopes that the total number is as few as " + (I) + (I) (I) GI AM(AG ) M(AH )B AmbigG , possible and the average depth is as little as pos- G H sible respectively. Based on this viewpoint, two k"1, 2,2, n; optimization problems concerning fuzzy decision trees are proposed as follows. Ambig(I)"Min(p(I), q(I))/Max(p(I), q(I)) G G G G G (A) Look for a fuzzy decision tree with the min- denotes the ambiguity of classi"cation. imum total number of leaves. 122 X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125

(B) Look for a fuzzy decision tree with the min- We will accomplish the proof via reducing the imum average depth of leaves. problem of optimal set cover to the problem of The following theorem gives us the computation decision tree with minimum number of leaves. complexity for the problem (A). (We guess that Without losing generality, we illustrate how to con- the problem (B) is also NP hard, but cannot struct a transformation to implement the reducing give the proof.) The proof of the theorem and rela- process by using a concrete example. ¹"+ , "+ ,"++ tive explanation are given in the last part of this Let 1, 2, 3,2,7 ,F S,2,S 1, 4, section. 5, 7,, +3, 4,, +2, 5, 7,, +1, 2, 6,, +1, 3, 7,, +3, 5, 6,,.Itis ¹ 8 "¹ obvious that F is a set cover of , i.e. 1Z$ S . Theorem. ¹he problem (A) is NP hard. Constructing a characteristic table (Table 3) of this set cover, we regard Table 3 as 7 positive examples A computational problem is said to be a P-prob- and e"(0, 0, 0, 0, 0, 0) as the uniquely negative lem if there exists such an algorithm that the exact example. In Table 3, FG corresponds to an attribute " O ) ) solution can be obtained within polynomial time, if SG [FG 0] holds for each i (1 i 6). not, it is called NP-hard. So far, only the heuristic Each set cover of ¹ corresponds to a fuzzy deci- + , algorithm concerning the approximate solution can sion tree. For example, S, S, S is a set cover of be given for the NP-hard. It plays a key role in ¹ and the corresponding fuzzy decision tree is computation complexity theory that the algorithm shown as Fig. 3. The corresponding tree is fuzzy for obtaining the exact solution is given or the since the intersection of subnodes located on the problem is proved to be NP-hard. Details of NP- same layer is not empty. It is easy to see that the hard problems can be found in [3, 11]. number of subsets of the set cover"the number of It is unrealistic to "nd the exact algorithm for the leaves of the corresponding tree#1. problem (A) because it is NP hard. Common methods are to use heuristics to look for approxi- mately optimal solutions. Heuristics used in the likes of fuzzy ID3 attempt to reduce only the aver- age depth of leaves (e.g. heuristics (1) and (2) men- tioned above), but neglect discussing the number of leaves. This is the inherent defect of the likes of fuzzy ID3 algorithm. Our research shows that just the number of leaves plays a key role in improving the accuracy of classi"cation for unknown cases. Drawing inspiration from these two optimization problems, we design a new induction of fuzzy deci- Fig. 3. A fuzzy decision tree corresponding to the set cover + , sion trees, called MB algorithm, in the following S, S, S . section.

Proof of the theorem. Let A and B be two prob- Table 3 lems. If A is NP hard and A can be reduced to A characteristic table of the set cover F B within polynomial time, then B is also NP hard. F F F F F F The following problem of optimal set cover has No.       been proved to be NP hard in [5]. 1100110 Problem of optimal set cover: Let ¹" 2001100 + , "+ , L¹ 8 1, 2,2, m , F S,2, SN , SG and 1Z$S 3010011 "¹ L 8 4110000 .IfGis such a subfamily that G F, 1Z%S "¹ and "G""minimum, then G is called an opti- 5101001 ¹ " " 6000101 mal set cover of , where G denotes the cardinal- 7101010 ity of G. X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125 123

+ , Now, we have to convert the problem of optimal de"ned on X, A, A,2, AL . The classi"cation is + , set cover into the problem of fuzzy decision tree described by m fuzzy subsets, C, C,2, CK . with minimum number of leaves, which is just the (a) For "xed i (1)i)n), compute the condi- desired result. The proof is completed. tional frequency G" X" " " fH P( AG CH) for j 1, 2,2, m, X" " " 5 4. A new algorithm for constructing fuzzy decision where P( AG CH) M(AG CH)/M(CH). trees (b) For "xed i (1)i)n), select an integer k"k(i)(kdepends on i) such that In the process of expanding a Not}leaf node on the fuzzy decision tree, the new algorithm "rst f G" Max f G . I ) ) H selects the expanded attribute using heuristic (1) or 1 j m (2) mentioned previously, and then uses fuzzy value Then put the label, &&k(i)-class'', on the branch X" clustering to merge branches which the Not}leaf [ AG]. node sends out. Merging branches is essential for (c) Cluster according to the following rule: the new algorithm. if k(i )-class"k(i )-class The aim of selecting the expanded attribute via   then the two branches, [X"A ] and [X"A ], using heuristic (1) or (2) mentioned in Section 3 is i i merge into one branch. an attempt to reduce the average distance from the current node to its childnodes, i.e. attempt to satisfy After clustering, values of the attribute X are the optimization principle (B) mentioned in Section divided into ¸ groups. Generally, ¸)n. 3. It guarantees that the maximum classi"cation (d) Expand the current Not}leaf node, each group information can be obtained when an unknown corresponds to a new branch (i.e., a new sonnode). example is tested on the current Not}leaf node. The Compared with the original heuristic, the new aim of merging branches is an attempt to guarantee algorithm makes the number of branches which the that the number of branches which the current current Not}leaf node sends out to decrease. We Not}leaf node sends out are as few as possible. call the new algorithm Merging Branches algo- That is, merging branches attempts to satisfy the rithm, in short, MB algorithm. optimization principle (A) mentioned in Section 3. Key points of the new algorithm are listed as Example 3. Using MB algorithm where the selec follows. tion of the expanded attribute is based on heuristic Let X be the expanded attribute at the current (1) to train Table 1, we obtain the fuzzy decision Not}leaf node and X has n values of fuzzy subsets tree which is shown as Fig. 4.

Fig. 4. A fuzzy decision tree constructed by using the new algorithm. 124 X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125

Table 4 An experimental comparison between the Fuzzy ID3 and the MB

Problem Algorithm Number of leaves Test accuracy Training time (s)

Sleep Fuzzy ID3 288 88% 18 MB 206 95% 32 RHCC Fuzzy ID3 602 72% 45 MB 463 81% 71 Diagnosis Fuzzy ID3 12 86% * MB 8 96% *

5. Analysis and comparison of experimental results algorithm are superior to that of the Fuzzy ID3 algorithm. But the training speed of the former is In the Pentium-586 microcomputer, we imple- slightly less than that of the latter and the selection ment the Fuzzy ID3 algorithm and the MB algo- of expanded attributes in the MB algorithm can be rithm using C## Language and apply them to repeated. It can be seen that the MB algorithm is three practical problems which are as follows. a reasonable compromise between the training The "rst problem is the automatic knowledge speed and the test accuracy. acquisition about sleep states which is provided by Illinois University [6]. It is a crisp case which consists of a group of typical data concerning learn- 6. Conclusions ing from examples. It provides 1236 examples which are divided into 6 classes. Each example has This paper discusses the optimization of fuzzy 11 attributes. We randomly select a training set of decision trees and has the following main results: 1000 examples and construct two decision trees (1) The computation complexity of making the using ID3 and MB, respectively. The experimental minimum fuzzy decision tree is NP hard. results of testing the remaining 236 examples are (2) The likes of Fuzzy ID3 algorithm have the shown in Table 4. inherent defect and, to some degree, the new algo- The second problem is the recognition of hand- rithm presented in this paper remedies this defect. written Chinese characters (RHCC) which pos- (3) With regard to the number of leaves and the sesses much fuzziness since their deformation is classi"cation accuracy for unknown cases, the new very serious. We select 100 Chinese characters used algorithm is superior to the fuzzy ID3. frequently, ask 30 students to write them, and then (4) The computation complexity of the optimiza- obtain 3000 examples. These examples regarded as tion problem (B) mentioned in Section 3 remains to images are inputted by a scanner. After extracting be studied further. 12 fuzzy features (attributes), we select a training set of 2000 examples and obtain the experimental re- References sults shown in Table 4. The third problem is the fault diagnosis of tur- [1] W. Buntine, T. Niblett, A further comparison of splitting bine generators [15]. It is a learning problem of rules for decision-tree induction, Machine Learning continuous-valued attributes. It provides 40 exam- 8 (1992) 75}85. ples divided into 4 types of faults. Each example has [2] K.J. Cios, L.M. Sztandera, Continuous ID3 algorithm 8 continuous-valued attributes. After fuzzifying with fuzzy entropy measures, Proc. IEEE Internat. Conf. a training set of 30 examples, we obtain the experi- on Fuzzy Systems, San Diego, CA, March 1992, pp. 469}476. mental results shown in Table 4. [3] M.R. Garey and D.S. Johnson, Computers and Intracta- The experimental results in Table 4 show that the bility: A Guide to the Theory of NP-Completeness, size of each tree and the test accuracy of the MB Printed in the United States of America, 1979. X. Wang et al. / Fuzzy Sets and Systems 112 (2000) 117}125 125

[4] H. Ichihashi et al., Neuro-Fuzzy ID3, Fuzzy Sets and Conf. on Fuzzy Systems, San Diego, CA, March 1992, Systems 81 (1996) 151}167. pp. 923}930. [5] D.S. Johnson, Approximate algorithm for combinational [11] J.F. Traub, Algorithms and Complexity: New Directions problems, J. Comput. Systems Sci. 9(3) (1973). and Recent Results, Academic Press, New York, 1976. [6] R.S. Michalski, I. Mozetic and J.R. Hong, The multipur- [12] L.X. Wang, J.M. Mendel, Generating fuzzy rules by learn- pose incremental learning system AQ15, Proc. 5th AAAI ing from examples, IEEE Trans. System Man Cybernet. 22 (1986) 1041}1045. (1992) 1414}1427. [7] J. Mingers, An empirical comparison of selection measures [13] R.Weber, Fuzzy-ID3: a class of methods for automatic for decision-tree induction, Machine Learning 3 (1989) knowledge acquisition, Proc. 2nd Internat. Conf. on Fuzzy 319}342. Logic and Neural Networks, Iizuka, Japan, 17}22 July [8] J.R. Quinlan, Induction of decision trees, Machine Learn- 1992, pp. 265}268. ing 1 (1986) 81}106. [14] Y. Yuan, M.J. Shaw, Induction of fuzzy decision trees, [9] S.R. Safavian, D. Landgrebe, A survey of decision tree Fuzzy Sets and Systems 69 (1995) 125}139. classi"er methodology, IEEE Trans. System Man Cyber- [15] L. Zhansheng, Study on Diagnostic Expert Systems and net. 21 (1991) 660}674. its Knowledge acquisition for Rotating Machinery, [10] T. Tani, M. Sakoda, Fuzzy Modeling by ID3 algorithm Ph.D. Dissertation, Harbin Institute of Technology, and its application to prediction, Proc. IEEE Internat. 1995.