<<

University of Pennsylvania ScholarlyCommons

Technical Reports (CIS) Department of Computer & Information Science

August 1992

Search Through Systematic

Ron Rymon University of Pennsylvania

Follow this and additional works at: https://repository.upenn.edu/cis_reports

Recommended Citation Ron Rymon, " Search Through Systematic Set Enumeration", . August 1992.

University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-92-66.

This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_reports/297 For more information, please contact [email protected]. Search Through Systematic Set Enumeration

Abstract In many problem domains, solutions take the form of unordered sets. We present the Set- (SE)-tree - a vehicle for representing sets and/or enumerating them in a best-first fashion. eW demonstrate its usefulness as the for a unifying search-based framework for domains where minimal (maximal) elements of a are targeted, where minimal (maximal) partial instantiations of a set of variables are sought, or where a composite decision is not dependent on the order in which its primitive component-decisions are taken. Particular instantiations of SE-tree-based algorithms for some AI problem domains are used to demonstrate the general features of the approach. These algorithms are compared theoretically and empirically with current algorithms.

Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MS- CIS-92-66.

This technical report is available at ScholarlyCommons: https://repository.upenn.edu/cis_reports/297 Search Through Systematic Set Enumeration

MS-CIS-92-66 LINC LAB 234

Ron Rymon

University of Pennsylvania School of Engineering and Applied Science Computer and Information Science Department Philadelphia, PA 19104-6389

August 1992 Search through Systematic Set Enumeration

Ron Rymon* Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104

In Proceedings KR-92, Cambridge HA, October 1992

Abstract More generally, variables can be instantiated from an arbitrary domain. In many problem domains, solutions take the Researchers in Artificial Intelligence (AI) have also form of unordered sets. We present the Set- made use of such abstract problems in their models. Enumeration (SE)-tree - a vehicle for repre- The HS problem, for example, was used by [Reiter 871 senting sets and/or enumerating them in a in his formalization of diagnosis. In a newer character- best-first fashion. We demonstrate its use- ization, diagnoses are viewed as partial assignments of fulness as the basis for a unifying search- state to components [de Kleer et al. 901. Many other based framework for domains where minimal A1 problems are, or could be, formulated so as to ad- (maximal) elements of a power set are tar- mit sets as solutions. geted, where minimal (maximal) partial in- stantiations of a set of variables are sought, Our goal in introducing the Set-Enumeration (SE)-tree or where a composite decision is not de- is to provide a unified search-based framework for solv- pendent on the order in which its primitive ing such problems, albeit their problem-independent component-decisions are taken. Particular solution criteria. SE-tree-based algorithms for differ- instantiations of SE-tree-based algorithms for ent problems will share their skeletal structure, but some A1 problem domains are used to demon- will each use additional domain-specific tactics. Fur- strate the general features of the approach. thermore, at a certain level of abstraction, even those These algorithms are compared theoretically tactics are general and can be shared across domains. and empirically with current algoritlims. General tactics identified here include pruning rules which exploit the SE-tree structure, exploration poli- cies, and problem decomposition methods. Incremen- 1 INTRODUCTION tal versions of SE-tree-based algorithms can be con- structed for some problem domains. In what follows, Many problems admit solutions we use particular instantiations of SE-tree-based al- which are elements of a given power-set. Typically, gorithms to demonstrate the general features of the such sets are required to satisfy some problem-specific approach. criterion which designates them as solutions. In Consistency-based formulations of diagnosis will serve many cases, such criteria either include, or are aug- as a working example for most of this paper. Section 2 mented with, some minimality/maximality require- begins with a formal description of the basic SE-tree. ment. Consider, for example, the Hitting-Set (HS) Section 3 presents an SE-tree-based hitting-sets alge problem [Karp 721. Given a collection of sets, solu- rithm (SE-HS) for Reiter' original formulation. Being tions are required to have a non-empty best-first, it can use any of a of exploration with each member of the collection. In applications policies. This algorithm is first contrasted, as is, with of the HS problem, solutions are typically interesting the original algorithm. The SE-tree structure is then minimal with respect to set inclusion. In a more gen- used to improve SEHS by pruning away unpromis- eral of ~roblems.solutions are ~artialinstantia- ing parts of the search . We conclude with an tions of a sed of variables. A hitting-set, for example, empirical comparison of the two algorithms. can also be described as a membership-based mapping from the underlying set of primitive elements to {O,l). In Section 4, we extend the SE-tree to fit the more gen- eral class of problems, where solutions take the form 'Address for correspondence: Ron Rymon, Computer of partially instantiated sets of variables. Section 5 and Information Science, Room 423C, 3401 Walnut Street, a Philadelphia PA 19104, e-mail: [email protected]. presents an extended version of SE-HS for newer characterization of diagnoses [de Kleer et al. 901. Al- though derived from a very general search framework, this algorithm corresponds to a implicate gen- eration algorithm proposed by [Slagle et al. 701, and is empirically shown to perform quite well compared to a recent algorithm [Ngair 921. Unlike Slagle et al.'s algorithm, the extended SE-HS can work under diag- nostic theories with multiple fault modes, can use a of exploration policies for focusing purposes, and has an incremental version. Furthermore, we sub- sequently augment it with a problem decomposition tactic, thereby obtaining an improved version of Slagle et a1.k algorithm. Finally, we briefly review potential use of the SE-tree in abductive diagnostic frameworks. Figure 1: SE-tree for P({l,2,3,4)) In Section 6, we contrast features of the SE-tree with decision trees in the context of learning classification rules from examples. For lack of space, the scope of Notice also that the SE-tree can be used as a data this study is very limited and the reader is referred to structure for caching unordered sets, and as an ef- [Rymon 92b] for a more detailed analysis and empiri- fective means of checking whether a new set is sub- cal evaluation. sumed by any of those already cached. [de Kleer 921 has made such use of an SE-tree and reports significant 2 THE BASIC SE-TREE improvements in run-time. As a caching device, the SEtree is a special case of Knuth's trie data structure [Knuth 731, originally offered for ordered sets. While The Set-Enumeration (SE)-tree is a vehicle for repre- we too use the SE-tree for caching solutions and for senting and/or enumerating sets in a best-first fash- subsumption checking, our main objective in this pa- ion. The complete SEtree systematically enumerates per is its use in a search framework. elements of a power-set using a pre-imposed order on the underlying set of elements. In problems where the search space is a of that power-set that is (or 3 AN SE-TREE-BASED can be) closed under set-inclusion, the SE-tree induces a complete irredundant search technique. Let E be the HITTING-SET ALGORITHM underlying set of elements. We first index E's elements using a one-bone ind : E + W. Then, given In this section, we demonstrate the use of the ba- any subset SGE, we define its SE-tree view: sic SE-tree structure for a hitting-set algorithm in the context of Reiter's theory of diagnosis. We open 2.1 A Node's View with a brief introduction of Reiter's theory, to the in which a hitting-set problem is formulated. An SE-tree-based algorithm (SE-HS) is then con- trasted with the dag-based algorithm proposed by Definition 2.2 A Basic Set Enumeration Tree [Reiter 87, Greiner et al. 891 to show that a large num- ber of calls to a subsumption checking procedure can Let F be a collection of sets that is closed under C be saved. Then, the SE-tree systematicity allows im- (i.e. for every SEF, if S'CS then S'EF). T is a Set proving SE-HS via a domain-specific pruning rule. Enumeration tree for F i$: Empirical comparison of the improved SE-HS with the dag-based implementation of [Greiner et al. 891 sup- 1. The mot of T is labeled by the ; ports our claims. 2. The children of a node labeled S in T are { S U{e} E F I eE View(ind,S) ). 3.1 REITER'S THEORY OF DIAGNOSIS

Figure 1 illustrates an SE-tree for the complete power- Reiter's theory of diagnosis [Reiter 871 is among the set of {1,2,3,4). Note that restricting a node's ex- the most widely referenced logic-based approaches to pansion to its View, ensures that every set is uniquely model-based diagnosis. For lack of space, we shall only explored within the tree. By itself, the idea of using present the and which Reiter uses to an imposed order is not new; it is used for similar pur- derive his hitting-set algorithm. poses in many specific algorithms. Our contribution is in identifying the SE-tree as a recurring search struc- Definition 3.1 A Diagnostic Problem [Reiter 871 ture, thereby facilitating its use in a general framework and the sharing of particular tactics. A diagnostic problem is a triple (SD,COMPS,OBS): 1. SD - the system description, is a set offirst order 3. Pruning: If a set C is to label a node n and it has sentences; not been used previously, then try to prune D:

def (a) If there is a node m which has been labeled 2. COMPS = {c~}~=~- the system's components, is with a set S' such that C S', then relabel a of constants; and rn with C. Prune all edges from m with arcs 3. OBS - the observations, is also a set of first order labeled with as from S'-C. sentences. (b) Interchange S' and C in CONFLICTS.

The in which diagnostic problems are ex- 3.3 SE-TREE-BASED ALTERNATIVE pressed is thus first order, and is augmented with an extra AB predicate (for abnormal). SgHS (Algorithm 3.4) is an SE-tree-based hitting set algorithm. In a best-first fashion, it explores nodes Definition 3.2 Conflict Set in an order conforming to some predetermined prior- ity function. For that purpose, nodes along the tree's Given a diagnostic problem, a conflict is a set of com- expanding fringe are kept in a priority queue and the ponents that cannot all be functioning correctly. Let next node to be expanded is accessed via the Next- CONFLICTS denote the collection of conflict sets. Best . Prioritization allows implementation Theorem 3.3 [Reiter 871 Given a diagnostic problem, of various exploration policies, to be discussed shortly. minimal diagnoses are precisely the minimal hitting Let us first assume that nodes are explored by their sets for CONFLICTS. ; i.e. breadth-first. Reiter's algorithm is an implementation of Thee Algorithm 3.4 Finding Minimal Hitting Sets rem 3.3. In two steps, it first discovers conflicts, and then runs an HS algorithm on the conflicts discovered. Program SE-HS (CONFLICTS) We shall concentrate on the latter phase. 1. Let HS + {); OPEN-NODES + {{))

3.2 DAG-BASED APPROACH 2. Until OPEN-NODES is empty do 3. Expand (Next-Best(0PEN-NODES)) Given a collection of conflict sets, Reiter's algorithm grows an HS-tree in which nodes represent partial hit- ting sets and leaves represent complete ones. To avoid Procedure Expand(S) highly redundant exploration, Reiter augments this 1. Let Window(S) +- { c 1 c E Vzew(ind,S) ) basic algorithm with a set of rules for reusing and pruning nodes. [Greiner et al. 891 present a correction 2. For each c~ Window(S) which is to this algorithm which uses a directed acyclic graph a member of some set from NYH(S) do (dag). It proceeds as follows: 3. Unless there is S'EHS such that S'CSU{c) 1. Let D represent a growing HS-dag. Label its root with an arbitrary CECONFLICTS; 4 - If %{c) is a hitting set, add it to HS; 2. Process nodes in D in a breadth-first order. To 5. Otherwise, add it to OPEN-NODES. process a node n: (a) Let H(n) be the set of edge labels on the The main SE-HS program simply implements a best- path from the root to n. If H(n) hits all sets first search. The algorithm's functionality is embod- in CONFLICTS, mark it as a minimal hitting ied in its Expand procedure, where the SE-tree struc- set. Otherwise, label n with the first set of ture is used. and where hittine" sets are identified. CONFLICTS which is not hit by H(n). In choosing viable expansions for a node labeled S, (b) If n is labeled by C, generate a downward arc we restrict ourselves to components within S's View. labeled by any a E C. Such components are also required to participate in conflicts not yet hit by S (denoted NYH(S)). Step This algorithm is augmented with three types of rules 3 in Expand prunes away nodes subsumed by mini- for expanding a node n: mal hitting sets. It corresponds to the closing step in [Greiner et al. 891. However, SE-HS avoids the redun- 1. Reusing: If there is another node m for which dancy for which &using rules were devised, and does H(m) = H(n) U {g), do not expand n, but rather not require pruning. link it to m, labeling that link with a. 2. Closing: If there is a node m which is marked as Theorem 3.5 If nodes are prioritized by their label's a hitting set, such that H(m) H(n), then close cardinality, then SE-HS is correct (produces all and n, i.e. do not expand it at all. only minimal hitting sets.) 3.4 PROJECTED GAIN set-inclusion. We have not used the systematic order- ing of nodes in the SEtree for that purpose. That or- The HS-dag algorithm uses three pruning rules, each dering provides a restriction on node labels which can of which is computationally expensive and requires nu- occur in a given node's sub-tree. More specifically, let merous calls to a subsumption checking procedure. In S be a node's label, then the sub-tree rooted at that examining the purpose of these rules, we note that (1) node will only have nodes whose labels are expansions Reusing is aimed at avoiding redundancy in search, of S with components from View(ind,S). Thus, in i.e. the phenomena that same part of the search space choosing viable expansions for S, we can restrict our- is repeatedly explored within the HS-tree. It requires selves to expansions such that every set that will not comparing each new node to every previous node; (2) be hit by the expanded set will still contain compo- Closing is aimed at shutting nodes which are super- nents within its View (and thus stand the chance of sets of minimal diagnoses. For that purpose, if the being hit by any of that node's descendants). HS-dag is explored breadth-first, each node will only have to be compared against previous minimal hitting This is, in fact, a general feature of an SE-tree-based sets; finally (3) Pruning is aimed at "correcting" the search program: the systematic enumeration embed- HS-dag from the effects of non-minimal conflict sets. ded in the SE-tree structure allows us to ignore parts The same effect could also be achieved a priori, by of the space which do not have the potential to lead to "sorting" CONFLICTS by cardinality. a solution. As previously explained, while closing cannot be To incorporate this pruning rule into SE-HS, it is suf- avoided, SEHS requires neither reusing, nor pruning. ficient to modify the node expansion routine. Avoiding numerous calls to a subsumption checking procedure results in a tremendous improvement in run Algorithm 3.8 Node Expansion (version 2) time (see Section 3.7). Procedure Expan d(S) 3.5 EXPLORATION POLICIES 1. Let Window(S)+ { c ( cEView(ind,S) } n Due to its potentially exponential size, it may often be { c 1 and(c)

We have already seen that exploration by cardinal- 3.7 DEMONSTRATED GAIN ity is correct. Simpler diagnoses are explored first using this exploration policy. Other interesting poli- To demonstrate the advantages of SE-HS over the dag- def based algorithm, we will first work through a complete cies include exploration by ($(S) - example (taken from [Reiter 87]), and will then present Prob(S is a diagnosis)), and by utility or some other the results of extensive empirical experiments. monotonic external criterion imposed on sets. Example 3.9 Consider the following collection of 3.6 PRUNING UNPROMISING PARTS OF conflicts {{2,4,5), {1,2,3), {I, 3,5), {2,4,6}, (2941, THE SEARCH SPACE {2,3,5), {1,6)) [Reiter 871. Figure 2 depicts the cor- responding HS-dag, where 0's mark hitting sets, and So far, nodes were pruned only if subsumed by known X's denote closed nodes. The rightmost branch from hitting-sets, thereby using the minimality requirement the root was pruned by the last node to be explored and the monotonicity of the SE-tree with respect to (itself a descendant of that branch). 4 THE EXTENDED SE-TREE

Sometimes the space being searched consists not of sets of components, but rather of sets of partially in- stantiated attributes (variables). We next extend the SE-tree accordingly.

Definition 4.1 Partial Descriptions Let ATTRS~~ be a set of attributes, with do- naains {DO~(A~))~=~.A partial description is a sub- set of ATTRS, each of which is znstantiated with one value from its domain. It is complete if all attributes are instantiated. Figure 2: An HS-dag Consider, for example, the space defined by 3 boolean attributes. The set {A2=T) is a partial description In contrast, figure 3 shows an SE-tree for the same in that space. {A1=T,A2=F,A3=F} is a complete de- problem. In comparing, note the difference in nota- scription. tion: nodes in the HS-dag are labeled with the set As with its basic counterpart, to define the extended that is chosen to be hit next whereas the SE-tree's la- SE-tree we first impose an ordering (and) on ATTRS, bels are partial hitting sets. In the HS-dag the partial and define a node's View as all attributes ranked hitting sets label the path from the root to the partic- higher than the highest ranked attribute participating ular node. The new pruning rule results in fewer nodes in that node. Then, being explored: 16 in SE-HS versus 34 in HS-dag. Definition 4.2 An Extended Set Enumeration Tree Let F be a collection of sets of attribute instantiations such that each set contains at most one value for each attribute and such that F is closed under C_, then T is an extended SE-tree for F ifl:

1. The root of T is labeled by the empty set; 2. The children of a node S in T are

Figure 3: An SE-tree Figure 6 depicts an extended SE-tree for the complete space defined by three boolean attributes. Note the use of reduced where i stands for {A; = T}, In , we have empirically compared SE-HS's and -i represents {Ai = F). performance with that of an HS-dag implementa- tion which was provided to us by Barbara Smith [Greiner et al. 891. The run-time and number of nodes generated by each of the two implementations were tested on hundreds of randomly generated test cases. Test cases were generated using three parameters: number of conflicts (denoted #conf), number of com- ponents in each conflict (#lit), and overall number of components (#camp). Figures 4 and 5 use loga- rithmic scale to present one-way sensitivity analyses with respect to each of the parameters and with re- spect to both run-time (CPU seconds) and nodes gen- erated. SE-HS's performance is indicated by the shad- owed squares, and that of HS-dag by the open ones. Each data point was obtained by averaging each al- gorithm's performance on 10 random cases with same Figure 6: Complete SE-tree for 3 Boolean Attributes parameters. Figure 4: Run Time of SEHS (u) versus HS-dag (0)

Figure 5: Number of Nodes Explored by SEHS (M) versus HS-dag (0)

5 SE-TREE-BASED PRIME itly specify working and non-working condition, with- IMPLICATE ALGORITHM out any presumption about other components' state.

Definition 5.1 AB-Clause [de Kleer et a]. 901 In this section, we present an of SE-HS and demonstrate its use for the diagnostic framework of Let an AB-literal be AB(C), or -AB(C) for some [de Kleer et al. 901. We begin with a short descrip- cECOMPS. An AB-clause as a disjunction of AB-literals tion of the extended theory where diagnoses containing no complementary pair of AB-literals. An are characterized as prime implicants of the (newly AB-clause is positive if all its ~B-literalsare positive. defined) set of conflicts. An extension of SE-HS, pre- sented next, can be used to find those kernel diag- Definition 5.2 Conflict [de Kleer et al. 901 noses. The extended SE-HS has other useful prop- erties: it can be flexibly focused; it can work with A conflict is any AB-clause entailed by SDUOBS. A multiple behavioral modes; and it has an incremen- conflict set is its underlying set of AB-literals. tal version. A two-niode restriction of this algorithm corresponds to an old prime implicate generation algo- Note that the new definition extends Reiter's original rithm [Slagle et al. 701. We first demonstrate the em- definition which, roughly speaking, allows only posi- pirical performance of this restricted version compared tive conflicts. We shall interchangeably speak about to a recent prime implicate generation algorithm. We conflicts and their underlying sets. then augment it with a new problem decomposition tactic, thereby obtaining an improved algorithm for Definition 5.3 Partial Diagnosis [de Kleer et al. 901 prime implicate generation. A partial diagnosis is a conjunctzon of AB-literals P such that P is satisfiable (does not contain complemen- 5.1 EXTENDED THEORY OF DIAGNOSIS tary pairs), and for any other satisfiable conjunction 4 covered by P, SWOBAJ4 is satisfiable. [de Kleer et al. 901 extended Reiter's theory with the notion of kernel diagnoses. Rather than having a diag- In other words, not only is P consistent with the sys- nosis represent only faulty components (with the im- tem description and the observed behavior, but also plicit assumption that all other components function any extension of P that assigns either AB, or TAB to properly), the new theory allows a diagnosis to explic- components not mentioned in P, is also consistent. Definition 5.4 Kernel Diagnosis [de Kleer et al. 901 Algorithm 5.8 Node Expansion (version 3) A kernel diagnosis is a partial diagnosis such that the only partial diagnosis which covers it is itself. Procedure Expand(S) 1. Let Window(S)+ [de Kleer et al. 901 use the notion of prime implicants { c I cE View(and,S) ) fl to characterize kernel diagnoses: { C 1 ind(c)

Figure 7: Run Time of extended SE-HS (M) versus PHI-based algorithm (0)

Figure 9: SE-tree Exploration with Decomposition

set problem. [biter 871 presents a transformation of a GSC representation of a diagnostic problem into his own framework. There is, in fact, a better transfor- mation which avoids the conflict generation part of Reiter's theory by mapping the GSC problem directly into an HS one. Then, we could simply use SE-HS. Figure 8: SE-tree without Decomposition Given a set of symptoms si, we could define a "con- flict set" for each symptom: conflict(si) sf {d ) d is a disease, d causes si) model, a diagnostic problem is represented in a bi- partite graph in which symptoms and disorders form Presented with sj, the conflict asserts that it is impos- each of the respective partitions. Each disorder in the sible that none of its causing disorders are present. It graph is linked to all of its symptoms via a causes re- is easy to prove that a set of disorders is a minimal set lation. Given a set of observed symptoms, a diagnosis cover iff it is a minimal hitting set for such conflicts. is defined as a minimal set of disorders which covers In [Peng & Reggia 871, hypotheses are explored by all symptoms. their likelihood. The SE-tree-based framework allows A most-probable-first search algorithm for that prob- such exploration, as well as a variety of other explo- lem is described in [Peng & Reggia 871. It searches the ration policies. In [Peng & Reggia 871, non-minimal space of sets of disorders for such sets which cover all hypotheses are also explored. This is easily done in symptoms. This algorithm, however, is redundant in SEHS by removing the subsumption requirement (Ex- that partial hypotheses may be discovered repeatedly pand, step 3). In addition, pruning rules (cf. Sec- during search. That redundancy could be avoided if tion 3.6) can be used to avoid unpromising parts of an SEtree framework were adopted. the search space. Problem decomposition (cf. Sec- tion 5.4) may also be helpful in reducing time and Alternatively, the problem can be turned into a hitting cost. Finally, it seems that other models of diagno- sis in which solutions are defined in terms of sets, e.g. Goodman & Smyth $81. As Quinlan notes, one can of- [Bylander et al. 91, Poole 91, Console & Torasso 911, ten not afford to generate all possible decision trees in can also use an SE-tree-based search framework in order to choose the best one. Thus, ID3 (as do other their implementations. algorithms) uses a heuristic to guide its choice of at- tributes. One prominent heuristic is based on entropy- minimization, using Shannon's information-theoretic 6 LEARNING MINIMAL . CLASSIFICATION RULES 6.2 SE-TREE-BASED ALTERNATIVE Decision trees are an important tool, and serve as an underlying representation in many problem solv- Aimed at all minimal rules, SE-Learn (Algorithm 6.3) ing tasks. Significant research in Machine Learn- uses an SEtree-based framework. As before, open ing has used decision trees in architectures for induc- nodes are prioritized, facilitating various exploration tion of classification knowledge from examples. Best policies. In the context of learning, these will be used known are ID3 [Quinlan 861 and its descendants. In to represent bias and will be briefly discussed in the [Rymon 92b], we present an SE-tree-based character- of this section. As before, SE-Learn exploits the ization of the induction task, contrast it from classifi- systematic ordering to prune away unpromising parts cation and search perspectives with the decision-tree- of the search space (i.e. nodes which cannot lead to based framework, and compare the two empirically. minimal rules). Here, we will only contrast features of the two repre- sentations, concentrating on search aspects. Definition 6.2 Candidate Expansions

Definition 6.1 Rules Let S be a node, TSET(S) ef {~ETSETI SCt}. We say that (A=v) as a candidate ezpansion of A training set (TSET) is a collection of examples. S if AgView(ind,S), vgDom(A), and in addition Each example is a complete description for which a TSET(,W{(A=v)))#TSET(S). A node S will be called correct classification (denoted T) is known. A rule is impotent if either (1) TSET(S) is empty; or (2) there a partial description R such that if t,t'€TSET are such exist t,tJ€TSET(S) disagreeing on their class, and that RCt,t', then ~(t)= ~(t'). It is minimal if none only digering in their assignment to attributes not in of its subsets is a rule. View(ind,S).

The objective of a learning system is to learn rules Algorithm 6.3 Induction of Minimal Rules that can be expected to perform well not only on the training set, but also on new examples. While there is Program SE-Learn (TSET) no consensus as to the precise composition of such a 1. Let RULES t {), OPEN-NODES c collection, it is fairly acceptable that general (minimal) {{}I rules are preferable to specific ones. We shall therefore 2. Until OPEN-NODES is empty do concentrate on finding minimal classification rules2.

6.1 PROPOSED SOLUTION Procedure Expand(S) ID3 constructs a decision tree in which internal nodes are labeled with attributes, edges with instantiations 1. For each candidate expansion (A=v), let of these attributes, and leaves with a class prediction. Ref5U{(A=v)}, do Briefly, the tree is constructed by successively parti- tioning the set of training examples until all remaining 2. If R is not impotent, nor is it examples are equally classified. Such node becomes a subsumed by any R'ERULES then leaf and is labeled with that class. 3. If R is a rule then add it to RULES; While construction of an arbitrary decision tree that correctly classifies the training data is straightforward, 4. Or else add it to OPEN-NODES. it is well known that the success of decision-tree-based algorithms on future data is crucially dependent on the Theorem 6.4 If open nodes are prioritized by their particular order in which the attributes were chosen label's cardinality then SE-Learn is correct (produces in the successive refinement steps [Fayyad & Irani 88, all and only minimal rules.)

2We simplify here. Motivation for learning all minimal Given the incompleteness of the examples with which rules, variations of an SE-tree-based algorithm that learn they are presented, learning programs may often subsets of this collection, and empirical results are given in have to choose among a number of candidate clas- [Rymon 92b]. sifiers, all of which are consistent with the training set. External preference criteria, also referred to as decision trees. More specifically, all decision trees in bias [Mitchell 801, may be necessary for that purpose. which attributes are chosen monotonically with re- Within an SEtree-based framework, exploration poli- spect to some arbitrary indexing, are topologically and cies can serve in the implementation of such bias. In semantically equivalent to a tree formed from a subset programs such as SELearn, where all rules are ex- of the SEtree's edges. plored during the learning phase, an exploration pol- icy will serve in the classification of new objects by Complexity-wise, the SEtree's exhaustiveness and rel- guiding preference over possibly conflicting rules. In atively large initial branching factor are deceiving. Its variants of SE-Learn in which only a subset of the complexity is fairly close to that of a single decision tree: rules are learned, an exploration policy will implement a preference among possible subsets. As was the case for SE-HS, any exploration policy that is monotonic Theorem 6.5 SETree Size will result in a correct algorithm. Important policies If all attributes are b-valued, then the number of nodes include (1) exploration by cardinality, where a prefer- in a complete decision tree is ~?='=,b" bn. In sharp ence is given to simpler rules; (2) by probability (us- contrast, the size of a super-tree in which all decision ing either a known distribution or frequency in the trees are embedded is significantly larger: bn . n!. The training set), resulting in preference to characteriza- size of a complete SE-tree is only (b 1)". tion of denser parts of the search space; (3) using Shan- + non's information-theoretic measure, preferring more discriminating rules; and (4) by utility or some other monotone preference criterion. 7 Summary

6.3 PROJECTED GAIN AND COST Many problems in which partial sets or partially in- stantiated set of variables are targeted share a com- Three related problems arise when a decision tree is mon structure when viewed as search problems. We used as a framework for search and representation of presented the Set-Enumeration (SE)-tree as a sim- minimal rules: ple, complete and irredundant vehicle for represent- ing and/or enumerating sets in a best-first fashion. As 1. The minimality problem - rules will often not be such, it can serve as the basis for a search-based frame- discovered in their minimal form; work for many such problems. 2. The multiplicity problem - a minimal rule may be To demonstrate its usefulness and effectiveness, we discovered repeatedly, disguised in a number of its presented SEtree-based algorithms for the hitting- minimal subsets; and set problem, in the context of consistency-based di-

3. The incompleteness problem - some minimal rules agnosis. We used the particular instantiations of may not be discovered at all. these algorithms to demonstrate general features of the paradigm, and compare it with current algorithms. The minimality problem is often addressed by sub- Throughout this process, we developed several add-on sequently pruning the rules extracted from the de- tactics including SE-tree-based pruning rules, explo- cision tree [Quinlan 871. The replication problem, ration policies, and problem decomposition methods. a special case of multiplicity in which sub-trees are Besides their particular incarnations in the SE-HS al- replicated within a single decision tree, has been gorithms, those methods are general and can be shared addressed by several researchers, e.g. [Rivest 87, across many problem domains. In the last part of this Pagallo & Haussler 901. The more general multiplic- paper, in the context of rule induction, we compared ity problem, however, may take many other forms. features of an SEtree-based representation with one Incompleteness is the result of the mutual exclu- that is based on decision trees. siveness of decision-tree-based rules (see [Weiss & Indurkhya 911). Acknowledgements In contrast, the SEtree-based framework does not suf- fer from these problems: This research was supported in part by a graduate fel- lowship ARO Grant DAAL03-89-C0031PRI. I thank 1. Rules are always discovered in minimal form; Teow-Hin Ngair, Greg Provan, Russ Greiner, Alex Kean, and Ron Rivest for useful discussions and sug- 2. Minimal rules are always discovered uniquely; and gestions. I also thank Kevin Atteson, Michael Niv, 3. All minimal rules are discovered. Philip Resnik, Jeff Siskind, and Bonnie Webber for comments on previous drafts. Finally, I am grateful to The fact that any given decision tree may suffer from Barbara Smith for providing the HS-dag implementa- those problems suggests that none is globally optimal. tion and Teow-Hin Ngair for providing the code of his The SE-tree, however, can be shown to embed many prime implicate generation algorithm. References [Ngair 921 Ngair, T., Convex Spaces as an Order- Theoretic Basis for Problem Solving, Ph. D. The- [Bylander et al. 911 Bylander, T., Allemang, D., Tan- sis, Department of Computer and Information ner, M. C., and Josephson, J., The Computa- Science, University of Pennsylvania, 1992. tional Complexity of Abduction. Artificial Intel- [Pagallo & Haussler 901 Pagallo, G., and Haussler, D., ligence 49, pp. 25-60, 1991. Boolean Feature Discovery in Empirical Learning. [Console & Torasso 911 Console, L., and Torasso, P., Machine Learning, 5, pp. 71-99, 1990. A Spectrum of of Model-Based Diag- [Peng & Reggia 871 Peng, Y., and Reggia, J. A., Be- nosis. Computational Intelligence, 7, pp. 133-141, ing Comfortable with Plausible Diagnostic Hy- 1991. potheses. TR 1753, University of Maryland, 1987. [de Kleer & Williams 891 de Kleer, J. and Williams, [Poole 911 Poole, D., Representing Diagnostic Knowl- B., C., Diagnosis with Behavioral Modes. Proc. edge for Probabilistic Horn Abduction. Proc. IJCAI-89, Detroit MI, pp. 1324-1330, 1989. IJCAI-91, Sydney, Australia, 1991. [de Kleer et al. 901 de Kleer, J., Mackworth, A. K., [Provan & Poole 911 Provan, G. M., and Poole, D., and biter, R., Characterizing Diagnoses. Proc. The Utility of Consistency-Based Diagnostic AAAI-90, pp. 324330, Boston MA, 1990. Techniques. Proc. KR-91, Cambridge MA, pp. [de Kleer 911 de Kleer, J., Focusing on Probable Diag- 461-472, 1991. noses. Proc. AAAI-91, pp. 842-848, Anaheim CA, [Quinlan 861 Quinlan, J. R., Induction of Decision 1991. Trees. Machine Learning, 1(1):81-106, 1986. [de Kleer 921 de Kleer, J ., An Improved Incremental [Quinlan 871 Quinlan, J. R., Generating Production Algorithm for Generating Prime Implicates. Proc. Rules from Decision Trees. Proc. IJCAI-87, pp. AAAI-92, pp. 780-785, San Jose CA, 1992. 304-307, 1987. [Reggia et al. 851 Reggia, J. A., Nau, D. S. and Wang, [Fayyad & Irani 881 Fayyad, U. M., and Irani, K. B., P. Y., A Formal Model of Diagnostic . I. What Should be Minimized in a Decision Tree? Problem Formulation and Decomposition. Infor- Proc. AAAI-90, pp. 749-754, Boston MA, 1990. mation Sciences, 37, pp. 227-285, 1985. [Forbus & de Kleer 19881 Forbus, K. D., and de Kleer, [Reiter 871 Reiter, R., A Theory of Diagnosis From J ., Focusing the ATMS. Proc. AAAI-88, pp. 193- First Principles. Artificial Intelligence, 32, pp. 57- 198, Saint Paul MN, 1988. 95, 1987. [Goodman & Smyth 881 Goodman, R., and Snlyth, [Rivest 871 Rivest, R., Learning Decision Lists. Ma- P., Decision Tree Design from a Communication chine Learning, 2, pp. 229-246, 1987. Theory Standpoint. IEEE Rans. on Information [Rymon 92a] Rymon, R., SE-tree-based Candidate Theory, 34(5), 1988. Generation: Systematic Exploration in Model- [Greiner et al. 891 Greiner, R., Smith, B. A., and Based Diagnosis. In preparation, 1992. Wilkerson R. W., A Correction to the Algorithm [Rymon 92b] Rymon, R., An SE-tree-based Charac- in Reiter's Theory of Diagnosis. Artificial Intelli- terization of the Induction Problem. In prepara- gence, 41, pp. 79-88, 1989. tion, 1992. [Holzblatt 881 Holzblatt, L. J., Diagnosing Multiple [Tison 671 Tison, P., Generalized Consensus Theory Failures Using Knowledge of Component States. and Application to the Minimization of Boolean Pmc. IEEE AI Applications, pp. 139-143, 1988. Functions. IEEE Trans. on Computers, 16(4):446- [Karp 721 Karp, R. M., Reducibility Among Combi- 456, 1967. natorial Problems. In Miller and Thatcher eds., [Slagle et al. 701 Slagle, J. R., Chang, C, and Lee R. Complexity of Computer Computations, Plenum C., A New Algorithm for Generating Prime Inl- Press, New York, pp. 85-103, 1972. plicants. IEEE Rans. on Computers, 19(4), 1970. [Kean & Tsiknis 901 Kean, A., and Tsiknis, G., An [Tarjan 831 Tarjan, R., Data Structures and Network Incremental Method for Generating Prime impli- Algorithms. Society for Industrial and Applied cants/Implicates. Journal of Symbolic Computa- Mathematics, Philadelphia PA, 1983. tion, 9:185-206, 1990. [Weiss & Indurkhya 911 Weiss, S. M., and Indurkliya, [Knuth 731 Knuth, D. E., The Art of Computer Pro- N., Reduced Complexity Rule Induction. Proc. gramming, Vol. 3: Sorting and Searching. Addi- IJCAI-91, pp. 678-684, Sydney, Australia, 1991. son Wesley, 1973. [Wu 901 Wu, T. D., A Problem Decomposition [Mitchell 801 Mitchell, T. M., The Need for Biases Method for Efficient Diagnosis and in Learning Generalizations. TR 5-110, Rutgers of Multiple Disorders. Proc. SCA MC-90, pp. 86- University, 1980. 92, Washington DC, 1990.