Meta-Pattern Extraction: Mining Cycles *

Jennifer Seltzer t, James P. Buckley, Alvaro Monge Computer Science Department University of Dayton 300 College Park D~,t.on, Ohio 15469-2160 { seltzer, buckley, mongc} ~C~cps. udayton.ed u

Abstract as seen in [6] and [10]. It is largely our goal to bring to the reader’s attention the benefit of employing Inductive computing, comprised of machine a graph capable of possessing cycles to represent learaing, data mining, and knowledgediscov- learned knowledge and identifying the appropriate ery, seeks to extract causal patterns fromdata. propositional vertices as cycle participants. Meta-patterns are patterns of extrax:ted patterns. The meta-pattern we study here is the cycle. Why Cycles Typically Exist in Data In this paper, weillustrate the ubiquity of cycles anti argue that as responsible knowledge Muchof our social construction of reality stems di- engineers, we need to identify and actively seek rectly from the physical cycles of the earth revolv- cycles out of data. Wepresent a methodology ing around the sun, and its rotation on its axis: the for cycle mining and exemplify such a pursuit temporal calendar by which we live. We invent and in a generic relation’,d modelas well as briefly self-impose other cycles based on these: the seasons discussing the authors’implementation, com- and holidays of the year, the organization of the aca- puter system INDED. System INDEDis a demic and fiscal years. These cycles arc the basis for machine learning system implementing induc- much of our commonsense, domain kzmwledge, and tive logic programmingtechniques of pattern they spawn related cycles such as the cycle of sea- extraction. sons justifying the cycle of weather events we might experience. Humanbehavior is based on reinforcement, repe- Introduction and Motivation tition, and routine [7]. This humanbehavior manifests in the telephone calls and purchases we make, Knowledgediscovery in databases has been defined the food we eat, and the hardships, such as illness, as the sou-trivial process of identifying valid, novel, we experience. In our speech, we often allude to a potentially useful, and understandable patterns in negative sequence of humanbehaviors, such as those data [5]. A pattern is often denoted in the form manifested as overall poor quality in manufacturing, of an IF-THEN rule (IF antecedent THENconse- or more profoundly, violence in our society, as "being quenQ, where the antecedent and consequent are log- caught in a vicious cycle." Or we embrace a posi- ical conjunctions of predicates (first order logic) tive sequence of behaviors and try to secure a cycle propositions (propositional logic) [10]. In [9], the by reinforcing excellent employee performance or by author observes that knowledge can take on more instilling good habits in our children. coiuplex forms than a simple implication as a causal chain or network by interconnecting the consequent Cycles appear in our creative pursuits, as well. of one rule to the antecedent of another. Acyclic An analysis of the tonality of virtually any musical graphs are used extensively as knowledge representa- work of Bach will indicate a traversal around the tion constructs in knowledge discovery in databases circle of fifths [3]. A study of Escher’s art will haw~ the observer entranced in subtle, circular, repetitious *C,c~pyrightgt.’ 1999,American Association for Artifi- patterns [8]. cial InteUigence(www.aaaJ.org}. All rights reserved. t Partially supported under Grant 9806184by the Na- The authors contend that evidence of this cyclic tional Science Foundation. nature of hunmns and our world exists in our data.

466 SELTZER For example, consider the cyclic affect advertising has on sales. Assumean inductive learning system for a consumer products company has learned the following patterns or rules (note: the left hand side of--* below contains the antecedent, and the right hand side, the consequent): ® brand_X_advertises~ J one s_t, isits~tore_o f _brand_X Figure 1: Example of a cycle. Jones_z)isits_storeaof_brand_X ~ dones_buys_braad_X J one s_bu.ys_bran,l_X ~ brand_X_profits To identify and extract these special patterns can The chain of causal dep(,ndencies here is simply help us better understand and control our environ- brand_X.ad~,ertises g mont. Moreover, because events comprising a cycle Jones_visits_store..o f _brand_X are interdependent, we are given a choice as to the .lones.buys_brand_X 1) particular event on which to focus to perpetuate a cych’, and are assured a sense of continuity until the brand_X_profits cycle is broken. At this point, assume the learner discovers the rule. What is a Cycle? brand~\’_profits ---, brand_.X_advertise The graph theoretic definition tells us that a cycle This newly discovered rule forms a cycle of causal is a path in a directed graph that begins and ends dependencies with a feedback loop connecting the at tile same node. ~k, represent each logical propo- notion of profits and advertising (the more money sition in the knowledge base as a node. that is available, the more the company can spend on advertising). At this point, the system deems all Definition 1.1 (cycle) Let G = (V,E) be a di- four propositions cycle participants. Identifica- rected graph where u.u/,vo,t,1,...,vt E V. A path tion of cycle participants provides the user a choice: of length k from ~l to u’ is a sequence(Vo. vl ..... v~) t to break or to perpetuate the cycle. This choice is of vertices such that u = vo and u = v~ and dependent on the goals of the overall system. (.v~’-l,vi) E E for i = 1 ..... k. The length of the path is the number of edges in the path. A cycle is Why Cycles are Important a path (v0,vl,...,Vk) where, vo = vk: it is a simple cycle if vertices tq, . . ., ve are distinct [4]. The authors in [9] tell us that an isolated IF-THEN pattern or rule is useful when it helps to achieve Wecolmider a cycle intuitively as a chain of causal a system goal. Cycles are powerful sets of con- del)eud(.l,cies with feedback. By chain, we mean nected IF-THENrules for two related reasons. If a collection implications, or rules, where one rule the system goal is dependent on a cycle participant. proves the antecedent of another using modus po- then the user can choose to attain the system goal hens. The feedback manifests by the head (or conse- by perpetuating the cycle. To perpetuate, the user quent ) of ~t discovered rulc appearing in the body (or can activate the antecedent of any participating IF- antecedent) of a previously discovered rule in which THENrule. One antecedent fires all of the other one of its antecedent conjuncts is the head (all other participating IF-THENrules by k applications of conjuncts are true). Or. through transitive closure, modus poneus. This means that the user may" choose the head appears in the antecedent of a differcnt which event or scenario to activate. In essence, be- rule, the head of which participated in the chain of cause of the cyclic causality, activating one cyclc deduction to derive the new antecedent, again, by participant is equivalent to activating all k scenar- modus ponens. Figure 1 shows a cycle with its cor- ios. In the above exaznple, the company producing respouding rules. brand..X need only activate one cycle participant In this paper, we use a hypergraph knowledge - such as putting up a bi]lboard - so as to assert representation structure discussed in [13] to repre- the fact brand_X_advertises to activate the cycle sent logical implications and dependencies amongst of profit acquisition. Similarly, if the goals of the knowledge base propositions. Although tim forth- system warrant that the cycle be broken, only one coming cycle detection algorithm works for both causal link must be broken to break the circle of fire. simple arid general cycles, without loss of general- The second powerflfl aspect of a cycle is its in- it v, we assumeall cycles are simple. herent implication of continuity. Above, it may be

UNCERTAINREASONING 467 all well and good that companyX acquires a profit Determine the current state of the knowledg(’ from Jone’s purchase one time. It is probably safe to . base. As the dependency graph grows with each say, however, that the underlying goal of most con- new pattern, the current state of the comlfim:d sumer products companies is to continue to sell m collection is reevaluated. Each time, a (pos- customers, and hence continue to bring in a profit. sibly) new set of true facts or prol)usitkms Extraction of a cyclic pattern alerts a system that generated. targeting any of the cycle participant activities, as- Note: lbr any given cycle, either all or none of sures continuous attainment of the goal, at least un- the cycle participants will be included in the til the cycle is broken. current state (i.e., either all or rtone of the constituent rules will be fired). Wehave iml)h, menl,ed cycle detection work in the nmchine learning ~trea of inductive logic program- Algorithm 1.2 (Cycle DetectionAlgorithm) ruing. Any inductive pursuit including data min- If the domainknowledge is initiallyacyclic, our ing and knowledge discovery, however, can t)e used structure entbodies a forest of causal trees. to acquire the constituent IF-TIIEN patterns of cycles. We,t llerefore, present a formal model exempli- Input: Ilypergraph representation t~,ing cycle detection in a more. traditional relational of knowledgeba,se database setting. We then summarize results at- Newlydiscovered ruh’ of the tbrm h -- b tained from the ILP implementation. Regardless of Output: Set of cycle participants if cycle exists the specific inductive pursuit, however, we contend that a cycle detectkm algorithm should bc invoked BEGI:Y ALGORITHM1.2, regularly as the knowledge base contimms to grow, for each discovered rule b ~ b in order to exploit the benefits of cycle detection. insert rule into P With head of new rule h do A Methodology for Discovering Cycles performan iteratiw, dl:s(h) - as each proposition is popped, Discovery of cycles has as much to do with the assign its postorder number knowledge representation structure as it does with if postorder numberof head is >_ any of its the rule discovery algorithm itself. The detec- incoming edge postorder nums, tion algorithm operates by performing a depth first then there exists a causal cycle search on a graph. We assume a hypergraph rep- formed by the incoming edge source resentation where logical predicates appeari,,g as node to the head, and the path from the head heads of rules ar~ represented as w, rtices, and sets to the source of the incorning edge. of (conjuncted) predicates, appearing as rule bod- END ALGORITHM 1.’2 ies, are represented ms hyperedges. For example, a rule P(X) -- R(X) ..... Q(Y). with head P(X) and body R(X) .... ,Q(Y) is internally rep- Formal Model resented as vertex P(X) with incoming hyperedge We exemplify our method using a standard rela- R(X) ..... QiY). tional model where rules of the form Our methodology consists of four steps: X, -- Yt ..... Y;n Discover constituent IF-THENpatterns. are discovered. Rules possess individual subgoals , ~(1 _< i _< m) which are conjulmted (conjunction 2. Insert constituent IF-THENpatterns into the denoted by comma) forming the body of the rule. dependency hypergraph knowledge structure The consequent X1 forms the head of the rule. described above We define a database D as a set of relations Perform the cycle identification algorithm. As . D = {R1 ..... Rt} where each relation Ri is a set each rule is learned, the graph is traversed in of ordered n-tuples of the form (dt .... ,dn) where depth-first manner and if a cycle is introduccd, each dj is in the domain Dj, for 1 < j _< n. The all cycle participants are identified and the cycle cardinality of Ri is the number of n-tuples in the is output. relation. We make the following assumptions origi- ’This step exploits a standard algorithm for cy- nally presented in [12]: cle detection which is presented in [2] and em- plws postorder numbering of propositions as * Closed-world Assumplron. Informat.ion not con- variables, as shownbelow. tained in the database is assumecl falst.

468 SELTZER ¯ Unique-NameAssumption. Any item in the Mining a Cycle databaseschema has a uniquename; items with differentnames are different. The process of mining an entire cycle may be an incremental one. That is, the entire cycle may" be ¯ Domain-Closure Assumption. There are no detected in more than one database instance. As other individualitems than those in the database instances change, new patterns are learned database. that maybe cycle participants. Assumethat new instances r~, ca, and r4 of the above database are used A database instance r is the set of tuples in the to extract additional causal patterns. In particular, database at a given moment in time. To discover assumethe classification rule any rule, a particular instance of a database is used. (A = aT),- (D = A classification rule is a rule of the form: from instance r~, and the pattern (S -- bl) ~ (A = aT) where individual subgoals equate specific values to frominstance r4. Alsoassume these rules are ex- attributes, forming the conjuncted rule body, as well tractedwith confidences 70% and 85% and withsup- as the consequent X1 being formed by such an equa- ports95% and 91%, respectively.We now havethe tion. Wesay that attributes Y1 ¯ ¯ ¯ Yn determine, or recta-patternof a cyclein databaseD. The cycle classify, the value of X1. These rules are derived can be denoted from a given instance of the database with the asso- ciated support and confidence measures as defined ((B = bl), = ds) ,(A = a z) ,(B = b in [11]: Weuse individual metrics of each constituent rule to quantify the strength of the cycle. These metrics * the support for a rule, C2 *-- C1 is the percent- also facilitate determining the starting point of the age of tuples that satisfy C1 A C., meta-pattern for future predictive use. ¯ the confidence for a rule, C2 ~-- C1 is the per- Definition 1.3 (cycle confidence) The confi- centage of tuples that satisfy C2 given all tuples dence of a cycle is the minimum confidence value that satisfy C1. of any of the constituent rules forming the cycle. Becausedetection of cyclesprovides the user an op- Mining a Rule tionto continueor breakany given cycle, wc quan- tifythe probability of a cyclecontinuing without any Let r~ be the following ilmtance of database intervention. D’ = {A,B,C,D,E}. Definition 1.4 (support) The support of a cych is the minimum support value of any of the con- A B C D E stituent rules forming the cycle. ax bl cl d5 el a2 bl c2 ds e2 To invoke a cycle, we identify the aspect of the state aa bt ca ds ea (a rule antecedent in the form of a logical proposi- a4 b2 c4 da e4 tion) which most likely will activate the cycle. as bl cs d5 e5 a6 bl Cs dz e6 Definition 1.5 (starting point) The al bl C7 ds e7 starting point of a cycle C is the antecedent of rule aa bl cs ds e8 rI where r’ has maximumsupport of all constituent a9 bl c9 ds e9 rules forming the cycle. al0 bl Cl0 ds el0 The above cycle has has confidence 70%and sup- Applying a standard classification rule mining al- port 80%and has starting point (D = ds). gorithm such as that found in [11], we obtain the rule Implementation in Inductive Logic (D = ds) ,--- (B - Programming with confidence 880£ and support 80%. This rule The authors have implemcnted the cycle detec- will be used to predict this causal relation over all tion method in an inductive logic programming sys- instances of D’. tem INDED(pronounced "’indeed"} which performs

¯ UNCERTAINREASONING 469 both INduction to learn rules comprising an inten- By identifying these meta-patterns we gain nmre in- signal knowledge base. and DEDuction to compute sight, and hence, more control, over the enviromnent the current state - part of the extensional knowledge being examined. By detecting the meta-pattern of a base [13]. Although the syntax of the generated rules cycle, we maybetter predict and control the action differs slightly, the meta pattern of a cycle remains to bc taken to achieve the encompassing system’s the same. That is, the cycle is formed by a circular goal. That is, we may choose to perpetuate or break sequence of rule heads firing subsequent rules. the cycle.

References Architecture of Systenl INI)ED [1] I-t. Agrawal, T. Imielinski, and A. Swami. Min- DEDUCTIVE ENGINE ing association rules between sets of items in :.NI)tJC:TIVE ENGINE large databases. SIGMODBulletin. pages 207.- 216, May 1993. [ e- [ " ILP -~,,~dic.,.~" Reasoning- [2] Alfred V. Aho and Jeffrey D. Ulhnan. l’bunda- Learner I I System lions of Computer Science. C.omputer .qcience Background [ Knowledge"~ IF Press, 1995. B= Y ~ UB-J [3] Manfred F. Bukofzer. Music in the Baroque ? Era. W.W. Norton and Company, 1947. [4] Corman, Leiserson, and Kivest. Introduction to Algorithms. McGraw-Hill. 1991. By providing small data sets for tile positive and negative examples, int ensiona[ rules, and extensional [5] Usama Favyad and Evangelos facts of the background knowledge, we were able Simoudis. Knowledge discovery in databases. to learn cycles in the geneok)gy domain. Wc are Tutorial at IJCAI-95, 1995. currently examining the cyclic relationship of stress [6] Rouen I’~ldman and Ido Dagan. Knowledge (indicated quantitatively by blood pressure) and discovery in textual databases (kdt). In Pro- mononucleosis diagnosis (indicated by titers being ceedings of the First [nternation Conference on above an acceptable level). We are also studying Knowledge Discovery ~ Data Mining, pages and experinlenting with diagnosis of LymeDisease 112-117, 1995. because of its puzzling, circular, and seemingly in- consistent symptom set. Early results have been [7] Daniel Goleman. Emotional Intelligence. Ban- tam Books. 1995. promising and will be included in a future publi- cation. [8] Douglas R. Hofstadter. Goedel, Esther, Bach: An Eternal Golden Braid. Vintage Books, 1980. Future Work [9] Piatetsky-Shapiro and Frawley. editors. Knowl- edge Discovery in Databases, chapter Knowl- Along with continued experimentation, we expect edge Discovery in Databases: An Overview. to expound on the theory of learned meta-patterns. AAAI Press/The MIT Press, 1991. In particular, we are studying near cycles, chains of deduction which could possibly be forced to form a [1O]J. R. Quindlan. Induction of decision trees. Ma- complete cycle if certain conditions were satisfied. chine Learning, 1:81-106, 1986. Wearc exanlining cycles that are made up of propo- [11] Raghu Ramakrishnan. Database Managemevt sitions with alterability coefficients, cycles involving Systems. McGraw-Hill, Inc., 1998. other types of rules including association rules [1], [12] R. Reiter. Towards a logical reconstruction as well as considering overlapping cycles. of relational database theory. On Conceptual Modeling, pages 163-189, 1984. Conclusion [13] Jennifer Seltzer. Stable ILP: Exploring the A cycle mining methodology and cycle detection al- added expressivity of negation in the back- gorithm have been presented. Both of these involve ground knowledge. In Proceedings of the Fron- dependency hypcrgraph knowledge representation in tiers in Inductive Logic Programming Work- the form of logical implications. shop. Fifteenth International Joint C’.onference on Artificial Intelligence, Nagoya, Japan, 1997. The authors posit that just as unknownpatterns exist in data, unknowncycles of patterns also exist.

470 SELTZER