Integrative Windowing

Journal of Articial Intelligence Research Submitted published Integrative Windowing Johannes F urnkranz jufficscmuedu School of Computer Science Carnegie Mel lon University Pittsburgh PA Abstract In this pap er we reinvestigate windowing for rule learning algorithms We show that contrary to previous results for decision tree learning windowing can in fact achieve signi cant runtime gains in noisefree domains and explain the dierent b ehavior of rule learning algorithms by the fact that they learn each rule indep endently The main contribution of this pap er is integrative windowing a new type of algorithm that further exploits this prop erty by integrating go o d rules into the nal theory right after they have b een discovered Thus it avoids relearning these rules in subsequent iterations of the windowing pro cess Exp erimental evidence in a variety of noisefree domains shows that integrative windowing can in fact achieve substantial runtime gains Furthermore we discuss the problem of noise in windowing and present an algorithm that is able to achieve runtime gains in a set of exp eriments in a simple domain with articial noise Introduction Windowing is a subsampling technique prop osed by Quinlan for the ID decision tree learning algorithm Its goal is to reduce the complexity of a learning problem by identifying an appropriate subset of the original data from which a theory of sucient quality can b e learned For this purp ose it maintains a subset of the available data the socalled window which is used as the training set for the learning algorithm The window is initialized with a small random sample of the available data and the learning algorithm induces a theory from this sample This theory is then tested on the remaining examples If the quality of the learned theory is not sucient the window is adjusted usually by adding more examples from the training data and a new theory is learned This pro cess is rep eated until a theory of sucient quality has b een found There are at least three motivations for studying windowing techniques Memory Limitations Almost all learning algorithms still require to have all training examples and all background knowledge in main memory Although memory has b e come cheap and the capacity of the main memory of the available hardware platforms is increasing rapidly there certainly are datasets to o big to t into the main memory of conventional computer systems Eciency Gain Learning time usually increases most often sup erlinearly with the complexity of a learning problem Reducing this complexity may b e necessary to make a learning problem tractable c AI Access Foundation and Morgan Kaufmann Publishers All rights reserved Furnkranz procedure WindowingExamplesInitSizeMaxIncSize Window RandomSampleExamplesInitSize Test Examples n Window repeat Theory InduceWindow NewWindow OldTest for Example Test Test Test n Example if ClassifyTheoryExample ClassExample NewWindow NewWindow Example else OldTest OldTest Example if jNewWindow j MaxIncSize exit for Test AppendTestOldTest Window Window NewWindow until NewWindow returnTheory Figure The basic windowing algorithm Accuracy Gain It has b een observed that windowing may also lead to an increase in predictive accuracy A p ossible explanation for this phenomenon is that learning from a subset of examples may often result in a less overtting theory In this pap er our ma jor concern is the appropriateness of windowing techniques for increasing the eciency of inductive rule learning algorithms such as those using the p opu lar separateandconquer or covering learning strategy Michalski Clark Niblett Quinlan F urnkranz We will argue that windowing is more suitable for these algorithms than for divideandconquer decisiontree learning Quinlan section Thereafter we will introduce integrative windowing a technique that exploits the fact that rule learning algorithms learn each rule indep endently We will show that this metho d allows to signicantly improve the p erformance of windowing by integrating go o d rules learned from dierent iterations of the basic windowing pro cedure into the nal theory section While we have primarily worked with noisefree domains section will discuss the problem of noise in windowing as well as some preliminary work that shows how windowing techniques can b e adapted for noisy domains Parts of this work have previously app eared as F urnkranzc d a A Brief History of Windowing Windowing dates back to early versions of the ID decision tree learning algorithm where it was devised as an automated teaching pro cedure that allowed a preliminary version of ID to discover complete and consistent descriptions of various problems in a KRKN chess endgame Quinlan Figure shows the basic windowing algorithm as describ ed in Quinlans subsequent seminal pap er on ID Quinlan It starts by picking a random Integrative Windowing sample of a usersettable size InitSize from the total set of Examples These examples are used for inducing a theory with a given learning algorithm This theory is then tested on the remaining examples and all misclassied examples are removed from the test set and added to the window of the next iteration In order to keep the size of the training set small another parameter MaxIncSize controls the maximum number of examples that can b e added to the training set in one iteration If this number is reached no further examples are tested and the next theory is learned from the new training set To make sure that all examples are tested in the rst few iterations the examples that have already b een tested OldTest are Appended to the examples that have not yet b een used so that testing will start with new examples in the next iteration Quinlan argued that windowing is necessary for ID to tackle very large classi cation problems Nevertheless windowing has not played a ma jor role in machine learning research One reason for this certainly is the rapid development of computer hardware which made the motivation for windowing seem less comp elling However a go o d deal of this lack of interest can b e attributed to an empirical study Wirth Catlett in which the authors studied windowing with ID in various domains and concluded that it cannot b e recommended as a pro cedure for improving eciency The b est results were achieved in noisefree domains such as the Mushroom domain where windowing was able to p erform on the same level as simple ID while its p erformance in noisy domains was considerably worse Despite the discouraging exp erimental evidence of Wirth and Catlett Quinlan implemented a new version of windowing into the C learning algorithm It diers from the simple windowing version originally prop osed for ID Quinlan in several ways While selecting examples it takes care to make the class distribution as uniform as p ossible This can lead to accuracy gains in domains with skewed distributions Catlett b It includes at least half of the misclassied examples into next window which supp os edly guarantees faster convergence fewer iterations in noisy domains It can stop b efore all examples are correctly classied if it app ears that no further gains in accuracy are p ossible Cs t parameter which invokes windowing allows it to p erform multiple runs of windowing and to select the b est tree Nevertheless windowing is arguably one of Cs least frequently used options Recent work in the areas of Know ledge Discovery in Databases Kivinen Mannila Toivonen and Intel ligent Information Retrieval Lewis Catlett Yang has reemphasized the imp ortance of subsampling pro cedures for reducing b oth learn ing time and memory requirements Thus the interest in windowing techniques has revived as well We discuss some of the more recent approaches in section Quinlan do es not explicitly sp ecify how this case should b e handled but we think it makes sense that way Furnkranz Tree Learning and Windowing Rule Learning and Windowing Seconds Seconds C4.5 DOS 30.00 1.50 C4.5 -t 1 DOS + Win 28.00 1.40 26.00 1.30 24.00 1.20 22.00 1.10 1.00 20.00 0.90 18.00 0.80 16.00 0.70 14.00 0.60 12.00 0.50 10.00 0.40 8.00 0.30 6.00 0.20 4.00 0.10 2.00 0.00 0.00 Train Exs x 103 Train Exs x 103 0.00 2.00 4.00 6.00 8.00 0.00 2.00 4.00 6.00 8.00 Figure Results for windowing with the decision tree learner c and a rule learner in the Mushroom domain A Closer Lo ok at Windowing A Motivating Example The motivation for our study came from a brief exp eriment in which we compared windowing with a decision tree algorithm to windowing with a rule learning algorithm in the noise free Mushroom domain This domain contains examples represented by symbolic attributes The task is to discriminate b etween p oisonous and edible mushrooms Figure shows the results of this exp eriment The left graph shows the runtime b ehavior over dierent training set sizes of C invoked with its default parameters versus C invoked with one pass of windowing pa rameter setting t No signicant dierences can b e observed although windowing seems to eventually achieve a little runtime gain The graph is quite similar to one result ing from exp eriments with ID and windowing shown by Wirth and Catlett so that we b elieve the dierences in the original version of windowing and the one implemented in C are negligible in this domain The results are also consistent with those of Quinlan who for this domain rep orted runtime savings of no more than for windowing with appropriate parameter settings In any case it is obvious that the runtime of b oth C

Integrative Windowing

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support