Prototype Selection for Composite Nearest Neighbor Classifiers
Total Page:16
File Type:pdf, Size:1020Kb
Prototype Selection for Composite Nearest Neighbor Classifiers David B. Skalak Department of Computer Science University of Massachusetts Amherst, Massachusetts 01003-4160 [email protected] CMPSCI Technical Report 96-89 1 PROTOTYPE SELECTION FOR COMPOSITE NEAREST NEIGHBOR CLASSIFIERS A Dissertation Presented by DAVID B. SKALAK Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 1997 Department of Computer Science c Copyright by DAVID B. SKALAK 1997 All Rights Reserved PROTOTYPE SELECTION FOR COMPOSITE NEAREST NEIGHBOR CLASSIFIERS A Dissertation Presented by DAVID B. SKALAK Approved as to style and content by: Edwina L. Rissland, Chair Paul E. Utgoff, Member Andrew G. Barto, Member Michael Sutherland, Member David W. Stemple, Department Chair Computer Science ACKNOWLEDGMENTS Edwina Rissland has provided unflagging support during my decade in Amherst. More important than any single piece of research is the influence that a teacher has on one’s research career and on one’s life. I am fortunate to have worked under Edwina. Andy Barto, Paul Utgoff and Mike Sutherland have each provided intellectual support of the highest rank. I deeply thank Robert Constable, Chair of the Department of Computer Science at Cornell University, for providing me with computing facilities and office space. For valuable assistance, I thank Paul Cohen, Paul Utgoff, Jamie Callan, the members of the Experimental Knowledge Systems Laboratory at UMass, Priscilla Coe, Sharon Mallory, Neil Berkman and Zijian Zheng. I am especially grateful to Andy Golding. Most of all, I thank Claire Cardie. This work was supported in part by the Air Force Office of Scientific Research un- der Contract 90-0359, by National Science Foundation Grant No. EEC-9209623 State- University-Industry Cooperative Research on Intelligent Information Retrieval, by a University of Massachusetts Graduate School Fellowship, and by trading in S&P 100 Index options. iv ABSTRACT PROTOTYPE SELECTION FOR COMPOSITE NEAREST NEIGHBOR CLASSIFIERS MAY 1997 DAVID B. SKALAK, B.S., UNION COLLEGE M.A., DARTMOUTH COLLEGE J.D., HARVARD LAW SCHOOL M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Edwina L. Rissland Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as “Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?” or “Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number of other classifiers?”. The goal of this dissertation is to provide answers to these and closely related questions with respect to a particular model class, the class of nearest neighbor classifiers. We undertake the first study that investigates in depth the combination of nearest neighbor classifiers. Although previous research has questioned the utility of combining nearest neighbor classifiers, we introduce algorithms that combine a small number of component nearest neighbor classifiers, where each of the components stores a small number of prototypical instances. In a variety of domains, we show that these algorithms yield composite classifiers that are more accurate than a nearest neighbor classifier that stores all training instances as prototypes. The research presented in this dissertation also extends previous work on prototype selection for an independent nearest neighbor classifier. We show that in many domains, storing a very small number of prototypes can provide classification accuracy greater than or equal to that of a nearest neighbor classifier that stores all training instances. We extend previous work by demonstrating that algorithms that rely primarily on random sampling can effectively choose a small number of prototypes. v TABLE OF CONTENTS Page ACKNOWLEDGMENTS v ABSTRACT vi LIST OF TABLES xi LIST OF FIGURES xvi CHAPTER 1. INTRODUCTION 1 1.1 The Problem 1 1.2 Approach to the Problem 5 1.3 Overview of the Results 8 1.4 Guide to the Dissertation 9 2. COMPOSITE CLASSIFIER CONSTRUCTION 12 2.1 Composite Classifier Design Criteria 12 2.1.1 Accuracy 13 2.1.2 Diversity 13 2.1.2.1 Measuring Diversity 14 2.1.3 Efficiency 15 2.1.3.1 Number of Component Classifiers 16 2.1.3.2 Computational Resources of a Component Classifier 16 2.2 Composite Classifier Architectures 17 2.2.1 Stacked Generalization 17 2.2.1.1 Architecture and Algorithm 19 2.2.1.2 Error-Correction Output Coding 22 2.2.1.3 Network Construction Algorithms 24 2.2.2 Boosting 24 2.2.3 Recursive Partitioning 26 2.3 Component Classifier Construction 27 2.3.1 Reasoning Strategy Combination 28 2.3.2 Divide and Conquer 28 vi 2.3.3 Model Class Combination 30 2.3.4 Architecture and Parameter Modification 30 2.3.5 Randomized Search 31 2.3.6 Training Set Resampling 32 2.3.7 Feature Selection 32 2.3.8 Discussion 33 2.4 Combining Classifier Construction 33 2.4.1 Function Approximation 34 2.4.1.1 Linear 34 2.4.1.2 Non-linear 35 2.4.2 Classification 35 2.4.2.1 Voting 35 2.4.2.2 Non-voting Methods 38 2.5 Nearest Neighbor Classifier Construction 39 2.5.1 Prototype Roles 40 2.5.1.1 Cognitive Organization 40 2.5.1.2 Facilitating Case Retrieval 41 2.5.1.3 Local Learning Algorithms 42 2.5.2 Prototype Selection for Nearest Neighbor Classifiers 43 2.5.3 Limitations of Existing Prototype Selection Algorithms 48 2.6 Summary 50 3. NEAREST NEIGHBOR COMPOSITE CLASSIFIERS 52 3.1 Characteristics of the Approach 52 3.1.1 Homogeneous Nearest Neighbor Component Classifiers 52 3.1.2 Prototype Selection through Sampling 54 3.1.3 Coarse-Hypothesis Classifiers 55 3.1.4 Concise Level-1 Representation 55 3.1.5 Summary 56 3.2 Experimental Design 56 3.2.1 Data Sets 56 3.2.2 Experimental Method 59 3.3 A Nearest Neighbor Algorithm 59 3.3.1 Evaluation of Baseline Nearest Neighbor Classifier 62 3.3.2 Disadvantages of Some Nearest Neighbor Algorithms 63 vii 3.4 Classifier Sampling 64 3.4.1 Algorithm 64 3.4.2 Evaluation 67 3.4.2.1 Number of Samples 69 3.4.2.2 Clustering 75 3.4.3 A Taxonomy of Instance Types 78 3.5 Summary 82 4. BOOSTING 83 4.1 Four Algorithms for Boosting 84 4.1.1 Coarse Reclassification 85 4.1.2 Deliberate Misclassification 86 4.1.3 Composite Fitness 87 4.1.4 Composite Fitness–Feature Selection 90 4.1.5 Summary 90 4.2 Coarse Reclassification 91 4.3 Deliberate Misclassification 93 4.4 Composite Fitness 100 4.5 Composite Fitness–Feature Selection 106 4.6 Comparison of the Four Algorithms 108 4.6.1 Composite Accuracy 109 4.6.2 Component Diversity and Accuracy 112 4.7 Comparison to Schapire’s Boosting Algorithm 120 4.8 Analysis 122 4.8.1 Failure Analysis 122 4.8.2 Selective Superiority 126 4.9 Summary 126 5. STACKED GENERALIZATION 130 5.1 Comparison of Combining Algorithms 131 5.2 Number of Component Classifiers 135 5.2.1 Two Component Classifiers 135 5.2.2 More than Two Component Classifiers 138 5.3 Relationship between Accuracy and Diversity 142 5.4 Class Probability Estimates 148 5.5 Generalizing the k-Nearest Neighbor Classifier 151 5.6 Summary 155 viii 6. CONCLUSIONS 157 6.1 Summary of the Dissertation 157 6.2 Contributions 161 6.3 Issues for Future Research 161 APPENDICES A. INSTANCES RANDOMLY OMITTED FROM CROSS-VALIDATION 163 B. ACCURACY AND DIVERSITY CONTOUR GRAPHS 164 C. SUM OF SQUARES AND CROSS-PRODUCT MATRICES 177 BIBLIOGRAPHY 178 ix LIST OF TABLES Table Page 1.1 Topic summary table. 10 1.2 Component classifier summary table: prototypes stored in component classi- fiers for boosting and stacked generalation in this thesis. 11 2.1 Classical nearest neighbor editing algorithms. 44 2.2 Example machine learning nearest neighbor editing algorithms. 47 2.3 Published accuracy of machine learning prototype selection algorithms. Blank entries indicate that we could not find a value reported in the research literature for an editing algorithm. “dlM” indicates IB3 results reported by de la Maza [1991]; “C-J” indicates IB3 results reported by Cameron-Jones [1995]. 49 3.1 Summary of data sets used in experiments. In the column labeled “Feature Types”, “N” designates numeric features; “S” designates symbolic features; “B” designates binary features. 58 3.2 Percent test accuracy and standard deviation for baseline 1-nearest-neighbor algorithm. 63 3.3 Classification accuracy of prototype sampling (PS) algorithm, one prototype per class. 67 3.4 Time to sample 100 sets of prototypes, select the most accurate sample (Sample) and classify the data set using that sample (Application), compared to the time to apply the nearest neighbor algorithm to the data set (NN). Values are wall-clock times, in seconds. 68 3.5 Classification accuracy of prototype sampling (PS) algorithm, two prototypes per class. 69 3.6 Partial trace of prototype sampling algorithm on Breast Cancer Ljubljana data. Accuracies in percent correct. 70 3.7 Optimal number of samples to achieve maximal test accuracy. 75 3.8 Comparison of test accuracies based on 100 samples and 1000 samples. 76 x 3.9 Statistical correlation between (i) clustering of the training set and clustering of the test set, (ii) clustering of the training set and test accuracy, and (iii) clustering of the test set and test accuracy.