A Holistic Approach to the Algorithm Selection Problem

Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-2020 Algorithm Selection Framework: A Holistic Approach to the Algorithm Selection Problem Marc W. Chalé Follow this and additional works at: https://scholar.afit.edu/etd Part of the Artificial Intelligence and Robotics Commons, Data Science Commons, and the Operations Research, Systems Engineering and Industrial Engineering Commons Recommended Citation Chalé, Marc W., "Algorithm Selection Framework: A Holistic Approach to the Algorithm Selection Problem" (2020). Theses and Dissertations. 3602. https://scholar.afit.edu/etd/3602 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. Algorithm Selection Framework: A Holistic Approach to the Algorithm Selection Problem THESIS Marc W. Chalé,Captain, USAF AFIT-ENS-MS-20-M-137 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this document are those of the author and do not reflect the official policy or position of the United States Air Force, the United States Department of Defense or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT-ENS-MS-20-M-137 AFIT/ENS ALGORITHM SELECTION FRAMEWORK: A HOLISTIC APPROACH TO THE ALGORITHM SELECTION PROBLEM THESIS Presented to the Faculty Department of Operational Sciences Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command in Partial Fulfillment of the Requirements for the Degree of Master of Science in Operations Research Marc W. Chalé,BS, MS Captain, USAF March 2020 DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. AFIT-ENS-MS-20-M-137 AFIT/ENS ALGORITHM SELECTION FRAMEWORK: A HOLISTIC APPROACH TO THE ALGORITHM SELECTION PROBLEM THESIS Marc W. Chalé,BS, MS Captain, USAF Committee Membership: Dr. J. D. Weir Chairman Maj N. D. Bastian, PhD Member AFIT-ENS-MS-20-M-137 Abstract A holistic approach to the algorithm selection problem is presented. The \algorithm selection framework" uses a combination of user input and meta-data to stream- line the algorithm selection for any data analysis task. The framework removes the conjecture of the common trial and error strategy and generates a preference ranked list of recommended analysis techniques. The framework is performed on nine analysis problems. Each of the recommended analysis techniques are implemented on the corresponding data sets. Algorithm performance is assessed using the primary metric of recall and the secondary metric of run time. In six of the problems, the recall of the top ranked recommendation is considered excellent with at least 95 percent of the best observed recall; the average of this metric is 79 percent due to two poorly performing recommendations. The top recommendation is Pareto efficient for three of the problems. The framework measures well against an a-priori set of criteria. The framework provides value by filtering the candidate of analytic techniques and, often, selecting a high performing technique as the top ranked recommendation. The user input and meta-data used by the framework contain information with high potential for effective algorithm selection. Future work should optimize the recommendation logic and expand the scope of techniques for other types of analysis problems. Further, the results of this proposed study should be leveraged in order to better understand the behavior of meta-learning models. iv Acknowledgements My graduate studies at AFIT has been often gratifying and rarely easy. I thank my instructors and committee members for pushing me in the classroom and mentoring me always. I thank my family and friends for fostering an invaluable network of support. It has been a pleasure to collaborate with Lt Clarence Williams, Capt Brandon Hufstetler and Professor Mark Gallagher. Thank you to everyone who has helped me be my best. Marc W. Chalé v Contents Page Abstract . iv Acknowledgements . .v List of Figures . viii List of Tables . .x I. Introduction of the Problem . .1 1.1 Introduction to Operations Research . .1 1.2 Rise of Meta-models . .1 1.3 Problem Statement . .2 1.4 Overview of Contents . .3 II. Literature Review . .4 2.1 Machine Learning . .4 2.2 The Taxonomy of Analysis Techniques . .4 Field Guide to Anlaysis . .5 McGarigal . .6 2.3 Algorithm Selection Frameworks. .6 Booz Allen Hamilton . .6 Big Data Sources . .7 INFORMS Body of Knowledge . .7 2.4 Classification Techniques . .8 2.5 Regression Techniques . 12 2.6 Clustering Techniques . 15 2.7 Data Reduction Techniques . 17 2.8 Performance Metrics for Machine Learning Techniques . 18 2.9 Meta Learning . 23 Background of Meta-Learning . 23 Evaluation of Recommendation System Performance . 25 III. Methodology . 28 3.1 Criteria . 28 Framework . 28 Taxonomy..................................................... 30 3.2 Proposed Framework. 31 Characterizing the Problem . 32 Step 1: Map Problem to Category and Approach . 35 Step 2: Score Techniques . 36 vi Page Step 3: Rank Recommendations . 37 3.3 Data Sets . 39 3.4 Software and Packages . 40 Workstation Specifications . 40 IV. Experimental Results and Analysis . 41 4.1 Experimental Results . 41 4.2 Evaluation of Taxonomy to Criteria . 50 4.3 Evaluation of the Framework. 52 Analysis of Results . 52 Evaluation of Framework to Criteria . 54 V. Conclusion . 55 VI. Appendix A . 56 VII. Appendix B . 57 VIII. Appendix C . 64 8.1 Ranking Recommended Techniques for Data Factor . 64 Bibliography . 67 vii List of Figures Figure Page 1 The meta-learner adaptation of Rice's framework [34] . 25 2 The factors identified in this this research are superimposed with the stages of analysis which they impact, ie. determine analysis approach and determine technique . 34 3 The considerations are shown for each factor which drives analytical approach and analytical technique selection . 34 4 The 11 possible assigned tasks all into one or more of the categories of analysis which are listed on the far left. Each category of analysis maps to one or more analytical approach on the far right. 35 5 A portion of the proposed taxonomy is hi-lighted to show the structure of the taxonomy . 37 6 Decision tree used to assign a preference rank for each technique in regards to the data factor . 38 7 Step 2, scoring, is performed for each factor. Step 3, overall technique ranking, is performed for the Heart data set. 42 8 Mean recall for the Heart data set. SVR is a hit. 46 9 Mean recall and mean run time for the Heart data set. SVR is Pareto efficient because it dominates recall. 46 10 Mean recall for the Spam data set. Na¨ıve Bayes is the top recommendation and produces the worst recall of all recommendations. 47 11 Na¨ıve Bayes is a Pareto efficient solution because it dominates run time. 47 12 Mean recall for the Election data set. Na¨ıve Bayes is a hit. 48 13 Despite producing the.

Load more