COST-SENSITIVE DECISION TREE LEARNING USING a MULTI-ARMED BANDIT FRAMEWORK Susan Elaine LOMAX

COST-SENSITIVE DECISION TREE LEARNING USING A MULTI-ARMED BANDIT FRAMEWORK Susan Elaine LOMAX SCHOOL OF COMPUTING, SCIENCE AND ENGINEERING, INFORMATICS RESEARCH CENTRE, COLLEGE OF SCIENCE AND TECHNOLOGY, UNIVERSITY OF SALFORD Submitted in Fulfilment of the Requirements of the Degree of Doctor of Philosophy, June 2013 CONTENTS ACKNOWLEDGEMENTS .................................................................................................................... v ABSTRACT ........................................................................................................................................... vi CHAPTER 1: INTRODUCTION ........................................................................................................... 1 1.1 The motivation for the research in this thesis ............................................................................... 1 1.2 Research methodology .................................................................................................................. 3 1.3 Research hypothesis, aims and objectives .................................................................................... 6 1.4 Outline of thesis ............................................................................................................................ 7 CHAPTER 2: BACKGROUND ............................................................................................................. 9 2.1 Decision tree learning ................................................................................................................... 9 2.2 Game Theory .............................................................................................................................. 12 CHAPTER 3: SURVEY OF EXISTING COST-SENSITIVE DECISION TREE ALGORITHMS ... 18 3.1 Single tree, greedy cost-sensitive decision tree induction algorithms ........................................ 22 3.2 Multiple tree, non-greedy methods for cost-sensitive decision tree induction............................ 39 3.3 Summary and analysis of the results of the survey ..................................................................... 62 CHAPTER 4: THE DEVELOPMENT OF A NEW MULTI-ARMED BANDIT FRAMEWORK FOR COST-SENSITIVE DECISION TREE LEARNING ........................................................................... 67 4.1 Analysis of previous cost-sensitive decision tree algorithms ..................................................... 67 4.2 A new algorithm for cost-sensitive decision tree learning using multi-armed bandits ............... 74 4.3 Potential problems with the MA_CSDT algorithm .................................................................... 83 4.4 Summary of the development of the MA_CSDT algorithm ....................................................... 92 CHAPTER 5: INVESTIGATING PARAMETER SETTINGS FOR MA_CSDT .............................. 94 5.1 Parameters allowing continuation of process when it is worthwhile ........................................ 101 5.2 Determining how many lever pulls, which version and strategy is desirable ........................... 108 5.3 Investigate taking advantage of different parameter settings to achieve aim ........................... 109 5.4 Developing guidelines for datasets to determine the best combinations of parameter settings 113 5.5 Summary of findings from the investigation ............................................................................ 118 CHAPTER 6: AN EMPIRICAL COMPARISON OF THE NEW ALGORITHM WITH EXISTING COST-SENSITIVE DECISION TREE ALGORITHMS ................................................................... 121 6.1 Empirical comparison results .................................................................................................... 125 6.2 Discussion of the outcome of the empirical evaluation ............................................................ 137 6.3 Summary of the findings of the evaluation ............................................................................... 145 CHAPTER 7: CONCLUSIONS AND FUTURE WORK .................................................................. 147 REFERENCES ................................................................................................................................... 159 APPENDIX ......................................................................................................................................... 169 i A1 Analysis of datasets used by studies from the survey ............................................................... 169 A2 Details of the datasets used in the main evaluation ................................................................... 171 A3 Details of the misclassification costs used in all experiments .................................................. 188 A4 Summary of the attributes in the results dataset ........................................................................ 191 A5 Parameter settings % frequencies of best and worst results all examples and trees only plus frequency of trees not grown or grown ........................................................................................... 192 APPENDIX D ..................................................................................................................................... 193 ii TABLE OF FIGURES Figure 1 Decision tree after ID3 has been applied to the dataset in Table 1 ........................... 11 Figure 2 Taxonomy of Cost-Sensitive Decision Tree Induction Algorithms .......................... 19 Figure 3 A timeline of algorithms ............................................................................................ 21 Figure 4 Decision tree after EG2 has been applied to the dataset in Table 1 .......................... 24 Figure 5 Decision tree when DT with MC has been applied to dataset in Table 1 .................. 31 Figure 6 Linear Machine .......................................................................................................... 34 Figure 7 Example ROC ............................................................................................................ 38 Figure 8 Illustration of mapping .............................................................................................. 42 Figure 9 Multi-tree using the example dataset ......................................................................... 59 Figure 10 Using ICET’s results to demonstrate weaknesses of cost-sensitive decision tree algorithms ................................................................................................................................ 71 Figure 11 Illustration of the single pull and look-ahead bandit in the algorithm .................... 77 Figure 12 Generate P bandits and calculate cost at the end of each path ................................ 79 Figure 13 Multi-Armed Cost-Sensitive Decision Tree Algorithm (MA_CSDT) .................... 81 Figure 14 To calculate the number of potential unique bandit paths in a dataset .................... 85 Figure 15 Results illustrating the fluctuating values obtained using artificial datasets ........... 87 Figure 16 Desired tree chosen by a given strategy .................................................................. 88 Figure 17 Graphs showing do nothing costs versus cost obtained for each of the types of classes .................................................................................................................................... 105 Figure 18 Graphs showing multi-class datasets in their 3 groups of misclassification costs: mixed, low, high .................................................................................................................... 107 Figure 19 Comparing mean values obtained with values obtained by a given strategy ....... 111 Figure 20 Rules extracted using J48 accuracy-based algorithm on examples in the results analysis file ............................................................................................................................ 117 Figure 21 Heart dataset processed using pruned versions of the cost-sensitive algorithms and flare dataset using un-pruned versions of the cost-sensitive algorithms................................ 130 Figure 22 The krk dataset: top processed using pruned version; bottom processed using un- pruned version of cost-sensitive algorithms .......................................................................... 133 Figure 23 Iris dataset processed using the un-pruned version of the cost-sensitive algorithms ................................................................................................................................................ 134 iii TABLE OF TABLES Table 1 Example dataset ‘Television Repair’ .......................................................................... 10 Table 2 Pay-off matrix for Prisoner’s Dilemma ...................................................................... 14 Table 3 Cost-sensitive decision tree induction algorithms categorized with respect to taxonomy by time .................................................................................................................... 20 Table 4 Definitions of equations .............................................................................................. 22 Table 5 Example of a cost matrix of a four class problem ...................................................... 29 Table 6 Typical cost matrix for two-class problems ................................................................ 68 Table 7 Main characteristics

Load more