Empirical Investigation of Decision Tree Extraction From

Empirical Investigation of Decision Tree Extraction From

EMPIRICAL INVESTIGATION OF DECISION TREE EXTRACTION FROM NEURAL NETWORKS A thesis presented to the faculty of the Fritz J. and Dolores H. Russ College of Engineering and Technology of Ohio University In partial fulfillment of the requirements for the degree Master of Science Maimuna H. Rangwala June 2006 This thesis entitled EMPIRICAL INVESTIGATION OF DECISION TREE EXTRACTION FROM NEURAL NETWORKS by MAIMUNA H. RANGWALA has been approved for the Department of Industrial and Manufacturing Systems Engineering and the Russ College of Engineering and Technology by Gary R. Weckman Associate Professor of Industrial & Manufacturing Systems Engineering R. Dennis Irwin Dean, Fritz J. and Dolores H. Russ College of Engineering and Technology ABSTRACT RANGWALA, MAIMUNA H., M.S., June 2006. Industrial and Manufacturing Systems Engineering EMPIRICAL INVESTIGATION OF DECISION TREE EXTRACTION FROM NEURAL NETWORKS (201 pp.) Director of Thesis: Gary R. Weckman The purpose of this thesis is to develop heuristics for employing Trepan, an algorithm for extracting decision trees from neural networks. Typically, several parameters need to be chosen to obtain a satisfactory performance of the algorithm. The current understanding of the various interactions between these is not well understood. By empirically evaluating the performance of the algorithm on a test set of databases chosen from benchmark machine learning and real world problems, several heuristics are proposed to explain and improve the performance of the algorithm. The experimentation is further validated by performance statistic measures. The algorithm is extended to work for multi-class regression problems and its ability to comprehend generalized feedforward networks is investigated. This work thus serves to provide improvements, an increased understanding of the behavior of the algorithm and heuristics to choose parameters for a better performance. Approved: Gary R. Weckman Associate Professor of Industrial and Manufacturing Systems Engineering Dedicated to my Father Prof. H. T. RANGWALA (1942-2006) and my Sister Fatema Rangwala (1988-2006) 5 TABLE OF CONTENTS ABSTRACT........................................................................................................................3 LIST OF TABLES.............................................................................................................. 8 LIST OF FIGURES .......................................................................................................... 11 CHAPTER 1. INTRODUCTION .................................................................................... 13 1.1 MACHINE LEARNING.......................................................................................... 13 1.2 CLASSIFICATION ALGORITHMS .......................................................................... 14 1.3 RESEARCH OBJECTIVES ..................................................................................... 14 1.4 THESIS OVERVIEW ............................................................................................. 16 CHAPTER 2. BACKGROUND AND LITERATURE REVIEW .................................. 17 2.1 ARTIFICIAL NEURAL NETWORKS ....................................................................... 17 2.1.1 Neural Network Architecture........................................................................ 17 2.1.2 Neural Network Training.............................................................................. 24 2.1.3 Neural Networks for Classification and Regression..................................... 25 2.1.4 Rule Extraction from Neural Networks ........................................................ 26 2.2 DECISION TREES ................................................................................................ 27 2.2.1 Decision Tree Classification......................................................................... 27 2.2.2 Decision Tree Applications........................................................................... 29 2.3 C4.5 ALGORITHM .............................................................................................. 30 6 2.3.1 Information Gain, Entropy Measure and Gain Ratio.................................... 31 2.4 TREPAN ALGORITHM....................................................................................... 37 2.4.1 M-of-N Splitting tests ................................................................................... 39 2.4.2 Single Test TREPAN and Disjunctive TREPAN ......................................... 40 CHAPTER 3. METHODOLOGY ................................................................................... 41 3.1 PHASE 1 ............................................................................................................. 43 3.2 PHASE 2 ............................................................................................................. 44 3.2.1 Datasets......................................................................................................... 45 3.2.2 Neural Network Modeling ............................................................................ 56 3.3 PHASE 3 ............................................................................................................. 57 3.4 PHASE 4 ............................................................................................................. 60 3.5 PERFORMANCE MEASURES ................................................................................. 62 3.5.1 Classification Accuracy................................................................................ 62 3.5.2 Comprehensibility......................................................................................... 64 CHAPTER 4. RESULTS AND DISCUSSION............................................................... 65 4.1 INVESTIGATE AND EXTEND TREPAN................................................................ 65 4.2 DATASET ANALYSIS .......................................................................................... 65 4.2.1 Corrosion....................................................................................................... 65 4.2.2 Outages ......................................................................................................... 79 4.2.3 Iris ................................................................................................................. 89 4.2.4 Body Fat........................................................................................................ 94 7 4.2.5 Saginaw Bay............................................................................................... 101 4.2.6 Admissions.................................................................................................. 110 CHAPTER 5. CONCLUSIONS AND FUTURE RESEARCH .................................... 118 5.1 SUMMARY AND DISCUSSION ............................................................................ 118 5.1.1 Accuracy ..................................................................................................... 119 5.1.2 Comprehensibility....................................................................................... 120 5.2 HEURISTICS...................................................................................................... 120 5.3 CONCLUSIONS.................................................................................................. 122 5.4 FUTURE RESEARCH.......................................................................................... 123 REFERENCES ............................................................................................................... 125 APPENDIX A: WEIGHTS AND NETWORK FILE FORMATS (GFF)...................... 131 APPENDIX B: CORROSION RESULTS...................................................................... 134 APPENDIX C: OUTAGES RESULTS.......................................................................... 143 APPENDIX D: IRIS RESULTS..................................................................................... 153 APPENDIX E: BODY FAT RESULTS......................................................................... 157 APPENDIX F: SAGINAW BAY RESULTS................................................................. 161 APPENDIX G: ADMISSIONS RESULTS.................................................................... 178 8 LIST OF TABLES Table 2.1: Activation Functions used in Neural Networks............................................... 21 Table 2.2: Play Tennis Example Dataset .......................................................................... 34 Table 3.1: Dataset Summary............................................................................................. 44 Table 3.2: Iris--Sample Dataset ........................................................................................ 45 Table 3.3: Body Fat -- Sample Dataset............................................................................. 47 Table 3.4: Body Fat Class Labels ..................................................................................... 47 Table 3.5: Saginaw Bay -- Sample Dataset ...................................................................... 50 Table 3.6: Chlorophyll Level Class Labels....................................................................... 50 Table 3.7: Corrosion -- Sample Dataset............................................................................ 52 Table 3.8: Corrosion Class labels ....................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    201 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us