Utilizing Distributions of Variable Influence for Feature Selection In
Total Page:16
File Type:pdf, Size:1020Kb
UTILIZING DISTRIBUTIONS OF VARIABLE INFLUENCE FOR FEATURE SELECTION IN HYPERSPECTRAL IMAGES by Neil Stewart Walton A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science MONTANA STATE UNIVERSITY Bozeman, Montana April 2019 c COPYRIGHT by Neil Stewart Walton 2019 All Rights Reserved ii DEDICATION To my wonderful, supportive parents for encouraging me every step of the way, to my fantastic friends for keeping me sane and smiling through it all, and to my enigmatic brother for reasons that are as of yet still unclear to me. iii ACKNOWLEDGEMENTS I would like thank the members of the Numerical Intelligent Systems Laboratory at Montana State University | in particular, Jacob Senecal, with whom I collabo- rated on much of the research included herein, Amy Peerlinck and Jordan Schupbach for many enlightening conversations, the members of my thesis committee | Dr. David L. Millman and Dr. Joseph A. Shaw, members of the Optical Technology Center at Montana State University | especially Bryan Scherrer and Riley Logan, and, above all, my advisor and committee chair, Dr. John W. Sheppard. I would also like to thank Intel Corporation for providing funding for much of this research. iv TABLE OF CONTENTS 1. INTRODUCTION ........................................................................................1 2. BACKGROUND...........................................................................................6 2.1 Feature Selection .....................................................................................6 2.1.1 Applications to Hyperspectral Data ..................................................7 2.2 Hyperspectral Imaging.............................................................................9 2.2.1 Overview.........................................................................................9 2.2.2 Multispectral Imaging.................................................................... 10 2.2.3 Hyperspectral Produce Monitoring ................................................. 13 2.3 Genetic Algorithm ................................................................................. 14 2.3.1 General formulation....................................................................... 14 2.3.2 Fitness Function............................................................................ 16 2.3.3 Selection ....................................................................................... 17 2.3.4 Crossover ...................................................................................... 18 2.3.5 Mutation....................................................................................... 19 2.3.6 Replacement.................................................................................. 20 2.4 Artificial Neural Networks...................................................................... 21 2.5 Decision Trees ....................................................................................... 26 2.5.1 Random Forests............................................................................. 28 3. GENETIC ALGORITHM BASED FEATURE SELECTION.........................30 3.1 Related Work ........................................................................................ 30 3.2 Genetic Algorithm Formulation .............................................................. 31 3.2.1 Representation .............................................................................. 31 3.2.2 Fitness Function............................................................................ 32 3.3 Histogram-based Approach..................................................................... 32 3.3.1 Overview....................................................................................... 32 3.3.2 Hierarchical Agglomerative Clustering............................................. 33 3.3.3 Expectation-Maximization.............................................................. 37 3.3.4 Selecting Features.......................................................................... 39 3.3.5 The HAGRID Algorithm................................................................ 39 3.4 Experimental Setup and Methods ........................................................... 39 3.4.1 Hyperspectral Imaging and Staging................................................. 40 3.4.2 Data ............................................................................................. 42 3.4.2.1 Avocado and Tomato Dataset:................................................ 42 3.4.2.2 Kochia Dataset:..................................................................... 43 3.4.2.3 Indian Pines Dataset:............................................................. 44 3.4.3 Experimental Design...................................................................... 44 v TABLE OF CONTENTS { CONTINUED 3.4.4 Parameter Settings ........................................................................ 46 3.5 Experimental Results............................................................................. 47 3.6 Conclusions and Future Work................................................................. 51 4. BANDWIDTH DETERMINATION.............................................................55 4.1 Specifying Bandwidths with HAGRID .................................................... 56 4.1.1 Transforming Data ........................................................................ 57 4.2 Experimental Setup ............................................................................... 59 4.3 Experimental Results............................................................................. 61 4.4 Conclusions and Future Work................................................................. 65 5. VARIABLE INFLUENCE ALGORITHMS...................................................67 5.1 Distribution-based Feature Selection using Variable Importance ............... 67 5.1.1 Mean Decrease Accuracy................................................................ 67 5.1.2 Mean Decrease Gini....................................................................... 68 5.1.3 Related Work ................................................................................ 68 5.1.4 Multicollinearity in Hyperspectral Data .......................................... 69 5.1.5 VI-DFS......................................................................................... 71 5.2 Automated Determination of Feature Subset Size .................................... 76 5.3 Layer-wise Relevance Propagation for Feature Selection ........................... 77 5.3.1 Gradient-based Explanation ........................................................... 77 5.3.2 Layer-wise Relevance Propagation .................................................. 80 5.3.3 Applications.................................................................................. 83 5.3.4 LRP-DFS...................................................................................... 83 5.4 Experimental Setup ............................................................................... 86 5.5 Experimental Results............................................................................. 88 5.6 Conclusions and Future Work................................................................. 91 6. CONCLUSION...........................................................................................93 REFERENCES CITED....................................................................................98 APPENDIX: Wavelength and Bandwidth Settings ........................................... 109 vi LIST OF TABLES Table Page 3.1 Sample sizes for produce data sets showing number of each fruit and number of samples per fruit.................................... 43 3.2 Classification accuracy using RGB, RGB+NIR, all wavelengths, and feature selection for the avocado and tomato datasets. The best accuracy for each dataset is shown in bold. ................................................................. 49 3.3 Classification accuracies for the kochia dataset. The best results are shown in bold. ......................................................... 51 3.4 Classification accuracies for the Indian Pines dataset. The best results (excluding all wavelengths) are shown in bold. ................................................................................ 52 4.1 Wavelength centers and bandwidths for red, blue, green, and near infrared filters used in experiments. .......................... 60 4.2 Classification accuracy for the avocado and tomato datasets using various wavelength centers and sim- ulated filter bandwidths. The best results for each dataset are shown in bold. ............................................................... 63 4.3 Classification accuracy for the kochia dataset us- ing various wavelength centers and simulated filter bandwidths. The best results are shown in bold................................ 64 5.1 Wavelengths selected by VI-DFS, rank-based feature selection (VI-R), forward selection (VI-FS), and backward elimination (VI-BE) for the avocado and tomato datasets. The number of features for VI-DFS is selected automatically using BIC and the other methods were made to select that same number of features. ......................................................................................... 75 5.2 Results for the VI-DFS, LRPZ+-DFS, and LRPZB- DFS algorithms for the avocado and tomato datasets. The highest accuracy attained for each dataset is shown in bold. ...............................................................................