Inductive Logic Programming with Gradient Descent for Supervised

Inductive Logic Programming with Gradient Descent for Supervised Binary Classification by Nicholas Wu B.S., Massachusetts Institute of Technology (2019) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2020 ○c Massachusetts Institute of Technology 2020. All rights reserved. Author................................................................ Department of Electrical Engineering and Computer Science February 2020 Certified by. Andrew W. Lo Charles E. and Susan T. Harris Professor, Sloan School of Management Thesis Supervisor Accepted by . Katrina LaCurts Chairman, Master of Engineering Thesis Committee 2 Inductive Logic Programming with Gradient Descent for Supervised Binary Classification by Nicholas Wu Submitted to the Department of Electrical Engineering and Computer Science on February 2020, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract As machine learning techniques have become more advanced, interpretability has become a major concern for models making important decisions. In contrast to Local Interpretable Model-Agnostic Explanations (LIME), this thesis seeks to develop an interpretable model using logical rules, rather than explaining existing blackbox models. We extend recent inductive logic programming methods developed by Evans and Grefenstette [3] to develop an gradient descent-based inductive logic programming technique for supervised binary classification. We start by developing our methodology for binary input data, and then extend the approach to numerical data using a threshold-gate based binarization technique. We test our implementations on datasets with varying pattern structures and noise levels, and select our best performing im- plementation. We then present an example where our method generates an accurate and interpretable rule set, whereas the LIME technique fails to generate a reasonable model. Further, we test our original methodology on the FICO Home Equity Line of Credit dataset. We run a hyperparameter search over differing number of rules and rule sizes. Our best performing model achieves a 71.7% accuracy, which is comparable to multilayer perceptron and randomized forest models. We conclude by suggesting directions for future applications and potential improvements. Thesis Supervisor: Andrew W. Lo Title: Charles E. and Susan T. Harris Professor, Sloan School of Management 3 4 Acknowledgments My journey through MIT hasn’t been linear, and there have been so many twists and turns along the way. As such, there are so many people I have to thank for their guidance, advice, and support. To start, I extend my heartfelt gratitude to Professor Andrew Lo for all his help through this thesis. I am incredibly grateful for all the guidance he has provided for me along the way. Whenever I had questions, whenever I seemed to get stuck or con- fused, Professor Lo would always offer a different angle or another way forward. His mentorship throughout the entire research process helped make this thesis possible. I also recognize the help I received from the staff at the Laboratory for Financial Engi- neering. I sincerely thank Jayna Cummings, Crystal Myler, and Mavanee Nealon, for all their behind-the-scenes work coordinating meetings and making sure my research process went smoothly. Additionally, I want to thank some of my close friends and classmates who have always been listening and helping me along my way throughout MIT. I’ve been helped by so many people, whether it be inspiring me, listening to my ideas, providing general advice, or teaching me things I don’t know. To that end, I want to thank Alap Sahoo, Luis Sandoval, Haris Brkic, Evan Tey, Justin Yu, and Henry La Soya for all they’ve done to help me through this thesis, and especially for being good friends to me throughout this time. I also want to thank Jenny Shi for all her support over the past two years. From late-night proofreading, motivating me to work, and getting me past mental stumbling blocks on my research path, she has done so much to help me get here. Lastly, I have to thank my parents, Daniel and Li, and my sister, Jackie, for their continual support throughout everything in the past four and a half years. Throughout this entire time, my family has stood by me, helping me through all my struggles and celebrating my successes. Without them, I likely would never have even gotten the opportunity to come to MIT, and I extend my greatest thanks for their constant support. 5 6 Contents 1 Introduction 15 1.1 Research Goals . 15 1.2 Thesis Structure and Result Summary . 16 1.2.1 Chapter 2 . 16 1.2.2 Chapter 3 . 16 1.2.3 Chapter 4 . 17 1.2.4 Chapter 5 . 17 1.2.5 Chapter 6 . 17 1.2.6 Chapter 7 . 18 2 Inductive Logic Programming 19 2.1 Notation and Definition . 19 2.2 Early methods for Inductive Logic Programming . 20 2.2.1 RLGG refinements . 21 2.2.2 Top-down approaches . 22 2.2.3 Inverse Entailment . 22 2.3 Recent Approaches . 23 2.3.1 Boolean Satisfiability Reduction . 23 2.3.2 Learning Process . 23 2.3.3 Approach Results . 24 3 Explainable Artificial Intelligence 25 3.1 Why Explainability? . 25 7 3.2 Blackbox Interpretability Methods . 26 3.2.1 LIME . 26 3.2.2 Gradient Approximation . 27 3.2.3 General Approach Shortcomings . 28 3.3 Inductive Methods versus Regression . 28 4 Model Development 31 4.1 Approximating Logical Structures . 32 4.1.1 Parametrization . 32 4.1.2 Implementating AND . 34 4.1.3 General Learning Procedure . 36 4.2 Experiments . 37 4.2.1 Data Construction . 37 4.2.2 Model Comparisons . 37 4.3 Discussion . 41 5 Inducing Rules on Numerical Data 43 5.1 Approach Details . 43 5.1.1 Feature Expansion . 44 5.1.2 Interpretability . 44 5.2 Experiments . 44 5.2.1 Experimental Results . 45 5.3 Discussion . 46 5.3.1 Comparison to LIME . 48 6 FICO Home Equity Line of Credit Tests 51 6.1 Dataset . 51 6.1.1 Weight-of-Evidence Encoding . 53 6.2 Binary Data Rule Learning . 53 6.2.1 Experiments . 54 6.3 Numerical Data Rule Learning . 55 8 6.3.1 Results and Analysis . 56 6.4 Discussion . 57 6.4.1 Comparisons to Related Work . 58 6.4.2 Generalizability . 58 6.4.3 Approach Shortcomings . 59 7 Conclusion and Next Steps 63 7.1 Key Ideas . 63 7.2 Future Work . 65 A Tables 67 9 10 List of Figures 4-1 Relation between number of features and model performance . 39 4-2 Relation between number of rules and model performance . 40 4-3 Relation between rule size and model performance . 41 4-4 Relation between noise level and model performance . 42 5-1 Relation between error rate and numerical model performance . 46 5-2 Relation between number of rules and numerical model performance . 47 5-3 Relation between rule size and numerical model performance . 47 6-1 Example Training Run, Plotting Accuracy and Loss . 59 11 12 List of Tables 4.1 List of all constructed dataset configurations for binary data rule learning 38 4.2 Model Run Results, Constructed Dataset 1 . 39 5.1 List of all constructed dataset configurations for numerical rule learning 45 5.2 LIME Coefficients around test point . 49 6.1 Rules for learning non-creditworthiness . 55 6.2 Rules for learning creditworthiness . 55 6.3 Rules for learning non-creditworthiness, numerical data . 57 6.4 Rules for learning creditworthiness, numerical data . 57 6.5 Comparison between our descent-based inductive logic programming and other models . 58 A.1 Model Run Results, Varying Number of Features . 67 A.2 Model Run Results, Varying Number of Rules . 68 A.3 Model Run Results, Varying Rule Size . 69 A.4 Model Run Results, Varying Noise Level . 70 A.5 Numerical Model Run Results, Varying Noise Level . 70 A.6 Numerical Model Run Results, Varying Number of Rules . 71 A.7 Numerical Model Run Results, Varying Number of Rules . 71 A.8 Hyperparameter search, predicting non-creditworthiness with HELOC Binarized Data . 72 A.9 Hyperparameter search, predicting creditworthiness using HELOC Bi- narized Data . 72 13 A.10 Hyperparameter search, predicting non-creditworthiness using HELOC Numerical Data . 73 A.11 Hyperparameter search, predicting creditworthiness using HELOC Nu- merical Data . 73 14 Chapter 1 Introduction In the past decade, the proliferation of machine learning and artificial intelligence techniques has allowed such models to outperform many traditional methods for a myriad of applications, from image processing to natural language processing. How- ever, many successful machine learning models function as blackboxes, especially since these models frequently produce extremely intricate functions where there is no in- tuitive meaning for any of the model’s parameters. Further, as models become more complex, the number of parameters in a model such as a deep neural network can exceed several million, making it difficult to understand model behavior. As such, there has been interest in producing machine learning models that are explainable to humans without sacrificing accuracy. One of the interesting approaches to developing explainable artificial intelligence comes from the field of inductive logic programming. Inductive logic programming deals with the development of a hypothesis that logically entails a set of background examples. This approach addresses the explainability issue explicitly in that any logic program consists of a set of Horn clauses, which are explainable as logical rules. 1.1 Research Goals This research seeks to extend research development in the field of inductive logic programming in order to adapt inductive logic programming methods to the general 15 task of supervised binary classification.

Load more