Cumulative Learning in the Lambda Calculus

Cumulative Learning in the Lambda Calculus Robert John Henderson Thesis submitted for the degree of Doctor of Philosophy Department of Computing Imperial College London May 2014 Copyright Declaration The copyright of this thesis rests with the author and is made available un- der a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work. Abstract I design a machine learning system capable of `cumulative learning', which means that it automatically acquires the knowledge necessary for solving harder problems through experience of solving easier ones. Working within the learning framework of inductive programming, I propose that the technique of abstraction, familiar from software engineering, is a suitable mechanism for accumulat- ing knowledge. In abstraction, syntactic patterns in solutions to past problems are isolated as re-usable units and added to background knowledge. For the system's knowledge representation language, I argue that lambda calculus is a more suitable choice than first-order logic because lambda calculus supports abstraction readily. However, more mature and theoretically well-founded base inference techniques are available within first-order Inductive Logic Programming (ILP). Therefore, my approach is to adapt ILP inference techniques to lambda calculus. Central to ILP is the idea of `generality', and I show that a suitable concept of generality in lambda calculus arises from its standard denotational semantics. Consequently, notions of entailment, subsumption, refinement, and inverse deduction have direct analogues in the lambda calculus setting. I argue that the conventional `compression' measure used in ILP is inflexible in capturing prior assumptions, particularly in the context of an expanding background knowledge. Instead I introduce a non-parametric Bayesian prior over hypotheses and background knowledge. I then design an inductive inference algorithm for the lambda calculus setting based on refinement and proof-directed search. I give a formal proof of correctness of this algorithm. To enable automatic invention of abstractions, I design two algorithms. The first is a heuristic search that uses anti-unification to discovering opportunities for abstraction within a corpus of knowledge. I give a formal characterisation of its search space. The second algorithm performs inverse deduction in order to refactor knowledge in terms of an abstraction. I prove that this refactoring process is semantics-preserving. Acknowledgements I would like to thank my supervisor, Stephen Muggleton, without whose guid- ance and advice this project would not have been possible. I would also like to thank Dianhuan Lin, JoséSantos, Tristan Allwood, and Antonia Mayer for many helpful discussions. This project was supported by an EPSRC studentship. Declaration of Originality I declare that the work presented in this thesis is my own, except where appro- priately stated and referenced. Contents 1 Introduction 4 1.1 The RUFINSKY System . 6 1.2 First-Order Logic vs. Lambda Calculus . 8 1.3 Contributions of this Project . 12 2 Background 13 2.1 Cumulative Learning . 13 2.2 Inductive Logic Programming . 18 2.2.1 Main Principles . 19 2.2.2 Inverse Deduction . 21 2.2.3 Refinement . 25 2.2.4 Multiple Refinement Points and Proof-Directed Search . 27 2.3 Other Inductive Programming Techniques . 31 2.3.1 Higher-order Background Knowledge as Declarative Bias . 31 2.3.2 Non-termination and Levin Search . 33 2.4 Abstraction Invention . 35 2.4.1 Anti-unification as a Means of Abstraction . 35 2.4.2 Predicate Invention in Inductive Logic Programming . 37 2.4.3 Limitations of Predicate Invention due to First-Order Logic 40 3 The Lambda Calculus Setting for Learning 43 3.1 Syntax . 43 3.1.1 Types . 43 3.1.2 Terms . 44 3.1.3 Declarations . 46 3.2 Denotational Semantics . 47 3.2.1 Types . 47 3.2.2 Terms . 48 3.3 Generality Orders . 50 3.4 Setting for Learning . 52 1 3.4.1 Partial terms represent sets of hypotheses . 53 4 Prior Probability Distributions 54 4.1 The Base Prior . 56 4.1.1 Some examples . 63 4.2 The Nonparametric Prior . 65 4.2.1 Worked example . 75 4.3 The Cumulative Learning Prior . 77 4.3.1 Worked example . 80 5 Refinement and RUFUS 84 5.1 Refinement . 84 5.1.1 Refinement operators and their properties . 85 5.1.2 The RUFUS refinement operator . 86 5.2 Proof-Directed Search . 88 5.2.1 The evaluator . 90 5.2.2 The runThreads algorithm . 91 5.2.3 The RUFUS search tree . 96 5.2.4 Guiding the search . 100 5.2.5 Worked example . 103 6 Anti-unification Search 105 6.1 Tree-Structured Terms . 106 6.2 The auSearch Specification . 108 6.3 The auSearch Algorithm . 111 6.4 The mnos Algorithm . 121 6.4.1 Gavril's maximum independent set algorithm . 121 6.4.2 Using Gavril's algorithm to find maximum non-overlapping subsets . 122 7 Abstraction Invention and KANDINSKY 125 7.1 The absInv Algorithm . 125 7.1.1 Worked example . 135 7.2 Proof of Correctness of absInv . 136 7.2.1 Preliminaries . 136 7.2.2 Eta-expansion . 139 7.2.3 Refactoring the master term . 139 8 Discussion 145 8.1 A `Proof-of-Concept' Demonstration of RUFINSKY . 145 8.2 How RUFINSKY could be Experimentally Evaluated . 151 2 8.3 Limitations of RUFINSKY's Cumulative Learning Mechanism . 154 8.4 Issues Raised about the Design of Inductive Programming Systems155 8.4.1 First order logic and lambda calculus . 155 8.4.2 Bayesian inference . 157 8.5 Related Work . 158 8.5.1 Search strategies . 158 8.5.2 Hybrid functional logic languages . 159 8.5.3 Program transformation . 159 8.5.4 Analogical reasoning . 160 9 Conclusions and Future Work 161 9.1 Summary of Thesis and Main Results . 161 9.2 Directions for Future Work . 162 9.3 Final Thoughts . 164 A Mathematical Conventions 165 A.1 Set Comprehension Notation . 165 B Proof of Lemma 6.53 167 Bibliography 170 3 Chapter 1 Introduction The aim of this PhD project is to explore one approach to the problem of cumulative learning [Michelucci and Oblinger, 2010] in artificial intelligence. In other words: how might a computer system accumulate knowledge or expertise over time, and thus incrementally improve its own ability to learn and solve problems? To illustrate what is meant by the term `cumulative learning', let us consider two examples. The first is that of a human learning a skill, for example, a child learning to speak and to understand their native language. The second is that of the scientific method, the process by which the human race as a whole increases its understanding of the nature of the universe and the laws governing it, by performing experiments so as to guide the development of increasingly accurate theories. Both of these processes, though they occur at very different scales, have two features in common: 1. Inductive inference is used to solve problems. 2. Progression to more difficult problems is made possible via the incremental accumulation of knowledge. For the purpose of this thesis, I define `cumulative learning' as any process that possesses these two features. In this PhD, the goal is to design and implement a machine learning algorithm capable of cumulative learning as defined by the two features above. Why pursue such a goal? There are two main reasons, one philosophical and the other practical. The philosophical motivation is that by implementing an algorithm that performs a process, we can improve our understanding of the nature of that process itself. In other words, if human learning and the scientific method are both instances of cumulative learning, then if one can construct a demonstrably effective algorithm that is also an instance of cumulative learning, it should provide some insight into how and why processes such as human and 4 scientific learning are able to operate. A more practical motivation for studying cumulative learning relates to the field of machine learning and data mining [Bishop, 2006; Kononenko and Kukar, 2007], the sub-field of artificial intelligence which may be defined as `the use of computer algorithms to discover patterns in data'. Machine learning currently has many industrial applications, to name a few: analysis of scientific data (particularly in biology) analysis of financial or stock market data medical diagnosis robot control (e.g. robots to help in disaster situations; automonous ve- hicles; robots for exploring other planets) information retrieval (e.g. internet search engines) automated translation handwriting and speech recognition artificial intelligence in computer games Despite these many applications, modern machine learning algorithms (for example: feedforward neural networks, Bayesian networks, or support vector ma- chines) are typically not capable of cumulative learning as defined by the two criteria given above. Most would only satisfy the first criterion, i.e. they use inductive inference to solve problems, often within a Bayesian framework. On the other hand, almost all machine learning algorithms currently in practical use lack any significant ability to `progress to more difficult problems via the incremental accumulation of knowledge'. This lack of an ability for the machine to learn cumulatively is a significant drawback, and it means that applying machine learning systems to real-world problems is currently a relatively labour-intensive and expensive activity. Cus- tom pattern-recognition algorithms are often implemented on a per-application basis, or otherwise data has to be heavily pre-processed in order to be suitable for input into stock algorithms.

Load more