Logic-Based Machine Learning
Total Page:16
File Type:pdf, Size:1020Kb
Chapter LOGICBASED MACHINE LEARNING Stephen Muggleton and Flaviu Marginean Department of Computer Science University of York Heslington York YO DD United Kingdom Abstract The last three decades has seen the development of Computational Logic techniques within Articial Intelligence This has led to the development of the sub ject of Logic Programming LP which can b e viewed as a key part of LogicBased Articial Intelligence The subtopic of LP concerned with Machine Learning is known as Inductive Logic Programming ILP which again can b e broadened to Logic Based Machine Learning by dropping Horn clause restrictions ILP has its ro ots in the groundbreaking research of Gordon Plotkin and Ehud Shapiro This pap er provides a brief survey of the state of ILP application s theory and techniques Keywords Inductive logic programming machine learning scientic discovery protein prediction learning of natural language INTRODUCTION As envisaged by John McCarthy in McCarthy Logic has turned out to b e one of the key knowledge representations for Articial Intelligence research Logic in the form of the FirstOrder Predicate Calculus provides a representation for knowledge which has a clear semantics together with wellstudied sound rules of inference Interestingly Turing Turing and McCarthy McCarthy viewed a combination of Logic and Learning as b eing central to the development of Articial Intelligence research This article is concerned with the topic of LogicBased Machine Learning LBML The mo dern study of LBML has largely b een involved with the study of the learning of logic programs or Inductive Logic Programming ILP Muggleton Many of the foundational results of b oth Gordon Plotkin Plotkin and Ehud Shapiro Shapiro are still central to the study and conceptual framework of ILP This is true despite the fact that Plotkin did not employ Horn clause restrictions ILP is a general form of Machine Learning which involves the construction of logic programs from examples and background knowledge The sub ject LOGICBASED ARTIFICIAL INTELLIGENCE has inherited its logical tradition from Logic Programming and its empirical orientation from Machine Learning Rob ert Kowalski Kowalski famously describ ed Logic Programming LP with the following equation LP Logic Control The equation emphasizes the role of the programmer in providing sequencing control when writing Prolog programs In a similar spirit we might describ e ILP as follows ILP Logic Statistics Computational Control The logical part of ILP is related to the formation of hypotheses while the statistical part is related to evaluating their degree of b elief As in LP the Computational Control part is related to the sequencing of search carried out when exploring the space of hypotheses The structure of the pap er is as follows Section introduces the key elements of ILP provides a formal framework Section for the later discussion and discusses how Bayesian inference Section is used as a preference mechanism Section describ es applications of ILP to problems related to discovery of biological function ILP has a strong p otential for b eing applied to Natural Language Pro cessing NLP The resultant research area of Learning Language in Logic LLL is describ ed in Section together with some encouraging preliminary results Conclusions concerning ongoing ILP research are given in Section ILP ILP algorithms take examples E of a concept such as a protein family together with background knowledge B such as a denition of molecular dynamics and construct a hypothesis h which explains E in terms of B For example in the protein fold domains Section E might consist of descriptions of molecules separated into p ositive and negative examples of a particular fold overall protein shap e This is exemplied in Figure for the fold helicalupanddownbundle A p ossible hypothesis h describing this class of proteins is shown in Figure The hypothesis is a denite clause consisting of a head fold and a body the conjunction length helix In this case fold is the predicate involved in the examples and hypothesis while length p osition etc are dened by the background knowledge A logic program is simply a set of such denite clauses Each of E B and h are logic programs In the context of knowledge discovery a distinct advantage of ILP over black b ox techniques such as neural networks is that a hypotheses such as that shown in Figure can in a straightforward manner b e made readable by translating it into the following piece of English text The protein P has fold class Fourhelical upanddown bundle if it contains a long helix H at a secondary structure p osition b etween and and H is followed by a second helix H Such explicit hypotheses are required for exp erts to b e able to use them within the familiar human scientic discovery cycle of debate criticism and refutation LogicBased Machine Learning Positive Negative H:6[79-88] H:5[111-113] H:3[71-84] H:1[19-37] H:4[61-64] H:5[66-70] H:1[8-17] H:2[26-33] E:2[96-98] E:1[57-59] H:7[99-106] H:4[93-108] H:2[41-64] H:3[40-50] 1omd - EF-Hand 2mhr - Four-helical up-and-down bundle Figure A p ositive and a negative example of the protein fold helicalupand downbundle D arrangement of secondary structure units is shown for helices cylinders and sheets arrows Each secondary structure unit is lab elled according to the index of its rst and last amino acid residue foldFourhelical upanddown bundleP helixPH lengthHhi p ositionPHPos interval Pos adjacentPHH helixPH Figure An hypothesised denite clause for helicalup anddownbund les LOGICBASED ARTIFICIAL INTELLIGENCE FORMAL FRAMEWORK FOR ILP The normal framework for ILP Muggleton and De Raedt Nienhuys Cheng and de Wolf is as follows As exemplied by the protein folds problem describ ed in the last subsection the learning system is provided with + background knowledge B p ositive examples E and negative examples E + and constructs an hypothesis h B E E and h are each logic programs A logic program Lloyd is a set of denite clauses each having the form h b b 1 n + where h is an atom and b b are atoms Usually E and E consist of 1 n + ground clauses those for E b eing denite clauses with empty b o dies and those for E b eing clauses with head false and a single ground atom in the b o dy In the text b elow the logical symbols used are logical and logical or j logical entailment 2 Falsity The conditions for construction of h are as follows + Necessity B j E + Suciency B h j E Weak consistency B h j 2 Strong consistency B h E j 2 Note that neither suciency nor strong consistency are required for systems that deal with noise The four conditions ab ove capture al l the logical requirements of an ILP system However for any B and E there will generally b e many hs which satisfy these conditions Statistical preference is often used to distinguish b etween these hypotheses see Section Both Necessity and Consistency can b e checked using a theorem prover Given that all formulae involved are denite the theorem prover used need b e nothing more than a Prolog interpreter with some minor alterations such as iterative deep ening to ensure logical completeness DERIVING ALGORITHMS FROM THE SPECIFICATION OF ILP The suciency condition captures the notion of generalizing examples relative to background knowledge A theorem prover cannot b e directly applied + to derive h from B and E However by simple application of the Deduction Theorem the suciency condition can b e rewritten as follows + E j h Suciency B This simple alteration has a profound eect The negation of the hypothesis can now b e deductively derived from the negation of the examples together with the background knowledge This is true no matter what form the examples take and what form the hypothesis takes This approach of turning an inductive problem into one of deduction is called inverse entailment Muggleton Metho ds for ensuring completeness of inverse entailment have b een a sub ject of debate recently Yamamoto Muggleton LogicBased Machine Learning 1 p(H) 0 H Figure Prior over hypotheses Hyp otheses are enumerated in order of descending prior probability along the Xaxis The Yaxis is the probabili ty of individua l hypotheses Vertical bars represent hypotheses consistent with E BAYESIAN FRAMEWORK It is not sucient to sp ecify the ILP framework in terms of the logical relationships which must hold b etween E B and h For any given E and B there will b e many p ossibly innitely many choices for h Thus a technique is needed for dening a preference over the various choices for h One approach to doing so involves dening a Bayesian prior probability distribution over the learners hypothesis space Muggleton a This is illustrated in Figure 1 According to Bayes theorem the hypothesis h with maximum M AP p osterior probability in hypothesis space H is as follows h argmax P hjE M AP h2H P E jhP h argmax h2H pE argmax P E jhP h h2H Within ILP Bayesian approaches Muggleton b have b een used to investigate the problem of learning from p ositive examples only Muggleton and issues related to predicate invention Khan et al a relational form of feature construction Learning from p ositive examples and predicate invention are imp ortant in b oth natural language domains see Section and in problems involving scientic discovery see Section DISCOVERY OF BIOLOGICAL FUNCTION Understanding of a variety of metab olic pro cesses is at the center of drug development within the pharmaceutical industry Each new drug costs hundreds of millions of p ounds to develop