Probabilistic Logic Learning
Total Page:16
File Type:pdf, Size:1020Kb
Probabilistic Logic Learning Luc De Raedt Kristian Kersting Institut fur¨ Informatik, Albert-Ludwigs-University, Institut fur¨ Informatik, Albert-Ludwigs-University, Georges-Koehler-Allee, Building 079, D-79110 Georges-Koehler-Allee, Building 079, D-79110 Freiburg, Germany Freiburg, Germany [email protected] [email protected] ABSTRACT The past few years have witnessed an significant interest in probabilistic logic learning, i.e. in research lying at the in- Probability Logic tersection of probabilistic reasoning, logical representations, and machine learning. A rich variety of different formalisms and learning techniques have been developed. This paper Probabilistic provides an introductory survey and overview of the state- Logic Learning Learning of-the-art in probabilistic logic learning through the identi- fication of a number of important probabilistic, logical and learning concepts. Figure 1: Probabilistic Logic Learning as the intersection of Keywords Probability, Logic, and Learning. data mining, machine learning, inductive logic program- ing, diagnostic and troubleshooting, information retrieval, ming, multi-relational data mining, uncertainty, probabilis- software debugging, data mining and user modelling. tic reasoning The term logic in the present overview refers to first order logical and relational representations such as those studied 1. INTRODUCTION within the field of computational logic. The primary advan- One of the central open questions in data mining and ar- tage of using first order logic or relational representations tificial intelligence, concerns probabilistic logic learning, i.e. is that it allows one to elegantly represent complex situa- the integration of relational or logical representations, prob- tions involving a variety of objects as well as relations be- abilistic reasoning mechanisms with machine learning and tween the objects. Consider e.g. the problem of building a data mining principles. In the past few years, this ques- logical troubleshooting system of a set of computers. Com- tion has received a lot of attention and various different ap- puters are complex objects which are themselves composed proaches have been developed, cf. [82; 41; 88; 66; 73; 45; 60; of several complex objects. Furthermore, assume that even 61; 57; 81; 52; 77; 2; 55; 87]. This paper provides an intro- though the overall structure of the computers can be the ductory survey and overview of those developments that lie same, their individual components can be highly different. at the intersection of logical (or relational) representations, Such situations can be elegantly modelled using relational probabilistic reasoning and learning, cf. Figure 1. While do- or first order logic representations. E.g., we could use the ing so, we identify some of the key concepts and techniques relation ram(r1, c1) to specify that computer c1 has a RAM underlying probabilistic logic learning. This, in turn, we component of type r1, and cpu(p2, c1) and cpu(p3, c1) to de- hope, should allow the reader to appreciate the differences note that c1 is a multi-processor machine with two cpu’s p2 and commonalities between the various approaches and at and p3. In addition to these base (or extensional) relations, the same time provide insight into some of the remaining one can also add virtual (or intensional) relations to describe challenges in probabilistic logic learning. generic knowledge about the domain. E.g., consider the rule Before starting our survey, let us specify what we mean by multiprocessor(C):− cpu(P1, C), cpu(P2, C), P1 6= P2 probabilistic logic learning. The term probabilistic in our context refers to the use of probabilistic representations and which defines the concept of a multi-processor machine (as reasoning mechanisms grounded in probability theory, such a machine that possesses two different cpu’s). Representing as Bayesian networks [78; 10; 47], hidden Markov models [85; the same information employing a propositional logic would 84; 22] and stochastic grammars [22; 64]. Such representa- require one to specify for each machine a separate model. tions have been successfully used across a wide range of ap- The term learning in the context of probabilistic logic refers plications and have resulted in a number of robust models for to deriving the different aspects of the probabilistic logic reasoning about uncertainty. Application areas include ge- on the basis of data. Typically, one distinguishes various netics, computer vision, speech recognition and understand- learning algorithms on the basis of the given data (fully or partially observable variables) or on the aspect being learned (the parameters of the probabilistic representation or its log- ical structure). The motivation for learning is that it is of- ten easier to obtain data for a given application domain and alarm :− burglary, earthquake. learn the model than to build the model using traditional johncalls :− alarm. knowledge engineering techniques. marycalls :− alarm. So, probabilistic logic learning aims at combining its three underlying constituents: learning and probabilistic reason- Figure 2: The alarm program. ing within first order logic representations. The recent inter- est in probabilistic logic learning may be explained by the growing body of work that has addressed pairwise intersec- the relational model is not expressive enough to model the tions of these domains: logical component of many of the probabilistic logical rep- resentations (e.g., [88; 66; 90; 52]). Nevertheless, we have - Indeed, various techniques for probabilistic learning such attempted to be self-contained and to reduce the required as gradient-based methods, the family of EM algorithms logical machinery and background as much as possible. or Markov Chain Monte Carlo methods have been devel- The paper is organized as follows: Sections 2 and 3 introduce oped and exhaustively investigated in different commu- the key concepts underlying computational logic and prob- nities, such as in the Uncertainty in AI community for abilistic reasoning formalisms, respectively; Section 4 then Bayesian networks and in the Computational Linguistic studies first-order probabilistic logics, i.e., how probabilistic community for Hidden Markov Models. These techniques reasoning and computational logic can be combined; after- are not only theoretically sound, they have also resulted wards, Section 5 then surveys various approaches to learning in entirely new technologies for, and revolutionary novel within those probabilistic logics, and, finally, in Section 6, products in computer vision, speech recognition, medical we conclude. diagnostics, troubleshooting systems, etc. Overviews of and introductions to probabilistic learning can be found 2. INTRODUCTION TO LOGIC in [20; 84; 44; 8; 48; 22; 64]. As logic programs will be used throughout this paper as - Inductive Logic Programming and multi-relational data the underlying knowledge representation framework, we mining studied logic learning, i.e. learning and data min- now briefly introduce their key concepts. Afterwards, we ing within first order logical or relational representations. show how state-of-the-art probabilistic representations em- Inductive Logic Programming has significantly broadened ploy these concepts. the application domain of data mining especially in bio- and chemo-informatics (cf. the other papers in this vol- 2.1 Logic ume) and now represent some of the best-known exam- The choice of logic programs as the underlying represen- ples of scientific discovery by AI systems in the literature. tation is motivated by its expressive power and general- Overviews of inductive logic learning and multi-relational ity. Indeed, logic programs allow us to represent relational data mining can be found in this volume and in [69; 23]. databases, grammars and also programs. This will turn out to be useful when discussing commonalities and differences - Early work on incorporating probabilities into logic pro- in the probabilistic relational or logical representations later grams was done by Clark and McCabe [9] and by on. In addition, logic programs are well-understood, have a Shapiro [97]. Subsequently, researchers such as Nils- clear semantics and are wide-spread. son [75], Halpern [43], Ng and Subrahmanian [71], and Poole [82] have studied probabilistic logics from a knowl- Three example logic programs are given in Figures 2, 3 edge representational perspective. The aim of this re- and 4.b. The first one specifies some regularities in the search is more a probabilistic characterization of logic than alarm program (inspired on Judea Pearl’s famous example suitable representations for learning. for Bayesian networks). The genetic program is a simple model of Mendel’s law applied to the color of plants. Finally, Finally, in the past few years there have been some impor- the grammar example encodes a context-free grammar. The tant attempts that address the full problem of probabilistic corresponding type of logic program is known under the term logic learning [59; 30; 67; 31; 14; 53; 51; 92; 68]. ”definite clause grammar”. The goal of this paper is to provide an introduction and Let us now introduce some terminology. The overview to these works. In doing so, we restrict our atten- predicates in the above examples include tion to those developments that address the intersection of alarm/0, burglary/0, earthquake/0,..., pc/2, mc/2, pt/2, probabilistic logic learning, cf. Figure 1. Consequently, we mother/2, father/2, s/2, vp/2,... Their arity