Interpreting Bayesian Logic Programs
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI Technical Report WS-00-06. Compilation copyright © 2000, AAAI (www.aaai.org). All rights reserved. Interpreting Bayesian Logic Programs Kristian Kersting and Luc De Raedt and Stefan Kramer Institute for Computer Science, Machine Learning Lab University Freiburg, AmFlughafen 17, D-79110 Freiburg/Brg., Germany {kersting, deraedt, skramer}@informatik.uni-freiburg.de Abstract This paper is laid out as follows. The next three Various proposals for combiningfirst order logic with sections present the authors’ solution: Bayesian logic Bayesian nets exist. Weintroduce the formalism of programs. Section 5 shows a kernel implementation of Bayesianlogic programs,which is basically a simplifi- them in Prolog. In the following section we give sug- cation and reformulation of Ngoand Haddawysprob- gestions for learning Bayesian logic programs. abilistic logic programs. However,Bayesian logic pro- Weassume some familiarity with Prolog or logic pro- gramsare sufficiently powerfulto represent essentially gramming(see e.g. (Sterling & Shapiro 1986)) as well the same knowledgein a more elegant manner. The el- with Bayesian nets (see e.g. (Russell & Norvig 1995)). eganceis illustrated by the fact that they can represent both Bayesian nets and definite clause programs(as in "pure" Prolog) and that their kernel in Prolog is actu- Bayesian logic programs ally an adaptation of an usual Prolog meta-interpreter. Bayesian logic programs consist of two components. The first componentis the logical one. It consists of a set of Bayesian clauses (cf. below) which captures Introduction the qualitative structure of the domain and is based A Bayesian net (Pearl 1991) specifies a probability dis- on"pure" Prolog. The second component is the quan- tribution over a fixed set of randomvariables. As such, titative one. It encodes the quantitative information Bayesian nets essentially provide an elegant probabilis- about the domain and employs - as in Bayesian nets tic extension of propositional logic. However, the lim- - the notions of a conditional probability table (CPT) itations of propositional logic, which Bayesian nets in- and a combining rule. herit, are well-known. These limitations motivated the A Bayesian predicate is a predicate r to which a finite development of knowledge representation mechanisms domain Dr is associated. We define a Bayesian defi- employing first order logic, such as e.g. in logic pro- nite clause as an expression of the form A [ A1,... , An grammingand Prolog. In this context, it is no surprise where the A, A1,... ,An are atoms and all variables that various researchers have proposed various first or- are (implicitly) universally quantified. Whenwriting der extensions of Bayesian nets: e.g. probabilistic logic down Bayesian definite clauses, we will closely fol- programs (Ngo & Haddawy 1997), relational Bayesian low Prolog notation (with the exception that Prolog’s nets (Jaeger 1997) and probabilistic relational mod- ¯ - is replaced by [). So, variables start with a cap- els (Koller 1999). Manyof these techniques employ the ital, constant and functor symbols start with a lower- notion of Knowledge-Based Model Construction (Had- case character. The main difference between Bayesian dawy 1999) (KBMC),where first-order rules with and classical clauses is that Bayesian atoms represent sociated uncertainty parameters are used as a basis for classes of similar random variables. More precisely, generating Bayesian nets for particular queries. each ground atom in a Bayesian logic program repre- Wetried to identify a formalism that is as simple as sents a random variable. Each random variable can possible. While introducing Bayesian logic program- take on various possible values from the (finite) do- ming we employed one key design principle. The prin- main Dr of the corresponding Bayesian predicate r. ciple states that the resulting formalism should be as In any state of the world, a random variable takes close as possible to both Bayesian nets and to some exactly one value. E.g., we paraphrase that James’ well-founded first order logic knowledge representation house is not burglarized with burglary(james) = false. mechanism, in our case, "pure" Prolog programs. Any Therefore, a logical predicate r is a special case of a formalism designed according to this principle should Bayesian one with Dr = {true, false). An example of a be easily accessible and usable by researchers in both Bayesian definite clause inspired on (Ngo & Haddawy communities. 1997) is burglary(X) [ neighborhood(X), where Copyright (~) 2000, AmericanAssociation for Artificial In- domains are Db~gtar~= {true, false) and Dn,lghbou.hood = telligence (www.aaai.org).All rights reserved. { bad, average, good). Roughlyspeaking, a Bayesian def- 29 inite clause specifies that for each substitution 8 that of the definition given in (Ngo & Haddawy1997) 3. As grounds the clause the random variable A0 depends on an example we consider the combining rule max. The A10,... ,An~. For instance, let 8 = {X +-- james}, functional formulation is then the random variable burglary(james) depends on neighbourhood(james). P(A [ UT=lA~1,... ,A~n,) As for Bayesian nets there is a table of condi- m:lax{P(AI Ail,... ,Ain,)} tional probabilities associated to each Bayesian definite clause.1 It is remarkable that a combining rule has full knowl- edge about the input, i.e., it knowsall the appearing neighbourhood(X burglary(X) burglary(X) ground atoms or rather random variables and the asso- true /alse bad 0.6 0.4 ciated domains of the random variables. average 0.4 0.6 We assume that for each Bayesian predicate there is good 0.3 0.7 a corresponding combining rule and that the combined CPTstill specifies a conditional probability distribu- The CPT specifies our knowledge about the condi- tion. From a practical perspective, the combining rules tional probability distribution 2 P(AOI A10,... ,An0) used in Bayesian logic programs will be those commonly for every ground instance ~ of the clause. We assume employedin Bayesian nets, such as e.g. noisy-or, max. total CPTs, i.e. for each tuple of values u E DA1×.. ¯ × DAnthe CPT specifies a distribution P(DA I u). For Semantics of Bayesian logic programs this reason we write P(A I A1,... ,An) to denote the Following the principles of KBMC,each Bayesian logic CPTassociated to the Bayesian clause A I A1,... , An. program essentially specifies a propositional Bayesian For instance, the above Bayesian definite clause and net that can be queried using usual Bayesian net in- CPT together imply that IP(burglary(james) = true ference engines. This view implicitly assumes that all neighbourhood(james) = bad) 0. 6. Ea ch Ba yesian knowledge about the domain of discourse is encoded in predicate is defined by a set of definite Bayesianclauses, the Bayesian logic program (e.g. the persons belong- e.g. ing to a family). If the domain of discourse changes alarm(X)I burglary(X). (e.g. the family under consideration), then part of the alarm(X) a tornado(X). Bayesian logic program has to be changed. Usually, these modifications will only concern ground facts (e.g. If a ground atom A is influenced directly (see be- the Bayesian predicates "person", "parent" and "sex"). low) only by the ground atoms of the body of one The structure of the corresponding Bayesian net fol- ground instance of one clause, then the associated lows from the semantics of the logic program, whereas CPTspecified a conditional probability distribution the quantitative aspects are encoded in the CPTs and over A given the atoms of the body. But, if there combining rules. are more than one different ground instances of rules The set of random variables specified by a Bayesian which all have A as head, we have multiple condi- logic program is the least Herbrand model of the pro- tional probability distribution over A - in particular gram4. The least Herbrand model LH(L) of a definite this is the normal situation if a Bayesian atom is de- clause program L contains the set of all ground atoms fined by several clauses. E.g., given the clauses for that are logically entailed by the program5, it repre- alarm, the random variable alarm(james) depends on sents the intended meaning of the program. By varying both burglary(james) and tornado(james). However, the evidence (some of the ground facts) one also mod- the CPT for alarm do not specify P(alarm(james) ifies the set of random variables. Inference for logic burglary(james), tornado(james). The standard solu- programs has been well-studied (see e.g. (Lloyd 1989)) tion to obtain the desired probability distribution from and various methods exist to answer queries or to com- the given CPTs is to use a so called combining rule. pute the least Herbrand model. All of these methods Theoretically speaking, a combination rule is any al- can essentially be adapted to our context. Here, we as- gorithm which maps every finite set of CPTs (P(A sume that the computation of LH(L) relies on the use Ail,...,A~n,) I 1 < i < m, ni >_ 0) over ground 3It differs mainly in the restriction of the input set to atoms onto one CP-T, called combined CPT, P(A m be finite. Wemake this assumption in order to keep things B1,... ,Bn) with {B1,... ,Bn} C_ ~Ji=l Ail,... ,Ain,. simple. The output is empty iff the input is empty. Our defi- 4Formallyit is the least Herbrandmodel of the logical nition of a combining rule is basically a reformulation program L’, which one gets from L by omitting the asso- ciated CPTsand combination rules as well as interpreting 1In the examples,we use a naive representation as a ta- all predicates as classical, logical predicates. For the benefit ble, because it is the simplest representation. Westress, of greater readability, in the sequel we do not distinguish however, that other representations are possible and known between’.