<<

Probabilistic Programming and Bayesian

Networks

Liem Ngo and Peter Haddawy

Department of Electrical Engineering and Computer Science

University of WisconsinMilwaukee

Milwaukee WI

fliem haddawygcsuwmedu

Abstract

We present a probabilistic framework that allows the repre

sentation of conditional While conditional probabilities are the most

commonly used metho d for representing uncertainty in probabilistic exp ert systems

they have b een largely neglected bywork in quantitative logic programming We de

ne a xp oint theory declarativesemantics and pro of pro cedure for the new class

of probabilistic logic programs Compared to other approaches to quantitative logic

programming weprovide a true probabilistic framework with p otential applications in

probabilistic exp ert systems and decision supp ort systems We also discuss the relation

ship b etween such programs and Bayesian networks thus moving toward a unication

of two ma jor approaches to automated reasoning

To appear in Proceedings of the Asian Computing Science Conference

Pathumthani Thailand December

This work was partially supp orted by NSF grant IRI

Intro duction

Reasoning under uncertainty is a topic of great imp ortance to many areas of Computer Sci

ence Of all approaches to reasoning under uncertainty theory has the strongest

theoretical foundations In the quest to extend the framwork of logic programming to

represent and reason with uncertain knowledge there havebeenseveral attempts to add

numeric representations of uncertainty to logic programming languages

Of these attempts the only one to use probabilityisthework of Ng and Sub

rahmanian In their framework a probabilistic logic program is an annotated Horn

program Atypical example clause in a probabilistic logic program taken from is

pathX Y aX Y whichsays that if the probabilitythata typ e A

connection is used lies in the interval then the reliability of the path is b etween and

As this example illustrates their framework do es not employ conditional probability

which is the most common waytoquantify degrees of inuence in probabilistic reasoning

and probabilistic exp ert systems In the authors allow clauses to b e interpreted as

ts but they consider only the consistency of such programs conditional probability statemen

and do not provide a query answering pro cedure

Bayesian networks have b ecome the most p opular metho d for representing and rea

soning with probabilistic information An extended form of Bayesian networks inuence

diagrams are widely used in decision analysis The strengths of causal relationships in

Bayesian networks and inuence diagrams are sp ecied with conditional probabilities A

prominent feature of Bayesian networks is that they allow computation of p osterior proba

bilities and p erformance of systematic sensitivity analysis which is imp ortant when the exact

probabilityvalues are hard to obtain Bayesian networks are used as the main representation

and reasoning device in probabilistic diagnostic systems and exp ert systems

Bayesian networks were originally presented as static graphical mo dels for a problem

domain the relevant random variables are identied a representing the

relationships b etween the random variables is sketched and probabilityvalues are assessed

Inference is then p erformed using the entire domain mo del even if only a p ortion is relevantto

agiven inference problem Recently the approachknown as knowledgebased mo del construc

tion has attempted to address this limitation by representing probabilistic information in

wledge base using schematic variables and indexing schemes and constructing a network a kno

mo del tailored to each sp ecic problem The constructed network is a subset of the domain

mo del represented by the collection of sentences in the knowledge base Approaches to this

area of researchhave either fo cused on practical mo del construction algorithms neglecting

formal asp ects of the problem or fo cused on formal asp ects of the knowledge base rep

resentation language without presenting practical algorithms for constructing networks

In we prop ose b oth a theoretical framework and a pro cedure for constructing

Bayesian networks from a set of conditional probabilistic sentences

The purp ose of this pap er is twofold First we prop ose an extension of logic program

ming which allows the representation of conditional probabilities and hence can b e used to

write probabilistic exp ert systems Second weinvestigate the relationship b etween prob

abilistic logic programs and Bayesian networks While Poole shows how to represent

a discrete Bayesian network in his Probabilistic Horn Ab duction framework in this pap er

we address b oth sides of this relationship First weshowhowBayesian networks can b e

represented easily and intuitively by our probabilistic logic programs Second we presenta

metho d for answering queries on the probabilistic logic programs by constructing Bayesian

networks and then prop ogating probabilities on the networks Weprovide a declarative se

mantics for probabilistic logic programs and prove that the constructed Bayesian networks

faithfully reect the declarative semantics

Syntax

C

Throughout this pap er we use Pr and sometimes Pr Pr to denote a probability distribu

P

P

tion AB with p ossible subscripts to denote atoms names with leading capital characters

to denote domain variables names with leading small characters to denote constants and

pq with p ossible subscripts to denote predicates We use a rst order language con

taining innitely manyvariable symb ols and nitely many constant function and predicate

symb ols WeuseHB to denote the Herbrand base of the language which can b e innite For

convenience we use comma instead of logical AND and semicolons to sep erate the sentences

in a list of sentences

Each predicate represents a class of similar random variables In the probability mo dels

we consider eac h can takevalues from a nite set and in each p ossible

realization of the world that variable can have one and only one value For example the

variable neighborhood of a p erson X can havevalue bad average good and in each p ossible

realization of the world one and only one of these three values can b e true the others

must b e false We capture this prop ertyby requiring that each predicate have at least one

attribute representing the value of the corresp onding random variable By convention we

take this to b e the last attribute For example the variable neighborhood of a p erson X can

b e represented byatwop osition predicate neig hbor hoodX V the rst p osition indicates

the p erson and the second indicates the typ e of that p ersons neighb orho o d bad average

or go o d We asso ciate with each predicate a value integrity constraint statement

Denition The value integrity constraint statement associated with an mary pred

icate p consists of the fol lowing rst order sentences pX X V V v

m

V v pX X v pX X v i j i j n where

n m i m j

nm are two integers v v are dierent constants cal ledthevalueconstants

n

denoting the possible values of the random variables corresponding to p X X are

m

e is universal ly quantied over the entire sentence dierent variable names and each sentenc

For convenience we use EXCLUSIV Ep v v to denote the above set of sentences

n

We use as the identity relation on HB and always assume our theories include Clarks

Equality Theory We denote by VALp the set fv v g If A is an atom of predicate

n

pwe also use VALA as equivalentto VALp If A is the ground atom pt t v

m

then valA denotes the value v and obj A denotes the random variable corresp onding to

p t t

m

We require sucha value integrity constraint for each predicate The set of all the integrity

constraints is denoted byIC

Example The value integrity constraint for the predicate neig hbor hood is

EX CLUSI V Eneig hbor hood bad av erag e g ood

f neig hbor hoodX bad neig hbor hoodX av er ag e

neig hbor hoodX bad neig hbor hoodX g ood

neig hbor hoodX av er ag e neig hbor hoodX g ood

neig hbor hoodX V V bad V av er ag e V g oodg

For a person say named John neig hbor hoodj ohn g ood means the random variable neigh

borhood of John indicated in the language by obj neig hbor hoodj ohn g ood isgo o d indi

cated in the language by valneig hbor hoodj ohn g ood goodInanypossible world one

and only one of the fol lowing atoms is true neig hbor hoodj ohn bad

neig hbor hoodjohn averageorneig hbor hoodj ohn g ood VALneig hbor hoodor

VALneig hbor hoodjohn bad is the set fbad av er ag e g oodg

Wehavetwo kinds of constants The value constants are declared byEXCLUSIVE

clauses and used as the last arguments of predicates The nonvalue constants are used for

the other predicate arguments

Denition LetAbe the ground atom pt t We dene ExtA the extension of

m

pg A to be the set fpt t vjv VAL

m

Example In the burglary example Extneig hbor hoodj ohn bad

fneig hbor hoodj ohn bad neig hbor hoodj ohn av er ag e neig hbor hoodj ohn g ood g

Let A b e an atom We dene groundA to b e the set of all ground instances of AA

set of ground atoms fA j i ng is called coherent if there do not exist any A and A

i j j



and valA valA such that j j and obj A obj A

j j j j

Denition A probabilistic sentence has the form PrA jA A where n

 n

andA are atoms The sentencecan have free variables and each free vari

i

able is universal ly quantied over its entirescope The meaning of such a sentenceisIf

PrB jB B is a ground instance of it then the conditional probability of obj B

 n 

achieving the value valB given obj B having value valB i i nis

 i i

We use anteS the antecedentofS Let S b e the sentence PrA jA A

 n

to denote the conjunction A A and consS the consequentofStodenoteA

n 

Sometimes we use anteS as the set of conjuncts

An alternative representation of the probability sentence PrA jA A is A

 n 

A A where is a value asso ciated with the entire sentence We will stick with the

n

form in the denition but mention the alternative representation to highlight the resemblance

to quantitative logic program clauses Notice that by using predicates with value attribute

and integrity constraints we can explicitly represent negative facts

IC EXCLUSIV Eneig hbor hood bad av er ag e g ood

EXCLUSIV Ebur g l ar y y es no EX C LU SI V Eal ar m y es no

EXCLUSIV Etor nado y es no

PB f Prneig hbor hoodj ohn av er ag e

Prneig hbor hoodj ohn bad Prneig hbor hoodj ohn g ood

Prbur g l ar y X yesjneig hbor hoodX av er ag e

Prbur g l ar y X yesjneig hbor hoodX good

Prbur g l ar y X yesjneig hbor hoodX bad

Pral ar mX yesjburglaryX yes Pral ar mX yesjburglaryX no

Pral ar mX yesjtor nadoX yes Pral ar mX yesjtor nadoX no g

Figure A Basic Probabilistic Logic Program

Basic Probabilistic Logic Programs

Denition A basic probabilistic logic program consists of two parts the proba

bilistic base PB is a nite set of probabilistic sentences and the set IC of value integrity

constraints for the predicates in the language

Consider the following motivating example which will b e referred to throughout the

remainder of the pap er A burglary alarm could b e triggered by a burglary or a tornado

The likeliho o d of a burglary is inuenced bythetyp e of neighb orho o d one resides in Figure

shows a p ossible basic probabilistic logic program for representing this example Wehave

the following predicates neig hbor hood bur g l ar y al ar mand tor nadoTheinterpretation

of statements in IC is similar to that of EX C LUSI V Eneig hbor hood bad av er ag e g ood

shown in a previous example

Acyclic Probabilistic Bases

In this pap er ma jor results are achieved for a class of programs characterized by acyclicity

A probabilistic base PB is called acyclic if there is a mapping or d from the set of ground

PB

instances of atoms into the set of natural numb ers such that For any ground instance

PrA jA A of some sentence in PB or d A ord A i i n

 n PB  PB i

  

If A and A are two ground atoms such that A ExtA then or d A or d A

PB PB

The expressiveness of acyclic logic programs is demonstrated in We exp ect that prob

abilistic logic programs with acyclic probabilistic bases will provetohave equal imp ortance

To the b est of our knowledge knowledge bases of conditional probabilisties containing lo ops

are considered problematic and all those considered in the literature are acyclic

Fixp oint Semantics

The RelevantAtom Set

In this section we consider the implications of the structure of basic probabilistic logic

programs ignoring the probabilityvalues asso ciated with the sentences We view the prob

abilistic sentence PrA jA A as the Horn clause A A A Our purp ose

 n  n

is to determine the set of relevant atoms implied by a program For normal logic programs

xp ointtheorycharacterizes the semantics of a program bya minimal set of literals which

is the xp oint of a transformation constructed from its syntactic structure That set consists

of ground atoms that are considered true and their negations false and ground atoms that

are considered false and their negations true Usually there are other atoms whose truth

values are undened Similarly from a basic probabilistic logic program we can obtain

sometimes partial probabilistic information ab out some ground atoms

Example Consider the fol lowing basic program

IC EX C LU SI V Ep tr ue f al se EXCLUSIV Eq bad av er ag e g ood

EX C LU SI V Er tr ue f al se EXCLUSIV Es tr ue f al se

tr ue PB f Prp

Prq goodjptr ue Prq goodjpfalse

Prr tr uejstr ue Prr tr uejr tr ue g

Using Bayes rule we can derive Prq good Prq good ptr ue

Prq g oodpfalse Prq goodjptr uePrp tr uePrq goodjpfalsePrpfalse

We know partial information about Prq bad and Prq av er ag e

because Prq bad Prq av er ag e Prq good But we do not know any

probabilistic information about r tr uerfalsestr ue and sfalse independently

Denition Given a b asic program P The xp oint op erator T is dened as a mapping

P

HB HB HB HB

from into such that for al l I T I is the smal lest set in satisfying the

P

fol lowing properties if S isaground instance of a sentence in P such that anteS is a

subset of I then consS T I if A T I then ExtA T I

P P P

The transformation T pro duces only reexive subsets of HBSuch subsets are imp ortant

P

to us b ecause when weknow partial probabilistic information ab out an atom Awe also

I of know partial probabilistic information ab out each other atom in ExtA A subset

HB is a reexive subset if A I ExtA I We consider the spaceofreexive subsets

of HB denoted by RH B

Prop osition RH B is a complete lattice wrt the normal relation T is

P

  

monotonic on RH B ie I I RH B whenever I I T I T I

P P

We dene a simple iterative pro cess for applying T

P

Denition Let range over the set of al l countable ordinals The upwardsequences fI g

and I are denedrecursively by I fg If is a limit ordinal I I If

 

I T I Final ly I T I

P  P

Example Continuing the example the upwardsequence fI g is I fg



I fptr uepfalsegI I fq badqav er ag eqgoodgI I I is the set

  

of al l ground atoms whose partial probability information can be obtainedfrom the program

The upward sequence fI g is a monotonic sequence of elements in RH B It follows by

classical results of Tarski that the upward sequence converges to the least xp oint

Theorem The upwardsequence fI g converges to lf pT I the least xpoint in RH B

P 

Furthermore if thereare no function symbols in the language then the convergenceoccurs

after a nite number of steps

We call lf pT therelevant set of atoms RAS RAS plays a similar role to wellfounded

P

partial mo dels We use RAS to formalize the concept of p ossible worlds implied bya

program Let b e a countable ordinal An macroworld of the logic program P is a

maximal coherent subset of I Ap ossible world is a maximal coherent subset of RAS

We use PW to denote the set of p ossible worlds We can see that there always exist p ossible

worlds for a program P

Example Continuing the previous example thereare two macroworlds

w fptr ueg and w fpfalseg The possible worlds and also macroworlds are



w fptr ueqgoodg w fptr ueqav er ag eg w fptr ueqbadg

  

w fpfalseqgoodg w fpfalseqav er ag eg w fpfalseqbadg

  

 

ThenW IC derives A A ExtA and Let W b e a p ossible world and A W



A A So W IC represent a coherent assignmentofvalues to the relevant random

variables

Combining Rules and Probabilistic Logic Programs

A basic probabilistic logic program will typically not b e a complete sp ecication of a probabil

ity distribution over the random variables represented by the atoms One typ e of information

whichmaybelacking is the sp ecication of the probabilityofa variable given combinations

of values of twoormorevariables which inuence it For realworld applications this typ e of

information can b e dicult to obtain For example for two diseases D and D and a symp



tom S wemay know PrS jD andPrS jD but not PrS j D D Combining rules such

 

as generalized noisyOR are commonly used to construct such combined inuences

We dene acombining rule as any algorithm that takes as input a p ossibly innite set

of ground probabilistic sentences with the same consequent

m

fPrA jA A j i mm may b e inniteg such that fA A g

 i in i i in

i i i

is coherent and pro duces as output PrA jA A where A andA are all

 n n

dierent and n is a nite integer In addition to the standard purp ose of combining rules we

also use them as one kind of default rule to augment missing causes a cause is an atom in

the antecedent In this case the antecedents of the output contain atoms not in the input

sentences The set of output causes can b e a prop er subset of the set of input causes in

which case the combining rule is p erforming a ltering and summarizing task

Example Assume two diseases D D and one symptom S which arerepresentedby



predicates d d and srespectively Also assume D D and S have values nor mal and

 

abnor mal Aprogram might contain only the fol lowing sentences

Prsabnor mal jdabnor mal Prsabnor mal jdnor mal

and Prsabnor mal jdnor mal Wecan provide combining rules to construct from

the rst and third sentences a new sentence of the form

Prsabnor mal jdabnor mal dnor mal and from the second and third another new

sentence of the form Prsabnor mal jdnor mal dnor mal where and aretwo

numbers determined by the combining rule The combining rules may also act as default

rules in augmenting the rst and second sentences to achieve



Prsabnor mal jdabnor mal dabnor mal

  

and Prsabnor mal jdnor mal dabnor mal for some values and

Denition A probabilistic logic program is a triple hIC P B CRi where hIC P Bi

isabasic probabilisitic logic program and CR is a set of combining rules We assume that

for each predicate there exists one corresponding combining rule in CR

The combining rules usually dep end on the meaning of the program In wediscuss

the combining rules for interaction b etween eects of actions and p ersistence rules in planning

problems

The Combined Relevant Probabilistic Base

With the addition of combining rules the real structure of a program changes In this section

we consider the eect of combining rules on the relationships prescrib ed by the program

Denition Given a program P let beacountable ordinal The set of relevant

probabilistic sentences RPB is denedasthesetofallground instances S of some

probabilistic sentence in PB such that al l atoms in S arein I

The RPB contains the basic relationships b etween atoms in I In the case of multi

ted bymultiple sentences we need combining rules to construct the ple inuences represen

combined probabilistic inuence

Denition Given a program P Let beacountable ordinal The combined RPB

CRPB is constructed by applying the appropriate combining rules to each maximal set

of sentences fS ji I g I maybe an innite index set in RPB which have the same

i

consequent and such that anteS is coherent

iI i

Combined RPBs play a similar role to completed logic programs We assume that

each sentence in CRPB describ es all random variables which directly inuence the random

variable in the consequent We dene a syntactic prop ertyof CRPB whichcharacterizes

the completeness of probability sp ecication

Denition An CRPB is completely quantied if

for al l ground atoms A in I there exists at least one sentenceinCRPB with A in

the consequent and

for al l ground sentences S in CRPB we have the fol lowing property Let S have the



form PrA jA A then for al l i nif valA v and v VALA v

 n i i



v there exists another ground sentenceSinCRPB such that S can beconstructedfrom

 

Sbyreplacing valA by v and by some

i

Denition says that for each ground atom A wehave a complete sp ecication of the

probability of all p ossible values valAgiven all p ossible combinations of values of the atoms

that directly inuence AIfwe think of each obj A as representing a random variable in a

Bayesian network mo del then the denition implies that we can construct a link matrix for

each random variable in the mo del

We call RPB the Relevant Probabilistic Base RPB and we call CRPB the Combined

Relevant Probabilistic Base CRPB

Example Consider our burglary example and assume that the language contains only

one nonvalue constant j ohn RASf neighb orho o djohnbad neighborhoodjohnaverage neigh

b orho o djohngo o d burglaryjohnyes burglaryjohnno alarmjohntrue

alarmjohnfalse tornadojohnyes tornadojohnnogand

RPB fPrneighb orho o djohnaverage Prneighb orho o djohnbad

Prneighb orho o djohngo o d Prburglaryjohnyesjneighb orho o djohnaverage

Prburglaryjohnyesjneighb orho o djohngo o d Prburglaryjohnyesjneighb orho o djohnbad

es Pralarmjohnyesjtornadojohnyes Pralarmjohnyesjburglaryjohny

Pralarmjohnyesjburglaryjohnno Pralarmjohnyesjtornadojohnnog

In the CRPB the sentences in RP B with al ar m as the consequent aretransformed into

sentences specifying the probability of al ar m conditionedonboth burglary and tor nadoThe

other sentences in RPB remain the same in CRPB

In conjunction with acyclicity prop erty of probabilistic bases we are interested in a class

of combining rules which is capable of transferring the acyclicity prop erty of a PB to the

corresp o ding CRPB Given a program Pwesay a combining rule in CR is selfcontained if

the generated sentence PrAjA A from the input set

n

fPrAjA A j i mm may b e inniteg

i in i

i

satises one additional prop erty



fA A g ffB B gj PrA jB B is in RPBg

n A Ext A i n i in i

i i

Selfcontainedness seems to b e a reasonable assumption on a combining rule it do es not

allow the generation of new atoms in the antecedentwhich are not related to any atom

in the extension of the consequent In order to generate a sentence with consequentAa

selfcontained combining rule may need to collect all the sentences whichhave an atom in

ExtA as consequent

Example The combining rule in the example is not selfcontainedbecause the sentence



Prsabnor mal jdabnor mal dabnor mal is constructedfrom a set of sentences

which do not contain the atom dabnor mal For this kind of diagnosis problem generalized

noisyOR rule always assume that if a disease is in the abnormal state then thereisa

probability that the symptom is abnormal that means Prsabnor mal jdabnor mal

In order to use selfcontainedcombining rules we need to write explicitly those sentences

Mo del Theory

The semantics of a probabilistic program is characterized by the probabilityweights assigned

the ground atoms That annotated approach is widely used in the related work

Because of space limitation we only showhow to assign weights to ground atoms

Details on annotated mo dels can b e found in the full pap er

Probabilistic Indep endence Assumption

In addition to the probabilitistic quantities given in the program we assume some probabilis

tic indep endence relationships sp ecied by the structure of probabilistic sentences Proba

bilistic indep endence assumptions are used in all probability mo del construction work

as the main device to construct a from lo cal conditional

probabilities UnlikePoole who assumes indep endence on the set of consistent as

sumable atoms weformulate the indep endence assumption in our framework by using the

structure of the sentences in CRPB We nd this approach more natural since the structure

of the CRPB tends to reect the causal structure of the domain and indep endencies are

naturally thought of causally

Denition Given a set P of ground probabilistic sentences let A and B be two ground



atoms We say A is inuenced by B in P if there exists a sentence S an atom A in

  

ExtA andanatomB in ExtB such that A consS and B anteS or there

exists another ground patom C such that A is inuencedbyC in P and C is inuencedby

B in P

Assumption We assume that if PrAjA A is in CRPB then for al l ground

n

atoms B which are not in ExtA and not inuenced by A in CRPB A and B areproba

bilistical ly independent given A A

n

Example Continuing the burglary example al ar mjohn yes is probabilistical ly indepen

dent of neig hbor hoodj ohn g ood and neig hbor hoodj ohn bad given bur g l ar y j ohn y es

and tor nadoj ohn no

Denition Consistent CRPB Acompletely quantied CRPB is consistent if

there isnoatomin I which is inuenced by itself in CRPB and

P



PrA jA A CRPB and for al l PrA jA A in CRPB f j

n i  n i





obj A obj A g





Possible World Semantics

In this section we allow the language to contain function symb ols There are in general

innitely many p ossible worlds innitely many macro worlds We use an approach similar

to that of Po ole by assigning weights to only certain subsets of worlds

Denition Rank of an atom Let A beaground atom in RAS We dene r ank A

the rank of A recursively by If A is not inuenced in CRPB by any atom then

r ank A otherwise r ank Asupfrank B jPrAj B is in CRPBg

Example In the burglary example rank tor nado rank neig hbor hood

r ank bur g l ar y and rank al ar m

The program with the fol lowing CRPB has an atom which cannot beassigned a nite

rank CRPB fPrq tr uejpX tr ue PrpX truejpX true g We

cannot assign any nite rank to q tr ue because rank q tr ue rankpX true X

We can see that if CRPB has no cycles then rank is a welldened mapping The following

lemma will b e useful in working with acyclic probabilistic bases

Lemma Given a program P with an acyclic probabilistic base If the combining rules are

selfcontained then the r ank function is wel lden ed

In dening the we will not consider individual p ossible world but sets of

p ossible worlds characterized by formulae of sp ecic forms

ogram P we can determine the set of al l possible worlds PW Denition Given a pr

Assume that the rank function is wel lden ed Let A beaground atom in RAS We denote

the set of al l possible worlds containing A by W A We dene the sample space to

P

be the smal lest set consisting of PW A RAS such that rank A is nite

P

W A ifW then PW W if W W arein then W W is

P P P  P 

in

P

We consider the probability functions on the sample space Let Pr b e a probability

P

function on the sample space we dene PrA A where A A are atoms in RAS

n n

n

with nite ranks as Pr W A Wetake a sentence of the form PrA jA A

i  n

i

as shorthand for PrA A A PrA A Wesay Pr satises a sentence

 n n

PrA jA A if PrA A A PrA A andPr satises CRPB

 n  n n

if it satises every sentence in CRPB

Denition A probability distribution induced by a program P is a probability

d by CRPB distribution on satisfying CRPB and the independence assumption implie

P

Example Consider the fol lowing program

IC EXCLUSIV Ep tr ue f al se EX C LU SI V Eq bad av er ag e g ood

PB f Prptrue Prp f al se

Prq goodjpT true Prq goodjpT f alse

PrpT truejpT true PrpT falsejpT true

PrpT truejpT f alse PrpT falsejpT f alse g

CR f Gener al iz ed N oisy ORg

We can imagine that p is a timedpredicate with the rst attribute indicating time The last

four sentences represent persistence rules We have PrW ptrue PrW p f al se

PrW ptrue

Theorem Given a program P if the CRPB is completely quantied and consistent then

ability distribution there exists one and only one inducedprob

The following theorem allows us to handle probability of conjunctions and disjunctions

in our framework

Theorem Givenaprogram P Any probability function on satisfying CRPB assigns

P

n m

a weight to any formula of the form A where n and m are nite integers and

ij

i j 

r ank A is nite i j

ij

Fixp oint Theory Revisited

Wenow extend the xp oint theory to include the quantatitive information giveninapro

gram Wehave constructed in a previous section the transformation T and the upward

P

sequence fI gWe asso ciate with each I a sample space and a probability distribution

Denition Given a program P we can determine the set of possible worlds PW Assume

that the rank function is wel lden edand is a nite ordinal We dene the sample space

A I such that rank A to be the smal lest set consisting of PW

P

P

W A if W then PW W if W W arein then W W is

 

P P P P

in

P

Prop osition If are two nite ordinals then

P

P P

We dene the probability functions on the sample space induced by a program P by

P

replacing RAS by I and CRPB by CRPB in the denitions of the previous section We

call the corresp onding induced probability function Pr

Theorem If are two nite ordinals and W then Pr W Pr W

P

Pr W where Pr is the probability distribution induced by P and RAS

P P

g converges to and Pr So as the upward sequence fI g converges to lf pT f

P P

P

converges to Pr Here we use a lo ose denition of convergence for any nite rst

P

order formula F of ground nite rank atoms there exists an integer n such that for all

n the set W F of p ossible worlds satisfying F is an elementof and Pr W F

P

Pr W F

P

Pro of Theory

In this section we dene a pro of theory which can b e used to derive the probabilityofa

ground atom given a program We will use G with p ossible subscripts to denote a goal

atom We will use a pro cess similar to the SLD pro of pro cedure with the only real dierence

b eing in the handling of combining rules We call this pro of pro cedure probabilistic SLD

pSLD

A query is a sentence of the form PrG G where G are atoms The query is a

n i

   

request to nd all ground instances G G of G G such that PrG G

n

n n

can b e determined from the program P and to return those probabilityvalues

Denition Suppose PrG G is a query cal led Q and S is the sentence

n

PrA jA A in the program and that the variables in Q and S are standard

 n

izedapart Let G betheselected atom in Q Assume that is the most general unier of

i

A and G Theresolvent of Q and S using mgu on G is the sentence

 i i

PrG A A G

i n i

ApSLDderivation of the initial query Q from a program P is a sequence hQ S G i

hQ S G i where i S is a renamed version of a sentence in P and Q

r r r r i i

Q and S using on the selected atom G is the resolvent of

i i i i

An pSLD refutation of the query Q is an nstep pSLD derivation of the initial query Q

such that the resolvent of Q and S using is the empty query The combined substitution

n n n

is cal led the computed answer substitution

n

The pSLD refutation tree of the query Q is the set of al l pSLD refutations of the initial

query Q

We need the concept of pSLD refutation tree b ecause b efore we can use the combining

rules to construct the sentences with an atom A as consequent in CRPB all sentences with

consequentmatching A in P need to b e collected Furthermore we need to instantiate those

sentences to ground b efore applying the combining rules

Example For the sentence Prq tr uejr X true a combining rule needs to con

sider al l ground sentences Prq tr ue jr a true Prq tr uejr a true



where a a areallconstants in the language



Denition The ground pSLD refutation tree of the query Q is the set of al l ground

pSLD refutations of the initial query QAground pSLD refutation is obtainedfrom a p

SLD refutation by rst applying the associatedcomputed answer substitution to each formula

in the derivation and nal ly instantiating it to a possible ground instance

Let Q b e the query PrG The ground pSLD refutation tree of Q contains all the

necessary ground probabilistic sentences to construct the combined sentences in CRPB whose

consequents are ground instances of G or of the selected atoms in the original refutation trees

We apply the combining rules to it

Denition Let Q be the query PrGThecombined supp orting set of Q is the

set of ground probabilistic sentences constructedfrom the ground pSLD refutation treeofQ

by the fol lowing procedure for each Aaground selected atom or ground instanceofG

appearing in the tree col lect al l ground sentences in it which have A as consequent and

apply the appropriate combining rule to construct the combined sentence

The combining rules may generate new atoms which did not o ccur in the ground refuta

tion tree as in example We need to apply the same pro cess to these new atoms

Denition Let Q be the query PrG The augmented combined supp orting

set of Q is constructed by augmenting the combinedsupporting set of Q in the fol lowing

orting set of Q for each atom A appearing recursive way starting from the combinedsupp

in that set if there is no sentence in that set with A as consequent then augment it with

the augmentedcombined supporting set of the query PrA

Example Continuing the example with the query Q Prptrue The pSLD

refutation treeofQ is the fol lowing set of pSLD refutations

f f hPrptrue Prpt truejpt tr ue ptrue ftjgi

true ftjgi hPrptrue Prpt truejpt tr ue p

hPrptrue Prptrue ptrue fgi g

f hPrptrue Prpt truejpt tr ue ptrue ftjgi

hPrptrue Prpt truejpt f al se ptrue ftjgi

h Prpfalse Prp f al se p f al se fgi g

f hPrptrue Prpt truejpt f al se ptrue ftjgi

hPrpfalse Prpt f al sejpt f al se p f al se ftjgi

p f al se fgi g hPrpfalse Prp f al se

f hPrptrue Prpt truejpt f al se ptrue ftjgi

hPrpfalse Prpt f al sejpt tr ue pfalse ftjgi

hPrptrue Prptrue ptrue fgi gg

The ground pSLD refutation tree the combinedsupporting set and the augmented support

ing set of Q are also equal to the above set

Prop osition Given a program P and an atom Gwecan construct the augmentedcom

bined supporting set PS of the query PrG If the rank function is wel lde ned

then the rankofeach atom in PS can be determined by a simple recursive procedure If

PrG PS then r ank G

else r ank Gsupfr ank AjPrGj A PSg

C

The probability of a ground atom G computed from the program P Pr G

P

can b e calculated from the augmented combined supp orting set PS of the query PrG

recursively

C

If PrG PS then return Pr G

P

C

G G else if fG G g is coherent and let G be the atom with highest rank Pr

n n i

P

P

C

f Pr G G G G A A jPrG jA A PSg

i i n m i m

P

C

G else Pr G

n

P

C

Example Continuing the example with the query Q Prptrue Pr ptrue

P

C C

Pr ptrue PrptruejptruePr ptruePrptruejp f al se

P P

C

p f al se Pr

P

Theorem Given a program P with a wel ldened rank and a ground atom GIfr ank G

is nite and CRPB is completely quantiedandc onsistent then the probability of G

C

computedfrom the program P Pr Gisequal to Pr G where Pr is the probability

P P

P

C

function induced by the logic program P the pSLD procedure wil l return the value Pr G

P

which is equal to Pr G

P

The condition that rank G b e nite can b e assured by the acyclicity prop erty of proba

bilistic logic programs Wehave soundness and completeness of pSLD for acyclic programs

Theorem Given a program P with an acyclic probabilistic base and selfcontainedcom

bining rules If CRPB is completely quantied and consistent then pSLD procedure is sound

and complete wrt nite rank ground atoms

C

As can b e seen in the recursive denition of Pr pSLD can b e easily extended to

P

evaluate the probability of a nite conjunction of atoms In fact wecanevaluate any nite

formula of the form A or A where the A are atoms of nite rank by a simple

i j ij i j ij ij

extension of pSLD

Lo cal Maximum Entropy Negation As Failure For

Probabilistic Logic Programs

Negationasfailure is used as a default rule in the SLDNF pro of pro cedure It allows us to

conclude that a ground atom A has the false if all attempts to prove A fail In

probabilistic logic programs such default rules are desirable b oth to shorten the programs

and to facilitate reasoning on incompletely sp ecied programs For these reasons wewould

like to dene a probabilistic analogue to negationasfailure

Example Consider the example It is obvious that we should be able to infer Prpfalse

Furthermore we only know that Prq av er ag ejptr ue Prq badjptr ue

and want to have some default rule to temporarily assign a probability value to each of

q badjptr ue and Prq av er ag ejptr ue Pr

A p opular principle for assigning missing probabilityvalues is the maximum entropy

principleWe prop ose the local maximum entropy rule as a form of negationasfailure for

probabilistic logic programs

Denition Given a program P and its corresponding CRPB Let A beanatominRAS

and VALA fv v g The lo cal maximum entropy rule LME can be appliedto

n

A in the fol lowing situation If PrAjA A is a sentence in CRPB such that the set

k

  

V fvalA jPrA jA A CRPB and obj A obj Ag has m mn

k

elements and the sum of al l in those sentences is then augment CRPB with the

  

fol lowing set of sentences fPrA jA A n mjA ExtA and valA

k

V g

Example Continuing the previous example the LME rule would assign to Prpfalse

av er ag ejptr ue and Prq badjptr ue and to Prq

We incorp orate LME into the pSLD pro of pro cedure and call the new pro cedure p

SLDLME by generalizing the concept of derivation and other dep endent concepts Details

are given in the full pap er

Bayesian Networks Construction

Baysian networks

Bayesian networks are nite directed acyclic graphs EachnodeinaBayesian network

represents a random variable which can b e assigned values from a xed nite set A link

represents the relationship either causal or relevance b etween random variables at b oth

ends Usually a link from random variable A to a random variable B says that A causes B

Asso ciated with each no de A is a link matrix whichcontains the conditional probabilities

of random variable A receiving sp ecic values given eachcombination of values of As par

ents The Bayesian network formalism is an ecient approach to representing probabilistic

structures and calculating p osterior probabilities of random variables given a set of evidence

In our query pro cedure we will not only nd the probabilityvalue of say a random

variable but its p osterior probabilit y after observing a set of evidence

Denition A set of evidence E is a set of atoms st g r oundE is coherent

We do that by rst constructing from the program the p ortion of the Bayesian network related

to the query On the constructed network wecanuseavailable prop ogation pro cedures to

up date the probabilities taking into account the set of evidence

Notice that in our framework an atom A represents the fact that the random variable

denoted by obj A receiving the value valAand VALA is the set of all p ossible values

of that random variable

Denition Given a program P and a set of evidenceEAcomplete ground query is

a query of the form PrG where G is an atom the last argument of G is a variable and

it is the only variable in G The meaning of such a query is nd the

distribution of obj GIfVALGfv v g then the answer to such a query is a vector

n

P

n

and is the posterior probability of obj G e wher

i i i n

i

receiving the value v

i

A complete query is a query of the form PrGwhere the last argument of G is a

variable and the other arguments may also contain variables The meaning of such a query



has an is nd al l ground instances G of G such that the complete ground query PrG

answer and return those answers



Q PROCEDURE

BEGIN

f Build the network that supp orts the evidence g

NET fg

of elements in EDO FOR i TO number

th

temp BUILDNETthe i element E of E NET

i

f Extend the network to supp ort the ground instances of the query g

SUBSS BUILDNET G NET

UPDATENETE

f Output p osterior probabilities g

FOR each in SUBSS output the probabilityvalues at no de obj G

END

Figure Query pro cessing pro cedure

Bayesian Network Construction Pro cedure

Because of space limitation we drop the details which can b e found in the full pap er In



this section we present a query answering algorithm Q pro cedure for answering complete

queries We only consider selfcontained combining rules This assumption allows us to omit

the augmentation step whichwas presented in the pSLD pro of pro cedure

Assume that we are given a program P a set of evidence E and a complete query PrG



Q pro cedure has the following two main steps build the supp orting Bayesian network

for fAg g r oundE RAS by a backward chaining pro cess similar to Prolog engine and

calculate the p osterior probabilit y using the set of evidence E and anyavailable pro cedure



Q pro cedure is more complex than SLDNF b ecause it needs to collect all relevant

sentences b efore combining rules can b e used



The pseudo co de for Q pro cedure is shown in Figure It simply makes calls to BUILD

NET function which builds the supp orting network of an atom with atoms in E and G

as successive arguments The FOR lo op constructs the supp orting network for the set of

evidence and the nal BUILDNET call augments the constructed network with supp ort

ing network of ground instances of the query UPDATE is any probability prop ogation

algorithm on Bayesian networks BUILDNET receives as input an atom whose supp ort

ing network needs to b e explored It up dates the NET whichmighthave b een partially

built The returning value of the function is the set of substitutions for input atom to get

all corresp onding ground instances in the resulting network

The soundness and completeness of Q pro cedure



In Q pro cedure we do not address the problem of termination We exp ect that the tech



niques for assuring termination of Prolog programs could b e applied to Q pro cedure



Theorem Soundness Given a program P If CRPB is completely quantied then Q

procedure is sound wrt complete queries

Theorem Soundness and Completeness Given a program P with an al lowed and



acyclic PB If CRPB is completely quantied then Q procedure is sound and complete wrt

complete ground queries and ground nite set of evidence

Related Work

In a related pap er we present a temp oral variant of our logic We describ e the ap

plication of this framework to representing probabilistic temp oral pro cesses and pro jecting

probabilistic plans

Po ole expresses an intention similar to ours there has not b een a mapping b etween

logical sp ecications of knowledge and Bayesian network representations He provides

such a mapping using probabilistic Horn ab duction theoryinwhichknowledge is repre

sented by Horn clauses and the indep endence assumption of Bayesian networks is explicitly

stated His work is develop ed along a dierent track than ours however by concentrating

on using the theory for ab duction Our approach has several advantages over Po oles We

do not imp ose as many constraints on our representation language as he do es Probabilis

tic dep endencies are directly represented in our language while in Po oles language they

are indirectly sp ecied through the use of sp ecial predicates in the rules Our probabilistic

ely app ealing since it reects the causalityofthe indep endence assumption is more intuitiv

domain

References

K R Apt and M Bezem Acyclic programs New Generation Computing pages

Sept

F Bacchus Using rstorder probability logic for the construction of Bayesian networks

In Proceedings of the Ninth ConferenceonUncertainty in Articial Intel ligence pages

July

H A Blair and V S Subrahmanian programming Theoretical

Computer Science pages

JS Breese Construction of b elief and decision networks Computational Intel ligence

FJ Diez Parameter adjustmentinbayes networks the generalized noisy orgate In

Proceedings of the Ninth ConferenceonUncertainty in Articial Intel ligen ce pages

Washington DC July

M C Fitting Bilattices and the semantics of logic programming Journal of Logic

Programming

A V Gelder K A Ross and J S Schlipf The wellfounded semantics for general

logic programs JACM pages July

RP Goldman and E Charniak A language for construction of b elief networks IEEE

Transactions on Pattern Analysis and Machine Intel ligence March

P Haddawy Generating Bayesian networks from probability logic knowledge bases

In Proceedings of the Tenth ConferenceonUncertainty in Articial Intel ligence pages

Seattle July

D Heckerman and MPWellman Bayesian networks Communications of the ACM

March

M Kifer and V S Subramahnian Theory of generalized annotated logic programs and

its applications Journal of Logic Programming pages

J W Lloyd Foundation of Logic Programming Second edition SpringerVerlag

Raymond Ng Semantics and consistency of empirical databases In Proceedings of the

International ConferenceonLogic Programming pages

Raymond Ng and V S Subrahmanian A semantical framework for supp orting sub

e databases In Proceedings of the jective and conditional probability in deductiv

International ConferenceonLogic Programming pages

Raymond Ng and V S Subrahmanian Probabilistic logic programming Information

and Computation

L Ngo PHaddawy and J Helwig A theoretical framework for contextsensitive tem

p oral probability mo del construction with application to plan pro jection In Proceedings

of the Eleventh ConferenceonUncertainty in Articial Intel ligence pages Au

gust

Liem Ngo and Peter Haddawy Plan pro jection as deduction and plan generation as

ab duction in a contextsensitive temp oral probability logic In Submitted to AIPS

J Pearl Probabilistic Reasoning in Intel ligen t Systems Networks of Plausible Inference

Morgan Kaufmann San Mateo CA

D Po ole Probabilistic horn ab duction and bayesian networks Articial Intel ligence

Novemb er

S Srinivas A generalization of the noisyor mo del In UAI pages July

van Emden M H Quantitative deduction and its xp oint theory Journal of Logic

Programming pages

MPWellman JS Breese and RP Goldman From knowledge bases to decision

mo dels The Know ledge Engineering Review