Journal of Articial Intelligence Research Submitted published

LogarithmicTime Up dates and Queries

in Probabilistic Networks

Arthur L Delcher delchercsloyolaedu

Computer Science Department Loyola Col lege in Maryland

Baltimore MD

Adam J Grove groveresearchnjneccom

NEC Research Institute

Princeton NJ

Simon Kasif kasifcsjhuedu

Department of Computer Science Johns Hopkins University

Baltimore MD

Judea Pearl pearllanaicsuclae du

Department of Computer Science University of California

Los Angeles CA

Abstract

Traditional commonly supp ort ecient query and up date pro cedures that

op erate in time which is sublinear in the size of the Our goal in this pap er is

to take a rst step toward dynamic reasoning in probabilistic databases with comparable

eciency We prop ose a dynamic data structure that supp orts ecient algorithms for

up dating and querying singly connected Bayesian networks In the conventional algorithm

new evidence is absorb ed in time O and queries are pro cessed in time O N where N

is the size of the network We prop ose an algorithm which after a prepro cessing phase

allows us to answer queries in time O log N at the exp ense of O log N time p er evidence

absorption The usefulness of sublinear pro cessing time manifests itself in applications

requiring near realtime resp onse over large probabilistic databases We briey discuss a

p otential application of dynamic probabilistic reasoning in computational biology

Introduction

Probabilistic Bayesian networks are an increasingly p opular mo deling technique that has

b een used successfully in numerous applications of intelligent systems such as realtime plan

ning and navigation mo delbased diagnosis information retrieval classication Bayesian

forecasting natural language pro cessing computer vision medical informatics and compu

tational biology Probabilistic networks allow the user to describ e the environment using

a probabilistic database that consists of a large number of random variables each corre

sp onding to an imp ortant parameter in the environment Some random variables could in

fact b e hidden and may corresp ond to some unknown parameters causes that inuence

the observable variables Probabilistic networks are quite general and can store information

such as the probability of failure of a particular comp onent in a computer system the prob

c

AI Access Foundation and Morgan Kaufmann Publishers All rights reserved

Delcher Grove Kasif Pearl

ability of page i in a computer cache b eing requested in the near future the probability

of a do cument b eing relevant to a particular query or the probability of an aminoacid

subsequence in a protein chain folding into an alphahelix conformation

The applications we have in mind include networks that are dynamically maintained to

keep track of a probabilistic mo del of a changing system For instance consider the task of

automated detection of p owerplant failures We might rep eat a cycle that consists of the

following sequence of op erations First we p erform sensing op erations These op erations

cause up dates to b e p erformed to sp ecic variables in the probabilistic database Based on

this evidence we estimate query the probability of failure in certain sites More precisely

we query the probability distribution of the random variables that measure the probability

of failure in these sites based on the evidence Since the plant requires constant monitoring

we must rep eat the cycle of senseevaluate on a frequent basis

A conventional nonprobabilistic database tracking the plants state would not b e

appropriate here b ecause it is not p ossible to directly observe whether a failure is ab out

to o ccur On the other hand a probabilistic database based on a Bayesian network

will only b e useful if the op erationsup date and querycan b e p erformed very quickly

Because realtime or near realtime is so often necessary the question of doing extremely

fast reasoning in probabilistic networks is imp ortant

Traditional nonprobabilistic databases supp ort ecient query and up date pro cedures

that often op erate in time which is sublinear in the size of the database eg using bi

nary search Our goal in this pap er is to take a step toward systems that can p erform

dynamic probabilistic reasoning such as what is the probability of an event given a set of

observations in time which is sublinear in the size of the probabilistic network Typically

sublinear p erformance in complex networks is attained by using parallelism This pap er

relies on prepro cessing

Sp ecically we describ e new algorithms for p erforming queries and up dates in b elief

networks in the form of trees causal trees p olytrees and join trees We dene two natural

database op erations on probabilistic networks

UpdateNode Perform sensory input mo dify the evidence at a leaf no de single

variable in the network and absorb this evidence into the network

QueryNode Obtain the marginal probability distribution over the values of an

arbitrary no de single variable in the network

The standard algorithms introduced by Pearl can p erform the QueryNode op er

ation in O time although evidence absorption ie the UpdateNode op eration takes

O N time where N is the size of the network Alternatively one can assume that the

UpdateNode op eration takes O time by simply recording the change and the Query

Node op eration takes O N time evaluating the entire network

In this pap er we describ e an approach to p erform b oth queries and up dates in O log N

time This can b e very signicant in some systems since we improve the ability of a system to

resp ond after a change has b een encountered from O N time to O log N Our approach is

based on prepro cessing the network using a form of no de absorption in a carefully structured

way to create a hierarchy of abstractions of the network Previous uses of no de absorption

techniques were rep orted by Peot and Shachter

Queries Updates in Probabilistic Networks

We note that measuring complexity only in terms of the size of the network N can

overlook some imp ortant factors Supp ose that each variable in the network has domain

size k or less For many purp oses k can b e considered constant Nevertheless some of the

algorithms we consider have a slowdown which is some p ower of k which can b e b ecome

signicant in practice unless N is very large Thus we will b e careful to state this slowdown

where it exists

Section considers the case of causal trees ie singly connected networks in which each

no de has at most one parent The standard algorithm see Pearl must use O k N

time for either up dates or for retrieval although one of these op erations can b e done in

O time As we discuss briey in Section there is also a straightforward variant on

this algorithm that takes O k D time for b oth queries and up dates where D is the height

of the tree

We then present an algorithm that takes O k log N time for up dates and O k log N

time for queries in any causal tree This can of course represent a tremendous sp eedup

esp ecially for large networks Our algorithm b egins with a p olynomialtime prepro cessing

step linear in the size of the network constructing another data structure which is not

itself a probabilistic tree that supp orts fast queries and up dates The techniques we use are

motivated by earlier algorithms for dynamic arithmetic trees and involve caching su

cient intermediate computations during the up date phase so that querying is also relatively

easy We note however that there are substantial and interesting dierences b etween the

algorithm for probabilistic networks and those for arithmetic trees In particular as will b e

apparent later computation in probabilistic trees requires b oth b ottomup and topdown

pro cessing whereas arithmetic trees need only the former Perhaps even more interest

ing is that the relevant probabilistic op erations have a dierent algebraic structure than

arithmetic op erations for instance they lack distributivity

Bayesian trees have many applications in the literature including classication For

instance one of the most p opular metho ds for classication is the Bayes classier that

makes indep endence assumption on the features that are used to p erform classication

Duda Hart Rachlin Kasif Salzb erg Aha Probabilistic trees have

b een used in computer vision HelOr Werman Chelb erg signal pro cessing

Wilsky game playing Delcher Kasif and statistical mechanics Berger

Ye Nevertheless causal trees are fairly limited for mo deling purp oses However

similar structures called join trees arise in the course of one of the standard algorithms for

computing with arbitrary Bayesian networks see Lauritzen and Spiegelhalter Thus

our algorithm for join trees has p otential relevance to many networks that are not trees

Because join trees have some sp ecial structure they allow some optimization of the basic

causaltree algorithm We elab orate on this in Section

In Section we consider the case of arbitrary p olytrees We give an O log N algo

rithm for up dates and queries which involves transforming the p olytree to a join tree and

then using the results of Sections and The join tree of a p olytree has a particularly

p

simple form giving an algorithm in which up dates take O k log N time and queries

p

O k log N where p is the maximum number of parents of any no de Although the

p

constant app ears large it must b e noted that the original p olytree takes O k N space

merely to represent if conditional probability tables are given as explicit matrices

Delcher Grove Kasif Pearl

U

M M

V jU X jU

R

V X

M M

Y jX Z jX

R

Y Z

Figure A segment of a causal tree

Finally we discuss a sp ecic mo delling application in computational biology where prob

abilistic mo dels are used to describ e analyze and predict the functional b ehavior of biolog

ical sequences such as protein chains or DNA sequences see Delcher Kasif Goldb erg and

Hsu for references Much of the information in computational biology databases is

noisy However a number of successful attempts to build probabilistic mo dels have b een

made In this case we use a probabilistic tree of depth that consists of no des and all

the matrices of conditional probabilities are The tree is used to mo del the dep endence

of a proteins secondary structure on its chemical structure The detailed description of the

problem and exp erimental results are given by Delcher et al For this problem we

obtain an eective sp eedup of ab out a factor of to p erform an up date as compared to the

standard algorithm Clearly getting an order of magnitude improvement in the resp onse

time of a probabilistic realtime system could b e of tremendous imp ortance in future use of

such systems

Causal Trees

A probabilistic causal tree is a directed tree in which each no de represents a discrete random

variable X and each directed edge is annotated by a matrix of conditional probabilities

M asso ciated with edge X Y That is if x is a p ossible value of X and y of Y

Y jX

then the x y th comp onent of M is Pr Y y jX x Such a tree represents a joint

Y jX

probability distribution over the pro duct space of all variables for detailed denitions and

discussion see Pearl Briey the idea is that we consider the pro duct over all no des

of the conditional probability of the no de given its parents For example in Figure the

implied distribution is

Pr U u V v X x Y y Z z

Pr U u Pr V v jU u Pr X xjU u Pr Y y jX x Pr Z z jX x

Given particular values of u v x y z the conditional probabilities can b e read from the

appropriate matrices M One advantage of such a pro duct representation is that it is very

Queries Updates in Probabilistic Networks

concise In this example we need four matrices and the unconditional probability over U

but the size of each is at most the square of the largest variables domain size In contrast

a general distribution over N variables requires an exp onential in N representation

Of course not every distribution can b e represented as a causal tree But it turns out

that the pro duct decomp osition implied by the tree corresp onds to a particular pattern

of conditional indep endencie s which often hold if p erhaps only approximately in real

applications Intuitively sp eaking in Figure some of these implied indep endenci es are

that the conditional probability of U given V X Y and Z dep ends only on values of V and

X and the probability of Y given U V X and Z dep ends only on X Indep endenci es of

this sort can arise for many reasons for instance from a causal mo deling of the interactions

b etween the variables We refer the reader to Pearl for details related to the mo deling

of indep endence assumptions using graphs

In the following we make several assumptions that signicantly simplify the presenta

tion but do not sacrice generality First we assume that each variable ranges over the

same constant number of values k It follows that the marginal probability distribution

for each variable can b e viewed as a k dimensional vector and each conditional probability

matrix such as M is a square k k matrix A common case is that of binary random

Y jX

variables k the distribution over the values TRUE FALSE is then p p for

some probability p

The next assumption is that the tree is binary and complete so that each no de has

or children Any tree can b e converted into this form by at most doubling the number

of no des For instance supp ose no de p has children c c c in the original tree We can

create another copy of p p and rearrange the tree such that the two children of p are

c and p and the two children of p are c and c We can constrain p always to have the

same value as p simply by choosing the identity matrix for the conditional probability

b etween p and p Then the distribution represented by the new tree is eectively the same

as the original Similarly we can always add dummy leaf no des if necessary to ensure a

no de has two children As explained in the introduction we are interested in pro cesses in

which certain variables values are observed up on which we wish to condition Our nal

assumption is that these observed evidence no des are all leaves of the tree Again b ecause

it is p ossible to copy no des and to add dummy no des this is not restrictive

The pro duct distribution alluded to ab ove corresp onds to the distribution over variables

prior to any observations In practice we are more interested in the conditional distribution

which is simply the result of conditioning on all the observed evidence which by the earlier

assumption corresp onds to seeing values for all the leaf no des Thus for each nonleaf no de

X we are interested in the conditional marginal probability over X ie the k dimensional

vector

Bel X Pr X j all evidence values

The main algorithmic problem is to compute Bel X for each nonevidence no de X

in the tree given the current evidence It is well known that the probability vector Bel X

can b e computed in linear time in the size of the tree by a p opular algorithm based on

This assumption is nonrestrictive b ecause we can add dummy values to each variables range which

should b e given conditional probability Nevertheless there may some computational advantage in

allowing dierent variable domain sizes The changes required to p ermit this are not dicult but since

they complicate the presentation somewhat we omit them

Delcher Grove Kasif Pearl

the following equation

Bel X Pr X j all evidence X X

Here is a normalizing constant X is the probability of all the evidence in the subtree

b elow no de X given X and X is the probability of X given all evidence in the rest of the

tree To interpret this equation note that if X x x x and Y y y y

k k

are two vectors we dene to b e the op eration of comp onentwise pro duct pairwise or

dyadic pro duct of vectors

X Y x y x y x y

k k

The usefulness of X and X derives from the fact that they can b e computed recur

sively as follows

If X is the ro ot no de X is the prior probability of X

If X is a leaf no de X is a vector with in the ith p osition where the ith value

has b een observed and elsewhere If no value for X has b een observed then X

is a vector consisting of all s

Otherwise if as shown in Figure the children of no de X are Y and Z its sibling

is V and its parent is U we have

X M Y M Z

Y jX Z jX

T

U M V X M

V jU

X jU

Our presentation of this technique follows that of Pearl However we use a

somewhat dierent notation in that we dont describ e messages sent to parents or succes

sors but rather discuss the direct relations among the and vectors in terms of simple

algebraic equations We will take advantage of algebraic prop erties of these equations in

our development

It is very easy to see that the equations ab ove can b e evaluated in time prop ortional to

the size of the network The formal pro of is given by Pearl

Theorem The b elief distribution of every variable that is the marginal probability

distribution for each variable given the evidence in a causal tree can b e evaluated in

O k N time where N is the size of the tree The factor k is due to the multiplication

of a matrix by a vector that must b e p erformed at each no de

This theorem shows that it is p ossible to p erform evidence absorption in O N time and

queries in constant time ie by retrieving the previously computed values from a lo okup

table In the next sections we will show how to p erform b oth queries and up dates in

worstcase O log N time Intuitively we will not recompute all the marginal distributions

after an up date but rather make only a small number of changes sucient however to

compute the value of any variable with only a logarithmic delay

Or we can set to all comp onents corresp onding to possible valuesthis is esp ecially useful when the

observed variable is part of a jointtree clique Section In general X should b e thought of as the

likelihood vector over X given our observations ab out X

Queries Updates in Probabilistic Networks

A Simple Prepro cessing Approach

To obtain intuition ab out the new approach we b egin with a very simple observation

Consider a causal tree T of depth D For each no de X in the tree we initially compute its

X vector vectors are left uncomputed Given an up date to a no de Y we calculate the

revised X vectors for all no des X that are ancestors of Y in the tree This clearly can b e

done in time prop ortional to the depth of the tree ie O D The rest of the information

in the tree remains unchanged Now consider a QueryNode op eration for some no de V

in the tree We obviously already have the accurate V vector for every no de in the tree

including V However in order to compute its V vector we need to compute only the

Y vectors for all the no des ab ove V in the tree and multiply these by the appropriate

vectors that are kept current This means that to compute the accurate V vector we

need to p erform O D work as well Thus in this approach we dont p erform the complete

up date to every X and X vector in the tree

Lemma UpdateNode and QueryNode op erations in a causal tree T can b e p er

formed in O k D time where D is the depth of the tree

This implies that if the tree is balanced b oth op erations can b e done in O log N

time However in some imp ortant applications the trees are not balanced eg mo dels of

temp oral sequences Delcher et al The obvious question therefore is Given a causal

tree T can we pro duce an equivalent balanced tree T While the answer to this question

app ears to b e dicult it is p ossible to use a more sophisticated approach to pro duce a data

structure which is not a causal tree to pro cess queries and up dates in O log N time This

approach is describ ed in the subsequent sections

A Dynamic Data Structure For Causal Trees

The data structure that will allow ecient incremental pro cessing of a probabilistic tree T

T will b e a sequence of trees T T T T T Each T will b e a contracted

i log N i

version of T whose no des are a subset of those in T In particular T will contain ab out

i i i

half as many leaves as its predecessor

We defer the details of this contraction pro cess until the next section However one key

idea is that we maintain consistency in the sense that Bel X X and X are given

the same values by all the trees in which X app ears We choose the conditional probability

matrices in the contracted trees ie all trees other than T to ensure this

Recall that the and equations have the form

X M Y M Z

Y jX Z jX

T

X M U M V

V jU

X jU

if Y and Z are children of X X is a right child of U and V is X s sibling Figure

However these equations are not in the most convenient form and the following notational

conventions will b e very helpful First let A x resp B x denote the conditional

i i

probability matrix b etween X and X s left resp right child in the tree T Note that the

i

identity of these children can dier from tree to tree b ecause some of X s original children

might b e removed by the contraction pro cess One advantage of the new notation is that

Delcher Grove Kasif Pearl

In T

In T

i

i

j

u

j

u

Rake e x

j j

v x

j j

z v

p p

p p

p p

j

e

z

p p

p p

p p

Figure The eect of the op eration Rake e x e must b e a leaf but z may or may not

b e a leaf

the explicit dep endence on the identity of the children is suppressed Next supp ose X s

parent in T is u Then we let C x denote either A u or B u and D x denote either

i i i i i

T T

B u or A u dep ending on whether X is the right or left child resp ectively of U It

i i

will not b e necessary to keep careful track of these corresp ondences but simply to note that

the ab ove equations b ecome

x A x y B x z

i i

x D x u C x v

i i

In the next section we describ e the prepro cessing step that creates the dynamic data

structure

Rake Op eration

The basic op eration used to contract the tree is Rake which removes b oth a leaf and its

parent from the tree The eect of this op eration on the tree is shown in Figure We

now dene the algebraic eect of this op eration on the equations asso ciated with this tree

Recall that we want to dene the conditional probability matrices in the raked tree so that

the distribution over the remaining variables is unchanged We achieve this by substituting

the equations for x and x into the equations for u z and v In the following

it is imp ortant to note that u z and v are unaected by the rake op eration

In the following let Diag denote the diagonal matrix whose diagonal entries are the

comp onents of the vector We derive the algebraic eect of the rake op eration as follows

u A u v B u x

i i

A u v B u A x e B x z

i i i i

A u v B u Diag B x z

i i i

A xe

i

A u v B u Diag B x z

i i i

A xe

i

A u v B u z

i i

where A u A u and B u B u Diag B x Of course the case

i i i i i

A xe

i

where the leaf b eing raked is a right child generates analogous equations Thus by dening

Throughout we assume that  has lower precedence than matrix multiplicatio n indicated by 

Queries Updates in Probabilistic Networks

A u and B u in this way we ensure that all values in the raked tree are identical

i i

to the corresp onding values in the original tree This is not yet enough b ecause we must

check that values are similarly preserved The only two values that could p ossibly change

are z and v so we check them b oth For the former we must have

z D z x C z e

i i

D z u C z v

i i

After substituting for x and some algebraic manipulation we see that this is assured if

C z C x and D z D z Diag D x However recall that by deni

i i i i i

C z e

i

tion C z A u and C x A u and so C z C x follows Furthermore

i i i i i i

T

D z B u

i i

T

B u Diag B x

i i

A xe

i

T T

B x Diag B u

i i

A xe

i

D z Diag D x

i i

C z e

i

as required

For v it is necessary to verify that

v D v u C v x

i i

D v u C v z

i i

T

By substituting for x this can b e shown to b e true if D v D v A u

i i i

T

A u and C v C v Diag B x B u But these identities follow

i i i i i

A xe

i

by denition so we are done

Beginning with the given tree T T each successive tree is constructed by p erforming

a sequence of rakes so as to rake away ab out half of the remaining evidence no des More

sp ecically let Contract b e the op eration in which we apply the Rake op eration to every

other leaf of a causal tree in lefttoright order excluding the leftmost and the rightmost

leaf Let fT g b e the set of causal trees constructed so that T is the causal tree generated

i i

from T by a single application of Contract The following result is proved using an easy

i

inductive argument

Theorem Let T b e a causal tree of size N Then the number of leaves in T is equal

i

to half the leaves in T not counting the two extreme leaves so that starting with T

i

after O log N applications of Contract we pro duce a threeno de tree the ro ot the

leftmost leaf and the rightmost leaf

Below are a few observations ab out this pro cess

The complexity of Contract is linear in the size of the tree Additionally log N ap

plications of Contract reduce the set of tree equations to a single equation involving

the ro ot in O N total time

The total space to store all the sets of equations asso ciated with fT g is ab out

i ilog N

twice the space required to store the equations for T

Delcher Grove Kasif Pearl

With each equation in T we also store equations that describ e the relationship

i

b etween the conditional probability matrices in T to the matrices in T Notice

i i

that even though T is pro duced from T by a series of rake op erations each matrix

i i

in T dep ends directly on matrices present in T This would not b e the case if we

i i

attempted to simultaneously rake adjacent children

We regard these equations as part of T So formally sp eaking fT g are causal trees

i i

augmented with some auxiliary equations Each of the contracted trees describ es a

probability distribution on a subset of the rst set of variables that is consistent with

the original distribution

We note that the ideas b ehind the Rake op eration were originally developed by Miller

and Reif in the context of parallel computation of b ottomup arithmetic expression

trees Kosara ju Delcher Karp Ramachandran In contrast we are using

it in the context of incremental up date and query op erations in sequential computing A

similar data structure to ours was indep endently prop osed by Frederickson in the

context of dynamic arithmetic expression trees and a dierent approach for incremental

computing on arithmetic trees was developed by Cohen and Tamassia There are

imp ortant and interesting dierences b etween the arithmetic expressiontree case and our

own For arithmetic expressions all computation is done b ottomup However in probabilis

tic networks messages must b e passed topdown Furthermore in arithmetic expressions

when two algebraic op erations are allowed we typically require the distributivity of one

op eration over the other but the analogous prop erty do es not hold for us In these re

sp ects our approach is a substantial generalization of the previous work while remaining

conceptually simple and practical

Example A Chain

To obtain an intuition ab out the algorithms we sketch how to generate and utilize the

T i log N and their equations to p erform value queries and up dates in O log N

i

time on an N L no de chain of length L Consider the chain of length in Figure

and the trees that are generated by rep eated application of Contract to the chain

The equations that corresp ond to the contracted trees in the gure are as follows ig

noring trivial equations Recall that A x is the matrix asso ciated with the left edge of

i j

random variable x in T

j i

x A x e B x x

x A x e B x x

for T

x A x e B x x

x A x e B x e

x A x e B x x

x A x e B x e

where

for T

B x B x Diag B x

A x e

B x B x Diag B x

A x e

Queries Updates in Probabilistic Networks

m m m m m

x x x x e

T

m m m m

e e e e

m m m

x x e

T

m m

e e

m m

x e

T

m

e

Figure A simple chain example

x A x e B x e

where

for T

B x B x Diag B x

A x e

We have not listed the A matrices b ecause in this example they are constant Now

consider a query op eration on x Rather than p erforming the standard computation we

will nd the level where x was raked Since this o ccurred on level we obtain the

equation

x A x e B x x

Thus we must compute x and to do this we nd where x is raked That happ ened

on level However on that level the equation asso ciated with x is

x A x e B x e

That means that we need not follow down the chain In general for a chain of N no des we

can answer any query to a no de on the chain by evaluating log N equations instead of N

equations

Now consider an up date for e Since e was raked immediately we rst mo dify the

equation

B x B x Diag B x

A x e

on the rst level where e o ccurs on the righthand side Since B x is aected by the

change to e we subsequently mo dify the equation

B x B x Diag B x

A x e

Delcher Grove Kasif Pearl

on the second level In general we clearly need to up date at most log N equations ie one

p er level We now generalize this example and describ e general algorithms for queries and

up dates in causal trees

Performing Queries And Up dates Eciently

In this section we shall show how to utilize the contracted trees T i log N to

i

p erform queries and up dates in O log N time in general causal trees We shall show that a

logarithmic amount of work will b e necessary and sucient to compute enough information

in our data structure to up date and query any or value

Queries

To compute x for some no de x we can do the following We rst lo cate ind x which is

dened to b e the highest level i such that x app ears in T The equation for x is of the

i

form

x A x y B x z

i i

where y and z are the left and right children resp ectively of x in T

i

Since x do es not app ear in T it was raked at this level of equations which implies

i

that one child we assume z is a leaf We therefore only need to compute y which can

b e done recursively If instead y was the raked leaf we would compute z recursively

In either case O op erations are done in addition to one recursive call which is to a

value at a higher level of equations Since there are O log N levels and the only op erations

are matrix by vector multiplications the pro cedure takes O k log N time The function

Query x is given in Figure

Up dates

We now describ e how the up date op erations can mo dify enough information in the data

structure to allow us to query the vectors and vectors eciently Most imp ortantly the

reader should note that the up date op eration do es not try to maintain the correct and

values It is sucient to ensure that for all i and x the matrices A x and B x and

i i

thus also C x and D x are always up to date

i i

When we up date the value of an evidence no de we are simply changing the value of

some leaf e At each level of equations the value of e can app ear at most twice once

in the equation of es parent and once in the equation of es sibling in T When e

i

disapp ears say at level i its value is incorp orated into one of the constant matrices A u

i

or B u where u is the grandparent of e in T This constant matrix in turn aects

i i

exactly one constant matrix in the next higher level and so on Since the eect at each

level can b e computed in O k time due to matrix multiplication and there are O log N

levels of equations the up date can b e accomplished in O k log N time The constant k

is actually p essimistic b ecause faster matrix multiplication algorithms exist

The up date pro cedure is given in Figure Update is initially called as UpdateE

e i where E is a leaf i the level at which it was raked and e is the new evidence This

op eration will start a sequence of O log N calls to function Update X Term i as

the change will propagate to log N equations

Queries Updates in Probabilistic Networks

FUNCTION Query x

We lo ok up the equation asso ciated with x in T

ind x

Case x is a leaf Then the equation is of the form x e where e is known In

this case we return e

Case The equation asso ciated with x is of the form

x A x y B x z

i i

where z is a leaf and therefore z is known In this case we return

A X Query y B X z

i i

The case where y is the leaf is analogous

Figure Function to compute the value of a no de

Queries

It is relatively easy to use a similar recursive pro cedure to p erform x queries Unfor

tunately this approach yields an O log N time algorithm if we simply use recursion to

calculate terms and calculate terms using our earlier pro cedure This is b ecause there

will b e O log N recursive calls to calculate values but each is dened by an equation

that also involves a term taking O log N time to compute

To achieve O log N time we shall instead implement x queries by dening a pro ce

dure Calc x i which returns a triple of vectors hP L Ri such that P x L y

and R z where y and z are the left and right children resp ectively of x in T

i

To compute x for some no de x we can do the following Let i ind x The equation

for x in T is of the form

i

x D x u C x v

i i

where u is the parent of x in T and v its sibling We then call pro cedure Calc u i

i

which will return the triple h u v xi from which we immediately can compute x

using the ab ove equation

Pro cedure Calc x i can b e implemented in the following fashion

Case If T is a no de tree with x as its ro ot then b oth children of x are leaves hence

i

their values are known and x is a given sequence of prior probabilities for x

Case If x do es not app ear in T then one of xs children is a leaf say e which is raked

i

at level i Let z b e the other child We call Calc u i where u is the parent of

x in T and receive back h u z v i or h u v z i according to whether x

i

Delcher Grove Kasif Pearl

FUNCTION Update Term Value i

Find the at most one equation in T dening some A or B in which Term

i i i

app ears on the righthand side let Term b e the matrix dened by this equation

ie its lefthand side

Up date Term let Value b e the new value

Call Update Term Value i recursively

Figure The up date pro cedure

was a left or right child of u in T and v is us other child We can now compute x

i

from u and v and we have e and z so we can return the necessary triple

Sp ecically

D x u A u v

i i

x

D x u B u v

i i

where the choice dep ends on whether x is the right or left child resp ectively of u in T

i

Case If x do es app ear in T then we call Calc x i This returns the correct

i

value of x For any child z of x in T that remains a child of x in T it also returns

i i

the correct value of z If z is a child of x that do es not o ccur in T then it must b e

i

the case that z was raked at level i so that one of z s children say e is a leaf and let the

other child b e q In this situation Calc x i has returned the value of q and

we can compute

z A z e B z q

i i

and return this value

In all three cases there is a constant amount of work done in addition to a single recursive

call that uses equations at a higher level Since there are O log N levels of equations each

requiring only matrix by vector multiplication the total work done is O k log N

Extended Example

In this section we illustrate the application of our algorithms to a sp ecic example Consider

the sequence of contracted trees shown in Figure Corresp onding to these trees we have

Queries Updates in Probabilistic Networks

l l

x x

T 1 T 1

Z

Z

A

Z

A

Z

A

Z

l l l

e

x x x

9

2 2 3

c

A A

c

A A

c

A A

c

l l l

e e e

x x x

8 9 7

5 4 4

A A

A A

A A

l l l

e e e

x x x

6 7 5

7 6 6

A A A

A A

A

A A

A

l

e e e e e

x

1 4 5 1 3

8

A

A

A

e e

2 3

l l

x x

T 1 T 1

A A

A A

A A

l

e e e

x

9 1 9

4

A

A

A

e e

1 5

Figure Example of tree contraction

Delcher Grove Kasif Pearl

such equations as the following

For T

x A x  x  B x  x x D x  x  C x  x

1 0 1 2 0 1 3 2 0 2 1 0 2 3

For T

x A x  x  B x  e x D x  x  C x  e

1 1 1 2 1 1 9 2 1 2 1 1 2 9

For T

x A x  x  B x  e x D x  x  C x  e

1 2 1 4 2 1 9 4 2 4 1 2 4 9

For T

x A x  e  B x  e

1 3 1 1 3 1 9

Now consider for instance the eect of an up date for e Since it is raked immediately

the new value of e is incorp orated in

B x B x  Diag  B x

1 6 0 6 0 8

A (x )(e )

0 8 2

From subsequent Rake op erations we know that A x dep ends on B x and A x

dep ends on A x so we must also up date these values as follows

A x A x  Diag  A x

2 4 1 4 1 6

B (x )(e )

1 6 3

A x A x  Diag  A x

3 1 2 1 2 4

B (x )(e )

2 4 5

Finally consider a query for x Since x is raked together with e in T we follow

the steps outlined ab ove and generate the following calls Calc x Calc x

Calc x and Calc x This provides us with x In this case x

is particularly easy to compute since b oth x s children are leaf no des Then we simply

compute x x and then normalize giving us the conditional marginal distribution

Bel x as required

Join Trees

Perhaps the b estknown technique for computing with arbitrary ie not singlyconnected

Bayesian networks uses the idea of join trees junction trees Lauritzen Spiegelhalter

In many ways a join tree can b e thought of as a causal tree alb eit one with somewhat

sp ecial structure Thus the algorithm in the previous section can b e applied However the

structure of a join tree p ermits some optimization which we describ e in this section This

b ecomes esp ecially relevant in the next section where we use the jointree technique to

show how O log N up dates and queries can b e done for arbitrary p olytrees Our review

of jointrees and their utility is extremely brief and quite incomplete for clear exp ositions

see for instance Spiegelhalter et al and Pearl

Given any Bayesian network the rst step towards constructing a jointree is to moralize

the network insert edges b etween every pair of parents of a common no de and then treat all

Queries Updates in Probabilistic Networks

edges in the graph as b eing undirected Spiegelhalter et al The resulting undirected

graph is called the moral graph We are interested in undirected graphs that are chordal

every cycle of length or more should contain a chord ie an edge b etween two no des

that are nonadjacent in the cycle If the moral graph is not chordal it is necessary to add

edges to make it so various techniques for this triangulation stage are known for instance

see Spiegelhalter et al

If p is a probability distribution represented in a Bayesian network G V E and

M V F is the result of moralizing and then triangulating G then

M has at most jV j cliques say C C

jV j

The cliques can b e ordered so that for each i there is some j i i such that

C C C C C C

i i i

j i

The tree T formed by treating the cliques as no des and connecting each no de C to

i

its parent C is called a join tree

j i

Y

p pC jC

i

j i

i

pC jC pC jC C

i i i

j i j i

From and we see that if we direct the edges in T away from the parent cliques

the resulting directed tree is in fact a Bayesian causal tree that can represent the original

distribution p This is true no matter what the form of the original graph Of course the

price is that the cliques may b e large and so the domain size the number of p ossible values

of a clique no de can b e of exp onential size This is why this technique is not guaranteed

to b e ecient

We can use the Rake technique of Section on the directed join tree without any

mo dication However prop erty ab ove shows that the conditional probability matrices

in the join tree have a sp ecial structure We can use this to gain some eciency In the

following let k b e the domain size of the variables in G as usual Let n b e the maximum

size of cliques in the join tree without loss of generality we can assume that all cliques are

of the same size b ecause we can add dummy variables Thus the domain size of each

n

clique is K k Finally let c b e the maximum intersection size of a clique and its parent

c

ie jC C j and L k

i

j i

In the standard algorithm we would represent pC jC as a K K matrix M

i

j i C jC

i

j i

However pC jC C can b e represented as a smaller L K matrix M By

i i

j i C jC C

i i

j i

prop erty ab ove M is identical to M except that many rows are rep eated

C jC C jC C

i i i

j i j i

Thus there is a K L matrix J such that

M J M

C jC C jC C

i i i

j i j i

J is actually a simple matrix whose entries are and with exactly one p er however

we do not use this fact

A clique is a maximal completelyconnected subgraph

Delcher Grove Kasif Pearl

Our claim is that in the case of join trees the following is true First the matrices

A and B used in the Rake algorithm can b e stored in factored form as the pro duct of

i i

two matrices of dimension K L and L K resp ectively So for instance we factor A

i

l r

as A A We never need to explicitly compute or store the full matrices As we have

i i

just seen this claim is true when i b ecause the M matrices factor this way The pro of

for i uses an inductive argument which we illustrate b elow The second claim is that

when the matrices are stored in factored form all the matrix multiplications used in the

Rake algorithm are of one of the following types an L K matrix times a K L matrix

an L K matrix times a K K diagonal matrix an L L matrix times an L K

matrix or an L K matrix times a vector

To prove these claims consider for instance the equation dening B in terms of lower

i

level matrices From Section B u B u Diag B x But by assumption

i i i

A xe

i

this is

l r l l

B u B u Diag B x B x

l r

i i i i

A xA xe

i i

which using asso ciativity is clearly equivalent to

h i

l r l l

B u B u Diag B x B x

l r

i i i i

A xA xe

i i

However every multiplication in this expression is one of the forms stated earlier Identifying

r l l

u as the bracketed part of the expression proves this case and u and B u as B B

i i i

of course the case where we rake a left child so that A u is up dated is analogous

i

Thus even using the most straightforward technique for matrix multiplication the cost of

nc

up dating B is O KL O k This contrasts with O K if we do not factor the

i

matrices and may represent a worthwhile sp eedup if c is small Note that the overall time

nc

for an up date using this scheme is O k log N Queries which only involve matrix by

nc

vector multiplication require O k log N time

For many join trees the dierence b etween N and log N is unimp ortant b ecause the

clique domain size K is often enormous and dominates the complexity Indeed K and L

may b e so large that we cannot represent the required matrices explicitly Of course in such

cases our technique has little to oer But there will b e other cases in which the b enets

will b e worthwhile The most imp ortant general class in which this is so and our immediate

reason for presenting the technique for join trees is the case of p olytrees

Polytrees

A p olytree is a singly connected Bayesian network we drop the assumption of Section

that each no de has at most one parent Polytrees oer much more exibili ty than causal

trees and yet there is a wellknown pro cess that can up date and query in O N time just

as for causal trees For this reason p olytrees are an extremely p opular class of networks

We susp ect that it is p ossible to present an O log N algorithm for up dates and queries

in p olytrees as a direct extension of the ideas in Section Instead we prop ose a dierent

technique which involves converting a p olytree to its join tree and then using the ideas of

the preceding section The basis for this is the simple observation that the join tree of a

p olytree is already chordal Thus as we show in detail b elow little is lost by considering

the join tree instead of the original p olytree The sp ecic prop erty of p olytrees that we

require is the following We omit the pro of of this wellknown prop osition

Queries Updates in Probabilistic Networks

Prop osition If T is the moral graph of a p olytree P V E then T is chordal and

the set of maximal cliques in T is ffv g parents v v V g

Let p b e the maximum number of parents of any no de From the prop osition every

maximal clique in the join tree has at most p variables and so the domain size of a no de

p

in the join tree is K k This may b e large but recall that the conditional probability

matrix in the original p olytree for a variable with p parents has K entries anyway since we

must give the conditional distribution for every combination of the no des parents Thus K

is really a measure of the size of the p olytree itself

It now follows from the prop osition ab ove that we can p erform query and up date in

p olytrees in time O K log N simply by using the algorithm of Section on the directed

join tree But as noted in Section we can do b etter Recall that the savings dep end on

c the maximum size of the intersection b etween any no de and its parent in the join tree

However when the join tree is formed from a p olytree no two cliques can share more than a

single no de This follows immediately from Prop osition for if two cliques have more than

one no de in common then there must b e either two no des that share more than one parent

or else a no de and one of its parents that b oth share yet another parent Neither of these is

consistent with the network b eing a p olytree Thus in the complexity b ounds of Section

c p

we can put c It follows that we can pro cess up dates in O K k log N O k log N

p

time and queries in O k log N

Application Towards Automated SiteSp ecic MutaGenesis

An exp eriment which is commonly p erformed in biology lab oratories is a pro cedure where

a particular site in a protein is changed ie a single aminoacid is mutated and then

tested to see whether the protein settles into a dierent conformation In many cases with

overwhelming probability the protein do es not change its secondary structure outside the

mutated region This pro cess is often called mutagenesis Delcher et al developed a

probabilistic mo del of a protein structure which is basically a long chain The length of the

chain varies b etween no des The no des in the network are either proteinstructure

no des PS no des or evidence no des E no des Each PS no de in the network is a discrete

random variable X that assumes values corresp onding to descriptors of secondary sequence

i

structure helix sheet or coil With each PS no de the mo del asso ciates an evidence no de

that corresp onds to an o ccurrence of a particular subsequence of amino acids at a particular

lo cation in the protein

In our mo del proteinstructure no des are nite strings over the alphab et fh e c g For

example the string hhhhhh is a string of six residues in an helical conformation while

eecc is a string of two residues in a sheet conformation followed by two residues folded as

a coil Evidence no des are no des that contain information ab out a particular region of the

protein Thus the main idea is to represent physical and statistical rules in the form of a

probabilistic network

In our rst set of exp eriments we converged on the following mo del that while clearly

biologically naive seems to match in prediction accuracy many existing approaches such as

neural networks The network lo oks like a set of PS no des connected as a chain To each

such no de we connect a single evidence no de In our exp eriments the PS no des are strings

of length two or three over the alphab et fh e c g and the evidence no des are strings of the

Delcher Grove Kasif Pearl

cc

ch hh

GS SA AT

Figure Example of causal tree mo del using pairs showing protein segment GSAT with

corresp onding secondary structure cchh

same length over the set of amino acids The following example claries our representation

Assume we have a string of amino acids GSAT We mo del the string as a network comprised

of three evidence no des GS SA AT and three PS no des The network is shown in Figure

A correct prediction will assign the values cc ch and hh to the PS no des as shown in the

gure

Now that we have a probabilistic mo del we can test the robustness of the protein or

whether small changes in the protein aect the structure of certain critical sites in the

protein In our exp eriments the probabilistic network p erforms a simulated evolution of

the protein namely the simulator rep eatedly mutates a region in the chain and then tests

whether some designated sites in the protein that are coiled into a helix are predicted to

remain in this conformation The main goal of the exp eriment was to test if stable b onds far

away from the mutated lo cation were aected Our previous results Delcher et al

supp ort the current thesis in the biology community namely that lo cal distant changes

rarely aect structure

The algorithms we presented in the previous sections of the pap er are p erfectly suited

for this type of application and are predicted to generate a factor of improvement in

eciency over the current bruteforce implementation presented by Delcher et al

where each change is propagated throughout the network

Summary

This pap er has prop osed several new algorithms that yield a substantial improvement in the

p erformance of probabilistic networks in the form of causal trees Our up dating pro cedures

absorb sucient information in the tree such that our query pro cedure can compute the

correct probability distribution of any no de given the current evidence In addition all

pro cedures execute in time O log N where N is the size of the network Our algorithms

are exp ected to generate ordersofmagnitude sp eedups for causal trees that contain long

paths not necessarily chains and for which the matrices of conditional probabilities are

relatively small We are currently exp erimenting with our approach with singly connected

networks p olytrees It is likely to b e more dicult to generalize the techniques to general

networks Since it is known that the general problem of inference in probabilistic networks is

NP hard Co op er it obviously is not p ossible to obtain p olynomialtime incremental

Queries Updates in Probabilistic Networks

solutions of the type discussed in this pap er for general probabilistic networks The other

natural op en question is extending the approach developed in this pap er to other dynamic

op erations on probabilistic networks such as addition and deletion of no des and mo difying

the matrices of conditional probabilities as a result of learning

It would also b e interesting to investigate the practical logarithmictime parallel algo

rithms for probabilistic networks on realistic parallel mo dels of computation One of the

main goals of massively parallel AI research is to pro duce networks that p erform realtime

inference over large knowledgebases very eciently ie in time prop ortional to the depth

of the network rather than the size of the network by exploiting massive parallelism Jerry

Feldman pioneered this philosophy in the context of neural architectures see Stanll and

Waltz Shastri and Feldman and Ballard To achieve this type of p er

formance in the neural network framework we typically p ostulate a parallel hardware that

asso ciates a pro cessor with each no de in a network and typically ignores communication re

quirements With careful mapping to parallel architectures one can indeed achieve ecient

parallel execution of sp ecic classes of inference op erations see Mani and Shastri

Kasif and Kasif and Delcher The techniques outlined in this pap er presented

an alternative architecture that supp orts very fast sublinear time resp onse capability on

sequential machines based on prepro cessing However our approach is obviously limited to

applications where the number of up dates and queries at any time is constant One would

naturally hop e to develop parallel computers that supp ort realtime probabilistic reasoning

for general networks

Acknowledgements

Simon Kasif s research at Johns Hopkins University was sp onsored in part by National

Science foundation under Grants No IRI IRI and IRI

References

Berger T Ye Z Entropic asp ects of random elds on trees IEEE Trans on

Information Theory

Chelb erg D M Uncertainty in interpretation of range imagery In Proc Intern

Conference on Computer Vision pp

Cohen R F Tamassia R Dynamic trees and their applications In Proceedings

of the nd ACMSIAM Symposium on Discrete Algorithms pp

Co op er G The computational complexity of probabilistic inference using bayes

b elief networks Articial Intel ligence

Delcher A Kasif S Improved decision making in game trees Recovering from

pathology In Proceedings of the National Conference on Articial Intel ligence

Delcher A L Kasif S Goldb erg H R Hsu B Probabilistic prediction of pro

tein secondary structure using causal networks In Proceedings of International

Conference on Intel ligent Systems for Computational Biology pp

Delcher Grove Kasif Pearl

Duda R Hart P Pattern Classication and Scene Analysis Wiley New York

Feldman J A Ballard D Connectionist mo dels and their prop erties Cognitive

Science

Frederickson G N A data structure for dynamically maintaining ro oted trees In

Proc th Annual Symposium on Discrete Algorithms pp

HelOr Y Werman M Absolute orientation from uncertain data A unied

approach In Proc Intern Conference on Computer Vision and Pattern Recognition

pp

Karp R M Ramachandran V Parallel algorithms for sharedmemory machines

In Van Leeuwen J Ed Handbook of Theoretical Computer Science pp

NorthHolland

Kasif S On the parallel complexity of discrete relaxation in constraint networks

Articial Intel ligence

Kasif S Delcher A Analysis of lo cal consistency in parallel constraint networks

Articial Intel ligence

Kosara ju S R Delcher A L Optimal parallel evaluation of treestructured

computations by raking In Reif J H Ed VLSI Algorithms and Architectures

Proceedings of Aegean Workshop on Computing pp Springer Verlag

LNCS

Lauritzen S Spiegelhalter D Lo cal computations with probabilities on graphical

structures and their applications to exp ert systems J Royal Statistical Soc Ser B

Mani D Shastri L Massively parallel reasoning with very large knowledge

bases Tech rep Intern Computer Science Institute

Miller G L Reif J Parallel tree contraction and its application In Proceedings

of the th IEEE Symposium on Foundations of Computer Science pp

Pearl J Probabilistic Reasoning in Intel ligent Systems Morgan Kaufmann

Peot M A Shachter R D Fusion and propagation with multiple observations

in b elief networks Articial Intel ligence

Rachlin J Kasif S Salzb erg S Aha D Towards a b etter understanding of

memorybased and bayesian classiers In Proceedings of the Eleventh International

Conference on Machine Learning pp New Brunswick NJ

Shastri L A computational mo del of tractable reasoning Taking inspiration from

cognition In Proceeding of the Intern Joint Conference on Articial Intel ligence

AAAI

Queries Updates in Probabilistic Networks

Spiegelhalter D Dawid A Lauritzen S Cowell R Bayesian analysis in exp ert

systems Statistical Science

Stanll C Waltz D Toward memorybased reasoning Communications of the

ACM

Wilsky A Multiscale representation of markov random elds IEEE Trans Signal

Processing