Journal of Artificial Intelligence Research 28 (2007) 1-48 Submitted 4/06; published 1/07
Cutset Sampling for Bayesian Networks
Bozhena Bidyuk [email protected] Rina Dechter [email protected] School of Information and Computer Science University Of California Irvine Irvine, CA 92697-3425
Abstract The paper presents a new sampling methodology for Bayesian networks that samples only a subset of variables and applies exact inference to the rest. Cutset sampling is a network structure-exploiting application of the Rao-Blackwellisation principle to sampling in Bayesian networks. It improves convergence by exploiting memory-based inference algo- rithms. It can also be viewed as an anytime approximation of the exact cutset-conditioning algorithm developed by Pearl. Cutset sampling can be implemented efficiently when the sampled variables constitute a loop-cutset of the Bayesian network and, more generally, when the induced width of the network’s graph conditioned on the observed sampled vari- ables is bounded by a constant w. We demonstrate empirically the benefit of this scheme on a range of benchmarks.
1. Introduction Sampling is a common method for approximate inference in Bayesian networks. When exact algorithms are impractical due to prohibitive time and memory demands, it is often the only feasible approach that offers performance guarantees. Given a Bayesian network (t) over the variables X = {X1, ..., Xn}, evidence e, and a set of samples {x } from P (X|e), an estimate fˆ(X) of the expected value of function f(X) can be obtained from the generated samples via the ergodic average: 1 E[f(X)|e] ≈ fˆ(X)= f(x(t)) , (1) T t where T is the number of samples. fˆ(X) can be shown to converge to the exact value as T increases. The central query of interest over Bayesian networks is computing the posterior marginals P (xi|e)foreachvaluexi of variable Xi, also called belief updating.Forthis query, f(X)equalsaδ-function, and the above equation reduces to counting the fraction of occurrences of Xi = xi in the samples,
1 T Pˆ(x |e)= δ(x |x(t)) , (2) i T i t=1 δ x |x(t) x x(t) δ x |x(t) where ( i )=1 iff i = i and ( i )=0 otherwise. Alternatively, a mixture estima- tor can be used, 1 T Pˆ(x |e)] = P (x |x(t)) , (3) i T i −i t=1