School of Computer Science
The Belief Propagation (Sum-Product) Algorithm
Probabilistic Graphical Models (10-708)
Lecture 5, Sep 31, 2007
Receptor A Receptor B X1 X2 Eric Xing X Kinase C 3 Kinase D X4 Kinase E X5
TF F X6 Reading: J-Chap 4 Gene G X 7 Gene H X8 1
From Elimination to Belief Propagation
z Recall that Induced dependency during marginalization is captured in elimination cliques
z Summation <-> elimination
z Intermediate term <-> elimination clique
B A B A A
C A A
C D A E F C D
E z Can this lead to an generic E E F inference algorithm? G H Eric Xing 2
1 Tree GMs
Undirected tree: a Directed tree: all Poly tree: can have unique path between nodes except the root multiple parents any pair of nodes have exactly one parent We will come back to this later
Eric Xing 3
Equivalence of directed and undirected trees
z Any undirected tree can be converted to a directed tree by choosing a root node and directing all edges away from it
z A directed tree and the corresponding undirected tree make the same conditional independence assertions
z Parameterizations are essentially the same.
z Undirected tree:
z Directed tree:
z Equivalence:
z Evidence:?
Eric Xing 4
2 From elimination to message passing
z Recall ELIMINATION algorithm:
z Choose an ordering Z in which query node f is the final node
z Place all potentials on an active list
z Eliminate node i by removing all potentials containing i, take sum/product over xi. z Place the resultant factor back on the list
z For a TREE graph:
z Choose query node f as the root of the tree
z View tree as a directed tree with edges pointing towards from f
z Elimination ordering based on depth-first traversal
z Elimination of each node can be considered as message-passing (or Belief Propagation) directly along tree branches, rather than on some transformed graphs Æ thus, we can use the tree itself as a data-structure to do general inference!!
Eric Xing 5
The elimination algorithm
Procedure Initialize (G, Z) Procedure Normalization (φ∗) ∗ ∗ 1. Let Z1, . . . ,Zk be an ordering of Z 1. P(X|E)=φ (X)/∑xφ (X) such that Zi ≺ Zj iff i < j 2. Initialize F with the full the set of factors
Procedure Evidence (E)
1. for each i∈ΙE , Procedure Sum-Product-Eliminate-Var ( F =F ∪δ(Ei, ei) F, // Set of factors Procedure Sum-Product-Variable- Z // Variable to be eliminated ) Elimination (F, Z, ≺) 1. F ′←{φ ∈ F : Z ∈ Scope[φ]} 1. for i = 1, . . . , k 2. F ′′ ← F − F ′ F ← Sum-Product-Eliminate-Var(F, Zi) ∗ 3. ψ ←∏φ∈F ′ φ 2. φ ← ∏φ∈F φ 4. τ ← ∑ ψ 3. return φ∗ Z 5. return F ′′ ∪ {τ} 4. Normalization (φ∗)
Eric Xing 6
3 Message passing for trees
Let mij(xi) denote the factor resulting from f eliminating variables from bellow up to i, which is a function of xi:
This is reminiscent of a message sent i from j to i.
j
k l
mij(xi) represents a "belief" of xi from xj!
Eric Xing 7
z Elimination on trees is equivalent to message passing along tree branches! f
i
j
k l
Eric Xing 8
4 The message passing protocol:
z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.
z Computing node marginals:
z Naïve approach: consider each node as the root and execute the message passing algorithm
X m21(x1) 1 Computing P(X1)
m32(x2) m42(x2) X2
X3 X4
Eric Xing 9
The message passing protocol:
z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.
z Computing node marginals:
z Naïve approach: consider each node as the root and execute the message passing algorithm
X m12(x2) 1 Computing P(X2)
m32(x2)m42(x2) X2
X3 X4
Eric Xing 10
5 The message passing protocol:
z A node can send a message to its neighbors when (and only when) it has received messages from all its other neighbors.
z Computing node marginals:
z Naïve approach: consider each node as the root and execute the message passing algorithm
X m12(x2) 1 Computing P(X3)
m23(x3) m42(x2) X2
X3 X4
Eric Xing 11
Computing node marginals
z Naïve approach:
z Complexity: NC
z N is the number of nodes
z C is the complexity of a complete message passing
z Alternative dynamic programming approach
z 2-Pass algorithm (next slide Î)
z Complexity: 2C!
Eric Xing 12
6 The message passing protocol:
z A two-pass algorithm:
X1
m21(X 1) m12(X 2)
m32(X 2) X2 m42(X 2)
X4 X3 m24(X 4) m23(X 3)
Eric Xing 13
Belief Propagation (SP-algorithm): Sequential implementation
Eric Xing 14
7 Belief Propagation (SP-algorithm): Parallel synchronous implementation
z For a node of degree d, whenever messages have arrived on any subset of d-1 node, compute the message for the remaining edge and send! z A pair of messages have been computed for each edge, one for each direction z All incoming messages are eventually computed for each node Eric Xing 15
Correctness of BP on tree
z Collollary: the synchronous implementation is "non-blocking"
z Thm: The Message Passage Guarantees obtaining all marginals in the tree
z What about non-tree?
Eric Xing 16
8 Another view of SP: Factor Graph
z Example 1
X X X f X 1 5 fa 1 d 5
X X 3 fc 3
f fe X2 X4 b X2 X4
P(X1) P(X2) P(X3|X1,X2) P(X5|X1,X3) P(X4|X2,X3)
fa(X1) fb(X2) fc(X3,X1,X2) fd(X5,X1,X3) fe(X4,X2,X3)
Eric Xing 17
Factor Graphs
z Example 2 X1 X1
fa fc
X2 X3 X2 X3 f ψ(x1,x2,x3) = fa(x1,x2)fb(x2,x3)fc(x3,x1) b
z Example 3 X1 X1
fa
X2 X3 X2 X3
ψ(x1,x2,x3) = fa(x1,x2,x3)
Eric Xing 18
9 Factor Tree
z A Factor graph is a Factor Tree if the undirected graph obtained by ignoring the distinction between variable nodes and factor nodes is an undirected tree
X1 X1
fa
X2 X3 X2 X3
ψ(x1,x2,x3) = fa(x1,x2,x3)
Eric Xing 19
Message Passing on a Factor Tree
z Two kinds of messages
1. ν: from variables to factors 2. µ: from factors to variables
x f1 j
x f xi fs i s
f3 xk
Eric Xing 20
10 Message Passing on a Factor Tree, con'd
z Message passing protocol:
z A node can send a message to a neighboring node only when it has received messages from all its other neighbors
z Marginal probability of nodes:
x f1 j
x f xi fs i s
f3 xk
P(xi) ∝∏s 2 N(i) µsi(xi)
∝νis(xi)µsi(xi)
Eric Xing 21
BP on a Factor Tree
X X3 1 X2
ν µ 1d µd2 e2 ν3e f f X X1 d X2 e 3 µ ν ν µ d1 2d 2e e3 µ µa1 ν3c c3 ν1a µb2 ν2b f fa fb c
Eric Xing 22
11 Why factor graph?
z Tree-like graphs to Factor trees
X1
X1
X 2 X2
X X3 4 X 3 X4
X5 X6
X5 X6
Eric Xing 23
Poly-trees to Factor trees
X X1 2 X 1 X2
X3 X4 X3 X4
X5
X5
Eric Xing 24
12 Why factor graph?
z Because FG turns tree-like X1
X1 graphs to factor trees,
z and trees are a data-structure X 2 X2 that guarantees correctness of
X X3 4 BP !
X3 X4
X5 X6
X5 X6
X1 X2 X 1 X2
X3 X4 X3 X4
X5
X5
Eric Xing 25
Max-product algorithm: computing MAP probabilities
f
i
j
k l
Eric Xing 26
13 Max-product algorithm: computing MAP configurations using a final bookkeeping backward pass
f
i
j
k l
Eric Xing 27
Summary
z Sum-Product algorithm computes singleton marginal probabilities on:
z Trees
z Tree-like graphs
z Poly-trees
z Maximum a posteriori configurations can be computed by replacing sum with max in the sum-product algorithm
z Extra bookkeeping required
Eric Xing 28
14 Inference on general GM
z Now, what if the GM is not a tree-like graph?
z Can we still directly run message message-passing protocol along its edges?
z For non-trees, we do not have the guarantee that message-passing will be consistent!
z Then what?
z Construct a graph data-structure from P that has a tree structure, and run message-passing on it!
Eric Xing 29
Elimination Clique
z Recall that Induced dependency during marginalization is captured in elimination cliques
z Summation <-> elimination
z Intermediate term <-> elimination clique
B A B A A
C A A
C D A E F C D
E z Can this lead to an generic E E F inference algorithm? G H Eric Xing 30
15 A Clique Tree
B A B A A m m C c b
A A md C D mf A E F me C D
E mh E E F mg G H
me (a,c,d)
= ∑ p(e | c,d)mg (e)m f (a,e) e Eric Xing 31
From Elimination to Message Passing
z Elimination ≡ message passing on a clique tree
B A B A B A B A B A B A B A B A A
C D C D C D C D C D C D C
E F E F E F E F E
G H G H G ≡
B A B A A m m C c b A A md C D mf A m (a,c,d) E F e me C D = p(e | c,d)mg (e)m f (a,e) ∑ E mh e E E F mg G H
z Messages can be reused
Eric Xing 32
16 From Elimination to Message Passing
z Elimination ≡ message passing on a clique tree
z Another query ...
B A B A A m m C c b A A md C D mf A E F me C D
E mh E E F mg G H
z Messages mf and mh are reused, others need to be recomputed
Eric Xing 33
17